Why many website do not allow to access the “last-modified” header?












0















I was developing a python program to scrape data from web, while requesting page from the website I checked the header field, It contained all other field such as server, via, date etc but most of the website responses did not contain "Last-Modified" field in it ? What is the reason behind it ?










share|improve this question





























    0















    I was developing a python program to scrape data from web, while requesting page from the website I checked the header field, It contained all other field such as server, via, date etc but most of the website responses did not contain "Last-Modified" field in it ? What is the reason behind it ?










    share|improve this question



























      0












      0








      0








      I was developing a python program to scrape data from web, while requesting page from the website I checked the header field, It contained all other field such as server, via, date etc but most of the website responses did not contain "Last-Modified" field in it ? What is the reason behind it ?










      share|improve this question
















      I was developing a python program to scrape data from web, while requesting page from the website I checked the header field, It contained all other field such as server, via, date etc but most of the website responses did not contain "Last-Modified" field in it ? What is the reason behind it ?







      python webserver http website headers






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Dec 31 '18 at 16:48







      Himanshu Poddar

















      asked Dec 31 '18 at 16:25









      Himanshu PoddarHimanshu Poddar

      1035




      1035






















          1 Answer
          1






          active

          oldest

          votes


















          1














          Probably for this reason:




          If you remove the Last-Modified and ETag header, you will totally eliminate If-Modified-Since and If-None-Match requests and their 304 Not Modified Responses, so a file will stay cached without checking for updates until the Expires header indicates new content is available!



          By removing both the ETag header and the Last-Modified headers from your static files (images, javascript, css) browsers and caches will not be able to validate the cached version of the file vs. the real version. By also including a Cache-Control header and Expires header, you can specify that certain files be cached for a certain period of time, and you magically (this is a really unique trick I promise) eliminate any validation requests!




          The ETag header is just a unique code (typically a hash) that a browser can check to see if a resource has changed.



          So, by not including this header or ETag, but including an Expires and Cache-Control header:




          • the browser will use its cached copy until the duration in the Expires header is past

          • and also will not send validation requests (I think these would be HEAD requests to get headers to check for an updated Last-Modified date) to check the modification state.


          Making browsers not send validation requests, but simply invalidate cached copies at a future date, cuts down on HTTP requests and increases webserver performance, which is important for servers facing the Internet at large where they get hit with bots and scrapers and such.






          share|improve this answer
























          • Then does this also mean that web caching or proxy server will not work as it iwon't be able to access last modified since?

            – Himanshu Poddar
            Dec 31 '18 at 19:13











          • It means instead of reading the "Last-Modified" header each time to check if it's different, it will simply wait until after the "Expires:" date to age out cache items. So it'll work, just a different and more efficient way.

            – LawrenceC
            Dec 31 '18 at 19:14













          • As I am a beginner its still a little vague to me. Can you please be more clear or suggest any reading material.

            – Himanshu Poddar
            Dec 31 '18 at 19:21











          Your Answer








          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "3"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1389317%2fwhy-many-website-do-not-allow-to-access-the-last-modified-header%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1














          Probably for this reason:




          If you remove the Last-Modified and ETag header, you will totally eliminate If-Modified-Since and If-None-Match requests and their 304 Not Modified Responses, so a file will stay cached without checking for updates until the Expires header indicates new content is available!



          By removing both the ETag header and the Last-Modified headers from your static files (images, javascript, css) browsers and caches will not be able to validate the cached version of the file vs. the real version. By also including a Cache-Control header and Expires header, you can specify that certain files be cached for a certain period of time, and you magically (this is a really unique trick I promise) eliminate any validation requests!




          The ETag header is just a unique code (typically a hash) that a browser can check to see if a resource has changed.



          So, by not including this header or ETag, but including an Expires and Cache-Control header:




          • the browser will use its cached copy until the duration in the Expires header is past

          • and also will not send validation requests (I think these would be HEAD requests to get headers to check for an updated Last-Modified date) to check the modification state.


          Making browsers not send validation requests, but simply invalidate cached copies at a future date, cuts down on HTTP requests and increases webserver performance, which is important for servers facing the Internet at large where they get hit with bots and scrapers and such.






          share|improve this answer
























          • Then does this also mean that web caching or proxy server will not work as it iwon't be able to access last modified since?

            – Himanshu Poddar
            Dec 31 '18 at 19:13











          • It means instead of reading the "Last-Modified" header each time to check if it's different, it will simply wait until after the "Expires:" date to age out cache items. So it'll work, just a different and more efficient way.

            – LawrenceC
            Dec 31 '18 at 19:14













          • As I am a beginner its still a little vague to me. Can you please be more clear or suggest any reading material.

            – Himanshu Poddar
            Dec 31 '18 at 19:21
















          1














          Probably for this reason:




          If you remove the Last-Modified and ETag header, you will totally eliminate If-Modified-Since and If-None-Match requests and their 304 Not Modified Responses, so a file will stay cached without checking for updates until the Expires header indicates new content is available!



          By removing both the ETag header and the Last-Modified headers from your static files (images, javascript, css) browsers and caches will not be able to validate the cached version of the file vs. the real version. By also including a Cache-Control header and Expires header, you can specify that certain files be cached for a certain period of time, and you magically (this is a really unique trick I promise) eliminate any validation requests!




          The ETag header is just a unique code (typically a hash) that a browser can check to see if a resource has changed.



          So, by not including this header or ETag, but including an Expires and Cache-Control header:




          • the browser will use its cached copy until the duration in the Expires header is past

          • and also will not send validation requests (I think these would be HEAD requests to get headers to check for an updated Last-Modified date) to check the modification state.


          Making browsers not send validation requests, but simply invalidate cached copies at a future date, cuts down on HTTP requests and increases webserver performance, which is important for servers facing the Internet at large where they get hit with bots and scrapers and such.






          share|improve this answer
























          • Then does this also mean that web caching or proxy server will not work as it iwon't be able to access last modified since?

            – Himanshu Poddar
            Dec 31 '18 at 19:13











          • It means instead of reading the "Last-Modified" header each time to check if it's different, it will simply wait until after the "Expires:" date to age out cache items. So it'll work, just a different and more efficient way.

            – LawrenceC
            Dec 31 '18 at 19:14













          • As I am a beginner its still a little vague to me. Can you please be more clear or suggest any reading material.

            – Himanshu Poddar
            Dec 31 '18 at 19:21














          1












          1








          1







          Probably for this reason:




          If you remove the Last-Modified and ETag header, you will totally eliminate If-Modified-Since and If-None-Match requests and their 304 Not Modified Responses, so a file will stay cached without checking for updates until the Expires header indicates new content is available!



          By removing both the ETag header and the Last-Modified headers from your static files (images, javascript, css) browsers and caches will not be able to validate the cached version of the file vs. the real version. By also including a Cache-Control header and Expires header, you can specify that certain files be cached for a certain period of time, and you magically (this is a really unique trick I promise) eliminate any validation requests!




          The ETag header is just a unique code (typically a hash) that a browser can check to see if a resource has changed.



          So, by not including this header or ETag, but including an Expires and Cache-Control header:




          • the browser will use its cached copy until the duration in the Expires header is past

          • and also will not send validation requests (I think these would be HEAD requests to get headers to check for an updated Last-Modified date) to check the modification state.


          Making browsers not send validation requests, but simply invalidate cached copies at a future date, cuts down on HTTP requests and increases webserver performance, which is important for servers facing the Internet at large where they get hit with bots and scrapers and such.






          share|improve this answer













          Probably for this reason:




          If you remove the Last-Modified and ETag header, you will totally eliminate If-Modified-Since and If-None-Match requests and their 304 Not Modified Responses, so a file will stay cached without checking for updates until the Expires header indicates new content is available!



          By removing both the ETag header and the Last-Modified headers from your static files (images, javascript, css) browsers and caches will not be able to validate the cached version of the file vs. the real version. By also including a Cache-Control header and Expires header, you can specify that certain files be cached for a certain period of time, and you magically (this is a really unique trick I promise) eliminate any validation requests!




          The ETag header is just a unique code (typically a hash) that a browser can check to see if a resource has changed.



          So, by not including this header or ETag, but including an Expires and Cache-Control header:




          • the browser will use its cached copy until the duration in the Expires header is past

          • and also will not send validation requests (I think these would be HEAD requests to get headers to check for an updated Last-Modified date) to check the modification state.


          Making browsers not send validation requests, but simply invalidate cached copies at a future date, cuts down on HTTP requests and increases webserver performance, which is important for servers facing the Internet at large where they get hit with bots and scrapers and such.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Dec 31 '18 at 16:40









          LawrenceCLawrenceC

          58.9k10102179




          58.9k10102179













          • Then does this also mean that web caching or proxy server will not work as it iwon't be able to access last modified since?

            – Himanshu Poddar
            Dec 31 '18 at 19:13











          • It means instead of reading the "Last-Modified" header each time to check if it's different, it will simply wait until after the "Expires:" date to age out cache items. So it'll work, just a different and more efficient way.

            – LawrenceC
            Dec 31 '18 at 19:14













          • As I am a beginner its still a little vague to me. Can you please be more clear or suggest any reading material.

            – Himanshu Poddar
            Dec 31 '18 at 19:21



















          • Then does this also mean that web caching or proxy server will not work as it iwon't be able to access last modified since?

            – Himanshu Poddar
            Dec 31 '18 at 19:13











          • It means instead of reading the "Last-Modified" header each time to check if it's different, it will simply wait until after the "Expires:" date to age out cache items. So it'll work, just a different and more efficient way.

            – LawrenceC
            Dec 31 '18 at 19:14













          • As I am a beginner its still a little vague to me. Can you please be more clear or suggest any reading material.

            – Himanshu Poddar
            Dec 31 '18 at 19:21

















          Then does this also mean that web caching or proxy server will not work as it iwon't be able to access last modified since?

          – Himanshu Poddar
          Dec 31 '18 at 19:13





          Then does this also mean that web caching or proxy server will not work as it iwon't be able to access last modified since?

          – Himanshu Poddar
          Dec 31 '18 at 19:13













          It means instead of reading the "Last-Modified" header each time to check if it's different, it will simply wait until after the "Expires:" date to age out cache items. So it'll work, just a different and more efficient way.

          – LawrenceC
          Dec 31 '18 at 19:14







          It means instead of reading the "Last-Modified" header each time to check if it's different, it will simply wait until after the "Expires:" date to age out cache items. So it'll work, just a different and more efficient way.

          – LawrenceC
          Dec 31 '18 at 19:14















          As I am a beginner its still a little vague to me. Can you please be more clear or suggest any reading material.

          – Himanshu Poddar
          Dec 31 '18 at 19:21





          As I am a beginner its still a little vague to me. Can you please be more clear or suggest any reading material.

          – Himanshu Poddar
          Dec 31 '18 at 19:21


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Super User!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1389317%2fwhy-many-website-do-not-allow-to-access-the-last-modified-header%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Mouse cursor on multiple screens with different PPI

          Agildo Ribeiro

          Sometime when accessing a menu: “Ubuntu 16.04 has experienced an internal error”