Why many website do not allow to access the “last-modified” header?
I was developing a python program to scrape data from web, while requesting page from the website I checked the header field, It contained all other field such as server, via, date etc but most of the website responses did not contain "Last-Modified" field in it ? What is the reason behind it ?
python webserver http website headers
add a comment |
I was developing a python program to scrape data from web, while requesting page from the website I checked the header field, It contained all other field such as server, via, date etc but most of the website responses did not contain "Last-Modified" field in it ? What is the reason behind it ?
python webserver http website headers
add a comment |
I was developing a python program to scrape data from web, while requesting page from the website I checked the header field, It contained all other field such as server, via, date etc but most of the website responses did not contain "Last-Modified" field in it ? What is the reason behind it ?
python webserver http website headers
I was developing a python program to scrape data from web, while requesting page from the website I checked the header field, It contained all other field such as server, via, date etc but most of the website responses did not contain "Last-Modified" field in it ? What is the reason behind it ?
python webserver http website headers
python webserver http website headers
edited Dec 31 '18 at 16:48
Himanshu Poddar
asked Dec 31 '18 at 16:25
Himanshu PoddarHimanshu Poddar
1035
1035
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
Probably for this reason:
If you remove the Last-Modified and ETag header, you will totally eliminate If-Modified-Since and If-None-Match requests and their 304 Not Modified Responses, so a file will stay cached without checking for updates until the Expires header indicates new content is available!
By removing both the ETag header and the Last-Modified headers from your static files (images, javascript, css) browsers and caches will not be able to validate the cached version of the file vs. the real version. By also including a Cache-Control header and Expires header, you can specify that certain files be cached for a certain period of time, and you magically (this is a really unique trick I promise) eliminate any validation requests!
The ETag header is just a unique code (typically a hash) that a browser can check to see if a resource has changed.
So, by not including this header or ETag, but including an Expires and Cache-Control header:
- the browser will use its cached copy until the duration in the
Expiresheader is past - and also will not send validation requests (I think these would be
HEADrequests to get headers to check for an updatedLast-Modifieddate) to check the modification state.
Making browsers not send validation requests, but simply invalidate cached copies at a future date, cuts down on HTTP requests and increases webserver performance, which is important for servers facing the Internet at large where they get hit with bots and scrapers and such.
Then does this also mean that web caching or proxy server will not work as it iwon't be able to access last modified since?
– Himanshu Poddar
Dec 31 '18 at 19:13
It means instead of reading the "Last-Modified" header each time to check if it's different, it will simply wait until after the "Expires:" date to age out cache items. So it'll work, just a different and more efficient way.
– LawrenceC
Dec 31 '18 at 19:14
As I am a beginner its still a little vague to me. Can you please be more clear or suggest any reading material.
– Himanshu Poddar
Dec 31 '18 at 19:21
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "3"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1389317%2fwhy-many-website-do-not-allow-to-access-the-last-modified-header%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Probably for this reason:
If you remove the Last-Modified and ETag header, you will totally eliminate If-Modified-Since and If-None-Match requests and their 304 Not Modified Responses, so a file will stay cached without checking for updates until the Expires header indicates new content is available!
By removing both the ETag header and the Last-Modified headers from your static files (images, javascript, css) browsers and caches will not be able to validate the cached version of the file vs. the real version. By also including a Cache-Control header and Expires header, you can specify that certain files be cached for a certain period of time, and you magically (this is a really unique trick I promise) eliminate any validation requests!
The ETag header is just a unique code (typically a hash) that a browser can check to see if a resource has changed.
So, by not including this header or ETag, but including an Expires and Cache-Control header:
- the browser will use its cached copy until the duration in the
Expiresheader is past - and also will not send validation requests (I think these would be
HEADrequests to get headers to check for an updatedLast-Modifieddate) to check the modification state.
Making browsers not send validation requests, but simply invalidate cached copies at a future date, cuts down on HTTP requests and increases webserver performance, which is important for servers facing the Internet at large where they get hit with bots and scrapers and such.
Then does this also mean that web caching or proxy server will not work as it iwon't be able to access last modified since?
– Himanshu Poddar
Dec 31 '18 at 19:13
It means instead of reading the "Last-Modified" header each time to check if it's different, it will simply wait until after the "Expires:" date to age out cache items. So it'll work, just a different and more efficient way.
– LawrenceC
Dec 31 '18 at 19:14
As I am a beginner its still a little vague to me. Can you please be more clear or suggest any reading material.
– Himanshu Poddar
Dec 31 '18 at 19:21
add a comment |
Probably for this reason:
If you remove the Last-Modified and ETag header, you will totally eliminate If-Modified-Since and If-None-Match requests and their 304 Not Modified Responses, so a file will stay cached without checking for updates until the Expires header indicates new content is available!
By removing both the ETag header and the Last-Modified headers from your static files (images, javascript, css) browsers and caches will not be able to validate the cached version of the file vs. the real version. By also including a Cache-Control header and Expires header, you can specify that certain files be cached for a certain period of time, and you magically (this is a really unique trick I promise) eliminate any validation requests!
The ETag header is just a unique code (typically a hash) that a browser can check to see if a resource has changed.
So, by not including this header or ETag, but including an Expires and Cache-Control header:
- the browser will use its cached copy until the duration in the
Expiresheader is past - and also will not send validation requests (I think these would be
HEADrequests to get headers to check for an updatedLast-Modifieddate) to check the modification state.
Making browsers not send validation requests, but simply invalidate cached copies at a future date, cuts down on HTTP requests and increases webserver performance, which is important for servers facing the Internet at large where they get hit with bots and scrapers and such.
Then does this also mean that web caching or proxy server will not work as it iwon't be able to access last modified since?
– Himanshu Poddar
Dec 31 '18 at 19:13
It means instead of reading the "Last-Modified" header each time to check if it's different, it will simply wait until after the "Expires:" date to age out cache items. So it'll work, just a different and more efficient way.
– LawrenceC
Dec 31 '18 at 19:14
As I am a beginner its still a little vague to me. Can you please be more clear or suggest any reading material.
– Himanshu Poddar
Dec 31 '18 at 19:21
add a comment |
Probably for this reason:
If you remove the Last-Modified and ETag header, you will totally eliminate If-Modified-Since and If-None-Match requests and their 304 Not Modified Responses, so a file will stay cached without checking for updates until the Expires header indicates new content is available!
By removing both the ETag header and the Last-Modified headers from your static files (images, javascript, css) browsers and caches will not be able to validate the cached version of the file vs. the real version. By also including a Cache-Control header and Expires header, you can specify that certain files be cached for a certain period of time, and you magically (this is a really unique trick I promise) eliminate any validation requests!
The ETag header is just a unique code (typically a hash) that a browser can check to see if a resource has changed.
So, by not including this header or ETag, but including an Expires and Cache-Control header:
- the browser will use its cached copy until the duration in the
Expiresheader is past - and also will not send validation requests (I think these would be
HEADrequests to get headers to check for an updatedLast-Modifieddate) to check the modification state.
Making browsers not send validation requests, but simply invalidate cached copies at a future date, cuts down on HTTP requests and increases webserver performance, which is important for servers facing the Internet at large where they get hit with bots and scrapers and such.
Probably for this reason:
If you remove the Last-Modified and ETag header, you will totally eliminate If-Modified-Since and If-None-Match requests and their 304 Not Modified Responses, so a file will stay cached without checking for updates until the Expires header indicates new content is available!
By removing both the ETag header and the Last-Modified headers from your static files (images, javascript, css) browsers and caches will not be able to validate the cached version of the file vs. the real version. By also including a Cache-Control header and Expires header, you can specify that certain files be cached for a certain period of time, and you magically (this is a really unique trick I promise) eliminate any validation requests!
The ETag header is just a unique code (typically a hash) that a browser can check to see if a resource has changed.
So, by not including this header or ETag, but including an Expires and Cache-Control header:
- the browser will use its cached copy until the duration in the
Expiresheader is past - and also will not send validation requests (I think these would be
HEADrequests to get headers to check for an updatedLast-Modifieddate) to check the modification state.
Making browsers not send validation requests, but simply invalidate cached copies at a future date, cuts down on HTTP requests and increases webserver performance, which is important for servers facing the Internet at large where they get hit with bots and scrapers and such.
answered Dec 31 '18 at 16:40
LawrenceCLawrenceC
58.9k10102179
58.9k10102179
Then does this also mean that web caching or proxy server will not work as it iwon't be able to access last modified since?
– Himanshu Poddar
Dec 31 '18 at 19:13
It means instead of reading the "Last-Modified" header each time to check if it's different, it will simply wait until after the "Expires:" date to age out cache items. So it'll work, just a different and more efficient way.
– LawrenceC
Dec 31 '18 at 19:14
As I am a beginner its still a little vague to me. Can you please be more clear or suggest any reading material.
– Himanshu Poddar
Dec 31 '18 at 19:21
add a comment |
Then does this also mean that web caching or proxy server will not work as it iwon't be able to access last modified since?
– Himanshu Poddar
Dec 31 '18 at 19:13
It means instead of reading the "Last-Modified" header each time to check if it's different, it will simply wait until after the "Expires:" date to age out cache items. So it'll work, just a different and more efficient way.
– LawrenceC
Dec 31 '18 at 19:14
As I am a beginner its still a little vague to me. Can you please be more clear or suggest any reading material.
– Himanshu Poddar
Dec 31 '18 at 19:21
Then does this also mean that web caching or proxy server will not work as it iwon't be able to access last modified since?
– Himanshu Poddar
Dec 31 '18 at 19:13
Then does this also mean that web caching or proxy server will not work as it iwon't be able to access last modified since?
– Himanshu Poddar
Dec 31 '18 at 19:13
It means instead of reading the "Last-Modified" header each time to check if it's different, it will simply wait until after the "Expires:" date to age out cache items. So it'll work, just a different and more efficient way.
– LawrenceC
Dec 31 '18 at 19:14
It means instead of reading the "Last-Modified" header each time to check if it's different, it will simply wait until after the "Expires:" date to age out cache items. So it'll work, just a different and more efficient way.
– LawrenceC
Dec 31 '18 at 19:14
As I am a beginner its still a little vague to me. Can you please be more clear or suggest any reading material.
– Himanshu Poddar
Dec 31 '18 at 19:21
As I am a beginner its still a little vague to me. Can you please be more clear or suggest any reading material.
– Himanshu Poddar
Dec 31 '18 at 19:21
add a comment |
Thanks for contributing an answer to Super User!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1389317%2fwhy-many-website-do-not-allow-to-access-the-last-modified-header%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown