The birth of cookies
Since the HTTP protocol is stateless, the server-side services must be stateful. The original purpose of the birth of cookies was to store status information in the web for easy use on the server side. For example, determine whether the user is visiting the website for the first time. The latest specification is RFC 6265, which is a specification implemented by browser servers.
The processing of cookies is divided into:
Server sends cookies like client
The browser saves cookies
After that, every time the http request is requested, the browser will send the cookie to the server.
Send and parsing on the server side
Send cookies
The server-side cookies sent by the client are implemented through HTTP response packets. In Set-Cookies, cookies that need to be sent by the client are set. The cookie format is as follows:
Set-Cookie: "name=value;domain=.domain.com;path=/;expires=Sat, 11 Jun 2016 11:29:42 GMT;HttpOnly;secure"
Where name=value is a required option, and others are optional. The main components of cookies are as follows:
name: A unique and definite cookie name. Generally speaking, the name of a cookie is case-insensitive.
value: The string value stored in the cookie. It is best to url encoding the name and value of the cookie
domain:cookie is valid for which domain. All requests sent to this domain will contain this cookie information. This value can contain subdomains (such as:
yq.aliyun.com) or it can not be included (for example: .aliyun.com, it is valid for all subdomains of aliyun.com).
path: Indicates the path affected by this cookie. The browser will send cookies based on this configuration, such as the path matching in the specified domain.
expires: expiration time, timestamp indicating when the cookie should be deleted (that is, when the cookie should be stopped sending to the server). If you do not set this timestamp, the browser will delete all cookies when the page is closed; however, you can also set the deletion time yourself. This value is in the GMT time format. If the client and server time are inconsistent, there will be deviations when using expires.
max-age: The same function as expires, is used to tell the browser how long this cookie expires (in seconds), rather than a fixed time point. Under normal circumstances, max-age has higher priority than expires.
HttpOnly: Informs the browser not to allow the script document.cookie to change this value, and this value is also not visible in document.cookie. But this cookie will still be carried on http request. Note that although this value is not available in the script, it still exists in the browser installation directory as a file. This setting is usually set on the server side.
secure: security flag, after specifying, it can only be sent to the server when using SSL link. If it is an http link, this information will not be passed. Even if the secure attribute is set, it does not mean that others cannot see the cookie information saved locally on your machine, so don't put important information into cookies and set it right on the server side.
Examples of cookies are as follows:
var http = require('http');var fs = require('fs');http.createServer(function(req, res) { res.setHeader('status', '200 OK'); res.setHeader('Set-Cookie', 'isVisit=true;domain=.yourdomain.com;path=/;max-age=1000'); res.write('Hello World'); res.end();}).listen(8888);console.log('running localhost:8888')Setting Set-Cookie directly is too original. We can encapsulate the cookie setting process as follows:
var serilize = function(name, val, options) { if (!name) { throw new Error("cooleie must have name"); } var enc = encodeURIComponent; var parts = []; val = (val !== null && val !== undefined) ? val.toString() : ""; options = options || {}; parts.push(enc(name) + "=" + enc(val)); // There must be two dots in the domain if (options.domain) { parts.push("domain=" + options.domain); } if (options.path) { parts.push("path=" + options.path); } // If you do not set expires and max-age browsers will clear cookies when the page is closed if (options.expires) { parts.push("expires=" + options.expires.toGMTString()); } if (options.maxAge && typeof options.maxAge === "number") { parts.push("max-age=" + options.maxAge); } if (options.httpOnly) { parts.push("HTTPOnly"); } if (options.secure) { parts.push("secure"); } return parts.join(";");}It should be noted that if a cookie is set to a past time, the browser will delete the cookie immediately; in addition, the domain item must have two points, so it cannot be set to localhost:
something that wasn't made clear to me here and totally confused me for a while was that domain names must contain at least two dots (.),hence 'localhost' is invalid and the browser will refuse to set the cookie!
Server-side analysis cookies
Cookies can set different domains and paths, so for the same name value, it can be repeated under different paths in different domains and different paths. The browser will sort the order in the order that best matches the current requested url or page address.
So when the cookie passed to the server side on the current side has multiple duplicate name values, we only need the one that matches the most, that is, the first one. The server-side parsing code is as follows:
var parse = function(cstr) { if (!cstr) { return null; } var dec = decodeURIComponent; var cookies = {}; var parts = cstr.split(//s*;/s*/g); parts.forEach(function(p){ var pos = p.indexOf('='); // name and value must be encoded before the cookie is stored var name = pos > -1 ? dec(p.substr(0, pos)) : p; var val = pos > -1 ? dec(p.substr(pos + 1)) : null; //Only you need to get the most matching if (!cookies.hasOwnProperty(name)) { cookies[name] = val; }/* else if (!cookies[name] instance of Array) { cookies[name] = [cookies[name]].push(val); } else { cookies[name].push(val); }*/ }); return cookies;}Client access
The browser manages cookies passed in the background and allows developers to use document.cookies in JavaScript to access cookies. But this interface is very lame to use. It will show different behaviors because of the different ways it is used.
When used to obtain attribute values, document.cookie returns all strings available on the current page (based on the cookie's domain, path, expiration time and security settings). The format of the string is as follows:
"name1=value1;name2=value2;name3=value3";
When used to set values, the document.cookie property can be set to a new cookie string. This string is interpreted and added to the existing cookie collection. like:
document.cookie = "_fa=aaaffffasdsf;domain=.dojotoolkit.org;path=/"
Setting document.cookie does not override cookies unless the set name value domain path is repeated with an existing cookie.
Since it is very inconvenient to read and write cookies, we can encapsulate some functions to handle cookies, mainly for the addition, modification and deletion of cookies.
var cookieUtils = { get: function(name){ var cookieName=encodeURIComponent(name) + "="; //Only get the most matching name, value var cookieStart = document.cookie.indexOf(cookieName); var cookieValue = null; if (cookieStart > -1) { // From cookieStart var cookieEnd = document.cookie.indexOf(';', cookieStart); //From = after = (cookieEnd > -1) { cookieValue = decodeURIComponent(document.cookie.substring(cookieStart + cookieName.length, cookieEnd)); } else { cookieValue = decodeURIComponent(document.cookie.substring(cookieStart + cookieName.length, document.cookie.length)); } } return cookieValue; }, set: function(name, val, options) { if (!name) { throw new Error("cooliie must have name"); } var enc = encodeURIComponent; var parts = []; val = (val !== null && val !== undefined) ? val.toString() : ""; options = options || {}; parts.push(enc(name) + "=" + enc(val)); // The domain must contain two dots if (options.domain) { parts.push("domain=" + options.domain); } if (options.path) { parts.push("path=" + options.path); } // If you do not set expires and max-age browser will clear cookies when the page is closed if (options.expires) { parts.push("expires=" + options.path); } // If you do not set expires and max-age browser will clear cookies when the page is closed if (options.expires) { parts.push("expires=" + options.expires.toGMTString()); } if (options.maxAge && typeof options.maxAge === "number") { parts.push("max-age=" + options.maxAge); } if (options.httpOnly) { parts.push("HTTPOnly"); } if (options.secure) { parts.push("secure"); } document.cookie = parts.join(";"); }, delete: function(name, options) { options.expires = new Date(0);// Set to past date this.set(name, null, options); } }Advantages of caching
The commonly referred to as web cache refers to an http device that can automatically save copies of common http requests. For front-end developers, browsers play an important role. In addition, there are various common proxy servers that can also be used for caching. When a web request reaches the cache, the cache extracts the replica content from the local replica without passing through the server. This brings the following advantages:
Caching reduces redundant data transmission and saves traffic
Cache alleviates bandwidth bottleneck issues. Pages can be loaded faster without more bandwidth
The cache alleviates instant congestion and reduces the requirements for the original server.
Cache reduces distance delay because loading pages from farther places will be slower.
Cache Type
The cache can be dedicated to a single user or shared by multiple users. A dedicated cache is called a private cache, and a shared cache is called a public cache.
Private cache
Private cache is only for proprietary users, so it does not require a lot of space and is cheap. Web browsers have built-in private caches - most browsers will cache common resources on your PC's disk and memory. For example, the cache storage location of the Chrome browser is: C:/Users/Your_Account/AppData/Local/Google/Chrome/User Data/Default.
Public cache
Public caches are special shared proxy servers, called cache proxy servers or proxy caches (a purpose of reverse proxy). The public cache will accept access from multiple users, so it can better reduce redundant traffic.
In the figure below, each client will repeatedly access a resource to the server (it is not in the private cache at this time), so that it will access the server multiple times, increasing the pressure on the server. When using a shared public cache, the cache only needs to be retrieved from the server once and does not have to pass through the server in the future, which can significantly reduce the pressure on the server.
In fact, hierarchical public cache is usually used in actual applications. The basic idea is to use small and cheap caches near the client, while at a higher level, larger and more powerful caches are gradually adopted to load resources shared by multiple users.
Cache processing flow
For front-end developers, we mainly deal with caches in the browser, so the process above is simplified to:
The following picture shows the request results of a website for different resources. It can be seen that some resources are read directly from the cache, some resources are reverified with the server, and some resources are re-acquisitioned from the server.
Note that all the questions we discussed about cache resources are only for GET requests. For behavioral operations such as POST, DELETE, and PUT, there is usually no cache.
Freshness limit
HTTP retains a copy of the server resource for a period of time through cache, which is called the freshness limit. This requests the same resource for a period of time and will not pass through the server again. Cache-Control and Expires in the HTTP protocol can be used to set the limit of freshness. The former is the new response header added in HTTP1.1, and the latter is the response header in HTTP1.0. Both do the same thing, but since Cache-Control uses relative time, and Expires may have a problem that the client and server time is different, we prefer Cache-Control.
Cache-Control
Let's take a look at what attribute values can be set by Cache-Control:
max-age (unit is s) specifies the maximum valid time to set the cache, which defines the length of time. When the browser sends a request to the server, the browser will no longer send a request to the server during max-age.
<html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no" /> <meta http-equiv="X-UA-Compatible" content="IE=EDGE" /> <title>Web Cache</title> <link rel="shortcut icon" href="./shortcut.png"> <script> </script> </head> <body> <img src="./cache.png"> </body></html>var http = require('http');var fs = require('fs');http.createServer(function(req, res) { if (req.url === '/' || req.url === '' || req.url === '/index.html') { fs.readFile('./index.html', function(err, file) { console.log(req.url) //Set cache for the main document, res.setHeader('Cache-Control', "no-cache, max-age=" + 5); res.setHeader('Content-Type', 'text/html'); res.writeHead('200', "OK"); res.end(file); }); } if (req.url === '/cache.png') { fs.readFile('./cache.png', function(err, file) { res.setHeader('Cache-Control', "max-age=" + 5);//Cache five seconds res.setHeader('Content-Type', 'images/png'); res.writeHead('200', "Not Modified"); res.end(file); }); } }).listen(8888)When the page is accessed for the second time within 5 seconds, the browser will directly obtain resources from the cache
public specifies that the response can be cached in the proxy cache and can therefore be shared by multiple users. If private is not specified explicitly, it defaults to public.
The private response can only be cached in the private cache and cannot be placed on the proxy cache. Resources that are sensitive to some user information usually need to be set to private.
no-cache means that you must first confirm with the server whether the resource has been changed (relying on If-None-Match and Etag) before deciding whether to use local cache.
If the above processing of cache.png is changed to the following, every time you visit the page, the browser needs to go to the server to verify whether the resource has been changed.
fs.readFile('./cache.png', function(err, file) { console.log(req.headers); console.log(req.url) if (!req.headers['if-none-match']) { res.setHeader('Cache-Control', "no-cache, max-age=" + 5); res.setHeader('Content-Type', 'images/png'); res.setHeader('Etag', "ffff"); res.writeHead('200', "Not Modified"); res.end(file); } else { if (req.headers['if-none-match'] === 'ffff') { res.writeHead('304', "Not Modified"); res.end(); } else { res.setHeader('Cache-Control', "max-age=" + 5); res.setHeader('Content-Type', 'images/png'); res.setHeader('Etag', "ffff"); res.writeHead('200', "Not Modified"); res.end(file); } } });no-store absolutely prohibits cache of any resource, which means that every time the user requests the resource, a request will be sent to the server, and the complete resource will be downloaded every time. Usually used for confidential resources.
Regarding the use of Cache-Control, see the picture below (from large amounts)
Client freshness limit
Cache-Control can not only be set in the response header, but also in the request header. The browser can decide whether to read resources from the cache by setting the Cache-Control in the request header. This is also why sometimes clicking the browser refresh button and entering in the address bar to see completely different results in the NetWork module
Expires
Expires is not recommended, it specifies a specific expiration date rather than a number of seconds. Because many servers and clients have clock inconsistent, it is best to use Cache-Control.
Server verification
The cached resource in the browser or proxy cache expires does not mean that it is actually different from the resources on the original server, but only means that it is time to check. This situation is called server re-verification.
If the resource changes, you need to get new resources and replace the old resources in the cache.
If the resource has not changed, the cache only needs to obtain a new response header and a new expiration time to update the expiration time of the resource in the cache.
The recommended verification method for HTTP1.1 is If-None-Match/Etag, and if-Modified-Since/Last-Modified is used in HTTP1.0.
Etag and If-None-Match
Generate a hash string based on the entity content, identifying the status of the resource, and is generated by the server. The browser will pass this string back to the server to verify that the resource has been modified. If it has not been modified, the process is as follows (the picture comes from a brief discussion on web cache):
In the above demo, we have seen how to verify Etag on the server:
Since Etag has a server structure, the uniqueness of Etag must be ensured in the cluster environment
If-Modified-Since vs. Last-Modified
These two are request/response headers used in HTTP 1.0 to verify whether the resource has expired. These two headers are dates. The verification process is similar to Etag, so we will not introduce it in detail here. When using these two headers to verify that the resource is updated, the following problems exist:
Some document resources are rewritten periodically, but the actual content has not changed. At this time, the file metadata will show that the file's most recent modification date is different from If-Modified-Since, resulting in unnecessary responses.
Some document resources have been modified, but the content of the modification is not important, and all caches are not required to be updated (such as code comments)
Regarding the update of cache, please check out Zhang Yunlong’s answer here. This article will not be expanded in detail.
The demo code in this article is as follows:
<!DOCTYPE HTML><html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no" /> <meta http-equiv="X-UA-Compatible" content="IE=EDGE" /> <title>Web Cache</title> <link rel="shortcut icon" href="./shortcut.png"> <script> </script> </head> <body> <img src="./cache.png"> </body></html>var http = require('http');var fs = require('fs');http.createServer(function(req, res) { if (req.url === '/' || req.url === '' || req.url === '/index.html') { fs.readFile('./index.html', function(err, file) { console.log(req.url) //Set cache for the main document, no effect res.setHeader('Cache-Control', "no-cache, max-age=" + 5); res.setHeader('Content-Type', 'text/html'); res.writeHeader('200', "OK"); res.end(file); }); } if (req.url === '/shortcut.png') { fs.readFile('./shortcut.png', function(err, file) { console.log(req.url) res.setHeader('Content-Type', 'images/png'); res.writeHead('200', "OK"); res.end(file); }) } if (req.url === '/cache.png') { fs.readFile('./cache.png', function(err, file) { console.log(req.headers); console.log(req.url) if (!req.headers['if-none-match']) { res.setHeader('Cache-Control', "max-age=" + 5); res.setHeader('Content-Type', 'images/png'); res.setHeader('Etag', "ffff"); res.writeHead('200', "Not Modified"); res.end(file); } else { if (req.headers['if-none-match'] === 'ffff') { res.writeHead('304', "Not Modified"); res.end(); } else { res.setHeader('Cache-Control', "max-age=" + 5); res.setHeader('Content-Type', 'images/png'); res.setHeader('Etag', "ffff"); res.writeHead('200', "Not Modified"); res.end(file); } } }); }}).listen(8888)OK, this article’s introduction to cookies ends here, I hope everyone likes it.