Web page encoding is translated as web page encoding in English, which is a library that specifies its specific character encoding format in a web page.
GBK is a standard that is compatible with GB2312 after expansion based on the national standard GB2312. The text encoding of GBK is represented by double bytes, that is, both Chinese and English characters are represented by double bytes. In order to distinguish Chinese, the highest bit is set to 1. GBK contains all Chinese characters, is national coded, and has worse versatility than UTF8, but UTF8 occupies a larger database than GBK.
UTF-8: Unicode TransformationFormat-8bit, allows BOM, but usually does not contain BOM. It is a multi-byte encoding used to solve international characters. It uses 8 bits (i.e. one byte) for English and 24 (three bytes) for Chinese to encode. UTF-8 contains characters that all countries around the world need to use. It is internationally encoded and has strong versatility. UTF-8 encoded text can be displayed on browsers that support UTF8 character sets in various countries. If it is UTF8 encoding, Chinese can also be displayed on foreigners' English IE, and they do not need to download the IE Chinese language support package.
Although the UTF-8 version has good international compatibility, Chinese requires 50% of the database storage space than the GBK/BIG5 version, so it is not recommended to use and is only for users who have special requirements for international compatibility. Simply put: For websites with more Chinese, it is suitable to use GBK encoding to save database space. For websites with more English, it is suitable to use UTF-8 to save database space.
How to convert GBK, GB2312, etc. to UTF8? GBK, GB2312, etc. and UTF8 must be encoded through Unicode to convert each other: GBK, GB2312-Unicode-UTF8; UTF8-Unicode-GBK, GB2312. Using Windows Notepad's Save As, you can convert between GBK, Unicode, Unicode big endian and UTF-8 encoding methods.
How to make the browser correctly recognize web encoding? Generally, there must be the following sentence in a web page: <meta http-equiv=Content-Type content=text/html; charset=gb2312>, indicating that the character set encoding of this web page is GB2312. (or UTF-8)
Sometimes the page has specified encoding, why does it sometimes appear garbled? This may be because the page declaration encoding is inconsistent with the file itself. It is often caused by opening the page with an error encoding and then saving it, or using some FTP software to modify the file online, such as CuteFTP, which causes the software encoding to be converted and incorrectly configured. At this time, use window's notepad to open it, and save as the corresponding encoding to solve the problem.
When using IE as a browser on Windows operating systems, this problem often occurs: when browsing web pages encoded with UTF-8, the browser cannot automatically recognize the encoding used by the page, even if the web page has declared the encoding format: <meta http-equiv=Content-Type content=text/html; charset=UTF-8 />, which causes some pages containing Chinese UTF-8 encoding to produce blank output. If you are using Firefox or Sarafi browsers, this will not cause this problem. This is because when IE parses web encoding, the tags in HTML are preferred, and then the messages in the HTTP header are just the opposite.
Since UTF-8 represents a man in 3 bytes, the ordinary GB2312 or BIG5 are two. When outputting the page, due to the above reasons, when the browser parses and outputs the content of <title></title>, if there are odd full-width characters in front of the </title>, half a Chinese character will appear when IE parses UTF-8 as two bytes. At this time, the half Chinese character will be combined with the <title> <title> <title>, causing the entire page to be empty and output. At this time, if you look at the source file, you will find that the entire page has actually been output, but the browser does not display the content. The easiest solution is to put <meta http-equiv=Content-Type content=text/html; charset=UTF-8 /> before <title></title>.