During the process of project work, we often establish various specifications to facilitate better cooperation among teams and better complete projects; we often hear various protocols, such as the open XMPP protocol used by Google's im software GTalk. As long as other im software also complies with the xMPP protocol, it can be used and communicated with gTalk; there is no count of information on the Internet, and this information itself exists independently. How to connect it in series and present it to users requires the use of the http protocol.
By the same token, because browsers have different kernels and different renderings of default styles, they need a rule that each browser follows to ensure that the styles presented by the same web document on different browsers are consistent. This rule is the doctype declaration.
Because the Internet is interoperable, any two or more web documents may involve data exchange. Because the xml language allows users to customize tags, any two exchanged documents may have the same tags, resulting in conflicts of the same tags. Therefore, a namespace is needed to distinguish the same tags that may exist in the exchange document.
As a language for transitioning html to xml, xhtml cannot implement user-defined tags in xml language, so the namespaces in xhmtl documents are the same:
<html xmlns=http://www.w3.org/1999/xhtml>
xmlns is the abbreviation of xhtml namespace, which is the so-called namespace. Like the doctype declaration, xmlns is also a declaration. Unlike the doctype declaration still exists in the html document, xmlns does not exist in the html document, and the xmlns we usually see appear in the xhtml document.
When making a web page, in addition to declaring doctype (document type) at the beginning, if it is an xhtml document, it also needs to declare the namespace, and the third thing to declare is the character encoding type of the web page document:
<meta http-equiv=content-type content=text/html; charset=utf-8' />
In order to be interpreted correctly by the browser and verified by W3C, each xhtml document should declare the character encoding used. Many times, most of the garbled codes in web documents are caused by incorrect character encoding.
UTF-8 is a variable-length encoding expression of unicode. As a globally universal character encoding, it is being used in more and more web documents. Web pages using UTF-8 character encoding can maximize the avoidance of garbled code caused by different character encodings when users in different regions access the same web page.
But when we open most domestic websites, especially large portal websites, the statement about character encoding is not utf-8, but gb2312:
<meta http-equiv=content-type content=text/html; charset=gb2312' />
Of course, in addition to gb2312, there are some websites that use gbk or gb18030 encoding. These three character encodings belong to the simplified Chinese character set. That is to say, if a computer does not have a simplified Chinese character set installed, when it accesses a Chinese web page with character encoded as gb2312, the garbled code is displayed.
Since garbled code may occur due to user access in different regions when using gb2312 character encoding, why not use utf-8?
One of the reasons may be historical reasons, while the other more important reason should be the different document sizes due to different storage methods of the two encodings.
When using the gb2312 character encoding set, a Chinese character occupies 2 bytes, while the number of bytes occupied by a Chinese character in the utf-8 encoding is often 3 bytes, or even more than 3 bytes. Therefore, for the same Chinese document, the volume of storing using gb2312 character encoding is smaller than the document size stored in the utf-8 encoding.
For Chinese websites with a lot of text and a lot of visits, using gb2312-encoded web page documents can save a lot of traffic in download and transmission. Furthermore, because the user group of Chinese websites is basically locked on Chinese users, which is why many websites use gb2312-encoded instead of UTF-8-encoded.
However, there are not many websites with a lot of text and visits in China. In addition, the problem of garbled code may be paired, so it is recommended to use utf-8 encoding when creating web pages.
Of course, no matter what kind of encoding is used, the most important thing is that the encoding used by the entire site must be unified.
For character encoding declarations in addition to the above method, you may also see another declaration method:
<meta http-equiv=content-language content=gb2312' />
<meta http-equiv=content-language content=zh-cn />
This declaration method is aimed at old versions of browsers. This declaration method is not recommended today when browsers have been generally updated.