Introduction to the use of Unicode characters in web pages (&#, \u, etc.)

Author：Eve Cole Update Time：2024-12-25 11:22:58

The earliest computers could only use ASCII characters, but with the expansion of computer applications, many countries have designed special character sets for computers so that the letters and characters of their own countries and nations can be displayed and processed on computers, such as China’s GB2312 code. Later, the Internet appeared, connecting the entire world. It became a reality to display the languages of multiple countries and ethnic groups on one computer or even one interface. International organizations have developed a character encoding scheme that can accommodate all texts and symbols in the world, called Unicode, which is the abbreviation of Universal Character Set to meet the requirements for cross-language and cross-platform text conversion and processing. Since 1994 Released and continuously expanded, it has now reached Version 10.

You can go to the website https://www.unicode.org/ to check detailed information, including downloading the latest version of the code table.

When designing a web page, you can use the Unicode character set. There are different ways to use it depending on whether it is in HTML, CSS, or JavaScript.

1)Use in HTML: &#dddd; or &#xhhhh;

Among them, dddd represents a 4-digit decimal value, and hhhh represents a 4-digit hexadecimal value. The two formats are prefixed with &# and &#x respectively, and are expressed as decimal code or hexadecimal Unicode code, both of which need to be preceded by &# and &#x. A semicolon is a suffix. At present, Unicode characters using 4-digit hexadecimal codes have relatively good support. Most of them can be displayed normally on web pages, but other Unicode characters often cannot be displayed because the computer platform used has not yet been developed. Install relevant Unicode support. Example:
<p>Display Unicode characters--∰</p>
A mathematical symbol is displayed, and the Unicode code is 2230. You can use "∰" or "∰" to output this special character, and then it can be displayed on the page.

2) Used in CSS: hhhh

Unicode characters are rarely used in CSS, but they are occasionally used. They are generally represented by 4-digit hexadecimal Unicode codes, prefixed by a backslash.

3) Used in JavaScript: uhhhh

JavaScript codes are often used to output special characters, such as outputting temperature or angle characters in an element, using Greek letters, Roman numerals, etc. You only need to add the prefix "u" in front of the 4-digit Unicode hexadecimal code. That’s it. Example:

document.body.innerHTML="u25D0";

The Unicode code 25D0 is used. In the geometric figure table, it is a circular pattern, usually filled with white and half filled with black, like half a moon.

Of course, the most common occasion for Chinese people to use Unicode codes is with Chinese characters. In order to display more Chinese characters, the Chinese character library was first expanded from GB2312 to GBK and now to GB18030. The latest version of GB18030 has included more than 70,000 Chinese characters, including various ethnic minority characters and some special characters. This standard is consistent with the Unicode code method. Of course, some computers may not have the complete new version of support software installed, so often only a part of the characters can be displayed.

In order to obtain the Unicode code of a Chinese character, you can use the JavaScript function charCodeAt(), for example:

var ucode="Zhao".charCodeAt();

In this way, the Unicode code of the Chinese character "Zhao" is stored in the variable ucode, and the Unicode code can be obtained as 36213, which is a decimal Unicode code. You can use the toString(16) method to convert this decimal code to hexadecimal code:

var ucode="Zhao".charCodeAt().toString(16);

What is obtained in this way is the Unicode code in the hexadecimal form of the Chinese character "Zhao", and the obtained value is 8d75.

Generally, when outputting Chinese characters, the string containing Chinese characters can be displayed directly. You can also use the Unicode code of Chinese characters to output the corresponding Chinese characters or other characters:

String.fromCharCode(36213);

In this way, the character with the decimal Unicode code 36213 is converted into a string, and then the Chinese character "Zhao" will be displayed when the string is output. Because Chinese characters can be directly obtained by using the input method, this method is often used to output some special characters.

Convert &# encoding into characters

This is unicode encoding, the encoding process is like this:

For example, to encode "Yang", we can create a new notepad, enter "Yang" and choose to save as unicode encoding when saving, and then view the binary content of the file. The first two bytes FF and FE are the unicode encoding file header mark, and the following The two bytes 68 67 are the unicode encoding of "Yang". Use a calculator to convert it to decimal which is 26472. Now you can write "Yang" in an html file, and IE will display the word "Yang" when it is opened.

Of course, for general ASCII codes, unicode encoding is consistent with ASCII encoding, so A can display a capital letter "A".

Convert &# encoding into characters

 function uncode(str) {
return str.replace(/&#(x)?([^&]{1,5});?/g, function (a, b, c) {
return String.fromCharCode(parseInt(c, b ? 16 : 10));
});
}

Convert characters to &# encoding

 function encode(str) {
var a = [], i = 0;
for (; i < str.length ;) a[i] = str.charCodeAt(i ++);
return "&#" + a.join(";&#") + ";";
}

This concludes this article about the introduction of Unicode characters (&#, u, etc.) in web pages. For more related Unicode content, please search previous articles on downcodes.com or continue browsing the related articles below. I hope you will support downcodes.com more in the future!