The lexical structure of a programming language is a basic set of rules used to describe how you write this language. As the basis of syntax, it specifies what variable names look like, how to write comments, and how to distinguish between statements. This section uses a very short space to introduce the lexical structure of javascript.
1. Character set
The javascript program is written in Unicode character sets, a superset of ASCII and Latin-1 and supports almost all languages in the region. ECMAscript3 requires that the implementation of javascript must support Unicode2, 1 and subsequent versions, while ECMAscript5 requires that the implementation of javascript must support Unicode3 and subsequent versions.
i. case sensitive
JavaScript is a case-sensitive language, that is, keywords, variables, function names and all expression characters must be consistently cased. For example, the keyword while must be written as while, and cannot be written as While or WHILE.
But it should be noted that html is not case sensitive (although xhtml is different), and it is easy to be confused because it is closely related to client javascript. For example, in the processing event set by html, the onclick attribute can be written as onClick, but in JavaScript, it can be written as onclick.
ii spaces, line breaks, and format controllers
JavaScript ignores the spaces between the tokens in the program. In most cases, JavaScript also ignores line breaks. Since spaces and line breaks can be used at will in the code, neat and consistent indentation can be used at one time to form a unified encoding style, improving the readability of the code.
javascript in addition to identifying space characters (/u0020). JavaScript also displays the following characters that indicate spaces: horizontal tab character (/u0009), vertical tab character (/u000B), page renewal character (/u000C), non-interrupted whitespace character (/u00A0), endian tag (/uFEFF), and characters in all Zs categories in Unicode. JavaScript recognizes the following characters as ending characters: line break (/u000A), carriage return symbol (/u000D), line separator (/u2028), and segment separator (/u2029). The carriage return character and line break character are parsed into a single line ending character.
Unicode format controls characters (Cf class), such as "Write marks from right to left" (/u200F) and "Write marks from left to right" (/u200E), controls the visual display of text. This is crucial for the correct display of some non-English texts, which can be used in javascript comments, string direct quantities and regular expression direct quantities, but cannot be used in identifiers (e.g. variable names), but there is an exception to zero-width connector (/u200D) and zero-width non-connector (/uFEFF) that can appear in identifiers but cannot be used as hand characters. It is also mentioned above that the byte order mark format control character (/uFEFF) is treated as a space.
iii.Unicode escape sequence
In some computer hardware and software, the complete set of Unicode characters cannot be displayed or entered. To support programmers using old technology, javascript defines a special sequence that uses 6 ASCII characters to represent any 16-bit Unicode internal code. These Unicode escape sequences are prefixed by /u, followed by hexadecimal rats (indicated with numbers and uppercase and lowercase letters AF). This Unicode escape writing can be used in JavaScript string direct quantities, regular expressions, and identifiers (except keywords). For example, the Unicode escape of character é is written as /u00E9, and the following two Javascript strings are exactly the same.
"café" === "caf/u00e9" => true
Unicode escape writing can appear in comments, but since JavaScript ignores comments, they are just treated as ascii characters in the context and will not be followed by Unicode characters.
iiii Standardization
Unicode allows encoding the same character using multiple methods. For example, the character é can use the Unicode character /u00E9, or the ordinary ascii character e can be used to follow a tone symbol /u0301. In a text editor, the results displayed by these two encodings are exactly the same, but their binary encoding representations are different and are not equal in computers. The Unicode standard defines a preferred code format for index characters and provides a standardized processing method to convert text into a standard format suitable for comparison, and will no longer standardize other representations, strings or regular expressions.
2. Comments
JavaScript supports two comment methods. The text after "//" at the end of the line will be ignored by JavaScript as comments.
In addition, the text between /* and */ is also used as comments. This kind of comment can be written across lines, but there are no nested comments.
//Single line comment
/*
*
*
*
*/
3. Direct quantity
The so-called direct quantity (literal) is the data value directly used in the program. The direct quantity is listed below.
The code copy is as follows:
12 // Numbers
1.2 //Decimal
"Hllo World" //String Text
'hi' // Another string
true //Boolean
false //Boolean
/javascript /gi //regular expression direct quantity (used as pattern matching)
null //Empty
Chapter 3 will explain in detail the direct quantity of numbers and strings. The direct quantity of regular expressions will be explained in Chapter 10. More expressions of welfare can be written as arrays or objects directly.
{x:1,y:2} //Object
[1,2,3,4,5] //Array
4. Identifiers and reserved words
An identifier is a name. In javascript, identifiers are used to name variables and functions, or to mark the jump position in certain loop statements in javascript code. The javascript identifier must be in letters. The underscore, or dollar sign begins. The subsequent characters can be letters. number. Underscore or dollar sign (numbers are not allowed to appear as initials, JavaScript can easily distinguish identifiers from numbers), and the following are legal identifiers
The code copy is as follows:
my_variable_name
b13
_dummy
$str
For portability and ease of writing, we usually use only ASCII letters and numbers to write identifiers. Then it should be noted that javascript allows letters and numbers in the entire set of Unicode characters in the identifier (from the technology to ECMAScript allows the Mn class, Mc class and P class of the Unicode character mechanism to appear after the first character of the identifier). Therefore, programmers can use non-English languages or mathematical symbols to write identifiers.
The code copy is as follows:
var sá = true;
var π = 3.14;
JavaScript takes out some identifiers as keywords, so names can no longer use these keywords as identifiers in the program.
The code copy is as follows:
break
case
catch
Continue continue
default
delete delete
do
else
Finally
for
function
if
in
instanceof
new
Return
switch
This
throw
try
typeof
var
void
While
with
JavaScript reserved words
class const enum export
export extends import super
Also, these keywords are legal in normal javascript, but are reserved words in strict mode
implements let private public yield interface package
protected static
In the same strict mode, the following identifiers are strictly restricted, but variable names, parameter names and function names cannot be used.
arguments eval
The specific implementation of javascript may define unique global variables and functions. Each specific javascript running environment (client) server, etc., has its own global attribute list, which needs to be kept in mind. (Window object to understand the list of global variables and functions defined in client javascript)
5. Optional semicolon
Like many programming languages, javascript uses semicolons (;) to separate statements. This is very important for enhancing the readability and neatness of the code. The end of a statement without a separator becomes the beginning of the next statement, and vice versa.
In javascript, each statement takes up one line, and the semicolon between the statements can usually be omitted (the semicolon before the braces of the "}" at the end of the program can also be omitted). Many javascript programmers (including the code examples of this book) use semicolons to clearly mark the end of a statement, even when semicolons are not completely needed. Another style is to omit semicolons when they can be omitted, and only use semicolons when they have to be used. Regardless of the programming style, there are several details to pay attention to about javascript.
The following code, the first semicolon can be omitted
a=3;
b=4;
However, if written in the following format, the first semicolon cannot be omitted.
a=3; b=4;
It should be noted that javascript does not fill semicolons in all newlines: javascript will fill semicolons only when the code is not parsed normally without semicolons. In other words (similar to the two exceptions in the following code), if the current statement and subsequent non-space characters cannot be parsed as a whole, javascript will fill semicolons at the end of the current statement. See the following code
var a
a
=
3
console.log(a)
Javascript parses it as
var a;a=3;console.log(a);
JavaScript adds a semicolon to the first line. Without a semicolon, JavaScript cannot parse var aa in the code. The second a can be used as a statement "a;", but javascript does not fill the semicolon at the end of the second line. Because it can be parsed with the third line content to "a=3;" .
The separating rules of some statements will lead to some unexpected situations. This break code is divided into two lines, which seems to be two independent statements.
var y = x + f
(a+b).toString()
The brackets on the second line form a function call with the f on the first line. JavaScript will regard this code as
var y = x+f(a+b).toString();
Obviously, the original intention of the code is not like this. In order to allow the above code to be parsed into two different statements, you must manually fill in the display semicolon of the behavior.
Generally speaking, if a statement starts with ( [ / + -, it is very likely to be parsed with the previous statement. Statements starting with / + - are not very common, but statements starting with ( [ are very common. At least in some JavaScript encoding styles. Some programmers like to conservatively add a semicolon to the statement, so that even if the previous statement is edited and the semicolon is deleted by mistake, the current statement will still be parsed correctly;
If the current statement and the next line statement cannot be merged and parsed. JavaScript fills the semicolon after the first row, which is a common rule, but has two columns outside. The first exception involves the returnm,birak,he continue statement if these three keywords are followed by a line break. JavaScript will fill in semicolons at the newline. For example
For example
Return
true;
And javascript parsed
return;ture;
And the original meaning of the code is
return truth;
In other words, there cannot be line breaks between the subsequent expressions with return, break, and contuine. If a line break is added, the program can only report an error in special circumstances. Moreover, the debugging of the program is inconvenient.
The second example is when it comes to the ++ -- operator, these expression symbols can represent the prefix and suffix of the identifier expression. If it is followed by the expression, if it is used as the suffix expression. It and expression should be regarded as one line. Otherwise the semicolon will be filled at the end of the line.
The code copy is as follows:
x
++
yy
The above code is parsed as
The code copy is as follows:
x;
++y