Microsoft's regular expression tutorial (II): Regular expression syntax

Author：Eve Cole Update Time：2025-03-20 21:32:01

Regular expression syntax

A regular expression is a literal pattern composed of ordinary characters (such as characters a to z) and special characters (called metacharacters ). This pattern describes one or more strings to be matched when searching for a text body. The regular expression acts as a template to match a character pattern with the searched string.

Here are some examples of regular expressions you might encounter:

JScript	VBScript	match
/^/[ /t]*$/	^/[ /t]*$	Match a blank line.
//d{2}-/d{5}/	/d{2}-/d{5}	Verify that an ID number consists of a 2-digit number, a hyphen, and a 5-digit number.
/<(.)>.<///1>/	<(.)>.<///1>	Match an HTML tag.

The following table is a complete list of metacharacters and their behavior in the context of regular expressions:

character	describe
/	Mark the next character as a special character, or an primitive character, or a backward reference, or an octal escape character. For example, 'n' matches the character n. '/n' matches a newline character. The sequence '//' matches / and /( matches (.
^	Matches the start position of the input string. If the Multiline property of the RegExp object is set, ^ also matches the position after '/n' or '/r'.
$	Matches the end position of the input string. If the Multiline property of the RegExp object is set, $ also matches the position before '/n' or '/r'.
*	Matches the previous subexpression zero or multiple times. For example, zo* can match z and zoo. * is equivalent to {0,}.
+	Matches the previous subexpression once or more times. For example, 'zo+' can match zo and zoo, but not z. + is equivalent to {1,}.
?	Matches the previous subexpression zero or once. For example, do(es)? can match do or do in do. ? is equivalent to {0,1}.
{ n }	n is a non-negative integer. Match the n times that are determined. For example, 'o{2}' cannot match 'o' in Bob, but can match two os in food.
{ n ,}	n is a non-negative integer. Match at least n times. For example, 'o{2,}' cannot match 'o' in Bob, but can match all os in Foooood. 'o{1,}' is equivalent to 'o+'. 'o{0,}' is equivalent to 'o*'.
{ n , m }	Both m and n are non-negative integers, where n <= m . Match at least n times and match up to m times. Liu, o{1,3} will match the first three os in fooooood. 'o{0,1}' is equivalent to 'o?'. Please note that there cannot be spaces between commas and two numbers.
?	The matching pattern is non-greedy when the character is immediately followed by any other restriction character (*, +, ?, { n }, { n ,}, { n , m }). The non-greedy pattern matches as few strings as possible, while the default greedy pattern matches as many strings as possible. For example, for the string oooo, 'o+?' will match a single o, and 'o+' will match all 'o'.
.	Match any single character except /n. To match any characters including '/n', use a pattern like '[./n]'.
( pattern )	Match pattern and get this match. The obtained matches can be obtained from the generated Matches collection, using the SubMatches collection in VBScript, and using the $0 … $9 attribute in JScript. To match parentheses characters, use '/(' or '/)'.
(?: pattern )	Match pattern but do not get the matching result, that is, this is a non-get match and is not stored for future use. This is useful when using or characters (\|) to combine various parts of a pattern. For example, 'industr(?:y\|ies) is a simpler expression than 'industry\|industries'.
(?= pattern )	Forward pre-check, match the lookup string at the beginning of any string matching pattern . This is a non-get match, that is, the match does not need to be retrieved for later use. For example, 'Windows (?=95\|98\|NT\|2000)' can match Windows in Windows 2000, but not Windows in Windows 3.1. Pre-checking does not consume characters, that is, after a match occurs, the next match's search begins immediately after the last match, rather than after the characters containing the pre-checking.
(?! pattern )	Negative lookahead matches the search string at any point where a string not matching pattern . This is a non-get match, that is, the match does not need to be retrieved for later use. For example, 'Windows (?!95\|98\|NT\|2000)' can match Windows in Windows 3.1, but not Windows in Windows 2000. Pre-checking does not consume characters, that is, after a match occurs, the next match search begins immediately after the last match, rather than after the characters containing the pre-checking
x \| y	Match x or y . For example, 'z\|food' can match z or food. '(z\|f)ood' matches zood or food.
[ xyz ]	Character collection. Match any character contained. For example, '[abc]' can match 'a' in plain .
[^ xyz ]	A collection of negative values characters. Match any characters not included. For example, '[^abc]' can match 'p' in plain .
[ az ]	Character range. Match any character in the specified range. For example, '[az]' can match any lowercase alphabetical characters in the range 'a' to 'z'.
[^ az ]	Negative value character range. Match any arbitrary characters that are not within the specified range. For example, '[^az]' can match any arbitrary characters that are not in the range 'a' to 'z'.
/b	Match a word boundary, which means the position between the word and space. For example, 'er/b' can match 'er' in never, but not 'er' in verb.
/B	Match non-word boundaries. 'er/B' can match 'er' in verb, but cannot match 'er' in never.
/c x	Matches the control characters specified by x . For example, /cM matches a Control-M or carriage return. The value of x must be one of AZ or az. Otherwise, treat c as an original 'c' character.
/d	Match a numeric character. Equivalent to [0-9].
/D	Match a non-numeric character. Equivalent to [^0-9].
/f	Match a page break. Equivalent to /x0c and /cL.
/n	Match a newline character. Equivalent to /x0a and /cJ.
/r	Match a carriage return character. Equivalent to /x0d and /cM.
/s	Match any whitespace characters, including spaces, tabs, page breaks, etc. Equivalent to [/f/n/r/t/v].
/S	Match any non-whitespace characters. Equivalent to [^/f/n/r/t/v].
/t	Match a tab character. Equivalent to /x09 and /cI.
/v	Match a vertical tab. Equivalent to /x0b and /cK.
/w	Match any word character that includes an underscore. Equivalent to '[A-Za-z0-9_]'.
/W	Match any non-word characters. Equivalent to '[^A-Za-z0-9_]'.
/x n	Match n , where n is a hexadecimal escape value. The hexadecimal escape value must be the length of two numbers that are determined. For example, '/x41' matches A. '/x041' is equivalent to '/x04' & 1. ASCII encoding can be used in regular expressions. .
/ num	Match num , where num is a positive integer. Reference to the obtained match. For example, '(.)/1' matches two consecutive identical characters.
/ n	Identifies an octal escape value or a backward reference. If at least n obtained subexpressions before / n , n is a backward reference. Otherwise, if n is an octal number (0-7), n is an octal escape value.
/ nm	Identifies an octal escape value or a backward reference. If at least nm obtain subexpressions are preceded by at least nm before / nm , nm is a backward reference. If there are at least n fetches before / nm , n is a backward reference followed by the literal m . If none of the previous conditions are satisfied, if both n and m are octal numbers (0-7), then / nm will match the octal escape value nm .
/ nml	If n is an octal number (0-3), and m and l are both octal numbers (0-7), the octal escape value nml is matched.
/u n	Match n , where n is a Unicode character represented by four hexadecimal digits. For example, /u00A9 matches the copyright symbol (?).

The order of priority of regular expressions

After constructing a regular expression, you can evaluate like a mathematical expression, that is, you can evaluate from left to right and in a priority order.

The following table lists the priority order of various regular expression operators from the highest priority to the lowest priority:

Operator	describe
/	Escape symbol
(), (?:), (?=), []	Braces and square brackets
*, +, ?, {n}, {n,}, {n,m}	Qualifier
^, $, / anymetacharacter	Position and order
\|	Or operate