A regular expression is a literal pattern composed of ordinary characters (such as characters a to z) and special characters (called metacharacters ). This pattern describes one or more strings to be matched when searching for a text body. The regular expression acts as a template to match a character pattern with the searched string.
Here are some examples of regular expressions you might encounter:
| JScript | VBScript | match |
|---|---|---|
| /^/[ /t]*$/ | ^/[ /t]*$ | Match a blank line. |
| //d{2}-/d{5}/ | /d{2}-/d{5} | Verify that an ID number consists of a 2-digit number, a hyphen, and a 5-digit number. |
| /<(.*)>.*<///1>/ | <(.*)>.*<///1> | Match an HTML tag. |
The following table is a complete list of metacharacters and their behavior in the context of regular expressions:
| character | describe |
|---|---|
| / | Mark the next character as a special character, or an primitive character, or a backward reference, or an octal escape character. For example, 'n' matches the character n. '/n' matches a newline character. The sequence '//' matches / and /( matches (. |
| ^ | Matches the start position of the input string. If the Multiline property of the RegExp object is set, ^ also matches the position after '/n' or '/r'. |
| $ | Matches the end position of the input string. If the Multiline property of the RegExp object is set, $ also matches the position before '/n' or '/r'. |
| * | Matches the previous subexpression zero or multiple times. For example, zo* can match z and zoo. * is equivalent to {0,}. |
| + | Matches the previous subexpression once or more times. For example, 'zo+' can match zo and zoo, but not z. + is equivalent to {1,}. |
| ? | Matches the previous subexpression zero or once. For example, do(es)? can match do or do in do. ? is equivalent to {0,1}. |
| { n } | n is a non-negative integer. Match the n times that are determined. For example, 'o{2}' cannot match 'o' in Bob, but can match two os in food. |
| { n ,} | n is a non-negative integer. Match at least n times. For example, 'o{2,}' cannot match 'o' in Bob, but can match all os in Foooood. 'o{1,}' is equivalent to 'o+'. 'o{0,}' is equivalent to 'o*'. |
| { n , m } | Both m and n are non-negative integers, where n <= m . Match at least n times and match up to m times. Liu, o{1,3} will match the first three os in fooooood. 'o{0,1}' is equivalent to 'o?'. Please note that there cannot be spaces between commas and two numbers. |
| ? | The matching pattern is non-greedy when the character is immediately followed by any other restriction character (*, +, ?, { n }, { n ,}, { n , m }). The non-greedy pattern matches as few strings as possible, while the default greedy pattern matches as many strings as possible. For example, for the string oooo, 'o+?' will match a single o, and 'o+' will match all 'o'. |
| . | Match any single character except /n. To match any characters including '/n', use a pattern like '[./n]'. |
| ( pattern ) | Match pattern and get this match. The obtained matches can be obtained from the generated Matches collection, using the SubMatches collection in VBScript, and using the $0 … $9 attribute in JScript. To match parentheses characters, use '/(' or '/)'. |
| (?: pattern ) | Match pattern but do not get the matching result, that is, this is a non-get match and is not stored for future use. This is useful when using or characters (|) to combine various parts of a pattern. For example, 'industr(?:y|ies) is a simpler expression than 'industry|industries'. |
| (?= pattern ) | Forward pre-check, match the lookup string at the beginning of any string matching pattern . This is a non-get match, that is, the match does not need to be retrieved for later use. For example, 'Windows (?=95|98|NT|2000)' can match Windows in Windows 2000, but not Windows in Windows 3.1. Pre-checking does not consume characters, that is, after a match occurs, the next match's search begins immediately after the last match, rather than after the characters containing the pre-checking. |
| (?! pattern ) | Negative lookahead matches the search string at any point where a string not matching pattern . This is a non-get match, that is, the match does not need to be retrieved for later use. For example, 'Windows (?!95|98|NT|2000)' can match Windows in Windows 3.1, but not Windows in Windows 2000. Pre-checking does not consume characters, that is, after a match occurs, the next match search begins immediately after the last match, rather than after the characters containing the pre-checking |
| x | y | Match x or y . For example, 'z|food' can match z or food. '(z|f)ood' matches zood or food. |
| [ xyz ] | Character collection. Match any character contained. For example, '[abc]' can match 'a' in plain . |
| [^ xyz ] | A collection of negative values characters. Match any characters not included. For example, '[^abc]' can match 'p' in plain . |
| [ az ] | Character range. Match any character in the specified range. For example, '[az]' can match any lowercase alphabetical characters in the range 'a' to 'z'. |
| [^ az ] | Negative value character range. Match any arbitrary characters that are not within the specified range. For example, '[^az]' can match any arbitrary characters that are not in the range 'a' to 'z'. |
| /b | Match a word boundary, which means the position between the word and space. For example, 'er/b' can match 'er' in never, but not 'er' in verb. |
| /B | Match non-word boundaries. 'er/B' can match 'er' in verb, but cannot match 'er' in never. |
| /c x | Matches the control characters specified by x . For example, /cM matches a Control-M or carriage return. The value of x must be one of AZ or az. Otherwise, treat c as an original 'c' character. |
| /d | Match a numeric character. Equivalent to [0-9]. |
| /D | Match a non-numeric character. Equivalent to [^0-9]. |
| /f | Match a page break. Equivalent to /x0c and /cL. |
| /n | Match a newline character. Equivalent to /x0a and /cJ. |
| /r | Match a carriage return character. Equivalent to /x0d and /cM. |
| /s | Match any whitespace characters, including spaces, tabs, page breaks, etc. Equivalent to [/f/n/r/t/v]. |
| /S | Match any non-whitespace characters. Equivalent to [^/f/n/r/t/v]. |
| /t | Match a tab character. Equivalent to /x09 and /cI. |
| /v | Match a vertical tab. Equivalent to /x0b and /cK. |
| /w | Match any word character that includes an underscore. Equivalent to '[A-Za-z0-9_]'. |
| /W | Match any non-word characters. Equivalent to '[^A-Za-z0-9_]'. |
| /x n | Match n , where n is a hexadecimal escape value. The hexadecimal escape value must be the length of two numbers that are determined. For example, '/x41' matches A. '/x041' is equivalent to '/x04' & 1. ASCII encoding can be used in regular expressions. . |
| / num | Match num , where num is a positive integer. Reference to the obtained match. For example, '(.)/1' matches two consecutive identical characters. |
| / n | Identifies an octal escape value or a backward reference. If at least n obtained subexpressions before / n , n is a backward reference. Otherwise, if n is an octal number (0-7), n is an octal escape value. |
| / nm | Identifies an octal escape value or a backward reference. If at least nm obtain subexpressions are preceded by at least nm before / nm , nm is a backward reference. If there are at least n fetches before / nm , n is a backward reference followed by the literal m . If none of the previous conditions are satisfied, if both n and m are octal numbers (0-7), then / nm will match the octal escape value nm . |
| / nml | If n is an octal number (0-3), and m and l are both octal numbers (0-7), the octal escape value nml is matched. |
| /u n | Match n , where n is a Unicode character represented by four hexadecimal digits. For example, /u00A9 matches the copyright symbol (?). |
After constructing a regular expression, you can evaluate like a mathematical expression, that is, you can evaluate from left to right and in a priority order.
The following table lists the priority order of various regular expression operators from the highest priority to the lowest priority:
| Operator | describe |
|---|---|
| / | Escape symbol |
| (), (?:), (?=), [] | Braces and square brackets |
| *, +, ?, {n}, {n,}, {n,m} | Qualifier |
| ^, $, / anymetacharacter | Position and order |
| | | Or operate |