Creation of RegExp object:
The creation of regular regular expressions can be done directly, i.e. characters enclosed by slash "/". However, in an environment where parameter changes are required, the RegExp() constructor is a better choice:
var reg1 = /'/w+'/g;
var reg2 = new RegExp('/'//w+/'','g');
Comparing the two creation methods, the first parameter in RegExp is the regular string to be created. On the one hand, it is not a direct representation of quantity, so it is not enclosed with a slash "/"; instead, the quotation mark "'" and escape symbol "/" must be quadratic escaped in the string.
In addition, whether it is the direct quantity or the RegExp() constructor, a new RegExp object is generated and assigned to a variable.
Similarities and differences between match() and exec():
match and exec are common ways to match strings for regular expressions. The functions implemented by the two are similar, with some subtle differences:
1. How to use
match is a method of wrapping objects with strings, usage: String.match(RegExp);
exec is a method of a regular expression object, usage: RegExp.exec(String);
2. Returned results
When RegExp does not set the global flag "g":
The return results of both are the same. That is, when there is no matching value, it returns null, and when there is a matching value, it returns an array (let's array). array[0] is a matching string, array[1], array[2]... corresponds to substrings matching parentheses in regular expressions. At the same time, the array has two properties. array.index represents the initial position of the matching string, and array.input represents the string being retrieved.
When RegExp has the global flag "g" set:
match returns an array array when there is a value. Each item in the array represents all the strings that match, so there are no more substrings matching parentheses. At this time, the array has no index attribute and input attribute.
exec is no different from the performance without the global label "g". The returned array array, array[0] is the current matching string, array[1], array[2]... is the current matching string. At this time, pay attention to the lastIndex property of the RegExp object, which represents the last position at the end of the matching string in the original string. When there is no further matching result, the lastIndex property is set to 0. Therefore, you can use the lastIndex loop to find all matching strings.
Support multiple matching methods:
js code
var testStr = "now test001 test002"; var re = /test(/d+)/ig; var r = ""; while(r = re.exec(testStr)) { alert(r[0] + " " + r[1]); }In addition, you can also use testStr.match(re), but in this way, there is no option to g, and you can only get the first match.
1. Regular expression rules
1.1 Normal characters
Letters, numbers, Chinese characters, underscores, and punctuation marks that are not specifically defined in the following chapters are all "ordinary characters". When a normal character in an expression matches a string, it matches the same character as it.
Example 1: When the expression "c", when matching the string "abcde", the matching result is: success; the matching content is: "c"; the matching position is: start at 2 and end at 3. (Note: The subscript starts from 0 or 1, which may vary depending on the current programming language)
Example 2: When the expression "bcd", when matching the string "abcde", the matching result is: successful; the matching content is: "bcd"; the matching position is: start at 1 and end at 4.
1.2 Simple escape characters
For some characters that are inconvenient to write, use the method of adding "/" to the front. We are already familiar with these characters.
expression | Matchable |
/r, /n | Represents carriage return and line break |
/t | Tab symbols |
// | Represents "/" itself |
There are other punctuation marks that are special for use in the following chapters. After adding "/" to the front, they represent the symbol itself. For example: ^ and $ have special meanings. If you want to match the "^" and "$" characters in a string, the expression needs to be written as "/^" and "/$".
expression | Matchable |
/^ | Match ^ symbol itself |
/$ | Match the $ symbol itself |
/. | Match the decimal point (.) itself |
The matching method of these escaped characters is similar to "normal characters". It also matches the same character.
Example 1: When the expression "/$d", when matching the string "abc$de", the matching result is: success; the matching content is: "$d"; the matching position is: start at 3 and end at 5.
1.3 Expressions that can match 'multiple characters'
Some representation methods in regular expressions can match any of the 'multiple characters'. For example, the expression "/d" can match any number. Although it can match any character in it, it can only be one, not multiple. This is like when playing poker cards, the big and small kings can replace any card, but they can only replace one card.
expression | Matchable |
/d | Any number, any one of 0~9 |
/w | Any letter, number or underscore, that is, any one of A~Z, a~z, 0~9,_ |
/s | Any of the whitespace characters including spaces, tabs, page breaks, etc. |
. | The decimal point can match any character except the newline character (/n). |
Example 1: When the expression "/d/d", when matching "abc123", the match result is: success; the matched content is: "12"; the matched position is: start at 3 and end at 5.
Example 2: When the expression "a./d", when matching "aaa100", the match result is: success; the matched content is: "aa1"; the matched position is: start at 1 and end at 4.
1.4 Customize expressions that can match 'multiple characters'
Use square brackets [ ] to contain a series of characters that can match any of them. If you use [^ ] to contain a series of characters, you can match any character other than the characters. By the same token, although it can match any one of them, it can only be one, not multiple.
expression | Matchable |
[ab5@] | Match "a" or "b" or "5" or "@" |
[^abc] | Match any character other than "a", "b", "c" |
[fk] | Match any letter between "f"~"k" |
[^A-F0-3] | Match any character other than "A"~"F","0"~"3" |
Example 1: When the expression "[bcd][bcd]" matches "abc123", the match result is: success; the matched content is: "bc"; the matched position is: start at 1 and end at 3.
Example 2: When the expression "[^abc]" matches "abc123", the match result is: success; the matched content is: "1"; the matched position is: start at 3 and end at 4.
1.5 Special symbols that modify the number of matches
The expressions mentioned in the previous chapter can only match one character or any one of multiple characters. If you use an expression plus a special symbol that modifies the number of matches, you can repeat the match without repeatedly writing the expression.
The method of using it is: "Number of modification" is placed after the "Modified Expression". For example: "[bcd][bcd]" can be written as "[bcd]{2}".
expression | effect |
{n} | The expression is repeated n times, for example: "/w{2}" is equivalent to "/w/w"; "a{5}" is equivalent to "aaaaa" |
{m,n} | The expression is repeated at least m times, and at most n times. For example: "ba{1,3}" can match "ba" or "baa" or "baaa" |
{m,} | The expression is repeated at least m times, for example: "/w/d{2,}" can match "a12","_456","M12344"... |
? | Match expression 0 or 1, which is equivalent to {0,1}, for example: "a[cd]?" can match "a", "ac", "ad" |
+ | The expression appears at least once, which is equivalent to {1,}. For example: "a+b" can match "ab", "aab", "aaab"... |
* | The expression does not appear or occurs any time, which is equivalent to {0,}. For example: "/^*b" can match "b", "^^^b"... |
Example 1: When the expression "/d+/.?/d*" is matched with "It costs $12.5", the match result is: successful; the matched content is: "12.5"; the matched position is: start at 10 and end at 14.
Example 2: When the expression "go{2,8}gle" is matched with "Ads by gooooogle", the match result is: success; the matched content is: "gooooogle"; the matched position is: start at 7 and end at 17.
1.6 Some other special symbols representing abstract meaning
Some symbols represent the special meaning of abstraction in expressions:
expression | effect |
^ | Matches the place where the string starts, not any characters |
$ | Matches the end of the string, not matching any characters |
/b | Match a word boundary, that is, the position between the word and the space, and does not match any characters |
Further text descriptions are still relatively abstract, so give examples to help everyone understand.
Example 1: When the expression "^aaa" matches "xxx aaa xxx", the matching result is: failed. Because "^" requires matching where the string begins, "^aaa" can match only when "aaa" is at the beginning of the string, for example: "aaa xxx xxx".
Example 2: When the expression "aaa$" is matched with "xxx aaa xxx", the matching result is: failed. Because "$" requires matching the end of the string, "aaa$" can match only when "aaa" is at the end of the string, for example: "xxx xxx aaa".
Example 3: When the expression "./b." is matched with "@@@abc", the matching result is: successful; the matching content is: "@a"; the matching position is: start at 2 and end at 4.
Further explanation: "/b" is similar to "^" and "$", which does not match any characters themselves, but it requires that it be on the left and right sides of the position in the matching result, one side is the "/w" range and the other side is the non-"/w" range.
Example 4: When the expression "/bend/b" matches "weekend,endfor,end", the matching result is: successful; the matching content is: "end"; the matching position is: start at 15 and end at 18.
Some symbols can affect the relationship between subexpressions inside an expression:
expression | effect |
| | The "or" relationship between the expressions on the left and right sides, matching the left or right sides |
( ) | (1). When the number of matches is modified, the expression in brackets can be modified as a whole. (2). When taking the matching result, the content matched by the expression in brackets can be obtained separately |
Example 5: When the expression "Tom|Jack" matches the string "I'm Tom, he is Jack", the matching result is: successful; the matching content is: "Tom"; the matching position is: start at 4 and end at 7. When matching the next one, the matching result is: successful; the matching content is: "Jack"; when matching position: starts at 15 and ends at 19.
Example 6: Expression "(go/s*)+" When matching "Let's go go go!", the matching result is: successful; the matching content is: "go go go go"; the matching position is: start at 6 and end at 14.
Example 7: When the expression "¥(/d+/.?/d*)" is matched with "¥10.9,¥20.5", the match result is: successful; the matched content is: "¥20.5"; the matched position is: start at 6 and end at 10. The content that gets the bracket range matches separately is: "20.5".
2. Some advanced rules in regular expressions
2.1 Greed and non-greedy in match times
When using special symbols that modify the number of matches, there are several representation methods that enable the same expression to match different times, such as: "{m,n}", "{m,}", "?", "?", "*", "+". The specific number of matches depends on the matching string. This expression with an uncertain number of repeated matches always matches as many times as possible during the matching process. For example, for the text "dxxxdxxxd", the following is:
expression | Match results |
(d)(/w+) | "/w+" will match all characters after the first "d" "xxxdxxxd" |
(d)(/w+)(d) | "/w+" will match all characters "xxxdxxx" between the first "d" and the last "d". Although "/w+" can also match the last "d", in order to make the entire expression match successfully, "/w+" can "get out" the last "d" it could match. |
It can be seen that when "/w+" matches, it always matches as many characters that meet its rules as possible. Although in the second example, it does not match the last "d", it is also to make the entire expression match successfully. Similarly, expressions with "*" and "{m,n}" match as many as possible, and expressions with "?" are also "matched" as much as possible when they can match or mismatch. This matching principle is called the "greed" pattern.
Non-greedy mode:
Adding a "?" sign after modifying the special symbol of the number of matches can make expressions with varying number of matches matches as few as possible, so that expressions that can match or mismatch can be "matched" as much as possible. This matching principle is called the "non-greedy" pattern, also called the "barb" pattern. If there are few matches, it will cause the entire expression to fail. Similar to greedy mode, the non-greedy mode will match to the minimum extent to make the entire expression match successfully. For example, for the text "dxxxdxxxd":
expression | Match results |
(d)(/w+?) | "/w+?" will match as few characters after the first "d" as possible, and the result is: "/w+?" only matches one "x" |
(d)(/w+?)(d) | In order for the entire expression to match successfully, "/w+?" has to match "xxx" to make the subsequent "d" match, so that the entire expression to match successfully. So the result is: "/w+?" matches "xxx" |
For more cases, please refer to the following:
Example 1: When the expression "<td>(.*)</td>" matches the string "<td><p>aa</p></td> <td><p>bb</p></td>", the matching result is: success; the match is "<td><p>aa</p></td>" the entire string, and the "</td>" in the expression will match the last "</td>" in the string.
Example 2: In contrast, when the expression "<td>(.*?)</td>" matches the same string in Example 1, you will only get "<td><p>aa</p></td>". When you match the next one again, you will get the second "<td><p>bb</p></td>".
2.2 Backreferences/1, /2...
When an expression matches, the expression engine records the string matched by the expression contained in the bracket "( )". When getting the matching result, the string matched by the expression contained in the brackets can be obtained separately. This has been demonstrated many times in the previous examples. In practical applications, when a certain boundary is used to search, and the content to be retrieved does not contain boundaries, brackets must be used to specify the desired range. For example, the previous "<td>(.*?)</td>".
In fact, "the string matched by the expression contained in brackets" can not only be used after the match is over, but also during the matching process. The part after the expression can refer to the previous "sub-match string that has been matched" in parentheses. The reference method is to add a number "/". "/1" refers to the string matched in the first pair of brackets, "/2" refers to the string matched in the second pair of brackets... and so on, if one pair of brackets contains another pair of brackets, the outer brackets are sorted first. In other words, if the pair has the left bracket "(" before, then the pair will be sorted first.
As an example:
Example 1: The expression "('|")(.*?)(/1)" When matching " 'Hello', "World" ", the match result is: successful; the matched content is: " 'Hello' ". When matching the next one again, you can match " "World" ".
Example 2: When the expression "(/w)/1{4,}" is matched with "aa bbbb abcdefg ccccc 111121111 999999999999", the match result is: successful; the matched content is "cccccc". When you match the next one again, you will get 999999999. This expression requires characters in the range "/w" to be repeated at least 5 times, paying attention to the difference from "/w{5,}".
Example 3: The expression "<(/w+)/s*(/w+(=('|").*?/4)?/s*)*>.*?<//1>" matches "<td id='td1' style="bgcolor:white"></td>" . If "<td>" does not pair with "</td>", the match will fail; if it is changed to another pair, the match can also be successful.
2.3 Pre-search, mismatch; reverse pre-search, mismatch
In the previous chapter, I talked about several special symbols representing abstract meanings: "^", "$", "/b". They all have one thing in common: they themselves do not match any characters, but only attach a condition to "two ends of a string" or "slits between characters". After understanding this concept, this section will continue to introduce another more flexible representation method that adds conditions to "two ends" or "slits".
Forward pre-search: "(?=xxxxx)", "(?!xxxxx)"
Format: "(?=xxxxx)", in the matching string, the condition attached to the "slit" or "two ends" is that the right side of the gap must be able to match the expression of the xxxxx part. Because it is just an additional condition on this gap, it does not affect the subsequent expression to truly match the characters after this gap. This is similar to "/b", which does not match any characters themselves. "/b" just takes the characters before and after the gap and makes a judgment, and will not affect the expressions behind to truly match.
Example 1: When the expression "Windows (?=NT|XP)" is matched with "Windows 98, Windows NT, Windows 2000", it will only match "Windows" in "Windows NT", and other words "Windows" will not be matched.
Example 2: Expression "(/w)((?=/1/1/1)(/1))+" When matching the string "aaa ffffff 999999999", it will be able to match the first 4 of 6 "f" and the first 7 of 9 "9". This expression can be read and interpreted as: repeating the alphanumeric number more than 4 times will match the part before the last 2 digits. Of course, this expression can not be written like this, and the purpose here is for demonstration purposes.
Format: "(?!xxxxx)", the right side of the gap must not match the expression xxxxx.
Example 3: Expression "((?!/bstop/b).)+" When matching "fdjka ljfdl stop fjdsla fdj", it will be matched from the beginning to the position before "stop". If there is no "stop" in the string, the entire string will be matched.
Example 4: The expression "do(?!/w)" can only match "do" when matching the string "done, do, dog". In this article, the effect of using "do" after "(?!/w)" and using "/b" is the same.
Reverse pre-search: "(?<=xxxxx)", "(?<!xxxxx)"
The concepts of these two formats are similar to forward pre-search. The requirements for reverse pre-search are: the "left side" of the gap. The two formats require that they must be able to match and must not be able to match the specified expression, rather than to judge the right side. Like "forward pre-search": they are both an additional condition to the gap in which they are located and do not match any characters themselves.
Example 5: The expression "(?<=/d{4})/d+(?=/d{4})" When matching "1234567890123456", the middle 8 numbers except the first 4 numbers and the last 4 numbers will be matched. Since JScript.RegExp does not support reverse pre-search, this article cannot be demonstrated. Many other engines can support reverse pre-search, such as: java.util.regex package above Java 1.4, the System.Text.RegularExpressions namespace in .NET, and the simplest and easiest DEELX regular engine recommended by this site.
3. Other common rules
There are also some rules that are more common among various regular expression engines, which were not mentioned in the previous explanation.
3.1 In expressions, "/xXX" and "/uXXXX" can be used to represent a character ("X" means a hexadecimal number)
form | Character range |
/xXX | Characters with numbers in the range 0 to 255, such as: spaces can be represented by "/x20" |
/uXXXXX | Any character can be represented by "/u" plus its number of 4-digit hexadecimal numbers, such as "/medium" |
3.2 While the expressions "/s", "/d", "/w", "/b" represent special meanings, the corresponding capital letters represent opposite meanings
expression | Matchable |
/S | Match all non-whitespace characters ("/s" can match individual whitespace characters) |
/D | Match all non-numeric characters |
/W | Match all characters other than letters, numbers, and underscores |
/B | Match non-word boundaries, that is, character gaps when both sides are "/w" in the left and right sides are not "/w" in the left and right sides are not "/w" in the left and right sides are not |
3.3 There is a special meaning in the expression, and it is necessary to add "/" to match the character summary of the character itself.
character | illustrate |
^ | Matches the start position of the input string. To match the "^" character itself, use "/^" |
$ | Matches the end position of the input string. To match the "$" character itself, use "/$" |
( ) | Marks the start and end positions of a subexpression. To match brackets, use "/(" and "/)" |
[ ] | Use custom expressions that can match 'multiple characters'. To match brackets, use "/[" and "/]" |
{ } | Symbols that modify the number of matches. To match braces, use "/{" and "/}" |
. | Match any character except the newline (/n). To match the decimal point itself, use "/." |
? | The number of modification matches is 0 or 1. To match the "?" character itself, use "/?" |
+ | The number of modification matches is at least 1 time. To match the "+" character itself, use "/+" |
* | The number of modification matches is 0 or any. To match the "*" character itself, use "/*" |
| | The "or" relationship between the expressions on the left and right sides. Match "|" itself, please use "/|" |
3.4 Subexpression in brackets "( )". If you want the matching result not to be recorded for future use, you can use the "(?:xxxxx)" format
Example 1: When the expression "(?:(/w)/1)+" matches "a bbccdd efg", the result is "bbccdd". The matching result of the bracket "(?:)" range is not recorded, so "(/w)" is used to reference it.
3.5 Introduction to commonly used expression attribute settings: Ignorecase, Singleline, Multiline, Global
Expression properties | illustrate |
Ignorecase | By default, letters in expressions are case sensitive. Configure as Ignorecase makes case-insensitive when matching. Some expression engines extend the concept of "case" to the case of the UNICODE range. |
Singleline | By default, the decimal point "." matches characters other than line breaks (/n). Configure as Singleline to make the decimal point match all characters including line breaks. |
Multiline | By default, the expressions "^" and "$" only match the beginning ① and end ④ positions of the string. like: ①xxxxxxxx2/n ③xxxxxxxxxx④ Configure Multiline to make "^" match ①, and can also match the position ③ before the next line starts, so that "$" match ④, and can also match the position ② before the newline and ends with one line. |
Global | It mainly works when using expressions to replace, and is configured as Global to replace all matches. |
4. Other tips
4.1 If you want to understand that advanced regular engines also support complex regular syntax, please refer to the documentation of the DEELX regular engine on this site.
4.2 If you want to require the expression to match the entire string, instead of finding a part from the string, you can use "^" and "$" at the beginning and end of the expression, for example: "^/d+$" requires that the entire string only has numbers.
4.3 If the content required to match is a complete word and not a part of the word, then use "/b" at the beginning and end of the expression, for example: use "/b(if|while|else|void|int...)/b" to match keywords in the program.
4.4 The expression does not match an empty string. Otherwise, the match will be successful all the time, and nothing will be matched. For example: When you are preparing to write an expression that matches "123", "123.", "123.5", ".5", etc., integers, decimal points, and decimal numbers can be omitted, but do not write the expression as: "/d*/.?/d*", because if there is nothing, this expression can also match successfully. A better way to write it is: "/d+/.?/d*|/./d+".
4.5 Do not loop infinitely for sub-matches that can match empty strings. If each part of the subexpression in the brackets can match 0 times, and the brackets as a whole can match infinite times, then the situation may be more serious than what the previous article said, and the matching process may be a dead loop. Although some regular expression engines have now avoided the dead loop in this situation, such as .NET regular expressions, we should still try to avoid this situation. If we encounter a dead loop when writing expressions, we can also start from this point of view and find out whether this is the reason mentioned in this article.
4.6 Reasonably choose greedy mode and non-greedy mode, see the topic discussion.
4.7 or "|" on the left and right sides, it is best to match only one side of a certain character, so that the expressions on both sides of "|" will not differ due to the exchange position.
Next article-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1. Define regular expressions
1) There are two forms of defining regular expressions, one is the ordinary method and the other is the constructor method.
2) Normal method: var reg=/expression/additional parameters
Expression: A string representing a certain rule, where certain special characters can be used to represent special rules, which will be explained in detail later.
Additional parameters: used to extend the meaning of the expression, currently there are three main parameters:
g: means that global matching can be performed.
i: It means case insensitive matching.
m: means that multiple row matching can be performed.
The above three parameters can be combined arbitrarily to represent the compound meaning, and of course, there are no parameters added.
example:
var reg=/a*b/;
var reg=/abc+f/g;
3) Constructor method: var reg=new RegExp("Expression", "Add parameters");
The meaning of "expression" and "additional parameters" is the same as the definition method above.
example:
var reg=new RegExp("a*b");
var reg=new RegExp("abc+f","g");
4) The difference between ordinary methods and constructor methods
An expression in the normal way must be a constant string, while the expression in the constructor can be a constant string or a js variable, such as an expression parameter based on the user's input, etc.:
var reg=new RegExp(document.forms[0].exprfiled.value,"g");
2. Expression pattern
1) Expression pattern refers to the expression method and style of the expression, that is, how to describe the "expression" in var reg=/expression/additional parameters?
2) From a standardized perspective, expression patterns are divided into simple patterns and composite patterns.
3) Simple pattern: refers to a pattern expressed through the combination of ordinary characters, for example
var reg=/abc0d/;
It can be seen that simple patterns can only represent specific matches.
4) Compound pattern: refers to a pattern expressed with wildcard characters, for example:
var reg=/a+b?/w/;
The +, ? and /w are all wildcard characters and represent special meanings. Therefore, the composite pattern can express more abstract logic.
Let's focus on the meaning and use of each wildcard in the composite mode.
5) Explanation of special characters in compound mode:
1>/: It is used as an escape character in many programming languages. Generally speaking
If the / symbol is followed by the ordinary character c, then /c represents a special meaning. For example, n originally represents the character n, but /n represents a new line.
If the / symbol is followed by a special character c, then /c represents the ordinary character c. For example, / is generally used as an escape character, but // is used to tune the ordinary character /.
The usage of / in Javascript's regular expression is the same as above, but in different programming languages, special character tables may be different.
2>^: Match the beginning of the input string. If it is a multi-line match, that is, the additional parameters of the expression contain m, it will also be matched after a newline.
example:
/^B/Match the first B in "Bab Bc"
Example 2:
/^B/gm matching
“Badd B
cdaf
B dsfB”
The first line in the first B, the first B in the third line
3>$: Match the end of the input character creation. If it is a multi-line match, that is, the additional parameters of the expression contain m, it will also be matched before a newline.
Contrary to the usage of ^.
Example: /t$/ matches t in "bat", but does not match t in "hate"
Example 2: /t$/match
“tag at
bat”
The last t of the first line and the t of the second line.
4>*: Match the previous character 0 or more times.
Example: /ab*/Matches "abbbb" in "dddabbbbc", and also matches "a" in "ddda"
5>+: Match the previous character 1 or more times.
Example: /ab+/ matches "abbbb" in "dddabbbbc", but does not match "ddda"
Similar to the usage of the following {1,} (prototype: {n,})
6>?: The usage of? is quite special. Generally speaking, it is used to match the previous character 0 times or 1 time, but it has two other special uses:
If immediately following *, +, ? and { }, it means the minimum number of matches of the original match, for example:
/ba*/ originally matched "baaaa" in "bbbaaaa", but /ba*?/ matched "b" in "bbbaaaa" (because * means 0 or more matches, while plus? should mean the minimum number of matches, that is, 0 matches).
Similarly: /ba+?/ matches "ba" in "baaaa".
As a syntax structure symbol, it is used in pre-assert, that is, x(?=y) and x(?!=y) to be mentioned later.
7>.: The "." sign in the decimal point matches any single character, except for the newline character.
What are the characters in the standard in total? Please refer to: Character Set
For example: /ab/ matches "acb" in "acbaa", but does not match "abbb".
8>(x): means matching x (not specifically referring to the character x or specifically referring to a character, x represents a string), and the match will be remembered. In syntax, this kind of () is called "capturing parentses", that is, the brackets used for capturing.
Matches are remembered because in the functions provided by the expression, some functions return an array that holds all the matching strings, such as the exec() function.
Also note that the premise that x in () is remembered is to match x.
Example 1:
var regx=/a(b)c/;
var rs=regx.exec("abcddd");
As can be seen from the above, /a(b)c/ matches "abc" in "abcddd". Because of (), b will also record it, so the number content returned by rs is:
{abc,b}
Example 2:
var regx=/a(b)c/;
var rs=regx.exec("acbcddd");
rs returns null, because /a(b)c/ does not match "acbcddd", so b in () will not be recorded (although the string contains b)
9>(?:x): Match x, but don't remember x. () in this format is called "non-capturing parentses", that is, brackets for non-capturing.
example:
var regx=/a(?:b)c/;
var rs=regx.exec("abcddd");
As can be seen from the above, /a(?:b)c/ matches "abc" in "abcddd", because of (?:), b will not be recorded, so the number content returned by rs is:
{abc}
10>X(?=y): Match x, only if y is followed immediately. If the match matches, only x will be remembered and y will not be remembered.
example:
var regx=/user(?=name)/;
var rs=regx.exec("The username is Mary");
Result: The match is successful, and the value of rs is {user}
11>X(?!y): Match x, only if y is not followed immediately. If the match matches, only x will be remembered and y will not be remembered.
example:
var regx=/user(?!name)/;
var rs=regx.exec("The user name is Mary");
Result: The match is successful, and the value of rs is {user}
Example 2:
var regx=//d+(?!/.)/;
var rs=regx.exec("54.235");
Result: The matching result, the value of rs is {5}, and the mismatch is because 54 is followed by the "." sign. Of course, 235 also matches, but due to the behavior of the exec method, 235 will not be returned.
12>x|y: Match x or y. Note that if both x and y match, then remember only x.
example:
var regx=/beijing|shanghai/;
var rs=regx.exec("I love beijing and shanghai");
Result: The match is successful, the value of rs is {beijing}. Although shanghai also matches, it will not be remembered.
13>{n}: Match n occurrences of the previous character.
n must be a non-negative number, of course, if it is a negative number or a decimal number, there will be no syntax error.
example:
var regx=/ab{2}c/;
var rs=regx.exec("abbcd");
Result: The match is successful, and the value of rs is: {abbc}.
14>{n,}: Match at least n occurrences of the previous character.
example:
var regx=/ab{2,}c/;
var rs=regx.exec("abbcdabbbc");
Result: The match is successful, and the value of rs is: {abbc}. Note why abbBC meets the conditions and is not remembered. This is related to the behavior of the exec method, which will be explained in a unified manner later.
15>{n,m}: Match at least n times and up to m occurrences of the previous character.
As long as n and m are numbers and m>=n will not report syntax errors.
example:
var regx=/ab{2,5}c/;
var rs=regx.exec("abbbcd");
Result: The match is successful, the value of rs is: {abbbc}.
Example 2:
var regx=/ab{2,2}c/;
var rs=regx.exec("abbcd");
Result: The match is successful, and the value of rs is: {abbc}.
Example 3:
var regx=/ab(2,5)/;
var rs=regx.exec("abbbbbbbbbbbbb");
Result: The match is successful, and the value of rs is: {abbbbb}, which means that if the previous character appears more than m times, it will only match m times. in addition:
var regx=/ab(2,5)c/;
var rs=regx.exec("abbbbbbbbbbbc");
Result: The match fails, the value of rs is: null. Why does the match fail? Because there are more than 5 b, then b(2,5) will match the first 5 b, while b is followed by b in expression /ab(2,5) c/, but b is still b after 5 b in the string, so an error will be reported.
16>[xyz]: xyz represents a string, and the pattern represents a character in []. Formally [xyz] is equivalent to [xz].
example:
var regx=/a[bc]d/;
var rs=regx.exec("abddgg");
Result: The match is successful, the value of rs is: {abd}
Example 2:
var regx=/a[bc]d/;
var rs=regx.exec("abcd");
Result: The match failed, the value of rs is: null. The reason for failure is that [bc] means matching one of b or c, but it will not match at the same time.
17>[^xyz]: This pattern indicates that a character is matched in non[], and in form [^xyz] is equivalent to [^xz].
example:
var regx=/a[^bc]d/;
var rs=regx.exec("afddgg");
Result: The match is successful, the value of rs is: {afd}
Example 2:
var regx=/a[^bc]d/;
var rs=regx.exec("abd");
Result: The match failed, the value of rs is:.
18>[/b]: Match backspace.
19>/b: Match the boundary characters of a word, such as spaces and line breaks, etc. Of course, when matching line breaks, the expression should be appended with parameter m.
example:
var regx=//bc./;
var rs=regx.exec("Beijing is a beautiful city");
Result: The match is successful, the value of rs is: {ci}. Note that the spaces before c will not match the result, that is, {ci} is incorrect.
20>/B: Represents a non-word boundary.
example:
var regx=//Bi./;
var rs=regx.exec("Beijing is a beautiful city");
Result: The match is successful, and the value of rs is: {ij}, which means that the ij in Beijing is matched.
21>/cX, match a control character. For example, /cM matches a Control-M or
Carriage return symbol. The value of x must be one of AZ or az. Otherwise, treat c as one
A 'c' character in original meaning. (Practical examples need to be added)
21>/d: Match a numeric character, equivalent to [0-9].
example:
var regx=/user/d/;
var rs=regx.exec("user1");
Result: The match is successful, the value of rs is: {user1}
22>/D: Matching a non-numeric character, equivalent to [^0-9].
example:
var regx=/user/D/;
var rs=regx.exec("userA");
Result: The match is successful, the value of rs is: {userA}
23>/f: Match a page breaker.
24>/n: Match a newline character. Because it is a newline character, the m parameter should be added to the expression.
example:
var regx=/a/nbc/m;
var str="a
bc”;
var rs=regx.exec(str);
结果:匹配成功,rs的值为:{ },如果表达式为/a/n/rbc/,则不会被匹配,因此在一般的编辑器中一个”Enter”键代表着“回车换行”,而非“换行回车”,至少在textarea域中是这样的。
25>/r:匹配一个回车符
26>/s:匹配一个空格符,等同于[ /f/n/r/t/v/u00A0/u2028/u2029].
example:
var regx=//si/;
var rs=regx.exec(“Beijing is a city”);
结果:匹配成功,rs的值为:{ i}
27>/S:匹配一个非空格符,等同于[ ^/f/n/r/t/v/u00A0/u2028/u2029].
example:
var regx=//Si/;
var rs=regx.exec(“Beijing is a city”);
结果:匹配成功,rs的值为:{ei}
28>/t:匹配一个tab
example:
var regx=/a/tb/;
var rs=regx.exec(“a bc”);
结果:匹配成功,rs的值为: {a bc}
29>/v:匹配一个竖向的tab
30>/w:匹配一个数字、_或字母表字符,即[A-Za-z0-9_ ]。
example:
var regx=//w/;
var rs=regx.exec(“$25.23”);
结果:匹配成功,rs的值为:{2}
31>/W:匹配一个非数字、_或字母表字符,即[^A-Za-z0-9_ ]。
example:
var regx=//w/;
var rs=regx.exec(“$25.23”);
结果:匹配成功,rs的值为:{$}
32>/n:注意不是/n,这里n是一个正整数,表示匹配第n个()中的字符。
example:
var regx=/user([,-])group/1role/;
var rs=regx.exec(“user-group-role”);
结果:匹配成功,rs的值为:{user-group-role,-},同样对user,group,role的匹配也是成功的,但像user-group,role等就不对了。
33>/0:匹配一个NUL字符。
34>/xhh:匹配一个由两位16进制数字所表达的字符。
35>/uhhhh:匹配一个由四位16进制数字所表达的字符。
3,表达式操作
1)表达式操作,在这里是指和表达式相关的方法,我们将介绍六个方法。
2)表达式对象(RegExp)方法:
1>exec(str),返回str中与表达式相匹配的第一个字符串,而且以数组的形式表现,当然如果表达式中含有捕捉用的小括号,则返回的数组中也可能含有()中的匹配字符串,例如:
var regx=//d+/;
var rs=regx.exec(“3432ddf53”);
返回的rs值为:{3432}
var regx2=new RegExp(“ab(/d+)c”);
var rs2=regx2.exec(“ab234c44”);
返回的rs值为:{ab234c,234}
另外,如果有多个合适的匹配,则第一次执行exec返回一个第一个匹配,此时继续执行exec,则依次返回第二个第三个匹配。 For example:
var regx=/user/d/g;
var rs=regx.exec(“ddduser1dsfuser2dd”);
var rs1=regx.exec(“ddduser1dsfuser2dd”);
则rs的值为{user1},rs的值为{rs2},当然注意regx中的g参数是必须的,否则无论exec执行多少次,都返回第一个匹配。后面还有相关内容涉及到对此想象的解释。
2>test(str),判断字符串str是否匹配表达式,返回一个布尔值。 For example:
var regx=/user/d+/g;
var flag=regx.test(“user12dd”);
flag的值为true。
3)String对象方法
1>match(expr),返回与expr相匹配的一个字符串数组,如果没有加参数g,则返回第一个匹配,加入参数g则返回所有的匹配
example:
var regx=/user/d/g;
var str=“user13userddduser345”;
var rs=str.match(regx);
rs的值为:{user1,user3}
2>search(expr),返回字符串中与expr相匹配的第一个匹配的index值。
example:
var regx=/user/d/g;
var str=“user13userddduser345”;
var rs=str.search(regx);
rs的值为:0
3>replace(expr,str),将字符串中匹配expr的部分替换为str。另外在replace方法中,str中可以含有一种变量符号$,格式为$n,代表匹配中被记住的第n的匹配字符串(注意小括号可以记忆匹配)。
example:
var regx=/user/d/g;
var str=“user13userddduser345”;
var rs=str.replace(regx,”00”);
rs的值为:003userddd0045
例子2:
var regx=/u(se)r/d/g;
var str=“user13userddduser345”;
var rs=str.replace(regx,”$1”);
rs的值为:se3userdddse45
对于replace(expr,str)方法还要特别注意一点,如果expr是一个表达式对象则会进行全局替换(此时表达式必须附加参数g,否则也只是替换第一个匹配),如果expr是一个字符串对象,则只会替换第一个匹配的部分,例如:
var regx=“user”
var str=“user13userddduser345”;
var rs=str.replace(regx,”00”);
rs的值为: 0013userddduser345
4>split(expr),将字符串以匹配expr的部分做分割,返回一个数组,而且表达式是否附加参数g都没有关系,结果是一样的。
example:
var regx=/user/d/g;
var str=“user13userddduser345”;
var rs=str.split(regx);
rs的值为:{3userddd,45}
4,表达式相关属性
1)表达式相关属性,是指和表达式相关的属性,如下面的形式:
var regx=/myexpr/;
var rs=regx.exec(str);
其中,和表达式自身regx相关的属性有两个,和表达式匹配结果rs相关的属性有三个,下面将逐一介绍。
2)和表达式自身相关的两个属性:
1>lastIndex,返回开始下一个匹配的位置,注意必须是全局匹配(表达式中带有g参数)时,lastIndex才会有不断返回下一个匹配值,否则该值为总是返回第一个下一个匹配位置,例如:
var regx=/user/d/;
var rs=regx.exec(“sdsfuser1dfsfuser2”);
var lastIndex1=regx.lastIndex;
rs=regx.exec(“sdsfuser1dfsfuser2”);
var lastIndex2=regx.lastIndex;
rs=regx.exec(“sdsfuser1dfsfuser2”);
var lastIndex3=regx.lastIndex;
上面lastIndex1为9,第二个lastIndex2也为9,第三个也是9;如果regx=/user/d/g,则第一个为9,第二个为18,第三个为0。
2>source,返回表达式字符串自身。 For example:
var regx=/user/d/;
var rs=regx.exec(“sdsfuser1dfsfuser2”);
var source=regx.source;
source的值为user/d
3)和匹配结果相关的三个属性:
1>index,返回当前匹配的位置。 For example:
var regx=/user/d/;
var rs=regx.exec(“sdsfuser1dfsfuser2”);
var index1=rs.index;
rs=regx.exec(“sdsfuser1dfsfuser2”);
var index2=rs.index;
rs=regx.exec(“sdsfuser1dfsfuser2”);
var index3=rs.index;
index1为4,index2为4,index3为4,如果表达式加入参数g,则index1为4,index2为13,index3会报错(index为空或不是对象)。
2>input,用于匹配的字符串。 For example:
var regx=/user/d/;
var rs=regx.exec(“sdsfuser1dfsfuser2”);
var input=rs.input;
input的值为sdsfuser1dfsfuser2。
3>[0],返回匹配结果中的第一个匹配值,对于match而言可能返回一个多值的数字,则除了[0]外,还可以取[1]、[2]等等。 For example:
var regx=/user/d/;
var rs=regx.exec(“sdsfuser1dfsfuser2”);
var value1=rs[0];
rs=regx.exec(“sdsfuser1dfsfuser2”);
var value2=rs[0];
value1的值为user1,value2的值为user2
5,实际应用
1)实际应用一
描述:有一表单,其中有一个“用户名”input域
要求:汉字,而且不能少于2个汉字,不能多于4个汉字。
accomplish:
<script>function checkForm(obj){ var username=obj.username.value; var regx=/^[/u4e00-/u9fa5]{2,4}$/g if(!regx.test(username)){ alert(“Invalid username!”); return false; } return true;}</script><form name=“myForm”onSubmit=“return checkForm(this)”> <input type=“text” name=“username”/> <input type=“submit” vlaue=“submit”/></form>2)实际应用二
描述:给定一个含有html标记的字符串,要求将其中的html标记去掉。
accomplish:
<script>function toPlainText(htmlStr){ var regx=/<[^>]*>|<//[^>]*>/gm; var str=htmlStr.replace(regx,""); return str;}</script><form name=“myForm”> <textarea id=“htmlInput”></textarea> <input type=“button” value=“submit” onclick=“toPlainText(document.getElementById('htmlInput').value”/></form>三,小结
1,Javascript正则表达式,我想在一般的程序员之中,使用者应该不是很多,因为我们处理的页面一般都不是很复杂,而复杂的逻辑一般我们都在后台处理完成了。但是目前趋势已经出现了扭转,富客户端已经被越来越多的人接受,而Javascript就是其中的关键技术,对于复杂的客户端逻辑而言,正则表达式的作用也是很关键的,同时它也是Javascript高手必须要掌握的重要技术之一。
2,为了能够便于大家对前面讲述的内容有一个更为综合和深刻的认识,我将前面的一些关键点和容易犯糊涂的地方再系统总结一下,这部分很关键!
总结1:附件参数g的用法
表达式加上参数g之后,表明可以进行全局匹配,注意这里“可以”的含义。我们详细叙述:
1)对于表达式对象的exec方法,不加入g,则只返回第一个匹配,无论执行多少次均是如此,如果加入g,则第一次执行也返回第一个匹配,再执行返回第二个匹配,依次类推。 For example
var regx=/user/d/;
var str=“user18dsdfuser2dsfsd”;
var rs=regx.exec(str);//此时rs的值为{user1}
var rs2=regx.exec(str);//此时rs的值依然为{user1}
如果regx=/user/d/g;则rs的值为{user1},rs2的值为{user2}
通过这个例子说明:对于exec方法,表达式加入了g,并不是说执行exec方法就可以返回所有的匹配,而是说加入了g之后,我可以通过某种方式得到所有的匹配,这里的“方式”对于exec而言,就是依次执行这个方法即可。
2)对于表达式对象的test方法,加入g于不加上g没有什么区别。
3)对于String对象的match方法,不加入g,也只是返回第一个匹配,一直执行match方法也总是返回第一个匹配,加入g,则一次返回所有的匹配(注意这与表达式对象的exec方法不同,对于exec而言,表达式即使加上了g,也不会一次返回所有的匹配)。 For example:
var regx=/user/d/;
var str=“user1sdfsffuser2dfsdf”;
var rs=str.match(regx);//此时rs的值为{user1}
var rs2=str.match(regx);//此时rs的值依然为{user1}
如果regx=/user/d/g,则rs的值为{user1,user2},rs2的值也为{user1,user2}
4)对于String对象的replace方法,表达式不加入g,则只替换第一个匹配,如果加入g,则替换所有匹配。(开头的三道测试题能很好的说明这一点)
5)对于String对象的split方法,加上g与不加g是一样的,即:
var sep=/user/d/;
var array=“user1dfsfuser2dfsf”.split(sep);
则array的值为{dfsf, dfsf}
此时sep=/user/d/g,返回值是一样的。
6)对于String对象的search方法,加不加g也是一样的。
总结2:附加参数m的用法
附加参数m,表明可以进行多行匹配,但是这个只有当使用^和$模式时才会起作用,在其他的模式中,加不加入m都可以进行多行匹配(其实说多行的字符串也是一个普通字符串),我们举例说明这一点
1)使用^的例子
var regx=/^b./g;var str=“bd76 dfsdf sdfsdfs dffs b76dsf sdfsdf”;var rs=str.match(regx);
此时加入g和不加入g,都只返回第一个匹配{bd},如果regx=/^b./gm,则返回所有的匹配{bd,b7},注意如果regx=/^b./m,则也只返回第一个匹配。所以,加入m表明可以进行多行匹配,加入g表明可以进行全局匹配,综合到一起就是可以进行多行全局匹配
2)使用其他模式的例子,例如
var regx=/user/d/;var str=“sdfsfsdfsdf sdfsuser3 dffs b76dsf user6”;var rs=str.match(regx);
此时不加参数g,则返回{user3},加入参数g返回{user3,user6},加不加入m对此没有影响。
3)因此对于m我们要清楚它的使用,记住它只对^和$模式起作用,在这两种模式中,m的作用为:如果不加入m,则只能在第一行进行匹配,如果加入m则可以在所有的行进行匹配。我们再看一个^的例子
var regx=/^b./;var str=“ret76 dfsdf bjfsdfs dffs b76dsf sdfsdf”;var rs=str.match(regx);
此时rs的值为null,如果加入g,rs的值仍然为null,如果加入m,则rs的值为{bj}(也就是说,在第一行没有找到匹配,因为有参数m,所以可以继续去下面的行去找是否有匹配),如果m和g都加上,则返回{bj,b7}(只加m不加g说明,可以去多行进行匹配,但是找到一个匹配后就返回,加入g表明将多行中所有的匹配返回,当然对于match方法是如此,对于exec呢,则需要执行多次才能依次返回)
总结3:
在HTML的textarea输入域中,按一个Enter键,对应的控制字符为“/r/n”,即“回车换行”,而不是“/n/r”,即“换行回车”,我们看一个前面我们举过的例子:
var regx=/a/r/nbc/;var str=“a bc”;var rs=regx.exec(str);
结果:匹配成功,rs的值为:{ },如果表达式为/a/n/rbc/,则不会被匹配,因此在一般的编辑器中一个”Enter”键代表着“回车换行”,而非“换行回车”,至少在textarea域中是这样的。
以上这篇浅谈JS正则表达式的RegExp对象和括号的使用就是小编分享给大家的全部内容了,希望能给大家一个参考,也希望大家多多支持武林网。