Select Allow the '|' character to select among two or more candidates. By extending the regular expression of the chapter title, it can be expanded into an expression that applies more than just to the chapter title. However, this is not as direct as expected. When using selection, the most likely expression for each side of the '|' character will be matched. You might think that the following JScript and VBScript expressions will match the 'Chapter' or 'Section' at the beginning and end positions of a row and followed by one or two numbers:
/^Chapter|Section [1-9][0-9]{0,1}$/ ^Chapter|Section [1-9][0-9]{0,1}___FCKpd___0quot;Unfortunately, the real case is that the regular expression shown above either matches the word 'Chapter' at the beginning of a line or matches the 'Section' at the end of a line followed by any number. If the input string is 'Chapter 22', the above expression will only match the word 'Chapter'. If the input string is 'Section 22', the expression will match 'Section 22'. But this result is not our purpose here, so there must be a way to make regular expressions more responsive to what they are going to do, and there is indeed such a way.
Parentheses can be used to limit the range of choices, that is, it is clear that the choice only applies to the two words 'Chapter' and 'Section'. However, parentheses are also difficult to deal with because they are also used to create subexpressions, and some will be introduced later on in the section on subexpressions. By taking the regular expression shown above and adding parentheses at the appropriate location, the regular expression can be made to match both 'Chapter 1' and 'Section 3'.
The following regular expression uses parentheses to group 'Chapter' and 'Section' so that the expression works correctly. For JScript:
/^(Chapter|Section) [1-9][0-9]{0,1}$/For VBScript:
^(Chapter|Section) [1-9][0-9]{0,1}___FCKpd___2quot;These expressions work correctly and just produce an interesting by-product. Placing parentheses on both sides of 'Chapter|Section' creates an appropriate grouping, but also causes one of the two words to be matched to be captured for future use. Since there is only one set of parentheses in the expression shown above, there can only be one captured submatch . This submatch can be referenced using the Submatches collection of VBScript or the $1-$9 attributes of the RegExp object in JScript.
Sometimes capturing a sub-match is desirable, sometimes undesirable. In the example shown in the description, what you really want to do is to use parentheses to group the choice between the words 'Chapter' or 'Section'. It is not desirable to refer to the match later. In fact, please do not use unless you really need to capture sub-match. This regular expression will be more efficient because it does not require time and memory to store those sub-matches.
You can use '?:' in front of the regular expression pattern parentheses to prevent storing this match for future use. The following modifications to the regular expression shown above provide the same functionality that eliminates submatch storage. For JScript:
/^(?:Chapter|Section) [1-9][0-9]{0,1}$/For VBScript:
^(?:Chapter|Section) [1-9][0-9]{0,1}___FCKpd___4quot;In addition to the '?:' metacharacter, there are two non-capturing metacharacters used to matches called pre-checks . A forward pre-check is represented by ?=, where the regular expression pattern in parentheses begins to match the search string. A negative pre-check is indicated by '?!' to match the search string at any position that does not match the regular expression pattern at the beginning.
For example, suppose there is a document that contains references to Windows 3.1, Windows 95, Windows 98, and Windows NT. Further assuming that the document needs to be updated by looking for all references to Windows 95, Windows 98, and Windows NT and changing these references to Windows 2000. You can use the following JScript regular expression, which is a forward pre-check to match Windows 95, Windows 98, and Windows NT:
/Windows(?=95 |98 |NT )/ To make the same match in VBScript, you can use the following expression: Windows(?=95 |98 |NT )
Once a match is found, the next match search begins immediately following the matching text (not including the characters used in the pre-examination). For example, if the expression shown above matches 'Windows 98', the search will continue from 'Windows' instead of '98'.
One of the most important features of regular expressions is the ability to store a part of the matching successful pattern for later use. Recall that adding parentheses to both sides of a regular expression pattern or partial pattern will cause the partial expression to be stored in a temporary buffer. The non-capturing metacharacters '?:', '?=', or '?!' can be used to ignore the storage of this part of the regular expression.
Each submatch captured is stored as the content encountered from left to right in the regular expression pattern. The buffer number that stores sub-matches starts at 1 and is consecutively numbered until the maximum 99 sub-expressions. Each buffer can be accessed using '/ n ', where n is a one- or two-digit decimal number that identifies a particular buffer.
Backward citation One of the easiest and most useful applications is the ability to determine where two identical words appear in a text in succession. Please see the following sentence:
Is is the cost of of gasoline going up up?According to the written content, the above sentence obviously has the problem of repeated words repeatedly. It would be great if there was a way to modify the sentence without looking for repetition of each word. The following JScript regular expression can achieve this function using a subexpression.
//b([az]+) /1/b/giThe equivalent VBScript expression is:
/b([az]+) /1/bIn this example, the subexpression is each item between parentheses. The captured expression includes one or more alphabetical characters, i.e. specified by '[az]+'. The second part of the regular expression is a reference to the previously captured sub-match, that is, the second occurrence of the word matched by the additional expression. '/1' is used to specify the first submatch. Word boundary element characters ensure that only individual words are detected. If not, phrases such as is issued or this is will be incorrectly recognized by the expression.
In a JScript expression, the global flag ('g') following the regular expression means that the expression will be used to find as many matches as possible in the input string. Case sensitivity is specified by the case sensitivity mark ('i') at the end of the expression. Multi-line tags specify potential matches that may appear at both ends of a newline character. For VBScript, various tags cannot be set in expressions, but they must be explicitly set using the properties of the RegExp object.
Using the regular expression shown above, the following JScript code can replace the same word that appears twice in a literal string with the same word using sub-match information:
var ss = Is is the cost of of gasoline going up up?./n; var re = //b([az]+) /1/b/gim; //Create regular expression style. var rv = ss.replace(re,$1); //Replace two words with one word.
The closest equivalent VBScript code is as follows:
Dim ss, re, rv ss = Is is the cost of of gasoline going up up?. & vbNewLine Set re = New RegExp re.Pattern = /b([az]+) /1/b re.Global = True re.IgnoreCase = True re.MultiLine = True rv = re.Replace(ss,$1)Note that in VBScript code, global, case sensitivity, and multi-line tags are set using appropriate properties of the RegExp object.
Use $1 in the replace method to reference the saved first submatch. If there are multiple sub-matches, you can continue to reference with $2 , $3, etc.
Another use of backward references is to break a common resource indicator (URI) into component parts. Suppose you want to decompose the following URI into protocols (ftp, http, etc), domain name address, and page/path:
http://msdn.microsoft.com:80/scripting/default.htmThe following regular expressions can provide this function. For JScript, as:
/(/w+):////([^/:]+)(:/d*)?([^# ]*)/For VBScript:
(/w+):////([^/:]+)(:/d*)?([^# ]*)The first additional subexpression is used to capture the protocol portion of the web address. This subexpression matches any word that is located before a colon and two forward slashes. The second additional subexpression captures the domain name address of that address. This subexpression matches any character sequence that does not include '^', '/' or ':' characters. The third additional subexpression captures the website port number if the port number is specified. This subexpression matches zero or more numbers followed by a colon. Finally, the fourth additional subexpression captures the path specified by the web address and/or page information. This subexpression matches one and multiple characters except '#' or space.
After applying this regular expression to the URI shown above, the sub-match contains the following:
RegExp.$1 contains http
RegExp.$2 contains msdn.microsoft.com
RegExp.$3 includes: 80
RegExp.$4 contains /scripting/default.htm