Summary of usage of regular expressions in Java programming

Author：Eve Cole Update Time：2025-03-28 15:16:01

1. Regular expressions in strings
Use regular expressions to search, extract, segment, replace and other operations on strings. The following special methods are provided in the String class:

boolean matches(String regex): determines whether the string matches the specified regular expression.
String replaceAll(String regex, String replacement): Replace all substrings matching regex in this string with replacement.
String[] split(String regex): Use regex as the separator to split the string into multiple substrings.
All of the above special methods rely on regular expressions provided by Java.

2. Create regular expressions
x: character x (x can represent any legal character);
/0mnn: The character represented by the octal number Omnn;
/xhh: The character represented by hexadecimal 0xhh;
/uhhhh: UNICODE character represented by hexadecimal 0xhhhh;
/t :Tab ('/u0009');
/n: New line (line newline) character ('/u000A');
/r: carriage return character ('/u000D');
/f: page break ('/u000C');
/a: Alarm (bell) symbol ('/u0007');
/e: Escape character('/u001B');
/cx: the controller corresponding to x. For example, /cM matches Ctrl-M. The x value must be one of A~Z or a~z;

3. Special characters in regular expressions
$: Match the end of a line. To match the $ character itself, use /$;
^: Match the beginning of a line. To match the ^ character itself, use /^;
(): Mark the start and end positions of the subexpression. To match these characters, use /(and /);
[]: Used to determine the start and end positions of bracket expressions. To match these characters, use /[ and /];
{}: Used to mark the frequency of the previous subexpression. To match these characters, use /{ and /};
*: Specifies that the previous subexpression can occur zero or more times. To match the * character itself, use /*;
+: Specifies that the previous subexpression can occur once or more times. To match the + character itself, use /+;
?: Specifies that the previous subexpression can occur zero or once. To match the ?character itself, use /?;
.: Match any unit character except line break/n. To match, the character itself, use /.;
/: Used to escape the next character, or specify octal or hexadecimal characters. To match /character, use //;
|: Specify any of the two items. To match the |character itself, use /|;

4. Predefined characters
.: Can match any character;
/d: Match all numbers from 0~9;
/D: Match non-numbers;
/s: Match all whitespace characters, including spaces, tabs, carriage return, page breaks, line breaks, etc.;
/S: Match all non-whitespace characters;
/w: Match all word characters, including all numbers from 0~9, 26 English letters and underscores (_);
/W: Match all non-word characters;

5. Boundary matching characters
^: The beginning of the line
$: The end of the line
/b: Word boundaries
/B: Non-word boundary
/A: The beginning of the input
/G: The end of the previous match
/Z: The end of the input, only for the last ending character
/z: The end of the input

6. Symbols representing the number of matches
The following figure shows symbols representing the number of matches, which are used to determine the number of times the symbol appears next to the left of the symbol:

(1) Suppose we want to search for the US Social Security number in a text file. The format of this number is 999-99-9999. The regular expression used to match it is shown in Figure 1. In regular expressions, hyphen ("-") has a special meaning, which represents a range, such as from 0 to 9. Therefore, when matching a hyphen in a social security number, it must be preceded by an escape character "/".

(2) Suppose when searching, you hope that the hyphen can appear or not - that is, 999-99-9999 and 9999999999999 are both in the correct format. At this time, you can add the "?" quantity qualifying symbol after the hyphen, as shown in the figure:

(3) Let’s take a look at another example below. One format for American car license plates is four numbers plus two letters. Its regular expression is preceded by the number part "[0-9]{4}" and the letter part "[AZ]{2}". The following figure shows the complete regular expression.

7. Some examples 1

 function replace(content){ var reg = '//[(//w+)//]', pattern = new RegExp(reg, 'g'); return content.replace(pattern, '<img src="img/$1.png">');}//or function replace(content){ return content.replace(//[(/w+)//g, '<img src="img/$1.png">');}

Example 2

 //zero-width look behind alternative solution//(?<=...) and (?<!...)//Method 1: Reverse the string, search with lookahead, replace it, and then rewind it back, for example: String.prototype.reverse = function () { return this.split('').reverse().join('');}//Simulate 'foo.bar|baz'.replace(/(?<=/.)b/, 'c') Immediately replace b with '.' in front of it with c'foo.bar|baz'.reverse().replace(/b(?=/.)/g, 'c').reverse() //foo.car|baz//Method 2: Do not use zero width assertion, judge by yourself //Simulate 'foo.bar|baz'.replace(/(?<=/.)b/, 'c') Change the b with '.' in front to c'foo.bar|baz'.replace(/(/.)?b/, function ($0, $1) { return $1 ? $1 + 'c' : $0; }) //foo.car|baz//Simulate 'foo.bar|baz'.replace(/(?<!/.)b/, 'c') Change the b with '.bar|baz'.replace(/(/.)?b/, function ($0, $1) { return $1 ? $0 : 'c'; }) //foo.bar|caz//This method is useful in some relatively simple scenarios and can be used with lookahead // However, there are many scenarios that are invalid, such as: //'tttt'.replace(/(?<=t)t/g, 'x') The result should be 'txxx''tttt'.replace(/(t)?t/g, function ($0, $1) { return $1 ? $1 + 'x' : $0;}) // txtx

Example 3

Use of $& symbol

 function escapeRegExp(str) { return str.replace(/[abc]/g, "($&)");}var str = 'a12b34c';console.log(escapeRegExp(str)); //(a)12(b)34(c)