Detailed introduction to Java regular expressions

Author：Eve Cole Update Time：2025-06-06 02:48:01

As we all know, in program development, it is inevitable to encounter situations where strings need to be matched, searched, replaced, and judged. These situations are sometimes more complicated. If solved in pure encoding, it will often waste programmers' time and energy. Therefore, learning and using regular expressions have become the main means to resolve this contradiction.

As we all know, regular expressions are a specification that can be used for pattern matching and replacement. A regular expression is a literal pattern composed of ordinary characters (such as characters a to z) and special characters (metacharacters). It is used to describe one or more strings to be matched when searching for the body of a literal. The regular expression acts as a template to match a character pattern with the searched string.

Since jdk1.4 launched the java.util.regex package, it has provided us with a good JAVA regular expression application platform.

Because regular expressions are a very complex system, I will only give some examples of introductory concepts. For more information, please refer to related books and explore them yourself.

// Backslash
/t interval ('/u0009')
/n Line break('/u000A')
/r Enter('/u000D')
/d Number is equivalent to [0-9]
/D Non-digit is equivalent to [^0-9]
/s blank symbol[/t/n/x0B/f/r]
/S Non-blank symbol [^/t/n/x0B/f/r]
/w Individual character [a-zA-Z_0-9]
/W Non-individual character [^a-zA-Z_0-9]
/f page break
/e Escape
/b The boundary of a word
/B A non-word boundary
/G End of previous match

^ starts with limit
^java condition limit is to characters starting with Java
$ is the end of the limit
java$ condition limit is java-end characters
. Conditional limits any single character except /n
java.. After the condition limit is java, any two characters except for the newline

Add specific restrictions "[]"

[az] Conditions are limited to a character in the lowercase a to z range
[AZ] Conditions are limited to one character in the uppercase A to Z range
[a-zA-Z] Conditions are limited to a character in the lowercase a to z or uppercase A to Z range
[0-9] Conditions are limited to one character in the lowercase 0 to 9 range
[0-9a-z] Conditions are limited to lowercase 0 to 9 or a to z range
[0-9[az]] Conditions are limited to lowercase 0 to 9 or a to z range (intersection)

[] and add the restriction condition "[^]" again

[^az] Conditions are limited to a character in the range of non-lowercase a to z
[^AZ] Condition limits one character in the non-caps A to Z range
[^a-zA-Z] Conditions are limited to a character in the range of non-lowercase a to z or uppercase A to Z
[^0-9] Conditions are limited to one character in the range of non-lowercase 0 to 9
[^0-9a-z] Conditions are limited to a character in the range of non-lowercase 0 to 9 or a to z
[^0-9[az]] Conditions are limited to a character in the range of non-lowercase 0 to 9 or a to z (intersection)

When the restriction condition is that a specific character appears more than 0 times, you can use "*"

J* More than 0 J
.* Any character above 0
J.*DJ and D any 0 characters

When the restriction condition is that a specific character appears more than once, you can use "+"

J+ 1 or more J
.+ 1 or more arbitrary characters
J.+1 or more character between DJ and D

When the restriction condition is that a specific character appears 0 or more times, "?" can be used.

JA? J or JA appears

Limit to the continuous occurrence of the specified number character "{a}"

J{2} JJ
J{3} JJJ
More than a text, and "{a,}"
J{3,} JJJ, JJJJJ, JJJJJ,???(J coexist more than 3 times)
More than one text, less than b "{a,b}"
J{3,5} JJJ or JJJJ or JJJJJ
Take one of the two
J|AJ or A
Java|Hello Java or Hello

"()" specifies a combination type, for example, if I query the data between <a href=/"index.html/">index</a>, I can write <a.*href=/".*/">(.+?)</a>

When using the Pattern.compile function, you can add parameters that control the matching behavior of regular expressions:

Pattern Pattern.compile(String regex, int flag)

The range of values of flags is as follows:

Pattern.CANON_EQ The match is determined if and only if the "canonical decomposition" of the two characters are exactly the same. For example, after using this flag, the expression "a/u030A" will match "?". By default, "canonical equality" is not considered.

Pattern.CASE_INSENSITIVE(?i) By default, case-unidentified matching is only available for US-ASCII character sets. This flag allows expressions to ignore case for matching. To match Unicode characters with unknown size, just combine UNICODE_CASE with this flag.

Pattern.COMMENTS(?x) In this mode, the space characters (in regular expressions) will be ignored when matching (translator's note: does not refer to "//s" in the expression, but refers to spaces, tabs, carriage return, etc.) in the expression. Comments start at # and end at this line. Unix row mode can be enabled through embedded flags.

Pattern.DOTALL(?s) In this mode, the expression '.' can match any character, including the ending character representing a line. By default, the expression '.' does not match the ending character of the line.

Pattern.MULTILINE
(?m) In this mode, '^' and '$' match the beginning and end of a row respectively. Additionally, '^' still matches the beginning of the string, and '$' also matches the end of the string. By default, these two expressions only match the beginning and end of the string.

Pattern.UNICODE_CASE
(?u) In this mode, if you also enable the CASE_INSENSITIVE flag, it will match Unicode characters with unidentified case. By default, case-insensitive matching is only available for US-ASCII character sets.
Pattern.UNIX_LINES(?d) In this mode, only '/n' is considered an abort of a line and matches '.', '^', and '$'.

Putting aside the empty concept, here are a few simple Java regular use cases:

◆For example, when string contains verification

 //Find a string that starts with Java and ends at will Pattern pattern = Pattern.compile("^Java.*"); Matcher matcher = pattern.matcher("Java is not a human"); boolean b= matcher.matches(); //When the condition is satisfied, it will return true, otherwise false System.out.println(b);

◆When splitting strings with multiple conditions

 Pattern pattern = Pattern.compile("[, |]+");String[] strs = pattern.split("Java Hello World Java,Hello,,World|Sun");for (int i=0;i<strs.length;i++) { System.out.println(strs[i]);}

◆Text replacement (character appears for the first time)

 Pattern pattern = Pattern.compile("regular expression"); Matcher matcher = pattern.matcher("regular expression Hello World,regular expression Hello World");//Replace the first data that complies with the regularity System.out.println(matcher.replaceFirst("Java"));

◆Text replacement (all)

 Pattern pattern = Pattern.compile("regular expression"); Matcher matcher = pattern.matcher("regular expression Hello World,regular expression Hello World");//Replace the first data that complies with the regularity System.out.println(matcher.replaceAll("Java"));

◆Text replacement (replace characters)

 Pattern pattern = Pattern.compile("regular expression");Matcher matcher = pattern.matcher("regular expression Hello World,regular expression Hello World ");StringBuffer sbr = new StringBuffer(); while (matcher.find()) { matcher.appendReplacement(sbr, "Java");}matcher.appendTail(sbr);System.out.println(sbr.toString());

◆Verify whether it is an email address

 String str="[email protected]";Pattern pattern = Pattern.compile("[//w//.//-]+@([//w//-]+//.)+[//w//-]+",Pattern.CASE_INSENSITIVE);Matcher matcher = pattern.matcher(str);System.out.println(matcher.matches());

◆Remove html tags

 Pattern pattern = Pattern.compile("<.+?>", Pattern.DOTALL);Matcher matcher = pattern.matcher("<a href=/"index.html/">Home</a>");String string = matcher.replaceAll("");System.out.println(string);

◆Find the corresponding condition string in html

 Pattern pattern = Pattern.compile("href=/"(.+?)/"");Matcher matcher = pattern.matcher("<a href=/"index.html/">Home</a>");if(matcher.find()) System.out.println(matcher.group(1));}

◆Intercept http://address

 //Intercept urlPattern pattern = Pattern.compile("(http://|https://){1}[//w//.//-/:]+");Matcher matcher = pattern.matcher("dsdsds<http://dsds//gfgffdfd>fdf");StringBuffer buffer = new StringBuffer(); while(matcher.find()){ buffer.append(matcher.group()); buffer.append("/r/n"); System.out.println(buffer.toString());}

◆Replace the specified {} Chinese characters

 String str = "The current development history of Java is from {0} years - {1} years";String[][] object={new String[]{"//{0//}","1995"},new String[]{"//{1//}","2007"}};System.out.println(replace(str,object));public static String replace(final String sourceString,Object[] object) { String temp=sourceString; for(int i=0;i<object.length;i++){ String[] result=(String[])object[i]; Pattern pattern = Pattern.compile(result[0]); Matcher matcher = pattern.matcher(temp); temp=matcher.replaceAll(result[1]); } return temp;}

◆Query files in designated directories with regular conditions

 //Used to cache file list private ArrayList files = new ArrayList(); //Used to host file path private String _path; //Used to host unmerged regular formula private String _regexp; class MyFileFilter implements FileFilter { /** * Match file name*/ public boolean accept(File file) { try { Pattern pattern = Pattern.compile(_regexp); Matcher match = pattern.matcher(file.getName()); return match.matches(); } catch (Exception e) { return true; } } } /** * Analyze the input stream* @param inputs */ FilesAnalyze (String path,String regexp){ getFileName(path,regexp); } /** * Analyze the file name and add files * @param input */ private void getFileName(String path,String regexp) { //Directory_path=path; _regexp=regexp; File directory = new File(_path); File[] filesFile = directory.listFiles(new MyFileFilter()); if (filesFile == null) return; for (int j = 0; j < filesFile.length; j++) { files.add(filesFile[j]); } return; } /** * Show output information* @param out */ public void print (PrintStream out) { Iterator elements = files.iterator(); while (elements.hasNext()) { File file=(File) elements.next(); out.println(file.getPath()); } } public static void output(String path,String regexp) { FilesAnalyze fileGroup1 = new FilesAnalyze(path,regexp); fileGroup1.print(System.out); } public static void main (String[] args) { output("C://","[Az|.]*"); }

There are many functions of Java regularity. In fact, as long as it is character processing, there is nothing that regularity cannot do. (Of course, it takes a lot of time to explain it regularly|||...)

The above is the information sorting out Java regular expressions. We will continue to add relevant information in the future. Thank you for your support for this site!