Java method of extracting data using regular expressions

Author：Eve Cole Update Time：2025-05-21 20:16:01

What is a regular expression

Regular expressions are specifications that can be used for pattern matching and replacement. A regular expression is a literal pattern composed of ordinary characters (such as characters a to z) and special characters (metacharacters). It is used to describe one or more strings to be matched when searching for the body of a literal. The regular expression acts as a template to match a character pattern with the searched string.

Java uses regular expressions to extract data

Java regular expressions are very useful. Before, they used to divide a large 3M txt text into multiple small texts. The words written in C# are very concise, and the code is only about twenty lines. I wrote them in Java today. Sure enough, Java is very wordy.

I won’t post the code to split the file. I will mainly post how to use regular expressions to group large strings:

For example, there is now an endlist.txt text file with the following content:

 1300102, Beijing 1300103, Beijing 1300104, Beijing 1300105, Beijing 1300106, Beijing 1300107, Beijing 1300108, Beijing 1300109, Beijing 1300110, Beijing 1300111, Beijing 1300112, Beijing 1300113, Beijing 1300114, Beijing 1300115, Beijing 1300116, Beijing 1300117, Beijing 1300118, Beijing 1300119, Beijing

The seven-digit number represents the first seven digits of the mobile phone number, and the Chinese characters behind it indicate the number's place of ownership. Now I want to write these contents into these files as 130 131 132... respectively.txt 131.txt 132.txt.......

 public static void main(String args[]) { File file = null; BufferedReader br = null; StringBuffer buffer = null; String childPath = "src/endlist.txt"; String data = ""; try { file = new File(childPath); buffer = new StringBuffer(); InputStreamReader isr = new InputStreamReader(new FileInputStream(file), "utf-8"); br = new BufferedReader(isr); int s; while ((s = br.read()) != -1) { buffer.append((char) s); } data = buffer.toString(); } catch (Exception e) { e.printStackTrace(); } Map<String, ArrayList<String>> resultMap = new HashMap<String, ArrayList<String>>(); for (int i = 0; i < 10; i++) { resultMap.put("13" + i, new ArrayList<String>()); } Pattern pattern = Pattern.compile("(//d{3})(//d{4},[/u4e00-/u9fa5]*//n)"); Matcher matcher = pattern.matcher(data); while (matcher.find()) { resultMap.get(matcher.group(1)).add(matcher.group(2)); } for (int i = 0; i < 10; i++) { if (resultMap.get("13" + i).size() > 0) { try { File outFile = new File("src/13" + i + ".txt"); FileOutputStream outputStream = new FileOutputStream(outFile); OutputStreamWriter writer = new OutputStreamWriter(outputStream, "utf-8"); ArrayList<String> tempList = resultMap.get("13" + i); for (int j = 0; j < tempList.size(); j++) { writer.append(resultMap.get("13" + i).get(j)); } writer.close(); outputStream.close(); } catch (Exception e) { // TODO Auto-generated catch block e.printStackTrace(); } } } }

Line 24 uses the regular expression "(//d{3})(//d{4},[/u4e00-/u9fa5]*//n)" The contents in each() are a group, the index starts from 1, and 0 represents the entire expression. Therefore, this expression is divided into two groups. The first group represents 3 numbers, and the second group represents 4 numbers plus multiple Chinese characters plus a newline character. The extraction is shown in lines 26-28.

Summarize

The above is the entire content of this article. I hope the content of this article will be of some help to your study or work. If you have any questions, you can leave a message to communicate.