Using Dom4j to parse and generate XML documents in Java

Author：Eve Cole Update Time：2025-06-15 05:48:02

1. Preface

dom4j is a very excellent Java open source API, mainly used for reading and writing XML documents, with excellent performance, powerful functions, and very convenient use. In addition, xml is often used for data exchange carriers, such as calling the parameters passed by the webservice, and performing data synchronization operations, so it is very necessary to use dom4j to parse xml.

2. Preparation conditions

dom4j.jar

Download address: http://sourceforge.net/projects/dom4j/

3. Use Dom4j to practice

1. Parsing the XML document

Implementation ideas:

<1>After reading the xml path, it is passed to the SAXReader and returns a Document document object;

<2>Then operate this Document object to obtain the information of the following nodes and children;

The specific code is as follows:

 import java.io.File; import javaioFileInputStream; import javaioInputStream; import javautilIterator; import javautilList; import orgdom4jDocument; import orgdom4jDocumentHelper; import orgdom4jElement; import orgdom4jioSAXReader; /** * Use dom4j to parse xml documents* @author Administrator * */ public class Dom4jParseXmlDemo { public void parseXml01(){ try{ //Convert the xml below src to input stream InputStream inputStream = new FileInputStream(new File("D:/project/dynamicWeb/src/resource/modulexml")); //InputStream inputStream = thisgetClass()getResourceAsStream("/modulexml"); //You can also find xml based on the relative path of the compiled file of the class //Create a SAXReader reader, specifically used to read xml SAXReader saxReader = new SAXReader(); //According to the read rewriting method of saxReader, it can be read through the inputStream input stream or through the file object //Document document = saxReaderread(inputStream); Document document = saxReaderread(new File("D:/project/dynamicWeb/src/resource/modulexml"));//The absolute path of the file must be specified // In addition, the xml converter provided by DocumentHelper can also be used. //Document document = DocumentHelperparseText("<?xml version=/"0/" encoding=/"UTF-8/"?><modules id=/"123/"><module> This is the text information of the module tag</module></modules>"); //Get the root node object Element rootElement = documentgetRootElement(); Systemoutprintln("root node name:" + rootElementgetName());//Get the node name Systemoutprintln("How many attributes does the root node have:" + rootElementattributeCount());//Get the number of node attributes Systememoutprintln("The value of the root node id attribute: " + rootElementattributeValue("id"));//Get the value of the node attribute id Systememoutprintln("Text in the root node: " + rootElementgetText());//If the element has children, it returns an empty string, otherwise the text in the node will be returned//rootElementgetText() The reason why the line breaks is because the tab key and newline layout are used between the label and the label, which is also considered text, so the line breaks are displayed. Systemoutprintln("Text(1):" + rootElementgetTextTrim());//What is removed is the tab key and line break between the tag and the tag, etc., not the space before and after the content Systemoutprintln("Text content of the root node child node:" + rootElementgetStringValue()); //Return the text information of all child nodes recursively on the current node. //Get child nodes Element element = rootElementelement("module"); if(element != null){ Systemeputprintln("Sub-node text: " + elementgetText());//Because the child node and the root node are Element objects, their operation methods are the same} // However, in some cases, the XML is more complicated and the specifications are not unified. There is no direct javalangNullPointerException for a certain node, so after obtaining the element object, you must first determine whether it is empty rootElementsetName("root");//Support to modify the node name Systemeputprintln("The name after the root node is modified: " + rootElementgetName()); rootElementsetText("text"); //The same is true for modifying the text in the tag Systemoutprintln("Text after the root node is modified: " + rootElementgetText()); } catch (Exception e) { eprintStackTrace(); } } public static void main(String[] args) { Dom4jParseXmlDemo demo = new Dom4jParseXmlDemo(); demoparseXml01(); } }

In addition, the above xml is under src, module01.xml is as follows:

 <?xml version="0" encoding="UTF-8"?> <modules id="123"> <module> This is the text information of the module tag</module> </modules>

Next, execute the main method of this class, and the console effect is as follows:

From this we know:

<1>There are many ways to read xml files;

<2>It is very simple to take out the text and label name of the element object;

<3> and it is very convenient to modify the text and label names of elements, but it will not be written to disk xml files.

The above simply obtains the element of the root directory of the xml, and then loops through the document document object using the Iterator iterator.

The specific code is as follows:

 public void parseXml02(){ try{ //Convert the xml below src to the input stream InputStream inputStream = thisgetClass()getResourceAsStream("/modulexml"); //Create a SAXReader reader specifically for reading xml SAXReader saxReader = new SAXReader(); //According to the read rewrite method of saxReader, it can be seen that it can be read through the inputStream input stream, or it can be read through the file object Document = saxReaderread(inputStream); Element rootElement = documentgetRootElement(); Iterator<Element> modulesIterator = rootElementelements("module")iterator(); //rootElementelement("name"); Get a certain child element //rootElementelements("name"); Get the set of child elements moudule nodes under the root node, return the List collection type //rootElementelements("module")iterator(); Iterate each element in the returned list collection and return all child nodes to an Iterator collection while(modulesIteratorhasNext()){ Element moduleElement = modulesIteratornext(); Element nameElement = moduleElementelement("name"); Systemoutprintln(nameElementgetName() + ":" + nameElementgetText()); Element valueElement = moduleElementelement("value"); Systemoutprintln(valueElementgetName() + ":" + valueElementgetText()); Element descriptionElement = moduleElementelement("descript"); Systemoutprintln(descriptElementgetName() + ":" + describeElementgetText()); } } catch (Exception e) { eprintStackTrace(); } }

In addition, the above xml is under src, module02.xml is as follows:

 <?xml version="1.0" encoding="UTF-8"?> <modules id="123"> <module> <name>oa</name> <value> Basic system configuration</value> <descript> Basic system configuration root directory</descript> </modules> </modules>

Next, execute the main method of this class, and the console effect is as follows:

From this we know:

<1>dom4j iterates over xml sub-elements very efficient and convenient;

However, the above simply iterates over the child node elements of xml, but if the xml rules are more complicated, such as the module03.xml to be tested next, the details are as follows:

 <?xml version="1.0" encoding="UTF-8"?> <modules id="123"> <module>This is the text information of the module tag</module> <module id=""> <name>oa</name> <value>Basic system configuration</value> <descript>Basic configuration for the system</descript> <module>This is the text information of the submodule tag</module> </module> <module> <name>Management configuration</name> <value>none</value> <descript>Instructions for managing configuration</descript> <module id="106"> <name>System management</name> <value>0</value> <descript>Config</descript> <module id="107"> <name>Department number</name> <value>20394</value> <descript>Number</descript> </module> </module> </module> </modules>

Because their structures are different, if they iterate directly, an error will be reported:

java.lang.NullPointerException

So you need to use it carefully at this time. You cannot put the elements directly into iterating each time. The specific implementation code is as follows:

 public void parseXml03(){ try{ //Convert the xml below src to the input stream InputStream inputStream = thisgetClass()getResourceAsStream("/modulexml"); //Create a SAXReader reader specifically for reading xml SAXReader saxReader = new SAXReader(); //According to the read rewrite method of saxReader, it can be seen that it can be read through the inputStream input stream, or it can be read through the file object Document = saxReaderread(inputStream); Element rootElement = documentgetRootElement(); if(rootElementelements("module") != null ){ //Because the first module tag has only content and no child nodes, iterator() is javalangNullPointerException, so it needs to implement List<Element> elementList = rootElementelements("module"); for (Element element : elementList) { if(!elementgetTextTrim()equals("")){ Systemoutprintln("【1】" + elementgetTextTrim()); }else{ Element nameElement = elementelement("name"); Systemoutprintln(" 【2】" + nameElementgetName() + ":" + nameElementgetText()); Element valueElement = elementelement("value"); Systemoutprintln(" 【2】" + valueElementgetName() + ":" + valueElementgetText()); Element descriptionElement = elementelement("descript"); Systemoutprintln(" 【2】" + descriptionElementgetName() + ":" + descriptionElementgetText()); List<Element> subElementList = elementelements("module"); for (Element subElement : subElementList) { if(!subElementgetTextTrim()equals("")){ Systemoutprintln(" 【3】" + subElementgetTextTrim()); }else{ Element subnameElement = subElementelement("name"); Systemoutprintln(" 【3】" + subnameElementgetName() + ":" + subnameElementgetText()); Element subvalueElement = subElementelement("value"); Systemoutprintln(" 【3】" + subvalueElementgetName() + ":" + subvalueElementgetText()); Element subdescriptElement = subElementelement("descript"); Systemoutprintln(" 【3】" + subdescriptElementgetName() + ":" + subdescriptElementgetText()); } } } } } } catch (Exception e) { eprintStackTrace(); } }

Next, execute the main method of this class, and the console effect is as follows:

OK, now we can solve the problem of empty references in iterative documents.

In addition, the code can actually be refactored, because the operation of taking out child elements in the loop is repetitive, and it can be improved by recursion, but the readability will be a little worse.

If you sometimes need to obtain all the text information in xml, or the XML format passed by others is not standardized, such as the name in the tag is case-sensitive, although xml is not case-sensitive, it must appear in pairs. So in order to avoid this, you can simply change all tag names to capitalize. The specific code is as follows:

 public static void main(String[] args) { String str = "<?xml version=/"0/" encoding=/"UTF-8/"?><modules id=/"123/"><module> This is the text information of the module tag <name>oa</name><value>Basic configuration</value><descript>Basic configuration of the system</descript></module></modules>"; Systemutprintln(strreplaceAll("<[^<]*>", "_")); Pattern pattern = Patterncompile("<[^<]*>"); Matcher matcher = patternmatcher(str); while(matcherfind()){ str = strreplaceAll(matchergroup(0), matchergroup(0)toUpperCase()); } Systemoutprintln(str); }

After running, the renderings are as follows:

2. Generate XML document

dom4j can parse xml, and it will definitely generate xml, and it is easier to use.

Implementation ideas:

<1>DocumentHelper provides a method to create a Document object;

<2>Operate this Document object and add the node and the text, name and attribute values under the node;

<3>Then use the XMLWriter writer to write the encapsulated document object to disk;

The specific code is as follows:

 import java.io.FileWriter; import javaioIOException; import javaioWriter; import orgdom4jDocument; import orgdom4jDocumentHelper; import orgdom4jElement; import orgdom4jioXMLWriter; /** * Use dom4j to generate xml documents* @author Administrator * */ public class Dom4jBuildXmlDemo { public void build01(){ try { //DocumentHelper provides a method to create a Document object Document document = DocumentHelpercreateDocument(); //Add node information Element rootElement = documentaddElement("modules"); //This can continue to add child nodes, or specify the content rootElementsetText("This is the text information of the module tag"); Element element = rootElementaddElement("module"); Element nameElement = elementaddElement("name"); Element valueElement = elementaddElement("value"); Element descriptionElement = elementaddElement("description"); nameElementsetText("name"); nameElementaddAttribute("language", "java");//Add attribute value value for node valueElementsetText("value"); valueElementaddAttribute("language", "c#"); descriptionElementsetText("description"); descriptionElementaddAttribute("language", "sql server"); Systemoutprintln(documentsXML()); //Convert document object directly into string output Writer fileWriter = new FileWriter("c://modulexml"); //dom4j provides an object specifically written to files XMLWriter XMLWriter xmlWriter = new XMLWriter(fileWriter); xmlWriterwrite(document); xmlWriterflush(); xmlWriterclose(); Systemoutprintln("xml document was added successfully! "); } catch (IOException e) { eprintStackTrace(); } } public static void main(String[] args) { Dom4jBuildXmlDemo demo = new Dom4jBuildXmlDemo(); demobuild01(); } }

The effect of running the code is as follows:
Then go to the c drive below to check whether the creation was successful. It turned out that the content in the xml file is the same as the content output by the console.

In addition, the above generated xml does not specify the encoding format, but UTF-8 is still displayed, indicating that this is the default encoding format. If you want to re-specify, you can add document.setXMLEncoding("GBK"); before writing to disk.

The above is all the content of this article. I hope it will be helpful to everyone's learning and I hope everyone will support Wulin.com more.