Detailed explanation of four methods of generating and parsing XML documents in Java (introduction + comparison of advantages and disadvantages + examples)

Author：Eve Cole Update Time：2025-06-11 21:16:01

As we all know, there are more and more methods to parse XML now, but there are only four mainstream methods, namely: DOM, SAX, JDOM and DOM4J

The following is the first to give the jar package download address for these four methods

DOM: It comes in the current Java JDK, in the xml-apis.jar package

SAX: http://sourceforge.net/projects/sax/

JDOM: http://jdom.org/downloads/index.html

DOM4J: http://sourceforge.net/projects/dom4j/

1. Introduction and analysis of advantages and disadvantages

1. DOM (Document Object Model)

DOM is the official W3C standard for representing XML documents in a platform- and language-independent manner. DOM is a collection of nodes or information fragments organized in a hierarchy. This hierarchy allows developers to look for specific information in the tree. Analyzing this structure usually requires loading the entire document and constructing the hierarchy before any work can be done. Since it is based on the information level, the DOM is considered to be tree-based or object-based.

【advantage】

① Allow applications to make changes to data and structure.

② Access is bidirectional, and you can navigate up and down the tree at any time, obtain and operate any part of the data.

【shortcoming】

① Usually, the entire XML document needs to be loaded to construct the hierarchy, which consumes a lot of resources.

2. SAX (Simple API for XML)

The advantages of SAX processing are very similar to the advantages of streaming. Analysis can start immediately, rather than waiting for all the data to be processed. And, since the application only checks the data when reading it, it does not need to store the data in memory. This is a huge advantage for large documents. In fact, the application doesn't even have to parse the entire document; it can stop parsing when a certain condition is met. Generally speaking, SAX is much faster than its replacement DOM.

Choose DOM or SAX? For developers who need to write their own code to process XML documents, choosing a DOM or SAX parsing model is a very important design decision. DOM uses the method of establishing a tree structure to access XML documents, while SAX uses an event model.

The DOM parser converts an XML document into a tree containing its contents and can traverse the tree. The advantage of parsing a model with DOM is that it is easy to program. Developers only need to call tree-making instructions and then use navigation APIs to access the required tree nodes to complete the task. It is easy to add and modify elements in the tree. However, since the entire XML document needs to be processed when using the DOM parser, the performance and memory requirements are relatively high, especially when encountering large XML files. Due to its traversal capabilities, DOM parsers are often used in services where XML documents require frequent changes.

The SAX parser adopts an event-based model. It can trigger a series of events when parsing an XML document. When a given tag is found, it can activate a callback method to tell the method that the tag has been found. SAX usually requires low memory because it allows developers to decide the tags they want to process. Especially when developers only need to process part of the data contained in the document, SAX's expansion ability is better reflected. However, encoding is difficult when using SAX parser, and it is difficult to access multiple different data in the same document at the same time.

【Advantages】

① There is no need to wait for all data to be processed and the analysis can begin immediately.

② Check the data only when reading the data, and does not need to be saved in memory.

③ You can stop parsing when a certain condition is met, without having to parse the entire document.

④ High efficiency and performance, and can parse documents larger than system memory.

【shortcoming】

① The application needs to be responsible for the processing logic of the TAG (such as maintaining the parent/child relationship, etc.), the more complex the document, the more complicated the program is.

② One-way navigation cannot locate the document hierarchy, and it is difficult to access different parts of the data of the same document at the same time, and does not support XPath.

3. JDOM (Java-based Document Object Model)

The purpose of JDOM is to be a Java-specific document model, which simplifies interaction with XML and is faster than using DOM. Since it is the first Java-specific model, JDOM has been vigorously promoted and promoted. Considering ending up using it as a "Java Standard Extension" via "Java Specification Request JSR-102". JDOM development has been started since the beginning of 2000.

JDOM and DOM are mainly different in two aspects. First, JDOM only uses concrete classes and not interfaces. This simplifies the API in some ways, but also limits flexibility. Second, the API uses a large number of Collections classes, simplifying the use of Java developers who are already familiar with these classes.

The JDOM documentation states that its purpose is to "use 20% (or less) of effort to solve 80% (or more) Java/XML problems" (assumed as 20% based on the learning curve). JDOM is of course useful for most Java/XML applications, and most developers find APIs much easier to understand than DOM. JDOM also includes quite extensive checks on program behavior to prevent users from doing anything meaningless in XML. However, it still requires you to fully understand XML in order to do something beyond the basics (or even understand errors in some cases). This may be a more meaningful job than learning a DOM or JDOM interface.

JDOM itself does not contain a parser. It usually uses a SAX2 parser to parse and validate input XML documents (although it can also take previously constructed DOM representations as input). It contains some converters to output JDOM representations into SAX2 event streams, DOM models, or XML text documents. JDOM is open source released under the Apache license variant.

【advantage】

①Use concrete classes instead of interfaces, simplifying the DOM API.

② A large number of Java collection classes are used to facilitate Java developers.

【shortcoming】

①No good flexibility.

② Poor performance.

4. DOM4J (Document Object Model for Java)

Although DOM4J represents a completely independent development result, initially it was an intelligent branch of JDOM. It incorporates many features beyond basic XML document representations, including integrated XPath support, XML Schema support, and event-based processing for large or streamed documents. It also provides the option to build a document representation, which has parallel access via the DOM4J API and the standard DOM interface. It has been under development since the second half of 2000.

To support all these features, DOM4J uses interfaces and abstract base class methods. DOM4J uses the Collections class in the API extensively, but in many cases it also provides some alternatives to allow for better performance or a more straightforward encoding method. The direct benefit is that while DOM4J pays the price of a more complex API, it provides much greater flexibility than JDOM.

When adding flexibility, XPath integration and the goals of large document processing, DOM4J is the same as JDOM: ease of use and intuitive operation for Java developers. It is also committed to becoming a more complete solution than JDOM, achieving the goal of dealing with all Java/XML problems in nature. When completing this goal, it emphasizes less emphasis on preventing incorrect application behavior than JDOM.

DOM4J is a very, very excellent Java XML API, with excellent performance, powerful functions and extremely easy to use. It is also an open source software. Nowadays, you can see that more and more Java software is using DOM4J to read and write XML. It is particularly worth mentioning that even Sun's JAXM is using DOM4J.

【advantage】

①The Java collection class is widely used to facilitate Java developers and provide some alternative methods to improve performance.

②Support XPath.

③It has good performance.

【shortcoming】

①The interface is used extensively, and the API is relatively complex.

2. Comparison

1. DOM4J has the best performance, and even Sun's JAXM uses DOM4J. Currently, many open source projects use DOM4J in large quantities, such as the famous Hibernate also uses DOM4J to read XML configuration files. If portability is not considered, then DOM4J is used.

2. JDOM and DOM perform poorly during performance testing, memory overflows when testing 10M documents, but portable. It is also worth considering DOM and JDOM in the case of small documentation. While JDOM developers have stated that they expect to focus on performance issues before the official release, from a performance point of view, it really doesn't really recommend. In addition, DOM is still a very good choice. DOM implementation is widely used in a variety of programming languages. It is also the basis of many other XML-related standards, as it is officially W3C recommendations (as opposed to non-standard-based Java models), so it may be needed in some types of projects as well (such as using DOM in JavaScript).

3. SAX performs better, which depends on its specific analysis method - event-driven. A SAX detects an upcoming XML stream, but is not loaded into memory (of course, when the XML stream is read, some documents will be temporarily hidden in memory).

My opinion: If the XML document is large and does not consider portability issues, it is recommended to use DOM4J; if the XML document is small, it is recommended to use JDOM; if it needs to be processed in time without saving data, it is recommended to consider SAX. But no matter what, the same sentence is: the best thing is the right thing. If time allows, it is recommended that you try these four methods and choose one that suits you.

III. Example

In order to save space, these four methods and differences in creating XML documents are not given here for the time being. Only the code for parsing XML documents is given. If a complete project is required (building XML documents + parsing XML + testing comparison).

Here is the following XML content as an example for parsing:

 <?xml version="1.0" encoding="UTF-8"?><users> <user id="0"> <name>Alexia</name> <age>23</age> <sex>Female</sex> </user> <user id="1"> <name>Edward</name> <age>24</age> <sex>Male</sex> </user> <user id="2"> <name>wjm</name> <age>23</age> <sex>Female</sex> </user> <user id="3"> <name>wh</name> <age>24</age> <sex>Male</sex> </user></users>

First define the interface for XML document parsing:

 /** * @author Alexia * * Define the interface for XML document parsing*/public interface XmlDocument { /** * parse XML document* * @param fileName * File full path name*/ public void parserXml(String fileName);}

1. DOM Example

 package com.xml;import java.io.FileNotFoundException;import java.io.FileOutputStream;import java.io.IOException;import java.io.PrintWriter;import javax.xml.parsers.DocumentBuilder;import javax.xml.parsers.DocumentBuilderFactory;import javax.xml.parsers.ParserConfigurationException;import javax.xml.transform.OutputKeys;import javax.xml.transform.Transformer;import javax.xml.transform.TransformerConfigurationException;import javax.xml.transform.TransformerException;import javax.xml.transform.TransformerFactory;import javax.xml.transform.dom.DOMSource;import javax.xml.transform.stream.StreamResult;import org.w3c.dom.Document;import org.w3c.dom.Element;import org.w3c.dom.Node;import org.w3c.dom.NodeList;import org.xml.sax.SAXException;/** * @author Alexia * * DOM parsing XML document*/public class DomDemo implements XmlDocument { private Document document; public void parserXml(String fileName) { try { DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); DocumentBuilder db = dbf.newDocumentBuilder(); Document document = db.parse(fileName); NodeList users = document.getChildNodes(); for (int i = 0; i < users.getLength(); i++) { Node user = users.item(i); NodeList userInfo = user.getChildNodes(); for (int j = 0; j < userInfo.getLength(); j++) { Node node = userInfo.item(j); NodeList userMeta = node.getChildNodes(); for (int k = 0; k < userMeta.getLength(); k++) { if(userMeta.item(k).getNodeName() != "#text") System.out.println(userMeta.item(k).getNodeName() + ":" + userMeta.item(k).getTextContent()); } System.out.println(); } } } catch (FileNotFoundException e) { e.printStackTrace(); } catch (ParserConfigurationException e) { e.printStackTrace(); } catch (SAXException e) { e.printStackTrace(); } catch (SAXException e) { e.printStackTrace(); } catch (SAXException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } }}

2. SAX example

 package com.xml;import java.io.FileInputStream;import java.io.FileNotFoundException;import java.io.FileOutputStream;import java.io.IOException;import java.io.InputStream;import java.io.OutputStream;import java.io.StringWriter;import javax.xml.parsers.ParserConfigurationException;import javax.xml.parsers.SAXParser;import javax.xml.parsers.SAXParserFactory;import javax.xml.transform.OutputKeys;import javax.xml.transform.Result;import javax.xml.transform.Transformer;import javax.xml.transform.TransformerConfigurationException;import javax.xml.transform.sax.SAXTransformerFactory;import javax.xml.transform.sax.TransformerHandler;import javax.xml.transform.stream.StreamResult;import org.xml.sax.Attributes;import org.xml.sax.SAXException;import org.xml.sax.helpers.AttributesImpl;import org.xml.sax.helpers.DefaultHandler;/** * @author Alexia * * SAX parsing XML document*/public class SaxDemo implements XmlDocument { public void parserXml(String fileName) { SAXParserFactory saxfac = SAXParserFactory.newInstance(); try { SAXParser saxparser = saxfac.newSAXParser(); InputStream is = new FileInputStream(fileName); saxparser.parse(is, new MySAXHandler()); } catch (ParserConfigurationException e) { e.printStackTrace(); } catch (SAXException e) { e.printStackTrace(); } catch (FileNotFoundException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } }}class MySAXHandler extends DefaultHandler { boolean hasAttribute = false; Attributes attributes = null; public void startDocument() throws SAXException { // System.out.println("The document has started printing"); } public void endDocument() throws SAXException { // System.out.println("The document has ended printing"); } public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException { if (qName.equals("users")) { return; } if (qName.equals("user")) { return; } if (attributes.getLength() > 0) { this.attributes = attributes; this.hasAttribute = true; } } public void endElement(String uri, String localName, String qName) throws SAXException { if (hasAttribute && (attributes != null)) { for (int i = 0; i < attributes.getLength(); i++) { System.out.print(attributes.getQName(0) + ":" + attributes.getValue(0)); } } } public void characters(char[] ch, int start, int length) throws SAXException { System.out.print(new String(ch, start, length)); }}

3. JDOM Example

 package com.xml;import java.io.FileNotFoundException;import java.io.FileOutputStream;import java.io.IOException;import java.util.List;import org.jdom2.Document;import org.jdom2.Element;import org.jdom2.JDOMException;import org.jdom2.input.SAXBuilder;import org.jdom2.output.XMLOutputter;/** * @author Alexia * * JDOM parsing XML documents* */public class JDomDemo implements XmlDocument { public void parserXml(String fileName) { SAXBuilder builder = new SAXBuilder(); try { Document document = builder.build(fileName); Element users = document.getRootElement(); List userList = users.getChildren("user"); for (int i = 0; i < userList.size(); i++) { Element user = (Element) userList.get(i); List userInfo = user.getChildren(); for (int j = 0; j < userInfo.size(); j++) { System.out.println(((Element) userInfo.get(j)).getName() + ":" + ((Element) userInfo.get(j)).getValue()); } System.out.println(); } } catch (JDOMException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } }}

4. DOM4J Example

 package com.xml;import java.io.File;import java.io.FileWriter;import java.io.IOException;import java.io.Writer;import java.util.Iterator;import org.dom4j.Document;import org.dom4j.DocumentException;import org.dom4j.DocumentHelper;import org.dom4j.Element;import org.dom4j.io.SAXReader;import org.dom4j.io.XMLWriter;/** * @author Alexia * * Dom4j parse XML document*/public class Dom4jDemo implements XmlDocument { public void parserXml(String fileName) { File inputXml = new File(fileName); SAXReader saxReader = new SAXReader(); try { Document document = saxReader.read(inputXml); Element users = document.getRootElement(); for (Iterator i = users.elementIterator(); i.hasNext();) { Element user = (Element) i.next(); for (Iterator j = user.elementIterator(); j.hasNext();) { Element node = (Element) j.next(); System.out.println(node.getName() + ":" + node.getText()); } System.out.println(); } } catch (DocumentException e) { System.out.println(e.getMessage()); } }}

The above is all the content of this article. I hope it will be helpful to everyone's learning and I hope everyone will support Wulin.com more.