A summary of four methods of Java parsing xml

Author：Eve Cole Update Time：2025-04-29 23:48:01

1. DOM (JAXP Crimson parser)

DOM is the official W3C standard for representing XML documents in a platform- and language-independent manner. DOM is a collection of nodes or information fragments organized in a hierarchy. This hierarchy allows developers to look for specific information in the tree. Analyzing this structure usually requires loading the entire document and constructing the hierarchy before any work can be done. Since it is based on the information level, the DOM is considered to be tree-based or object-based. DOM and generalized tree-based processing have several advantages. First, since the tree is persistent in memory, it can be modified so that the application can make changes to the data and structure. It also allows you to navigate up and down the tree at any time, rather than being a one-time job like SAX. DOM is much simpler to use.

2. SAX

The advantages of SAX processing are very similar to the advantages of streaming. Analysis can start immediately, rather than waiting for all the data to be processed. And, since the application only checks the data when reading it, it does not need to store the data in memory. This is a huge advantage for large documents. In fact, the application doesn't even have to parse the entire document; it can stop parsing when a certain condition is met. Generally speaking, SAX is much faster than its replacement DOM.

Choose DOM or SAX? For developers who need to write their own code to process XML documents, choosing a DOM or SAX parsing model is a very important design decision. DOM uses the method of establishing a tree structure to access XML documents, while SAX uses the event model.

The DOM parser converts an XML document into a tree containing its contents and can traverse the tree. The advantage of parsing a model with DOM is that it is easy to program. Developers only need to call tree-making instructions and then use navigation APIs to access the required tree nodes to complete the task. It is easy to add and modify elements in the tree. However, since the entire XML document needs to be processed when using the DOM parser, the performance and memory requirements are relatively high, especially when encountering large XML files. Due to its traversal capabilities, DOM parsers are often used in services where XML documents require frequent changes.

The SAX parser adopts an event-based model. It can trigger a series of events when parsing an XML document. When a given tag is found, it can activate a callback method to tell the method that the tag has been found. SAX usually requires low memory because it allows developers to decide the tags they want to process. Especially when developers only need to process part of the data contained in the document, SAX's expansion ability is better reflected. However, encoding is difficult when using SAX parser, and it is difficult to access multiple different data in the same document at the same time.

3. JDOM http://www.jdom.org

The purpose of JDOM is to be a Java-specific document model, which simplifies interaction with XML and is faster than using DOM. Since it is the first Java-specific model, JDOM has been vigorously promoted and promoted. Considering ending up using it as a "Java Standard Extension" via "Java Specification Request JSR-102". JDOM development has been started since the beginning of 2000.

JDOM and DOM are mainly different in two aspects. First, JDOM only uses concrete classes and not interfaces. This simplifies the API in some ways, but also limits flexibility. Second, the API uses a large number of Collections classes, simplifying the use of Java developers who are already familiar with these classes.

The JDOM documentation states that its purpose is to "use 20% (or less) of effort to solve 80% (or more) Java/XML problems" (assumed as 20% based on the learning curve). JDOM is of course useful for most Java/XML applications, and most developers find APIs much easier to understand than DOM. JDOM also includes quite extensive checks on program behavior to prevent users from doing anything meaningless in XML. However, it still requires you to fully understand XML in order to do something beyond the basics (or even understand errors in some cases). This may be a more meaningful job than learning a DOM or JDOM interface.

JDOM itself does not contain a parser. It usually uses a SAX2 parser to parse and validate input XML documents (although it can also take previously constructed DOM representations as input). It contains some converters to output JDOM representations into SAX2 event streams, DOM models, or XML text documents. JDOM is open source released under the Apache license variant.

4. DOM4J http://dom4j.sourceforge.net

Although DOM4J represents a completely independent development result, initially it was an intelligent branch of JDOM. It incorporates many features beyond basic XML document representations, including integrated XPath support, XML Schema support, and event-based processing for large or streamed documents. It also provides the option to build a document representation, which has parallel access via the DOM4J API and the standard DOM interface. It has been under development since the second half of 2000.

To support all these features, DOM4J uses interfaces and abstract base class methods. DOM4J uses the Collections class in the API extensively, but in many cases it also provides some alternatives to allow for better performance or a more straightforward encoding method. The direct benefit is that while DOM4J pays the price of a more complex API, it provides much greater flexibility than JDOM.

When adding flexibility, XPath integration and the goals of large document processing, DOM4J is the same as JDOM: ease of use and intuitive operation for Java developers. It is also committed to becoming a more complete solution than JDOM, achieving the goal of dealing with all Java/XML problems in nature. When completing this goal, it emphasizes less emphasis on preventing incorrect application behavior than JDOM.

DOM4J is a very, very excellent Java XML API, with excellent performance, powerful functions and extremely easy to use. It is also an open source software. Nowadays, you can see that more and more Java software is using DOM4J to read and write XML. It is particularly worth mentioning that even Sun's JAXM is using DOM4J.
Comparison of the four methods

DOM4J has the best performance, and even Sun's JAXM uses DOM4J. Currently, many open source projects use DOM4J in large quantities, such as the famous Hibernate also uses DOM4J to read XML configuration files. If portability is not considered, then DOM4J is used.
JDOM and DOM perform poorly during performance testing, memory overflowing when testing 10M documents. It is also worth considering DOM and JDOM in the case of small documentation. While JDOM developers have stated that they expect to focus on performance issues before the official release, from a performance point of view, it really doesn't really recommend. In addition, DOM is still a very good choice. DOM implementation is widely used in a variety of programming languages. It is also the basis of many other XML-related standards, as it is officially W3C recommendations (as opposed to non-standard-based Java models), so it may be needed in some types of projects as well (such as using DOM in JavaScript).
SAX performs better, which depends on its specific parsing method - event-driven. A SAX detects an upcoming XML stream, but is not loaded into memory (of course, when the XML stream is read, some documents will be temporarily hidden in memory).

Basic use of four xml operation methods

xml file:

 ＜?xml version="1.0" encoding="GB2312"?>＜RESULT>＜VALUE> ＜NO>A1234＜/NO> ＜ADDR> No. XX Section XX Road, XX Town, XX County, Sichuan Province </ADDR>＜/VALUE>＜VALUE> ＜NO> ＜ADDR> Group XX Village, XX Township, XX City, Sichuan Province </ADDR>＜/VALUE>＜/RESULT＞

1) DOM implementation method

 import java.io.*;import java.util.*;import org.w3c.dom.*;import javax.xml.parsers.*;public class MyXMLReader{ public static void main(String arge[]){ long lasting =System.currentTimeMillis(); try{ File f=new File("data_10k.xml"); DocumentBuilderFactory factory=DocumentBuilderFactory.newInstance(); DocumentBuilder builder=factory.newDocumentBuilder(); Document doc = builder.parse(f); NodeList nl = doc.getElementsByTagName("VALUE"); for (int i=0;i＜nl.getLength();i++){ System.out.print("licence plate number:" + doc.getElementsByTagName("NO").item(i).getFirstChild().getNodeValue()); System.out.println("Owner's address:" + doc.getElementsByTagName("ADDR").item(i).getFirstChild().getNodeValue()); } }catch(Exception e){ e.printStackTrace();}

2) SAX implementation method

 import org.xml.sax.*;import org.xml.sax.helpers.*;import javax.xml.parsers.*;public class MyXMLReader extends DefaultHandler { java.util.Stack tags = new java.util.Stack(); public MyXMLReader() { super();} public static void main(String args[]) { long lasting = System.currentTimeMillis(); try { SAXParserFactory sf = SAXParserFactory.newInstance(); SAXParser sp = sf.newSAXParser(); MyXMLReader reader = new MyXMLReader(); sp.parse(new InputSource("data_10k.xml"), reader); } catch (Exception e) { e.printStackTrace(); } System.out.println("Run time:" + (System.currentTimeMillis() - lasting) + "milliseconds");} public void characters(char ch[], int start, int length) throws SAXException { String tag = (String) tags.peek(); if (tag.equals("NO")) { System.out.print("licence plate number: " + new String(ch, start, length));}if (tag.equals("ADDR")) { System.out.println("Address:" + new String(ch, start, length));}} public void startElement(String uri,String localName,String qName,Attributes attrs) { tags.push(qName);}}

3) JDOM implementation method

 import java.io.*;import java.util.*;import org.jdom.*;import org.jdom.input.*;public class MyXMLReader { public static void main(String arge[]) { long lasting = System.currentTimeMillis(); try { SAXBuilder builder = new SAXBuilder(); Document doc = builder.build(new File("data_10k.xml")); Element foo = doc.getRootElement(); List allChildren = foo.getChildren(); for(int i=0;i＜allChildren.size();i++) { System.out.print("licence plate number:" + ((Element)allChildren.get(i)).getChild("NO").getText()); System.out.println("Owner's address:" + ((Element)allChildren.get(i)).getChild("ADDR").getText()); } } catch (Exception e) { e.printStackTrace();}}

4) DOM4J implementation method

 import java.io.*;import java.util.*;import org.dom4j.*;import org.dom4j.io.*;public class MyXMLReader { public static void main(String arge[]) { long lasting = System.currentTimeMillis(); try { File f = new File("data_10k.xml"); SAXReader reader = new SAXReader(); Document doc = reader.read(f); Element root = doc.getRootElement(); Element foo; for (Iterator i = root.elementIterator("VALUE"); i.hasNext() { foo = (Element) i.next(); System.out.print("licence plate number:" + foo.elementText("NO")); System.out.println("Owner's address:" + foo.elementText("ADDR")); } } catch (Exception e) { e.printStackTrace();})