DOM stands for "Document Object Model" and when parsing an XML document, builds an in-memory structured representation of the document. The whole document is read in at once, and the DOM tree is created in memory asa Java data structure and can be navigated by calling methods. In this exercise we use a DOM parser inside a JSP to display an XML file.
Level of Difficulty: 3 (medium)
Estimated time: 45 minutes
Pre-requisites:
In the section below, we will walk through the code providedand give an explanation of what is happening. Let's create a file called dom.jsp
<%@ page import="javax.xml.parsers.*" %><%@ page import="org.w3c.dom.*" %><%@ page import="java.io.*" %><html><head> <title>DOM Parser</title> </head><body><h1>XML DOM parser test</h1><hr />
The java.xml.parsers
package contains some basicmethods for working with XML parsers (either DOM or SAX).
The second package, org.w3c.dom
, contains DOM-specificobjects and methods. There is also a related package, org.w3c.sax
that wewill use in another exercise.
<% // Create the stream we will read from InputStream is = application.getResourceAsStream("/addressbook.xml"); // Create an instance of the DOM parser and parse the document DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); DocumentBuilder db = dbf.newDocumentBuilder(); Document doc = db.parse(is); // Begin traversing the document traverseTree(doc, out);%>
This section of code is where we set up the DOM parser to parse thedocument.The steps involved are:
Create a InputStream
object that refers to the particular XMLfile we want to open. getResourceAsStream()
will read a file from our war file.
Get a reference to a DocumentBuilderFactory
anda DocumentBuilder
object.The need for this step is because there are potentially many differentimplementations of DOM parsers available.For example, the implementation that we will be using is called Xerces,and is part of the Apache project.Another implementation of a DOM parser comes from IBM.
From an application programmer perspective, you aren't usually interested in which implementation of the DOM parser is being used. You just wantto get access to whichever DOM parser happens to be installed on thesystem you are using.The DocumentBuilderFactory
class provides a generic wayof locating the "default" DOM parser implementation thatis installed on any system.When you call DocumentBuilder.newInstance()
, it returnsa reference to some implementation of a DOM-compliant parser.The DocumentBuilder
object refers to the actual DOMparser itself.
Parse the XML file, by calling the parse()
methodon the DocumentBuilder
object.With DOM, whenever you call the parse()
method, in return you get back a reference to a Document
object that is the starting point for the parsed DOM tree.
If there was a syntax error during parsing and the DOM tree couldnot be built, then a Java exception would be thrown and an errormessage would appear in the browser.This error message will look like the message generated when testingwell-formedness of XML documents in an earlier exercise.
Finally, as a result of parsing we have a Document
object which represents a DOM tree that we can traverse.In this exercise, there is a specific Java method for performingthe traversal, called traverseTree()
.We call the traverseTree()
method and pass to ita reference to the Document
, and also to thepre-defined JSP object called out
, which is usedfor printing data into the HTML code that is sent back to the user's web browser.
Here we declare a Java method that will be used to perform thetraversal.We will call this method to handle each node in the DOM tree thathas been built in memory by the parser.
<%! /** Handles one node of the tree. It accepts any type of node, and will check the node type before processing it. This function is recursive - if one node contains other "sub-nodes", this function will call itself again to process the sub-nodes. @param currnode the current node @param out where to write the output */ private void traverseTree(Node currnode, JspWriter out) throws Exception { // If the current node is null, do nothing if(currnode == null) { return; } // Find the type of the current node int type = currnode.getNodeType(); // Check the node type, and process it accordingly switch (type) {
Notice that for the current node we are processing, we firstfind out the node type, and then use a switch
statementto branch to a block of code to handle that particular type ofnode.
Now we will examine each of the different handlers in turn.
/* * Handle the top-level document node. * Just print out the word "DOCUMENT", and then get the * root element of the document, and process it using * the traverseTree method */ case Node.DOCUMENT_NODE: { out.println("<p>DOCUMENT</p>"); traverseTree (((Document)currnode).getDocumentElement(), out); break; }
There is only one "document" node for each XML document.In this case, first we just print a message to indicate that we haveencountered a document node.Seconly, we call the getDocumentElement()
method toretrieve the root node of the document.With that root node, we then call the traverseTree()
methodto handle it.Note that from within the traverseTree()
method, we arecalling the same method again.This is an example of recursion in programming.
/* * Handle an element node * This is the most complex type of node dealt with. * First, print out the name of the element, before * processing any other sub-nodes (i.e. a preorder traversal). * Secondly, check if this element has any attributes, and * if it does, process those next, by calling the traverseTree() * method. * Finally, retrieve the children of this node (if any), and * process them one by one using the traverseTree() method. */ case Node.ELEMENT_NODE: { String elementName = currnode.getNodeName(); out.println("<p>ELEMENT: [" + elementName + "]</p>"); if (currnode.hasAttributes()) { NamedNodeMap attributes = currnode.getAttributes(); for (int i=0; i < attributes.getLength(); i++) { Node currattr = attributes.item(i); traverseTree(currattr, out); } } NodeList childNodes = currnode.getChildNodes(); if(childNodes != null) { for (int i=0; i < childNodes.getLength() ; i++) { traverseTree (childNodes.item(i), out); } } break; }
This is the most complex of the handlers.There are three main parts to it:
Find out the name of this element (elementName
) andprint it out.
Check to see if this element has any attributes associated with it.If it does, then we retrieve them (attributes
) andthen loop through them one by one using a for
loop.In DOM, every attribute is treated as a Node
as well.So in this example, for each attribute, we simply call the traverseTree()
method to handle it.
The final step in this example is to process any child nodes ofthis element.We retrieve a list of all the child nodes, and use a for
loop to process each one in turn, using the traverseTree()
method to do the processing.Note that children of element nodes are typically either textnodes (if the element contains text) or further element nodes(if the element contains other XML elements nested inside it).
Note that this is where we decide the traversal algorithm to use.In this case, we are using a preorder traversal, which is the mostcommon kind of traversal for processing documents with DOM.
/* * Handle attribute nodes. * Just print out the word "ATTRIBUTE", and then the name * and value of the attribute itself. */ case Node.ATTRIBUTE_NODE: { String attributeName = currnode.getNodeName(); String attributeValue = currnode.getNodeValue(); out.println("<p>ATTRIBUTE: name=[" + attributeName + "], value=[" + attributeValue + "]</p>"); break; }
In the case of attribute nodes, we just retrieve the attribute nameand value, and print them out.
Attribute nodes are leaf nodes in the DOM tree.They have no children to process.
/* * Handle text nodes. * Trim whitespace off the beginning and end of the text. * Then check whether there is any real text, and if so, * print it out. This avoids printing out text nodes that * consist of only whitespace characters. */ case Node.TEXT_NODE: { String text = currnode.getNodeValue().trim(); if (text.length() > 0) { out.println("<p>TEXT: [" + text + "]</p>"); } break; } }}%>
In the case of text nodes, we retrieve the value, and "trim" it.Trimming it means that we remove whitespace from either end of thestring.
If the resulting string has any characters left after trimming, thenwe print it out.This avoids printing text nodes that consist entirely of whitespace.
Text nodes are leaf nodes in the DOM tree.They have no children to process.
First, copy your dom.jsp
file to a new filenamed dom1.jsp
.Make the following changes to dom1.jsp
.
At the moment, the sample JSP prints all nodes at the same levelof indenting (against the left-hand margin).The first goal of this exercise is to modify the code so thateach time the traversal algorithm enters a new level of "depth" in the DOM tree, we indent the output onelevel further, and each time the traversal algorithm goesup one level in the DOM tree, we remove the indenting.
The easiest way to achieve indenting is to use the HTML<blockquote>
tag.When you want to increase the indenting by one level, print outthe following line of HTML:
<blockquote>
When you want to decrease the indenting by one level, print outthe corresponding closing tag:
</blockquote>
Think about how the code works.Each time you process a node, the traverseTree()
methodis called.Another way to think of it is that the start of thetraverseTree()
method is the time at which you"enter" (i.e. start processing) a node, and the end of the traverseTree()
method is when you "exit" (i.e. finish processing) the node.
The solution is quite short - it can be done by adding only two linesof code - but it does require you to think about and understand how the code works (particularly the traverseTree()
method).
The next exercise with the DOM parser is to print out the datafrom the addressbook.xml
file in a HTML table.Your resulting output should look something like the following:
Name | Address | Phonextn | Birthday | |
Wayne | Room 4/536 | brookes@it.uts.edu.au | 1872 | 2001-01-01 |
Maolin | Room 4/520 | maolin@it.uts.edu.au | 1858 | 2001-01-01 |
Copy the original dom.jsp
file to becomedom2.jsp
, and make your changes to dom2.jsp
.
The final exercise is to selectively print data from the DOM tree.Copy the original dom.jsp
file to becomedom3.jsp
, and make your changes to dom3.jsp
.
Suppose that using the addressbook.xml
file, we only want toprint out a list of names and email, and none of the other information.
Modify the code so that only the <name>
and <email> elementvalues are printed.It's not as easy as it sounds - remember that the actual value isn'tstored in the DOM "element" node, it is stored in a"text" node that is a child of the element.
This implies that you need to store some "state" information about where you are in the document.