This exercise uses the DOM parser to parse an XML documentand print out the document in a JSP.
Level of Difficulty: Difficult (new/difficult concepts)
Estimated time: 1 hour
Pre-requisites:
public_html
directory as indicated in lab 2.As in the previous labs, copy this labs's files into your public_html/dca
directory from:
/pub/dca/lab04
There are two sample files related to the DOM parsing examplefor this lab exercise:
dom.jsp
cd.xml
The dom.jsp
file is the code that we will be working on during the exercise.The cd.xml
file is a sample data file that we will read in.
First, you need to tell the JSP to read the XML data filefrom YOUR home directory.Open up the JSP file in a text editor:
gedit dom.jsp &
Look for the line that contains the path name to cd.xml
,e.g. the path will look like:
/home/USERNAME/public_html/dca/lab04/cd.xml
Change this path so that instead of USERNAME
,it says your own Faculty login name.
Then, open the JSP file in your browser, making sure that the URLcontains charlie.it.uts.edu.au
orsally.it.uts.edu.au
, e.g.
http://charlie.it.uts.edu.au/~username/dca/lab04/dom.jsp
When this file executes, it prints out the contents of thecd.xml
file, as seen by the DOM parser.
So you can see what it is doing, take a look at cd.xml
in a text editor.
In the section below, we will walk through the code providedand give an explanation of what is happening.
<%@ page import="javax.xml.parsers.*" %> <%@ page import="org.w3c.dom.*" %> <%@ page import="java.io.*" %>
These three lines appear at the top of the file.Here we are importing the Java class libraries that will beneeded by the JSP code.
Note that they are JSP directives because they are enclosedby the delimiters "<%@
" and"%>
".
The java.xml.parsers
package contains some basicmethods for working with XML parsers (either DOM or SAX).
The second package, org.w3c.dom
, contains DOM-specificobjects and methods. There is also a related package, org.w3c.sax
that wewill use in another exercise.
Finally, the java.io
package is needed becausewe will be using the Java File
class to read in a file.
<% // Create the file object we will read from File file = new File("/home/USERNAME/public_html/dca/lab04/cd.xml"); // Create an instance of the DOM parser and parse the document DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); DocumentBuilder db = dbf.newDocumentBuilder(); Document doc = db.parse(file); // Begin traversing the document traverseTree(doc, out);%>
This section of code is where we set up the DOM parser to parse thedocument.Note that the code is a JSP scriptlet, as it is containedwithin the delimiters "<%
" and"%>
".The steps involved are:
Create a File
object that refers to the particular XMLfile we want to open.
Get a reference to a DocumentBuilderFactory
anda DocumentBuilder
object.The need for this step is because there are potentially many differentimplementations of DOM parsers available.For example, the implementation that we will be using is called Xerces,and is part of the Apache project.Another implementation of a DOM parser comes from IBM.
From an application programmer perspective, you aren't usually interested in which implementation of the DOM parser is being used. You just wantto get access to whichever DOM parser happens to be installed on thesystem you are using.The DocumentBuilderFactory
class provides a generic wayof locating the "default" DOM parser implementation thatis installed on any system.When you call DocumentBuilder.newInstance()
, it returnsa reference to some implementation of a DOM-compliant parser.The DocumentBuilder
object refers to the actual DOMparser itself.
Parse the XML file, by calling the parse()
methodon the DocumentBuilder
object.With DOM, whenever you call the parse()
method, in return you get back a reference to a Document
object that is the starting point for the parsed DOM tree.
If there was a syntax error during parsing and the DOM tree couldnot be built, then a Java exception would be thrown and an errormessage would appear in the browser.This error message will look like the message generated when testingwell-formedness of XML documents in an earlier exercise.
Finally, as a result of parsing we have a Document
object which represents a DOM tree that we can traverse.In this exercise, there is a specific Java method for performingthe traversal, called traverseTree()
.We call the traverseTree()
method and pass to ita reference to the Document
, and also to thepre-defined JSP object called out
, which is usedfor printing data into the HTML code that is sent back to the user's web browser.
<%! private void traverseTree(Node currnode, JspWriter out) throws Exception {
Here we declare a Java method that will be used to perform thetraversal.We will call this method to handle each node in the DOM tree thathas been built in memory by the parser.
Some points to note:
The method is enclosed in a block that is delimited by the symbols"<%!
" and "%>
".Note the extra exclamation mark (!
) that you may not haveencountered before - this kind of code block in a JSP file is called adeclaration.
In a declaration block, the only kind of Java code you are allowed to haveis:
This method declares that it throws an exception.In the event that anything at all goes wrong, the method will just generate an exception that will be displayed as an errormessage in the browser.
The overall structure of the traverseTree()
methodis shown below:
int type = currnode.getNodeType(); switch (type) { case Node.DOCUMENT_NODE: { // handle a document node break; } case Node.ELEMENT_NODE: { // handle an element node break; } case Node.ATTRIBUTE_NODE: { // handle an attribute node break; } case Node.TEXT_NODE: { // handle a text node break; } }
This shows an outline of the traverseTree()
methodwithout all of the details filled in yet.Notice that for the current node we are processing, we firstfind out the node type, and then use a switch
statementto branch to a block of code to handle that particular type ofnode.
Now we will examine each of the different handlers in turn.
case Node.DOCUMENT_NODE: { out.println("<p>DOCUMENT</p>"); traverseTree (((Document)currnode).getDocumentElement(), out); break; }
There is only one "document" node for each XML document.In this case, first we just print a message to indicate that we haveencountered a document node.Seconly, we call the getDocumentElement()
method toretrieve the root node of the document.With that root node, we then call the traverseTree()
methodto handle it.Note that from within the traverseTree()
method, we arecalling the same method again.This is an example of recursion in programming.
case Node.ELEMENT_NODE: { String elementName = currnode.getNodeName(); out.println("<p>ELEMENT: [" + elementName + "]</p>"); if (currnode.hasAttributes()) { NamedNodeMap attributes = currnode.getAttributes(); for (int i=0; i < attributes.getLength(); i++) { Node currattr = attributes.item(i); traverseTree(currattr, out); } } NodeList childNodes = currnode.getChildNodes(); if(childNodes != null) { for (int i=0; i < childNodes.getLength() ; i++) { traverseTree (childNodes.item(i), out); } } break; }
This is the most complex of the handlers.There are three main parts to it:
Find out the name of this element (elementName
) andprint it out.
Check to see if this element has any attributes associated with it.If it does, then we retrieve them (attributes
) andthen loop through them one by one using a for
loop.In DOM, every attribute is treated as a Node
as well.So in this example, for each attribute, we simply call the traverseTree()
method to handle it.
The final step in this example is to process any child nodes ofthis element.We retrieve a list of all the child nodes, and use a for
loop to process each one in turn, using the traverseTree()
method to do the processing.Note that children of element nodes are typically either textnodes (if the element contains text) or further element nodes(if the element contains other XML elements nested inside it).
Note that this is where we decide the traversal algorithm to use.In this case, we are using a preorder traversal, which is the mostcommon kind of traversal for processing documents with DOM.
case Node.ATTRIBUTE_NODE: { String attributeName = currnode.getNodeName(); String attributeValue = currnode.getNodeValue(); out.println("<p>ATTRIBUTE: name=[" + attributeName + "], value=[" + attributeValue + "]</p>"); break; }
In the case of attribute nodes, we just retrieve the attribute nameand value, and print them out.
Attribute nodes are leaf nodes in the DOM tree.They have no children to process.
case Node.TEXT_NODE: { String text = currnode.getNodeValue().trim(); if (text.length() > 0) { out.println("<p>TEXT: [" + text + "]</p>"); } break; }
In the case of text nodes, we retrieve the value, and "trim" it.Trimming it means that we remove whitespace from either end of thestring.
If the resulting string has any characters left after trimming, thenwe print it out.This avoids printing text nodes that consist entirely of whitespace.
Text nodes are leaf nodes in the DOM tree.They have no children to process.
First, copy your dom.jsp
file to a new filenamed dom1.jsp
.Make the following changes to dom1.jsp
.
At the moment, the sample JSP prints all nodes at the same levelof indenting (against the left-hand margin).The first goal of this exercise is to modify the code so thateach time the traversal algorithm enters a new level of "depth" in the DOM tree, we indent the output onelevel further, and each time the traversal algorithm goesup one level in the DOM tree, we remove the indenting.
The easiest way to achieve indenting is to use the HTML<blockquote>
tag.When you want to increase the indenting by one level, print outthe following line of HTML:
<blockquote>
When you want to decrease the indenting by one level, print outthe corresponding closing tag:
</blockquote>
Think about how the code works.Each time you process a node, the traverseTree()
methodis called.Another way to think of it is that the start of thetraverseTree()
method is the time at which you"enter" (i.e. start processing) a node, and the end of the traverseTree()
method is when you "exit" (i.e. finish processing) the node.
The solution is quite short - it can be done by adding only two linesof code - but it does require you to think about and understand how the code works (particularly the traverseTree()
method).
The next exercise is to selectively print data from the DOM tree.Copy the original dom.jsp
file to becomedom2.jsp
, and make your changes to dom2.jsp
.
Suppose that using the cd.xml
file, we only want toprint out a list of track titles, and none of the other information.
Modify the code so that only the <title>
elementvalues are printed.It's not as easy as it sounds - remember that the actual value isn'tstored in the DOM "element" node, it is stored in a"text" node that is a child of the element.
What about the fact that the same element name (<title>
)is used to represent both the CD title and the track title, dependingupon where it appears in the XML document?Don't worry about this in your first attempt at a solution, butsee if you can find a way to solve it.
The final exercise with the DOM parser is to print out the datafrom the cd.xml
file in a HTML table.Your resulting output should look something like the following:
A Funk Odyssey | Jamiroquai | ||
Track Num | Title | Time | Rating |
---|---|---|---|
1 | Feels So Good | 4:38 | 2 |
2 | Little L | 4:10 | 5 |
3 | You Give Me Something | 5:02 | 3 |
4 | Corner of the Earth | 3:57 | 1 |