COMP207-Assignment2 Comp 207: Database development Assignment 2: XML processing DEADLINE: 5 pm, Tuesday 13th December 2011 The second practical assignment consists of designing an XML schema that corresponds to a given specification, and write a Java program that validates, prints and queries the XML schema produced. Therefore, for this assignment you are asked to complete the following tasks: 1. Write an XML schema that models information about Actors and the Movies they star in, following the given XML document MovieActors.xml. For each Actor, record their ID, first name and surname, and the movies they appear in. For each Movie, provide details regarding the Title of the movie, the actors starring in it, and the year of release. For each movie, the schema should also indicate the main scenes (possibly identified by an ID), and the actors participating in the scheme. Assume that each Actor ID and each Movie Title uniquely identifies Actors and Movies respectively and that the corresponding XML schema includes key and foreign key constraints. 2.Write a Java program that prints out the tree structure corresponding to the XML schema. 3.Validate the schema against the given XML document and print out the results of the validation. The validation should be successful and exceptions should be handled appropriately. 4. Query the XML document file using XPath in order to provide the results of the following queries: a. Find all the movies starring Tom Cruise; b. Find all the actors whose first name is “Daniel”. Hints Watch this space as new hints will be published regularly. XML Schema When designing the XML schema pay attention to the order used to define the elements in the given XML document. Make sure you use the appropriate representation of keys and foreign keys as presented during the lectures. Make sure you also define cardinality constraints appropriately. Java code A word on validation In Lab Class 5 we started looking at the notion of validation. For this assignment we are using the validating version of the DOM parser, therefore we need to set the appropriate configuration for the parser. However, this depends on the way we associate the document with its schema. If we the schema has a namespace, then the XML document will have the following instructions in the preamble Alternatively, you can also choose not declare a namespace at all. If you follow this route the preamble is different. The location can be a URL or an absolute location in a directory. If you use namespaces, then the parser associates the schema location to the document using the following code DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); factory.setValidating(false); factory.setNamespaceAware(true); //trying not Ignoring whitespaces //factory.setIgnoringElementContentWhitespace(true); SchemaFactory schemaFactory = SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema"); factory.setSchema(schemaFactory.newSchema( new Source[] {new StreamSource(uriXSD)})); The validation is performed by creating a parser DocumentBuilder builder = factory.newDocumentBuilder(); Document document = builder.parse(uriXML); XPath An XPath query can be creating an XPath factory (from the package javax.xml.xpath) after having parsed the document. //Ask the XPath queries XPathFactory factory1 = XPathFactory.newInstance(); XPath xpath = factory1.newXPath(); xpath.setNamespaceContext(new PersonalNamespaceContext()); XPathExpression expr = xpath.compile(search); The fragment of code above includes also a method to deal with namespaces. If the elements in the XML document are in a namespace, then the XPath expression for querying that document must use the same namespace. The XPath expression does not need to use the same prefixes, only the same namespace URIs. In fact, if the XML document uses the default namespace, the XPath expression must use a prefix even though the target document does not. Finally, the XPath data model is different from the Java one, and he results of XPath expressions are of type Nodes, Number, Boolean etc, as described in the table in Lab Class 5. More information can be found in the links listed in the useful resources. Assignment submission Assignments are handed in electronically through the departmental coursework submission system. For technical problems with the submission site please email either Dr. Tamma or Mr Shield. When submitting, check that you have adhered to the following list: 1. The solution to the exercises is contained in one zip file containing the Java source, XMLProcessing.java, and any other file needed to create XMLProcessing.class. The java file's name MUST be 'XMLProcessing.java'. This means that the main class name must also be 'XMLProcessing'. 2. Your code should be clearly laid out and documented. 3. Make sure that your program compiles from command line, and that can be executed correctly. Do not submit code that does not compile or it will be marked 0. 4. The naming convention for submitting the assignment is to name the file as: YourSurname-Assignment2.zip 5. Failure to follow the submission details explained above will caused a deduction of 15 marks. 6. Your solution must not bear undue resemblance to anybody else's! Electronic checks for similarity will be performed on all submissions and instances of plagiarism will be severely dealt with. The rules on plagiarism and collusion are explicit: do not copy code from anyone else’s, do not let anyone else copy from your code and do not hand in 'jointly developed' solutions. Marking Scheme: Below is the breakdown of the mark scheme for this assignment. Each category will be judged on the correctness, efficiency and modularity of the code, as well as whether or not it compiles and produces the desired output. Generation of the XML Schema following the specification = 20 marks; Key and foreign key constraints = 10 marks; Validation of the created XML file against the given XML schema = 20 marks; Implementation of the queries = 10 marks for each of the query (20 marks in total); Print out of the DOM tree corresponding to the schema = 10 marks; Comments and layout = 10 marks; Implementation of the solution: modularity, generality of the solution, etc = 10 marks. This assignment contributes 10% to your overall mark for Comp 207. Useful resources: Processing of XML in Java DOM Api http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/package-summary.html http://www.iwebie.com/java-xml-dom-example Validation and XPath http://www.edankert.com/validate.html http://www.ibm.com/developerworks/library/x-javaxpathapi/index.html http://www.ibm.com/developerworks/xml/library/x-javaxmlvalidapi/index.html http://docs.oracle.com/javaee/1.4/tutorial/doc/JAXPXSLT3.html http://www.ibm.com/developerworks/xml/library/x-javaxmlvalidapi.html