EE984 Laboratory Experiment 4 (Database Access) Dept. of Electronic Systems Engineering ©University of Essex, 2003-2004 1 EE984 Laboratory Experiment 4: Database Access Table of Content 1. Session 1 ....................................................................................................................2 1.1. Introduction .........................................................................................................2 1.2. Assessment ..........................................................................................................2 1.3. Relational database schema.................................................................................2 1.4. Implementing the database ..................................................................................3 1.5. Registering the database as an ODBC data source..............................................4 1.6. Simple Java Program to access the database. ......................................................4 1.7. Creating data for the database .............................................................................6 1.8. The Timer ............................................................................................................6 2. Session 2 ....................................................................................................................7 2.1. Introduction .........................................................................................................7 2.2. Scenario ...............................................................................................................7 2.3. Dates in Java........................................................................................................7 2.4. Populating the database .......................................................................................8 2.5. Timing measurements..........................................................................................8 3. Session 3 ..................................................................................................................10 3.1. Introduction .......................................................................................................10 3.2. Three Tier Application ......................................................................................10 3.3. Java Database Server .........................................................................................10 3.4. Client .................................................................................................................12 3.5. Experiments .......................................................................................................14 Appendix. Java Streams...............................................................................................15 EE984 Laboratory Experiment 4 (Database Access) Dept. of Electronic Systems Engineering ©University of Essex, 2003-2004 2 1. Session 1 1.1. Introduction The aim of this experiment is to measure the performance of a Java program accessing a relational database using a JDBC:ODBC ‘bridge’. The performance is to be measured when the database process (the ‘server’) and program (the ‘client’) are on 1) the same computer and 2) different machines on the same LAN. The experiment will take three 4-hour sessions and has the form of a ‘structured mini- project’. In the first session in week 9 you will create an MS Access relational database and a Java program to populate it with data of a specified statistical nature. You will also design and carry out an experiment to investigate the nature of the timing facilities provided by Java. In the second session in week 10, you will design and carry out an experiment to measure the time it takes a Java program to execute specified queries on the database as a function of the size and structure of the database. In the third session in week 11 you will repeat the experiment of the second session but with the database server residing on a different machine to the Java client program accessing it. TCP will be used to communicate between the database and the Java program. Hopefully, you will be able to carry out most of the experiment within the allotted time. If you are less familiar with relational databases/Java you may have to spend some extra time on the experiment, particularly the work of the first session in week 9. 1.2. Assessment The experiment will be assessed by means of a report, which you will submit by the end of week 16 (Spring Term) to ESE Stores. The report is to have a section for each of the sessions and final conclusions for the whole experiment at the end. DO NOT PAD OUT THE REPORT WITH BACKGROUND MATERIAL! Each section should carefully explain the aims (see below), design and results of the corresponding experiment. Credit will be given for clear and concise explanations. Given the limited time it will not be possible to carry out wide ranging experiments on database access performance that lead to general conclusions (this could easily be the subject for a whole doctoral thesis!). Instead, for each session you should think of a simple and unambiguous question (this is the ‘aim’ above) that you could answer by means of experimentation in the time available. It may take some thinking time to come up with such a question! In this first session, which concerns the investigation of the timing facilities of Java, you will be given some ideas on how to do this. 1.3. Relational database schema EE984 Laboratory Experiment 4 (Database Access) Dept. of Electronic Systems Engineering ©University of Essex, 2003-2004 3 The following scenario attempts to incorporate some realistic features on which to base the experiments of this laboratory. A company called ‘A.COM’ records its purchase orders in a database consisting of two normalised tables (‘relations’):- ORDER(orderId, supplierName, orderDate) ORDERITEM(orderItemId, orderId, description, quantity, pricePerUnit) The underlined ‘fields’ are the table keys i.e., those that uniquely identify a row in its table. The tables are normalised so that given information items are not repeated. An unnormalised version of the above database consisting of one table would be UORDER(orderId, supplierName, orderDate, orderItemId, orderId, description, quantity, pricePerUnit) This database repeats information. However, it may be more efficient to query than the normalised one because the latter in general requires its two tables to be joined together to retrieve a query. You may wish to make this observation the subject of your later performance investigations. 1.4. Implementing the database In this experiment we will implement the above database using MsAccess. This can be done in several ways. The simplest way is to directly use MsAccess itself but an alternative is to use code (e.g., Java or Visual Basic). We will first illustrate the use of MsAccess. Select ‘Start|Programs|Microsoft Access’ starting from the ‘start bar’ and select the ‘Blank Database’ option to create a database called ‘acomdb’ in a directory of your choice. Here you may have a choice – either you place the database on your ‘M’ drive (which of course is really on a different machine to the one you are using, i.e., a networked drive) or on a drive local to the machine (the D or E temporary drives). This choice is likely to affect the performance you later measure. If the database is correctly created you will be presented with a ‘database view’ called ‘acomdb:Database’ consisting of six ‘tab’ views which default to the ‘Tables’ tab. Click the ‘new’ button on the ‘Tables’ tab, select ‘Design View’ and enter the following:- Field Name Data Type orderId Number supplierName Text orderDate Date/Time Highlight the ‘orderId’ row and press the button with the ‘key’ icon on the menu bar to set ‘orderId’ as the key. Save the table as ‘tblOrder’ and switch to ‘datasheet view’ by clicking the leftmost icon on the same menu bar. Enter one row with the values “RS” (don’t use the quotes) and “1/1/00”. EE984 Laboratory Experiment 4 (Database Access) Dept. of Electronic Systems Engineering ©University of Essex, 2003-2004 4 Create the table ‘tblOrderItem’ in the same way. Ask the demonstrators about the domains of the fields. Also find out from them how to implement indexes. These will be useful for speeding up queries. 1.5. Registering the database as an ODBC data source. You will write a Java program that accesses the database via a standard interface called ODBC. The database will have to be registered for this purpose as an ODBC source. The way to do this is described in the laboratory. NOTE: YOU CAN ‘CUT AND PASTE’ CODE IN THIS EXPERIMENT FROM THE COPY OF THIS SCRIPT ON THE WEB. 1.6. Simple Java Program to access the database. Enter and run the following Java program (e.g., using the command line editor):- import java.io.*; import java.sql.*; public class acomdb{ public static void main(String args[]){ try{ Class.forName("sun.jdbc.odbc.JdbcOdbcDriver"); Connection con = DriverManager.getConnection ("jdbc:odbc:ee984source", "sm", "”); Statement stmt = con.createStatement(); long start=System.currentTimeMillis(); ResultSet rs=stmt.executeQuery("SELECT * FROM tblorder"); while (rs.next()) { String supplier = rs.getString("supplierName"); String orderDate=rs.getString("orderDate"); String orderId=rs.getString("orderId"); System.out.println(supplier); System.out.println(orderDate); System.out.println(orderId); } long end=System.currentTimeMillis(); System.out.println("Time taken= " +(end-start)); } catch (Exception e) { System.out.println(e); } } Listing 1.1 Note:- If this program fails to work after you are sure that it has been debugged it may be that the ‘classpath’ setting is incorrect. If this is the case then the command set classpath= EE984 Laboratory Experiment 4 (Database Access) Dept. of Electronic Systems Engineering ©University of Essex, 2003-2004 5 may solve the problem. For each new command line window the ‘classpath’ may have to be set in this way. This program illustrates how a relational database is accessed using embedded SQL in Java. Listing 1.1 Explained. How is this program able to run the Microsoft product Access? First of all it loads the package:- import java.sql.*; This package contains the classes of the Java Database Connectivity (JDBC) API, which is used by Java programmers to access relational databases. In particular it contains the Connection, DriverManager, Statement and ResultSet classes used in the program. The JDBC will access MS Access by itself using another API called ODBC (JDBC is acting as a ‘wrapper’ around ODBC, which is a standard C language API for relational database access) that contains the necessary functionality to open and extract data from MsAccess. In order to do this, JDBC needs to call methods of a class called JdbcOdbcDriver. In the program this class is loaded using the static method of the class Class:- Class.forName("sun.jdbc.odbc.JdbcOdbcDriver"); The connection to the database is created using the static method getConnection() of the ODBC class DriverManager:- Connection con = DriverManager.getConnection ("jdbc:odbc:ee984source", "sm", "”); This method has three arguments. The first is the URL of the database. In this case it is “jdbc:odbc:ee984source”, which tells the method that the database is being opened using jdbc, which in turn uses odbc to open the odbc-registered database with the odbc name ‘ee984source’. In Windows 2000 databases are registered using the Data Sources (ODBC) application that resides under Administrative Tools in the Control Panel. Registration details will be covered in detail in this laboratory. The second and third arguments are the username and password required in order to open the database. Next an instance of the ODBC class ‘Statement’ is created. This instance manages queries and other operations on the database:- Statement stmt = con.createStatement(); EE984 Laboratory Experiment 4 (Database Access) Dept. of Electronic Systems Engineering ©University of Essex, 2003-2004 6 An instance of ‘Statement’ has an executeQuery() method that runs an SQL statement and places the results in an instance of the class ResultSet. The methods of ResultSet allow the results to be manipulated. Deeper into Database Access The JdbcOdbcDriver class is stored in Windows\JAVA\6zvhbhjv.zip together with related classes. Somehow forName() manages to find and load it despite the fact that its actual location is very different from the search path provided to it! An interesting implementation detail is that the JdbcOdbvDriver class is supported by the JdbcOdbc.dll in c:\jdk1.2.1\jre\bin. Using the ‘dependency walker’ utility it can be seen that this .dll consists of functions like _Java_sun_jdbc_odbc_JdbcOdbc_allocConnect@16 whose name clearly corresponds to the nominal directory structure. Thus when we compile a Java program, even to byte code, the basic Java methods are replaced by calls to functions in these dlls. ‘c:\jdk1.2.1\jre\bin’ contains other *.dll and *.ocx binaries clearly created by the Java environment developers to allow the system Java class methods to access Windows functions. You will need to learn how to use the ‘INSERT’ statement of SQL to enter data into the database as well as read data. See the Access Help file for further information and also your Java textbook (The book by Deitel & Deitel contains all the necessary information). The essential changes to the above code are the use of the Connection class ‘executeUpdate()’ method instead of the ‘executeQuery()’ method and to explicitly use the Connection ‘close()’ method to close the connection (if this is not done the inserted records will not appear in the database). Details concerning the execution of SQL statements in MsAccess will be covered in this laboratory. 1.7. Creating data for the database Write a Java program to create dummy data for the database. The creation of this data will be the main task of the next laboratory. 1.8. The Timer You will be performing timing measurements in the next laboratories using the timer facility illustrated in the code segment above. You should now investigate the properties of this timer such as its accuracy and what it actually measures (real elapsed time or the time allocated to your process). Design and carry out simple experiments to answer these questions. End of Session 1 EE984 Laboratory Experiment 4 (Database Access) Dept. of Electronic Systems Engineering ©University of Essex, 2003-2004 7 2. Session 2 2.1. Introduction In session 1 you will have familiarised yourself with the creation of a relational database using MsAccess, the use of SQL to query and update the database and the execution of these SQL operations from a Java program that connects to the database using an JDBC:ODBC ‘bridge’. You will also have carried out some simple experiments to understand the meaning and accuracy of timing measurements performed using Java’s ‘System.currentTimeMillis()’ timing function. In this session you will carry out a simulation to estimate the query performance of a relational database using Java to access the local database. In session 3 you will repeat the simulation with the database situated on a different networked machine to the executing Java program. 2.2. Scenario A small company ‘A.com’ has a relational database to store its order information. You created this database in the previous laboratory session, namely ORDER(orderId, supplierName, orderDate) ORDERITEM(orderItemId, orderId, description, quantity, pricePerUnit). It has 10 suppliers whose names are (conveniently!) S1.com, S2.com, … S10.com and is equally likely to place its next order with any of them. Each order has between 1 and 10 order items, with a mean number of order items equal to 5.5. It places between 0 and 100 orders per day, with a mean number of orders in a day equal to 50. Assume uniform probability distributions for the number of orders per day and the number of order items per order. Each order item specifies a ‘description’ ∈ {W1, .., W10} of a widget and integers ‘quantity’ and ‘pricePerUnit’, both uniformly distributed in the range [1,10]. 2.3. Dates in Java The order date is best dealt with using Java’s date-related classes, which are illustrated by the following simple program:- import java.util.*; import java.text.*; class demjavaDates { public static void main(String args[]) { // The abstract class Calendar returns time/date. It is //abstract because an instance of a local extension is required. //This is provided by the static ‘getInstance’ member of Calendar Calendar myCalendar; EE984 Laboratory Experiment 4 (Database Access) Dept. of Electronic Systems Engineering ©University of Essex, 2003-2004 8 myCalendar=Calendar.getInstance(); // A DateFormat object converts the date/time into the required //format DateFormat shortDate=DateFormat.getDateInstance(DateFormat.SHORT); java.util.Date currentDate; currentDate= myCalendar.getTime(); System.out.println("Current Date is”+shortDate.format(currentDate)); int i; // The following function sets the calendar date myCalendar.set(2001,0,1); for(i=0;i<40;i++) { // add() with the arguments below gets the date of the next day myCalendar.add(Calendar.DAY_OF_MONTH,1); System.out.println(shortDate.format(myCalendar.getTime())); } }; A.com is to operate every day of the week starting (‘going live’ – to use the current vernacular) on January 1, 2003. Given the above scenario, we wish to predict the speed with which the following query will be executed at the end of each month in 2003:- “Calculate the total cost of all orders placed with supplier S1” NOTE: This is the query to be investigated. Do not invent one of your own. You will lose marks if you do! 2.4. Populating the database The main task in programming the above simulation is to write two functions, one to create an order record and the other to create an order item record. Both functions will require the use of a random number generator. Make sure you record the random number generator seed (if you choose not to use the default) so that you will be able to repeat any measurement if required. If necessary, the use of random number generators in this fashion will be reviewed in the laboratory. Once the functions are working populate the database one month at a time and perform your timing measurements on the above query at the end of each month’s of data. 2.5. Timing measurements Construct the SQL for the query to be executed. For students less familiar with SQL this query will be discussed in the laboratory session. Perform the required timing measurements. If you repeat a measurement make sure the same conditions apply. Remember that a database often ‘caches’ data in (process) memory that it has retrieved from disk. This means that if the data is again requested it may be retrieved quicker. In the latter case the database is said to be ‘hot’. You can make sure the database is ‘cold’ before each measurement by shutting and reopening it. Repeat the experiment several times to get an average time. EE984 Laboratory Experiment 4 (Database Access) Dept. of Electronic Systems Engineering ©University of Essex, 2003-2004 9 Repeat the above measurements in two of the following cases:- 1) Use a Java ‘prepared’ statement instead of the usual statement to execute the query. 2) Use the unnormalised database UORDER of session1. This will require the joining of the two tables in the database into a single one. 3) Implement indexes on ‘supplierName’ in ORDER and ‘orderId’ in ORDERITEM.. Interestingly, if this experiment is carried out using Visual Basic to access the database the times taken are typically 100 times less! Can you guess why? If you are sceptical about this claim the please ask for a demonstration! Try to explain the timings you observe and compare them with your earlier measurements. Next week, in the final session of this experiment, you will repeat the above experiment but with the difference that the database and Java program will be running on different computers communicating using TCP. End of Session 2 EE984 Laboratory Experiment 4 (Database Access) Dept. of Electronic Systems Engineering ©University of Essex, 2003-2004 10 3. Session 3 3.1. Introduction In session 1 you will have familiarised yourself with the creation of a relational database using MsAccess, the use of SQL to query and update the database and the execution of these SQL operations from a Java program that connects to the database using an JDBC:ODBC ‘bridge’. You will also have carried out some simple experiments to understand the meaning and accuracy of timing measurements performed using Java’s ‘System.currentTimeMillis()’ timing function. In session 2 you carried out a simulation to estimate the query performance of a relational database using Java to access the local database. In this session you will repeat the simulation with the database situated on a different networked machine to the executing Java program. 3.2. Three Tier Application You will implement a three-tier application consisting of the (backend) database, the network server and the client. The client Java program will send an SQL statement to the server, which will then via an JDBC:ODBC connection execute the statement on the database and receive the query results in a Resultset object. The server will then return the results to the client. One advantage of allowing a server to mediate the transaction between client and database is that the server can manage several connections to the database. Hopefully, by the end of the session you will have seen the operation of this system with the client on the same computer and on a different computer to that of the server and the database. In the next two sections is the code for a simple database server and its client. A Java framed window is used for the GUI of each application in order to help clarify the operation of the system. As a result a large part of these programs is not specifically devoted to the database access, which is identical to that in Session 1. In this example the database contains the ORDER table of the previous sessions and has the ODBC name “tcptestdb”. The objective of this session is to get this code to run and then to modify it so as to repeat the experiments of Session 2 (use the already populated databases of this session) both with the client and server on the same and different computers. Note that the code below assumes that they are on the same computer. After everyone has successfully run the code, some time in the session will be devoted to a discussion of its important features. Helpful information about the code can be found in “Java”, by Deitel and Deitel. The code below is based on code in chapters 18 and 21 of this book. 3.3. Java Database Server import java.io.*; EE984 Laboratory Experiment 4 (Database Access) Dept. of Electronic Systems Engineering ©University of Essex, 2003-2004 11 import java.net.*; import java.awt.*; import javax.swing.*; import java.awt.event.*; import java.sql.*; public class Server extends JFrame { private JTextArea display; ObjectOutputStream output; ObjectInputStream input; Connection dbConnection; public Server() { super("Simple Database Server"); Container c=getContentPane(); display = new JTextArea(); c.add(new JScrollPane(display), BorderLayout.CENTER); setSize( 300, 150); show(); } public void runServer() { ServerSocket server; Socket connection; int counter=1; try { server= new ServerSocket(5000,100); while (true) { display.setText("Waiting for a connection\n"); connection=server.accept(); display.append("connection " + counter + " received from: " + connection.getInetAddress().getHostName() ); output= new ObjectOutputStream(connection.getOutputStream()); output.flush(); input= new ObjectInputStream(connection.getInputStream()); display.append("\nGot I/O streams"); String message = "Connection successful"; output.writeObject(message); output.flush(); do { try { message = (String) input.readObject(); display.append( "\nCLIENT>>>" + message + "\n" ); display.setCaretPosition(display.getText().length()); Class.forName("sun.jdbc.odbc.JdbcOdbcDriver"); Connection dbConnection = DriverManager.getConnection( "jdbc:odbc:tcptestdb", "",""); Statement stmt=dbConnection.createStatement(); ResultSet rs=stmt.executeQuery(message); while (rs.next()) { EE984 Laboratory Experiment 4 (Database Access) Dept. of Electronic Systems Engineering ©University of Essex, 2003-2004 12 String supplier=rs.getString("supplierName"); sendData(supplier); } dbConnection.close(); } catch (ClassNotFoundException cnfex) { System.out.println("\nUnknown object received"); } catch (Exception e) {} } while (!message.equals("TERMINATE")); display.append("\nUser terminated connection"); output.close(); input.close(); connection.close(); ++counter; } // end of while } // end of try catch( EOFException eof) { System.out.println("Client terminated connection"); } catch( IOException io ) { io.printStackTrace(); } }// end of runServer private void sendData( String s) { try { output.writeObject(s); output.flush(); display.append("SERVER>>> " + s+"\n"); } catch (IOException cnfex ) { display.append( "\nError writing object"); } } public static void main( String args[]) { Server app = new Server(); app.addWindowListener( new WindowAdapter() { public void windowClosing (WindowEvent e) { System.exit(0); } } ); app.runServer(); } }// end of Server 3.4. Client import java.io.*; import java.net.*; import java.awt.*; import java.awt.event.*; import javax.swing.*; public class Client extends JFrame { private JTextField enter; EE984 Laboratory Experiment 4 (Database Access) Dept. of Electronic Systems Engineering ©University of Essex, 2003-2004 13 private JTextArea display; ObjectOutputStream output; ObjectInputStream input; String message = ""; public Client() { super("Simple Database Client"); Container c=getContentPane(); enter=new JTextField(); enter.setEnabled(false); enter.addActionListener( new ActionListener() { public void actionPerformed (ActionEvent e) { sendData(e.getActionCommand() ); } } ); c.add(enter, BorderLayout.NORTH ); display = new JTextArea(); c.add(new JScrollPane(display), BorderLayout.CENTER ); setSize(300,150); show(); } public void runClient() { Socket client; try { display.setText("Attempting connection\n"); client= new Socket(InetAddress.getByName("127.0.0.1"), 5000); String serverName; serverName=client.getInetAddress().getHostName(); display.append("Connected to: " + serverName + "\n"); output= new ObjectOutputStream(client.getOutputStream()); output.flush(); input= new ObjectInputStream(client.getInputStream()); display.append("Got I/O Streams"); enter.setEnabled(true); do { try { message = (String) input.readObject(); display.append("\nSERVER>>> " + message ); display.setCaretPosition( display.getText().length()); } catch(ClassNotFoundException cnfex ) { display.append( "\nUnknown object type received"); } } while (!message.equals("SERVER>>> TERMINATE")); display.append("Closing connection. \n"); EE984 Laboratory Experiment 4 (Database Access) Dept. of Electronic Systems Engineering ©University of Essex, 2003-2004 14 input.close(); output.close(); client.close(); } catch (EOFException eof) { System.out.println("Server terminated connection"); } catch (IOException e ) { e.printStackTrace(); } } private void sendData(String s) { try { message=s; output.writeObject(s); output.flush(); // display.append(s); } catch( IOException cnfex ) { display.append( "\nError writing object" ); } } public static void main(String args[]) { Client app = new Client(); app.addWindowListener( new WindowAdapter() { public void windowClosing( WindowEvent e ) { System.exit(0); } } ); app.runClient(); } } 3.5. Experiments Modify the above code so that you can perform the same timing experiments as those in laboratory session 2. If you think the above server and client can be implemented better than has been done above (do you think the code to access the database has been put in the best possible place?) then please do so. Your aim will be to investigate the change in query retrieval performance resulting from accessing the database using a TCP connection, both in the case that the client and server share the same computer and in the case they run on different computers. Write up your experiments in the fashion described in laboratory session 1. Credit will be given for a concise report that nonetheless contains the important information. End of Session 3 EE984 Laboratory Experiment 4 (Database Access) Dept. of Electronic Systems Engineering ©University of Essex, 2003-2004 15 Appendix. Java Streams In this appendix we review the Java concept of a stream. With this concept understood, and a knowledge of TCP, the above code should be easier to understand. Java Streams A stream in Java is one or more bytes being read or written by a process. It does not matter what the source (in the case of a read operation) or the destination (in the case of a write operation) of these bytes is. The source could be the hard disk (ie., a file) or the keyboard and the destination could be the standard output (ie the screen of a terminal) or the hard disk. In fact the source and/or the destination could be some other hardware reachable over the network. Bytes transferred to and from a process over a network is treated in a Java program in the same way as transfer to and from the hard disk (ie, to and from a file). Both cases are described by a stream of bytes. Both are I/O operations. The only difference is the type of hardware attached to the stream, which is specified when the stream is first created. A stream is an object of a stream class. The base stream classes are called InputStream and OutputStream. These classes provide the capability to read/write one or more bytes from the stream to/from program variables and to skip bytes in the stream. Filtering in the I/O context means providing extra stream processing capabilities such as buffering (storing the transferred bytes in a temporary location before later being sent to the final location – this is done to improve performance) and the aggregation of the basic bytes into more complex types such a ‘int’, ‘long. A stream class providing filtering has to be derived from the FilterInputStream or FilterOutputStream classes, which are an extension of InputStream/OutputStream overriding many of the methods of the latter. DataInputStream is such a class. It implements the DataInput interface, which specifies functions such as readInt(), readLine() for treating larger aggregates of bytes in the stream as single more complex types. Other classes derived from FilterOutputStream and FilterInputStream are ObjectInputStream and ObjectOutputStream. These classes allow the reading/writing of objects belonging to known classes. For example, they can read/write objects of the String class. They automatically read/write objects that are trees of other objects. The constructor of these classes requires an argument that is of type InputStream or OutputStream. In the case of a network stream the stream orginates from a socket object. Such an object has a getOutputStream()/getInputStream() method that returns an InputStream/OutputStream object that can be used to connect the network socket to and ObjectInputStream or ObjectOutputStream object. After this is done the network details disappear as far as the programmer is concerned. He/she simply invokes methods of stream attached to the network.