School of Information Technology & Computer Science Session: 1, January 2012 University of Wollongong Lecturer: Janusz R. Getta CSCI315 Database Design and Implementation Singapore 2012-1 Assignment 3 13 February 2012 This assignment includes the tasks in conceptual modeling and physical database design. This assignment is due by Sunday, 4 March, 2012, 10.00pm Singaporean time. This assignment is worth 7% of the total evaluation in the subject. Only electronic submission through eLearning (http://vista.uow.edu.au) will be accepted. All email submissions will be immediately deleted ! A submission procedure is explained at the end of this specification. Tasks Task 1 (2 marks) Consider the following description of a sample database domain. We would like to design a database to store information about kids' constructions built from the Lego bricks (http://www.lego.com). A complete construction is described by unique name of its creator, dates when it has been started and completed, a name of construction and type of a construction. At the moment we consider only three types of constructions: Architecture, Technic, and City. A construction consists of the elementary and composite parts. An elementary part also called as "Lego brick" is described by its unique name, colour, short specification and type. There are two types of elementary parts: Bricks and Items. A composite part is described by a unique name, short specification how it has been built, and a name of file with its jpg image. Composite parts are created from elementary parts and other composite parts through the application of the operations. An operation is described by a unique name, types of its arguments, the total number of arguments, type of its result, and short specification how the arguments suppose to be combined to form a composite part being the result of an operation. A type of an argument is equivalent to a name of either composite or elementary part. A type of result is also a name of either composite or elementary part. The argument of some operations maybe overloaded, which means that the same operation may have arguments of different types. The database must contain information about all operations. For example, assume, that we would like to build a column, which consists of elementary parts of type brick2x2. We can start from an operation JOIN2x2 with two arguments: brick2x2, brick2x2 to get a composite part column2x2 and then by the iterative applications of an operation JOIN2x2 with two arguments: column2x2, brick2x2 we can build the rest of the column. Note, that in such a case the first argument of an operation JOIN2x2 is overloaded because it can take two types of arguments: brick2x2 or column2x2. A construction is created by the initial applications of the operations to only elementary parts and later on by the applications of the operations to elementary and composite parts. A complete formal description of a construction is a sequence of operations. The database must contain the formal descriptions of all constructions. Use a notation of simplified UML classes of objects to design a conceptual schema of the database. Make sure that a schema of your database does not contain redundancies and it correctly represents all information that should be included in the database. Verify the correctness of your design with a "checklist" included in a presentation 09 Quality of conceptual schemas, slides 34 and 35. Only the final conceptual schema is expected. There is no need to present the intermediate stages of your design. Deliverables A file solution1.pdf with a drawing of conceptual schema expressed in a notation of simplified UML object classes. To draw a conceptual schema you may use a file drawing-patterns.ppt that contains all graphical components of simplified UML classes in Powerpoint format. Task 2 (1 marks) The following conceptual schema represents a database domain where vehicles are repaired at repair facilities. (1) Perform simplification of the conceptual schema above and re-draw the simplified conceptual schema. (2) We would like to improve the performance of the following class of applications: Find the locations of repair facilities (attributes city, street, bldg# in a class REPAIR-FACILITY) where a truck with a given registration number (attribute rego# in a class VEHICLE) and given capacity (attribute capacity in a class TRUCK) has been repaired at least one time in a given period of time (attributes start-date, end- date in a class REPAIR) A sample application that belongs to a class described above is a as follows: Find the locations of repair facilities (attributes city, street, bldg# in a class REPAIR-FACILITY) where a truck with a registration number PKR856 (attribute rego# in a class VEHICLE) and capacity 2T has been repaired at least one time in 2010. Find the denormalizations of the simplified conceptual schema that improves the performance of the calls of applications described above. When performing the denormalizations apply the following transformations of the simplified conceptual schema: migration of attributes, partitioning of classes of objects, and elimination of generalization. Re-draw the simplified conceptual schema after the denormalizations. Deliverables A file solution2.pdf with a drawing of the simplified conceptual schema and a drawing of the optimised conceptual schema expressed in a notation of simplified UML object classes. To draw a conceptual schema you may use a file drawing- patterns.ppt that contains all graphical components of simplified UML classes in Powerpoint format. A file task2.ppt contains a drawing of the original schema. Task 3 (1 mark) Consider a relational table EMPLOYEE(e#, name, salary, position) where an attribute e# is a primary key. Assume that: (i) a relational table EMPLOYEE occupies 102 data blocks, (ii) a relational table EMPLOYEE contains 103 rows, (iii) an attribute name has 800 distinct values, (iv) an attribute salary has 20 distinct values, (v) an attribute position has 50 distinct values, (vi) a primary key is automatically indexed, (vii) the attributes salary and position are indexed, (viii) all indexes are implemented as B*-trees with a fanout equal to 10, (ix) a leaf level of an index on attribute salary consists of 5 data blocks, (x) a leaf level of an index on attribute position consists of 20 data blocks. Find each of the following queries describe how a database system plans to compute the queries, (i.e. provide detailed query execution plans) and determine the total number of read block operations needed to compute the query. Show ALL computations performed to get the final answer. (1) SELECT DISTINCT salary FROM EMPLOYEE; (2) SELECT position, COUNT(*) FROM EMPLOYEE GROUP BY position; (3) SELECT * FROM EMPLOYEE WHERE e# = 007 AND position = 'boss'; (4) SELECT * FROM EMPLOYEE WHERE position = 'boss'; (5) SELECT * FROM EMPLOYEE WHERE position = 'boss ' AND salary = 1000; (6) SELECT MAX(SALARY) FROM EMPLOYEE; (7) SELECT * FROM EMPLOYEE WHERE position = 'boss ' OR salary = 1000; (8) SELECT salary FROM EMPLOYEE; (9) SELECT salary, position FROM EMPLOYEE; (10) SELECT * FROM EMPLOYEE ORDER BY salary DESC; Deliverables A file solution3.pdf with the computations of the total number of read block operations needed to compute the query. The file must contain the comprehensive descriptions of query execution plans and ALL computations performed to get the final answer. Task 4 (1.0 mark) Find the smallest collection of indexes that speeds up the processing of the queries listed above. Write CREATE INDEX statements that create all indexes found. For each SELECT statement explain how it will benefit from existence of the indexes. (1) SELECT I_TITLE, I_PUB_DATE, I_COST FROM ITEM ORDER BY I_PUB_DATE; (2) SELECT COUNT( I_TITLE ) FROM ITEM; (3) SELECT I_SUBJECT, COUNT(*) FROM ITEM GROUP BY I_SUBJECT; (4) SELECT COUNT( DISTINCT I_SUBJECT ) FROM ITEM; (5) SELECT I_SUBJECT, I_PUB_DATE FROM ITEM WHERE I_SUBJECT = 'ACTION' AND I_TITLE = 'JAVA'; Deliverables A file solution4.pdf with CREATE INDEX statements and with the comprehensive explanations how each query will benefit from the indexes. Task 5 (2 marks) Consider a relational database, that consists of the following relational tables. U(P) PK = (P) R(A,B,C) PK = (A), FK1 = (C) references Z(C) Z(C,N,P,M) PK = (C), FK1 = (P) references U(P), FK2 = (M) references Y(M) Y(K,L,M) PK = (M), FK1 = (K) references X(K) X(G,K) PK = (K), FK1 = (G) references V(G) V(G,F,E) PK = (G), FK1 = (F) references W(F), FK2 = (E) references T(E) W(H,F) PK = (F) T(D,E) PK = (E) FK1 = (D) references S(D) S(A,C,D) PK = (D) FK1 = (A) references R(A) Assume the following sizes of the relational tables. R 200 blocks S 100 blocks T 500 blocks U 800 blocks V 500 blocks W 300 blocks X 200 blocks Y 600 blocks Z 100 blocks Assume, that R and S are joined on average 10 times per day and join needs 900 read operations, S and T are joined on average 5 times per day and join needs 1000 read operations, T and V are joined on average 10 times per day and join needs 3000 read operations, V and W are joined on average 20 times per day and join needs 900 read operations, V and X are joined on average 10 times per day and join needs 1000 read operations, X and Y are joined on average 30 times per day and join needs 2000 read operations, Y and Z are joined on average 10 times per day and join needs 3000 read operations, Z and U are joined on average 20 times per day and join needs 2500 read operations, R and Z are joined on average 50 times per day and join needs 900 read operations. Find a clustering graph and the best clustering of the relational tables R, S, T, U, V, W, X, Y, Z. Deliverables A file solution5.pdf with a drawing of a clustering graph and specification of the best clustering of the relational tables R, S, T, U, V, W, X, Y, Z. Submissions Submit the files solution1.pdf, solution2.pdf, solution3.pdf, solution4.pdf, and solution5.pdf through eLearning in the following way: (1) Connect to eLearning at http://vista.uow.edu.au (2) Select a link to CSCI315 Database Design and Implementation (3) Navigate to a folder SUBMISSIONS (4) Click at Assignment 3 Submit your solution here link. (5) Click at Add Attachments button. (6) Navigate to a location where the file solution1.pdf, solution2.pdf, solution3.pdf, solution4, and solution5.pdf have been saved. (7) Select the file and click at Open button. Repeat the steps (5), (6) and (7) for all files. (8) Click at Submit button. (9) Click at OK button to return to Home Page. A policy regarding late submissions is included in the subject outline. Only one submission per student is accepted. Only electronic submission through eLearning (http://vista.uow.edu.au) will be accepted. All email submissions will be immediately deleted ! The first assignment is an individual assignment and it is expected that all its tasks will be solved individually without any cooperation with the other students. However, it is allowed to declare that a particular component or task of this assignment has been implemented in cooperation with another student (please read a text of declaration on a cover page). In such a case evaluation of a task or component may be shared with another student. In all other cases plagiarism will result in a FAIL grade being recorded for entire assignment. If you have any doubts, questions, etc. please consult your lecturer or tutor during lab classes or over e-mail. End of specification