Java程序辅导

C C++ Java Python Processing编程在线培训程序编写软件开发视频讲解

QQ：2653320439 微信：ittutor Email：itutor@qq.com

CS 133 Lab 1: SimpleDB CS133 Lab 1: SimpleDB Deadlines Part 0: Wednesday, September 11: Read Section 1 Getting started and Exercise 0 (nothing to submit). Part 1: Wednesday, September 18, 11:59 PM PT: Exercises 1-3 Final: Wednesday, September 25, 11:59 PM PT: Exercises 4-6 In the lab assignments in CS 133 you will write a basic database management system called SimpleDB. For this lab, you will focus on implementing the core modules required to access stored data on disk; in future labs, you will add support for various query processing operators, as well as transactions, locking, and concurrent queries. SimpleDB is written in Java. We have provided you with a set of mostly unimplemented classes and interfaces. You will need to write the code for these classes. We will grade your code by running a set of system tests written using JUnit. We have also provided a number of unit tests that you may find useful in verifying that your code works. Throughout the lab you will see numbered exercises, offset in boxes, that describe where you should write code and which unit tests you should expect to pass once you are done. The due dates above correspond to these exercises. Note that there is no code to write for Exercise 0. The remainder of this document describes the basic architecture of SimpleDB, and gives suggestions for coding, including the exercises you should complete. Section 3.2 discusses how to submit your code for both Part 1 and the Final version of the lab. We strongly recommend that you start as early as possible on this lab. It requires you to write a fair amount of code! Quick jump to exercises: Section 1 for Exercise 0 Section 2.2 for Exercise 1 Section 2.3 for Exercise 2 Section 2.4 for Exercise 3 Section 2.5 for Exercise 4 and 5 Section 2.6 for Exercise 6 Jump to Submission instructions. 1. Getting started These instructions are written for a Unix-based platform (e.g., Linux, MacOS, etc.) Because the code is written in Java, it should work under Windows as well, though the directions in this document may not apply. See Section 1.4 for information on working with the Eclipse IDE. Download the code from http://www.cs.hmc.edu/~beth/courses/cs133/lab/cs133-lab1.tar.gz and untar it. For example, you can do this by issuing the following commands on the command line: $ wget http://www.cs.hmc.edu/~beth/courses/cs133/lab/cs133-lab1.tar.gz $ tar xvzf cs133-lab1.tar.gz $ cd cs133-lab1 SimpleDB uses the Ant build tool to compile the code and run tests. Ant is similar to make, but the build file is written in XML and is somewhat better suited to Java code. Most modern Linux distributions include Ant. You can run ant on knuth.cs.hmc.edu. Eclipse also comes with a plugin for Ant; see Section 1.4. To help you during development, we have provided a set of unit tests in addition to the end-to-end system tests discussed in Section 1.1. These are by no means comprehensive, and you should not rely on them exclusively to verify the correctness of your lab. To run the unit tests use the test build target: $ cd cs133-lab1 $ # run all unit tests $ ant test $ # run a specific unit test $ ant runtest -Dtest=TupleTest You should see output similar to: # build output... test: [junit] Running simpledb.CatalogTest [junit] Testsuite: simpledb.CatalogTest [junit] Tests run: 6, Failures: 5, Errors: 1, Skipped: 0, Time elapsed: 0.079 sec [junit] Tests run: 6, Failures: 5, Errors: 1, Skipped: 0, Time elapsed: 0.079 sec # ... stack traces and error reports ... The output above indicates that two errors occurred during compilation; this is because the code we have given you doesn't yet work. As you complete parts of the lab, you will work towards passing additional unit tests. If you wish to write new unit tests as you code, they should be added to the test/simpledb directory. For more details about how to use Ant, see the manual. The Running Ant section provides details about using the ant command. However, the quick reference table below should be sufficient for working on the labs. Command Description ant Build the default target (for simpledb, this is dist). ant -projecthelp List all the targets in build.xml with descriptions. ant javadocs Build javadoc documentation ant dist Compile the code in src and package it in dist/simpledb.jar. ant test Compile and run all the unit tests. ant runtest -Dtest=testname Run the unit test named testname. ant systemtest Compile and run all the system tests. ant runsystest -Dtest=testname Compile and run the system test named testname. ant handin Generate tarball for submission. Exercise 0. To make sure that you have your environment set up, you will (a) try running a unit test and (b) generate and view the Java documentation, aka Javadocs. (a) Try to run the TupleDescTest unit test to make sure that you are able to compile the code: If you are using ant, use this command to run just TupleDescTest: ant runtest -Dtest=TupleDescTest You should see in the output that the tests have errors, e.g., a java.lang.AssertionError indicating assertions in the unit test are failing and probably a java.lang.NullPointerException too. This make sense since you haven't written any code yet! At the bottom of the output, you should see something like the following: BUILD FAILED The following error occurred while executing this line: Test simpledb.TupleDescTest failed You should not see something like "Compile failed". If you are using Eclipse, see Section 1.4 for setup. Then be sure to run TupleDescTest to familiarize yourself with viewing unit test results. (b) Now run the ant target to generate documentation: ant javadocs A directory called javadoc should have been created inside your cs133-lab1 directory. Note: in Eclipse, you may need to refresh the project contents in Package Explorer to see it (right-click the project name and find "refresh"). Open the file index.html in a web browser (or inside of Eclipse). Yay documentation! 1.1. Running end-to-end tests (unit tests in systemtest) In addition to small unit tests, such as TupleDescTest, we have also provided a set of end-to-end tests, called system tests, for checking the functionality of the system you have built so far. These tests are structured as JUnit tests that live in the test/simpledb/systemtest directory. To run all the system tests, use the systemtest build target: $ ant systemtest # ... build output ... [junit] Testcase: testSmall took 0.017 sec [junit] Caused an ERROR [junit] expected to find the following tuples: [junit] 19128 [junit] [junit] java.lang.AssertionError: expected to find the following tuples: [junit] 19128 [junit] [junit] at simpledb.systemtest.SystemTestUtil.matchTuples(SystemTestUtil.java:122) [junit] at simpledb.systemtest.SystemTestUtil.matchTuples(SystemTestUtil.java:83) [junit] at simpledb.systemtest.SystemTestUtil.matchTuples(SystemTestUtil.java:75) [junit] at simpledb.systemtest.ScanTest.validateScan(ScanTest.java:30) [junit] at simpledb.systemtest.ScanTest.testSmall(ScanTest.java:40) # ... more error messages ... This indicates that this test failed, showing the stack trace where the error was detected. To debug, start by reading the source code where the error occurred. When the tests pass, you will see something like the following: $ ant systemtest # ... build output ... [junit] Testsuite: simpledb.systemtest.ScanTest [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 7.278 sec [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 7.278 sec [junit] [junit] Testcase: testSmall took 0.937 sec [junit] Testcase: testLarge took 5.276 sec [junit] Testcase: testRandom took 1.049 sec BUILD SUCCESSFUL Total time: 52 seconds 1.2 Creating dummy tables It is likely you'll want to create your own tests and your own data tables to test your own implementation of SimpleDB. You can create any .txt file and convert it to a .dat file in SimpleDB's HeapFile format using the command: $ java -jar dist/simpledb.jar convert file.txt N where file.txt is the name of the file and N is the number of columns in the file. Note: dist/simpledb.jar is created after running ant dist. Notice that file.txt has to be in the following format: int1,int2,...,intN int1,int2,...,intN int1,int2,...,intN int1,int2,...,intN ...where each intN is a non-negative integer. Be sure file.txt ends in a newline. Note: To run the Jar file from within Eclipse instead of on the command-line, see the directions in "Running a Jar file Using Eclipse" in Section 1.4 below. To view the contents of a table, use the print command: $ java -jar dist/simpledb.jar print file.dat N where file.dat is the name of a table created with the convert command, and N is the number of columns in the file. 1.3. Implementation Notes Before beginning to write code, we strongly encourage you to read through this entire document to get a feel for the high-level design of SimpleDB. You will need to fill in any piece of code that is not implemented. It should be obvious from the comments within and above each Java method where we think you should write code. You may need to add private methods, instance variables, and/or helper classes. You may change APIs, but make sure our grading tests still run and make sure to mention, explain, and defend your decisions in your writeup. In addition to the methods that you need to fill out for this lab, the class interfaces contain numerous methods that you need not implement until subsequent labs. These will either be indicated per class: // Not necessary for lab1. public class Insert implements DbIterator { or per method: public boolean deleteTuple(Tuple t) throws DbException { // some code goes here // not necessary for lab1 return false; } The code that you submit should compile without having to modify these methods. Section 2 below walks you through the implementation steps for this lab and the unit tests corresponding to each one in more detail. Transactions, locking, and recovery As you look through the interfaces we have provided you, you will see a number of references to locking, transactions, and recovery. You do not need to support these features in this lab, but you should keep these parameters in the interfaces of your code because you will be implementing transactions and locking in a future lab. The test code we have provided you with generates a fake transaction ID that is passed into the operators of the query it runs; you should pass this transaction ID into other operators and the buffer pool. 1.4. Working in Eclipse Eclipse is a graphical software development environment that you might be more comfortable with working in. The instructions we provide were tested most recently with Eclipse Photon with Java 1.8 Setting the Lab Up in Eclipse Once Eclipse is installed, start it, and note that the first screen asks you to select a location for your workspace (we will refer to this directory as $W). On the file system, copy cs133-lab1.tar.gz to $W/cs133-lab1.tar.gz. Un-GZip and un-tar it, which will create a directory $W/cs133-lab1 (to do this, you can type tar -pzxvf cs133-lab1.tar.gz). In Eclipse, select File->New->Java Project. Enter "cs133-lab1" as the project name. Make sure the location is set to $W/cs133-lab1. Click finish, and you should be able to see "cs133-lab1" as a new project in the Project Explorer tab on the left-hand side of your screen. Opening this project reveals the directory structure discussed above -- implementation code can be found in "src", and unit tests and system tests found in "test." Running Ant Build Targets If you want to run commands such as ant test or ant systemtest, there are a couple options from within Eclipse: Add an Ant window in Eclipse: Go to Window -> Show view -> Ant. When the view opens (likely on the right), click the "add buildfile" icon. In the pop-up dialog box, select the build.xml under "cs133-lab1". Once the window appears in your workspace, you can run an ant target by right-clicking it, selecting "Run As," and then "Ant Build...". In the dialog box that pops up, you should see the target you right-clicked with a checkbox next to it in the "Targets" tab. Arguments such as "-Dtest=testname" can be specified in the "Main" tab, "Arguments" textbox. Clicking the "Run" button should run the build targets and show you the results in Eclipse's console window. Using build.xml in the Package Explorer: You can bring up that same dialog box by right-clicking build.xml in the Package Explorer (on the left) and selecting "Run As," and then "Ant Build...". You'll have to check off your desired targets in the "Targets" tab. Running Individual Unit and System Tests in the JUnit Tab You could also choose to run the unit or system tests by using the Package Explorer to run JUnit tests. Go to the Package Explorer tab on the left side of your screen. Under the "cs133-lab1" project, open the "test" directory. Unit tests are found in the "simpledb" package, and system tests are found in the "simpledb.systemtests" package. To run one of these tests, select the test (they are all called *Test.java - don't select TestUtil.java or SystemTestUtil.java), right click on it, select "Run As," and select "JUnit Test." This will bring up a JUnit tab, which will tell you the status of the individual tests within the JUnit test suite, and will show you exceptions and other errors that will help you debug problems. Running a Jar file Using Eclipse You can run a jar file from within Eclipse. For example, to run the jar command using dist/simpledb.jar described at the end of Section 1.2, take the following steps (note: ant dist creates the jar file, follow the directions above for running ant targets within Eclipse): Right-click on simpledb.jar in the Package Explorer (under dist) then click "Run As" and then "Run Configurations...". On the left side of the dialog box, select "Java Application", then click the icon for "New Launch Configuration" (a picture of a document with a plus sign, near the top on the left). You can pick a name for the configuation at the top of the right part of the dialog box, e.g., "SimpleDb" Set the project, which will likely be cs133-lab1 Set the main class to SimpleDb Your configuration is set up! You can use the "Arguments" tab to run the jar with command-line arguments. For example, if you want to run the convert command from Section 1.2, you would enter the following in the program arguments box: convert file.txt N, substituting the correct file location and value for N. Then click the "Run" button at the bottom of the dialog box. The next time you want to use this particular run configuration, you should see its name listed under "Java Application". 2. SimpleDB Architecture and Implementation Guide SimpleDB consists of: Classes that represent fields, tuples, and tuple schemas; Classes that apply predicates and conditions to tuples; One or more access methods (e.g., heap files) that store relations on disk and provide a way to iterate through tuples of those relations; A collection of operator classes (e.g., select, join, insert, delete, etc.) that process tuples; A buffer pool that caches active tuples and pages in memory and handles concurrency control and transactions (neither of which you need to worry about for this lab); and, A catalog that stores information about available tables and their schemas. SimpleDB does not include many things that you may think of as being a part of a "database." In particular, SimpleDB does not have: (In this lab), a SQL front end or parser that allows you to type queries directly into SimpleDB. Instead, queries are built up by chaining a set of operators together into a hand-built query plan (see Section 2.7). We will provide a simple parser for use in later labs. Views. Data types except integers and fixed length strings. (In this lab) Query optimizer. Indexes. In the rest of this Section, we describe each of the main components of SimpleDB that you will need to implement in this lab. You should use the exercises in this discussion to guide your implementation. This document is by no means a complete specification for SimpleDB; you will need to make decisions about how to design and implement various parts of the system. Note that for Lab 1 you do not need to implement any operators (e.g., select, join, project) except sequential scan. You will add support for additional operators in future labs. 2.1. The Database Class The Database class provides access to a collection of static objects that are the global state of the database. In particular, this includes methods to access the catalog (the list of all the tables in the database), the buffer pool (the collection of database file pages that are currently resident in memory), and the log file. You will not need to worry about the log file in this lab. We have implemented the Database class for you. You should take a look at this file as you will need to access these objects. 2.2. Fields and Tuples Tuples in SimpleDB are quite basic. They consist of a collection of Field objects, one per field in the Tuple. Field is an interface that different data types (e.g., integer, string) implement; see IntField and StringField. Tuple objects are created by the underlying access methods (e.g., heap files, or B-trees), as described in the next section. Tuples also have a type (or schema), called a tuple descriptor, represented by a TupleDesc object. This TupleDesc object consists of a collection of Type objects, one per field in the tuple, each of which describes the type of the corresponding field. If a tuple is stored on disk, it will have a RecordId that identifies where in a file the tuple is located. Exercise 1. In this exercise, you will write code to manage tuples. Implement the skeleton methods in: src/simpledb/TupleDesc.java src/simpledb/Tuple.java At this point, your code should pass the unit tests TupleTest and TupleDescTest. At this point, the test modifyRecordId() should fail because you haven't implemented anything in RecordId.java yet. Some helpful notes: For many of the SimpleDb Java classes throughout the labs, you will need to add your own instance variables as needed. E.g., Tuple will need a data structure to hold its Fields; looking at how Tuple will need to use that data structure will help you pick one. Remember that in Java, you check for equality using == only for primitive types. Any object, including String, should be compared using the equals() method. Standard Java classes that implement the interface java.lang.Iterable, such as java.util.ArrayList, can be useful for getting simple iterators over a collection like those required in TupleDesc and Tuple. The Java keyword instanceof can be used to check if an object is an instance of a particular Java class. Be sure to check the Javadoc comments above methods to see if you need to throw an exception under certain circumstances. E.g., if (someCondition) throw new NoSuchElementException(); 2.3. Catalog The catalog (class Catalog in SimpleDB) keeps track of the tables and the schemas of the tables that are currently in the database. You will need to support the ability to add a new table, as well as getting information about a particular table. Associated with each table, or DbFile, is a TupleDesc object that allows query plan operators to determine the types and number of fields in a table. The global catalog is a single instance of Catalog that is allocated for the entire SimpleDB process. The global catalog can be retrieved via the method Database.getCatalog(), and the same goes for the global buffer pool (using Database.getBufferPool()). Exercise 2. Implement the skeleton methods in: src/simpledb/Catalog.java At this point, your code should pass the unit tests in CatalogTest. Some helpful notes: Before deciding which data structure(s) you want to use to store the Catalog information, take a look at the methods in Catalog to see what lookup functionality it will support (the Javadoc is helpful for this!). In Java, you can't use primitive types such as int in collections like ArrayList or Iterator. Check out the wrapper class java.lang.Integer. 2.4. BufferPool The Buffer Pool (class BufferPool in SimpleDB) is responsible for caching pages in memory that have been recently read from disk. All operators read and write pages from various files on disk through the buffer pool. The Buffer Pool consists of a fixed number of pages, defined by the numPages parameter to the BufferPool constructor. In later labs, you will implement an eviction policy. For this lab, you only need to implement the constructor and the BufferPool.getPage() method used by the Sequential Scan (SeqScan) operator. The BufferPool should store up to numPages pages. For this lab, if more than numPages requests are made for different pages, then instead of implementing an eviction policy, you will throw a DbException. In a future lab you will be required to implement an eviction policy. Recall that the Database class provides a static method, Database.getBufferPool(), that returns a reference to the single BufferPool instance for the entire SimpleDB process. Exercise 3. Implement the getPage() method in: src/simpledb/BufferPool.java We have not provided unit tests for BufferPool. The functionality you implemented will be tested in the implementation of HeapFile below. Some helpful notes: You should use the DbFile.readPage method to access pages of a DbFile. Think about how to get the correct DbFile! If you haven't yet, you'll want to look at the Java docs for PageId and Page. For this lab, you won't need to use the arguments tid or perm since you won't be implementing transactions and locking until Lab 4. 2.5. HeapFile access method Access methods provide a way to read or write data from disk that is arranged in a specific way. Common access methods include heap files (unsorted files of tuples) and B-trees; for this assignment, you will implement a heap file access method; we have written some of the code for you. In SimpleDb, all access methods are encapsulated in classes that implement the DbFile interface, such as HeapFile. A HeapFile object provides access to a set of pages stored on disk, each of which consists of a fixed number of bytes for storing tuples, (defined by the constant BufferPool.PAGE_SIZE), including a header. In SimpleDB, there is one HeapFile object for each table in the database. Each page in a HeapFile is arranged as a set of slots, each of which can hold one tuple (tuples for a given table in SimpleDB are all of the same size). In addition to these slots, each page has a header that consists of a bitmap with one bit per tuple slot. If the bit corresponding to a particular tuple is 1, it indicates that the tuple is valid; if it is 0, the tuple is invalid (e.g., has been deleted or was never initialized). Recall that a tuple's RecordId indicates which page (of the file) and which slot on that page the tuple is located. Pages of HeapFile objects are of type HeapPage, which implements the Page interface. Pages are stored in the Buffer Pool but are read and written to/from disk using the HeapFile class. SimpleDB stores the pages in heap files on disk in more or less the same format they are stored in memory. Each file consists of page data arranged consecutively on disk. Each page consists of one or more bytes representing the header, followed by the page size bytes of actual page content. Each tuple requires tuple size * 8 bits for its content and 1 bit for the header. Thus, the number of tuples that can fit in a single page is: tuples per page = floor((page size * 8) / (tuple size * 8 + 1)) Where tuple size is the size of a tuple in the page in bytes. The idea here is that each tuple requires one additional bit of storage in the header. We compute the number of bits in a page (by mulitplying page size by 8), and divide this quantity by the number of bits in a tuple (including this extra header bit) to get the number of tuples per page. The floor operation rounds down to the nearest integer number of tuples (we don't want to store partial tuples on a page!) Once we know the number of tuples per page, the number of bytes required to store the header is simply: headerBytes = ceiling(tupsPerPage/8) The ceiling operation rounds up to the nearest integer number of bytes (we never store less than a full byte of header information.) The low (least significant) bits of each byte represents the status of the slots that are earlier in the file. Hence, the lowest bit of the first byte represents whether or not the first slot in the page is in use. Also, note that the high-order bits of the last byte may not correspond to a slot that is actually in the file, since the number of slots may not be a multiple of 8. Also note that all Java virtual machines are big-endian. Exercise 4. Implement the skeleton methods in: src/simpledb/HeapPageId.java src/simpledb/RecordID.java src/simpledb/HeapPage.java Although you will not use them directly in Lab 1, we ask you to implement getNumEmptySlots() and isSlotFree() in HeapPage. These require manipulating bits in the page header. You may find it helpful to look at the other methods that have been provided in HeapPage or in src/simpledb/HeapFileEncoder.java to understand the layout of pages. At this point, your code should pass the unit tests in HeapPageIdTest, RecordIDTest, and HeapPageReadTest. Some helpful notes: You will not need to modify the HeapPage constructor, however it is worth reading since it calls methods you will implement Be careful with rounding errors with Java int, as discussed in class. Implementing the iterator() method in HeapPage will likely be more complex than the iterator methods you have written so far. You will need to implement a java Iterator for the existing tuples in the page, which can be done by creating a class that implements the interface java.util.Iterator. Note: this is a different interface than java.lang.Iterable. For iterators, remember you should be able to call hasNext() repeatedly without skipping elements; calling next() twice in a row should yield consecutive items. Now that you have implemented HeapPage, you will write methods for HeapFile in this lab to calculate the number of pages in a file and to read a page from the file. HeapFile is given the filename, represented as an instance of a java.io.File, that it will read from. After implementing HeapFile, you will then be able to fetch tuples from a file stored on disk! Exercise 5. Implement the skeleton methods in: src/simpledb/HeapFile.java To read a page from disk, you will first need to calculate the correct offset in the file. Hint: you will need random access to the file in order to read and write pages at arbitrary offsets. You should not call BufferPool methods when reading a page from disk. Instead, use the File passed into the constructor. You will also need to implement the HeapFile.iterator() method, which should provide an Iterator of type DbFileIterator to iterate through through the tuples of each page in the HeapFile. The iterator must use the BufferPool.getPage() method to access pages in the HeapFile. This method loads the page into the buffer pool and will eventually be used (in a later lab) to implement locking-based concurrency control and recovery. Do not load the entire table into memory on the open() call in the Iterator -- this will cause an out of memory error for very large tables! At this point, your code should pass the unit tests in HeapFileReadTest. Some helpful notes: All pages are the same size; you can use the static method BufferPool.getPageSize(). The Java class java.io.RandomAccessFile may be useful for reading and writing to the on-disk file for a HeapFile. Pages in a HeapFile are numbered 0, 1, ... numPages()-1. You may find the method length() in java.io.File useful. You can use the HeapFile's TransactionId to call getPage() 2.6. Operators Operators are responsible for the actual execution of the query plan. They implement the operations of the relational algebra. In SimpleDB, operators are iterator based; each operator implements the DbIterator interface. Operators are connected together into a query execution plan, or simply "plan", by passing operators as input into other operators via their constructors, i.e., by 'chaining them together', forming a tree of operators. Access method operators at the leaves of the plan are responsible for reading data from the disk (and hence do not have any operators below them). At the top of the query plan tree, the program interacting with SimpleDB simply calls getNext on the root operator; this operator then calls getNext on its children, and so on, until these leaf operators are called. They fetch tuples from disk and pass them up the tree (as return arguments to getNext); tuples propagate up the plan in this way until they are output at the root or combined or rejected by another operator in the plan. For this lab, you will only need to implement one SimpleDB operator, the sequential scan (SeqScan); this operator is one of the access methods at the leaves of a query plan. Exercise 6. Implement the skeleton methods in: src/simpledb/SeqScan.java This operator sequentially scans all of the tuples from the pages of the table specified by the tableid in the constructor. This operator should access tuples through the iterator() method provided by the HeapFile. At this point, you should be able to complete the ScanTest system test. Good work! Some helpful notes: SeqScan implements the interface DbIterator, so be sure to read the Java docs for DbIterator.java. Note: this is *not* the same as the DbFileIterator interface. Much of the work of the SeqScan class should mostly be accomplished using the iterator for the appropriate HeapFile (think about how to find the correct DbFile!). You can use the TransactionId passed into the constructor to instantiate the HeapFile's iterator. You will add other operators in subsequent labs. 2.7. A simple query The purpose of this section is to illustrate how these various components are connected together to process a simple query. Suppose you have a data file, "some_data_file.txt", with the following contents: 1,1,1 2,2,2 3,4,4 You can convert this into a binary file that SimpleDB can query as follows: java -jar dist/simpledb.jar convert some_data_file.txt 3 Here, the argument "3" tells convert that the input has 3 columns. The following code implements a simple selection query over this file. This code is equivalent to the SQL statement SELECT * FROM some_data_file. package simpledb; import java.io.*; public class test { public static void main(String[] argv) { // construct a 3-column table schema Type types[] = new Type[]{ Type.INT_TYPE, Type.INT_TYPE, Type.INT_TYPE }; String names[] = new String[]{ "field0", "field1", "field2" }; TupleDesc descriptor = new TupleDesc(types, names); // create the table, associate it with some_data_file.dat // and tell the catalog about the schema of this table. HeapFile table1 = new HeapFile(new File("some_data_file.dat"), descriptor); Database.getCatalog().addTable(table1, "test"); // construct the query: we use a simple SeqScan, which spoonfeeds // tuples via its iterator. TransactionId tid = new TransactionId(); SeqScan f = new SeqScan(tid, table1.getId()); try { // and run it f.open(); while (f.hasNext()) { Tuple tup = f.next(); System.out.println(tup); } f.close(); Database.getBufferPool().transactionComplete(tid); } catch (Exception e) { System.out.println ("Exception : " + e); } } } The table we create has three integer fields. To express this, we create a TupleDesc object and pass it an array of Type objects, and optionally an array of String field names. Once we have created this TupleDesc, we initialize a HeapFile object representing the table stored in some_data_file.dat. Once we have created the table, we add it to the catalog. If this were a database server that was already running, we would have this catalog information loaded. We need to load it explicitly to make this code self-contained. Once we have finished initializing the database system, we create a query plan. Our plan consists only of the SeqScan operator that scans the tuples from disk. In general, these operators are instantiated with references to the appropriate table (in the case of SeqScan) or child operator (in the case of e.g. Filter). The test program then repeatedly calls hasNext and next on the SeqScan operator. As tuples are output from the SeqScan, they are printed out on the command line. We strongly recommend you try this out as a fun end-to-end test that will help you get experience writing your own test programs for simpledb. You should create the file "test.java" in the src/simpledb directory with the code above, and place the some_data_file.dat file in the top level directory. Then run: ant java -classpath dist/simpledb.jar simpledb.test Note that ant compiles test.java and generates a new jarfile that contains it. 3. Submission and Grading Details You must submit your code (see below) as well as a short (1 page, maximum) writeup describing your approach. This writeup should: Describe any design decisions you made. These may be minimal for Lab 1. Discuss and justify any changes you made to the API. Describe any missing or incomplete elements of your code. Describe how long you (and your partner) spent on the lab, and whether there was anything you found particularly difficult or confusing. 3.1. Collaboration This lab can be completed alone or with a partner. Please indicate clearly who you worked with, if anyone, on your writeup. Only one person needs to submit. On Gradescope, click "Group Members" at the bottom of the page after uploading your files to add your partner. 3.2. Submitting your assignment You will submit a tarball of your code on Gradescope for intermediate deadlines and for your final version. You only need to include your writeup for the final version. Generating Tarball You can generate the tarball by using the ant handin target. This will create a file called cs133-lab.tar.gz that you can submit. You can rename the tarball file if you want, but the filename must end in tar.gz. If you prefer, you can create the tarball with just your source code on the command line: $ cd cs-133lab1 $ tar czvf cs133-lab.tar.gz src/ The autograder won't be able to handle it if you package your code any other way! Submitting on Gradescope Click Lab 1 on your Gradescope dashboard. For deadlines besides the final version, you only need to upload cs133-lab.tar.gz. For the final version: click Lab 1 and then click the "Resubmit" button on the bottom of the page ; upload both cs133-lab.tar.gz and writeup.txt containing your writeup. If you worked with a partner, be sure to enter them as a group member on Gradescope after uploading your files. 3.3 Grading Your grade for the lab will be based on the final version after all exercises are complete. 75% of your grade will be based on whether or not your code passes the test suite. Before handing in your code, you should make sure it produces no errors (passes all of the tests) from both ant test and ant systemtest. Important: before testing, we will replace your build.xml and the entire contents of the test directory with our version of these files. This means you cannot change the format of .dat files! You should also be careful changing our APIs. You should test that your code compiles the unmodified tests. In other words, we will untar your tarball, replace the files mentioned above, compile it, and then grade it. It will look roughly like this: $ tar xvzf cs133-lab.tar.gz [replace build.xml and test] $ ant test $ ant systemtest If any of these commands fail, we'll be unhappy, and, therefore, so will your grade. An additional 25% of your grade will be based on the quality of your writeup, our subjective evaluation of your code, and on-time submission for the intermediate deadlines. ENJOY!! Acknowledgements Thanks to our friends and colleagues at MIT and UWashington for doing all the heavy lifting on creating SimpleDB!