Lab 3: DoubleChains and Collections CS 61B Lecture Lab/Homework Discussion Piazza Lab 3: Java Collections Submit by Friday, Sept 19 at 11:59 PM To avoid having to attend lab, submit by Wednesday at 2:00 PM Table of Contents Pre-lab Collection Classes List Iterators (part one) List Iterators (part two) Collection Utilities Sets Debugging Pre-lab If you haven't yet, take the background survey. If the links are broken, try logging into the website first. Before coming to lab, take this quiz to test your understanding of pointers. This quiz is not graded. You will first need to use the Account Administration link or the (Alternative Login) link to log into our system with your Calnet ID. to log into our system with your Calnet ID. If you feel at all shaky with these topics, consider coming to the guerrilla section which we'll be holding at time: TBA in place: TBA. Collection Classes So far, we've seen fairly primitive types for representing lists of things. Arrays: Advantages: fast random access to items. Disadvantages: hard to insert items or delete items. Hand-made linked lists (e.g., IntList, DoubleChain): Advantages: easy to expand, insert, delete. Disadvantages: random access (as opposed to in-sequence access) is slow, requires more space (for pointers) than arrays, and pointer manipulation can be tricky. Both allow us to represent a sequence of objects, but their syntax is very different, so it's hard to switch from one to the other if you change your mind. The Java library has types to represent collections of objects and save us from the burden of implementing such collections ourselves, including: Lists (sequences) of objects: ArrayList, LinkedList. Sets (like lists, but with no order and disallowing duplicates) of objects: TreeSet, HashSet. Maps (a.k.a. associative arrays, symbol tables, and dictionaries): TreeMap, HashMap. All of the collections above are found in the package java.util. A package is a set of (supposedly related) classes and subpackages. As elsewhere in Java, the notation `java.util.ArrayList' means "The class named "ArrayList" in the (sub)package named "util" in the package named "java". The point here is that you'll need to import them before you can use them, e.g. import java.util.ArrayList;
import java.util.LinkedList; In 61B, we'll not only learn how use these classes, but also how to implement them ourselves. Implementation will come later in the course. For today's lab, we'll familiarize ourselves with how to use lists and sets, which you'll be welcome to use on any future homeworks, projects, labs, discussion worksheets, and exams, except where explicitly prohibited. For example, the following code creates an ArrayList L of integers and adds 5 and 23 to L: ArrayList L = new ArrayList();
L.add(5);
L.add(23);
In the code above, don't worry too much about the mysterious syntax. We'll learn about this in the coming weeks -- for now, just accept that somehow this means our ArrayList L will contain integers. Since the LinkedList class is supposed to represent essentially the same abstraction (a numbered list of items), it has almost the same API (Application Programming Interface) as ArrayList. For our purposes today, that means it supports almost the same methods. This makes it easy to change back and forth between an ArrayList and a LinkedList. In our toy example, we'll note that LinkedList also has an add method, so we could easily change out our code so that it read: LinkedList L = new LinkedList();
L.add(5);
L.add(23);
It is nice that the Java designers gave us many implementations of the same collection, since each of these implementations has its own strengths and weaknesses (we'll see a glimpse of these tradeoffs in today's lab). However, with what we know so far, we run into a potential problem. Imagine that Alice and Bob are working on a project together. Alice needs Bob to write a method readList that reads information and returns it as some kind of list. Alice doesn't care much what kind of list it is, since her application is just going to read it in order, a task at which every list implementation excels. Bob, on the other hand, isn't 100% sure what kind of list he needs for his task. With what we know so far, Alice would have to guess what Bob is going to end up deciding and write either: ArrayList myList = readList() or LinkedList myList = readList() This is a huge pain. One potential fix is for Bob stop being such a slacker and get his work done so Alice can use his code, but as we all know, this is just not going to happen. Bob is just that kind of a guy. Through a mechanism known as an interface (to be discussed fully in class on Friday), we can avoid this problem. Java provides not just ArrayList and LinkedList types, but also a type called List, of which ArrayList and LinkedList are said to be implementations. List is not a Class, but instead an Interface. There are interfaces Set, and Map in java.util as well. What this means is that Alice can actually just write: List myList = readList() In Friday's lecture, we'll discuss about how a Java interface is an abstraction of an API (without any implementation) that may be shared by many classes. By specifying such an interface as the type of a variable, you get a program that is able to accommodate any of the implementing classes without having to change most of the program. Do not worry about what sort of magic syntax is under the hood that allows this to be possible (i.e. don't worry about how the List interface and ArrayList and LinkedList classes are implemented). For today, we just want you to know how to use interfaces and implementation classes. Let's get started with a concrete example by considering the program in Dups1.java, provided with this lab. Dups1 goes through the provided input, identifies all duplicates, and then prints all duplicates to the screen. It takes a single command line argument that tells it whether to use a linked list or an array list to solve this problem. Compile this program and execute it with java Dups1 linked < magus.txt and see what happens. Now try java Dups1 arrays < magus.txt You should find that the results of the two programs are identical. Look at the code, and observe that the duplicates method, which actually does the work of finding duplicates, doesn't actually know whether it's using a linked list or an array list. It just tells the List to do its job, and since the LinkedList and ArrayList classes implement the List class, it can make the exact same calls to both. ListIterators (Part 1) Look at the duplicates function and get a feeling for how it works. Observe that it involves repeated calls to the get(i) method of the List, which returns the ith item of a list. Think back to the DoubleChain class that you implemented for lab 2. We know that getFront() and getBack() have to crawl along a potentially long list, whereas getting the front or the back of an array can be done trivially by looking at only one item total. It seems possible that the get(i) method might be slow for a LinkedList for the same reason. Let's do our first computational experiment to measure performance. Try the Unix time commands
time java Dups1 linked < hammurabi110.txt
time java Dups1 array < hammurabi110.txt
Compare the results. (We've also set up 'gmake time' to run these and a couple of other timings. You might want to look at Makefile to how that is done). In this case, the problem isn't the LinkedList class itself, but the bone-headed way in which we're trying to use it. Supposing L is a LinkedList, it's rather silly to be making the sequence of calls L.get(65), L.get(66), L.get(67), and so forth. Under the hood (this is not an obvious fact), the LinkedList is seeking all the way to item #65 and returning it, and then when we tell it to get item #66, it starts all the way over from the beginning instead of starting from 65. Our fundamental problem is that we are using indices to iterate through the LinkedList using get() instead of letting it handle the iteration using abstractions that are built specifically around iteration. As a warmup to the rather tricky next part of this lab, we'll use another interfaces build to supports iteration. The interface types Iterator and ListIterator describe "moving fingers" that effectively point at elements in the middle of one of the collection types from the Java library and allow a program to move through the collection sequentially, regardless of how the collection is actually implemented. Iterators move only forward, and ListIterators move in either direction (and make sense only for Lists, which have an order). As an example of code using a ListIterator, see ListTour.java. Read and run the code, and make sure you understand quickIteratorExamples. Then fill out the printTour method as described. If you want to fix the style errors, feel free to delete the quickIteratorExamples code, which was provided only for pedagogical purposes. ListIterators (Part 2) The program Dups2.java uses the ListIterator abstraction to get around the problem in Dups1.java and puts the two implementations on a more nearly equal footing, albeit in a rather obtuse way. Again time the two programs: time java Dups2 linked < hammurabi110.txt
time java Dups2 array < hammurabi110.txt
(In the unlikely event that your hackles are raised regarding the speed of the .previous() method, it's useful to know that LinkedLists are more like DoubleChains than IntLists: a bit more sophisticated than IntLists: each item has two pointers, one to the next and one to the previous element. As an aside, the list itself has a pointer to both the first and last elements, allowing for easy access from either end.) If you haven't already, read and understand both programs. The continue keyword (covered in the reading, but not lecture) means to terminate the current iteration of the current loop and move on to the next one immediately. You may find to be useful the Java library documentation for the java.util.ArrayList and java.util.LinkedList classes and the java.util.List and java.util.ListIterator interfaces. We'll finish off this section of the lab by finally having you do some coding: Clean up our asinine implementation of Dups2. The Dups2.duplicates method uses two int variables, m and n, to keep track of positions in the list L (and stops once we have scanned back to the position of element we are checking for duplication). By making better use of the ListIterator interface, we can remove both of these variables. Copy and paste Dups2.java to Dups3.java, and modify the duplicates method so that duplicates does not use any integer variables. Your code should end up looking much more elegant than Dups2.java. If you're stuck make sure to see the java.util.ListIterator documentation. We're throwing you in the deep end here with strange new entitites. There are weird symbols and syntactical strangenesses like the angle brackets in ListIterator, but by doing some pattern matching, you should be fine -- we'll learn these new things later. Don't be afraid to experiment with method calls you aren't sure will work (recall that you can use the hw command to restore older copies of committed work in case you find yourself boxed in). Try and cajole your neighbors into discussion for more efficient progress. This will probably seem very hard because of all the new syntax, so collaborate away! If you are absolutely stuck, a solution is available at this link. You should only use this link as a last resort. Utilities Currently, the list of duplicate words gets printed in whatever order the words occur in the text. By adding one short line to the program, you can arrange for the list to be printed in sorted order. Figure out how to do this (there is a hint about where to look towards the top of the file) and change Dups3.java accordingly. Optional: With a bit more searching (this time in the documentation for java.lang.String), you may be able to find out how to get the sort to ignore the case of letters, instead of putting all capital letters before all lower-case letters. If you do this, modify Dups3.java accordingly. Sets We used an ArrayList called result to store our result in Dups1 through Dups3. To avoid having duplicate duplicates in result, we used the result.contains() method to continue to the next iteration whenever a redundant duplicate was found. The problem is that the contains method scans the ArrayList sequentually. While this is not much of a problem for the small inputs we've been using, it can cause difficulties with larger ones. The Java library provides collections that are faster for the purpose of collecting sets of objects without duplicates, namely the TreeSet and HashSet. Create a new program Dups4 that uses a java.util.TreeSet for the storing results in the .duplicate method. See the TreeSet documentation for a list of methods supported by the TreeSet class. You'll only need a couple of the methods from the list. In case you are tempted by dark forces beyond my comprehension, do not try to read the entirety of the TreeSet documentation. To make things a little more interesting, you must still return the result as a List, which a TreeSet is not. Hint: A TreeSet is a kind of Collection. Take a look through the documentation of ArrayList. More Debugging The program Natural.java gives another implementation of the naturalRuns method you did for homework, but this time with Java library Lists. Unfortunately, the program does not work, as you can see by compiling it (don't forget the -g option) and executing java Natural 1 3 7 5 4 6 9 10 You can probably figure this out by inspection, but try it in the gjdb debugger (or if you're using another IDE, the debugger of your choice). Step through the code of naturalRuns, and make sure you can see what is happening (that is, that you can execute each statement in turn and see the values of the variables and observe how they get updated). Now is your chance to ask the TA if the debugger doesn't seem to work for you. It is a good idea to know how to use some debugger or other when it comes time for projects. While you're at it, stop execution of your program at the start of the naturalRuns function, and print the value of L (the parameter to the function). The command below will result in an uninformative output that basically tells you that L is some kind of list. p L But try: p L.toString() This is actually a call that the debugger will execute, producing a handy String representation of L. You can produce the same string (in the debugger and in real Java) by typing p "" + L because the "+" operator on Strings, if given an operand that is a non-String object, will convert it into a String by calling toString() on it. Submit Use hw submit to commit all your lab3 files (Dups3.java, Dups4.java, and Natural.java at least). Check that your submission attempt was successful (that is, if you don't know how to do this, now is a great time to find out!). If you have any questions about ignoring files, etc., this is also a good time to ask your lab TA. Project 0 If you still have time, start working on Project 0 by at least using hw init proj0, making it, and then running it. You'll need to use java game2048.Main to start the game.