Concurrency Assignment Home Schedule Piazza Q&A Blackboard announcements lectures quizzes grades Lecture notes PLP CS Other resources Policies Grading Assignments Concurrency Assignment Your task in this assignment is to parallelize an existing implementation (in Java) of Conway’s game of Life. The game is a cellular automaton. The board consists of a conceptually infinite rectangular array of cells, each of which potentially contains an “organism” (in the version I’m giving you the board is a finite torus). The organisms move through a series of “generations.” If an organism has two or three neighbors (counting diagonals), it survives to the next generation. If it has fewer than two neighbors it dies of loneliness. If it has more than three neighbors it dies of overcrowding. If an empty cell has exactly three neighbors a new organism is born in that cell in the next generation. The potential patterns on the board are surprisingly rich. In fact, it has been shown that the game is Turing equivalent: one can build self-replicating and evolving structures capable of computing. Additional detail can be found in Martin Gardner’s “Mathematical Games” column in the October 1970 issue of Scientific American. The code we are giving you displays its output in a graphical window. There are five buttons at the bottom of the main window, to run, pause, stop, clear, or quit the game. When the game is stopped, you can click the mouse anywhere on the board to toggle cells on and off. Create some initial patterns and see what sort of results you get. Suggested examples:
o o
o o
o o o
o o o o o o o
(Life enthusiasts call these the glider and the spaceship.) You should find that the code runs approximately 2 generations per second. It would run much faster, except that I have inserted a spin loop that lingers on each cell when updating the board. This makes the animation slow enough to watch. It’s also reminiscent of more ambitious iterative scientific applications in which each update is a time-consuming mathematical computation. Note that you are not permitted to remove or modify the spin delays, but you can adjust the speed of your animation by specifying a specific spin value with the -s command-line argument (see below). Starting source code is available in Life.java and Coordinator.java, which you can view in, and save from, your browser. After you compile these with javac, you can run the result as java Life. Feel free to develop and run on any machine you like, but please make sure your final code will compile and run successfully with the javac and java on node2x14a.csug.rochester.edu and node2x18a.csug.rochester.edu. Your assigned task is to create two new versions of the game in which the board is updated by a collection of T threads, rather than a single thread. One version will use threads directly, as was standard in Java 2; the second version will use the Executor facilities introduced in Java 5. With N rows of board to be updated, I suggest you allocate N/T contiguous rows to each of T threads in the initial (Thread-based) version of the code (taking appropriate care to handle round-off errors cleanly). In the Executor version, you can use newFixedThreadPool(T) to create an Executor with exactly T threads behind it. You can then experiment with varying numbers of tasks K >= T. Do you get better performance with K = T or K >> T? For correctness, you will need to make sure that your threads or tasks move through time generations in lock step—you don’t want to have one thread updating the cells in row i while some other thread is trying simultaneously to read them. The easiest way to achieve the needed synchronization among true threads is to have them share an instance of java.util.concurrent.CyclicBarrier. In the Executor framework, you can use built-in Executor methods to force all extant tasks to complete before starting the next generation. The code we are giving you accepts several optional command-line arguments: -s num Number of iterations of a spin loop to execute per dot when updating the board. If unspecified, the program will attempt to self-calibrate to ensure about 0.5 seconds per full-board update. This is only approximate, however, and will vary a bit from run to run, so when you’re running experiments, be sure to specify an actual value (and the same one every time). -t num Number of threads (max) that should be running at any given time. This argument is currently unused; it’s there to support your parallelization efforts. --headless Starts the game without the user interface, so you can run accurate timing experiments. --glider Starts the game with a glider in the top-left corner of the window. You can run your application remotely, with X forwarding over ssh. You will probably get better results with -Y (insecure) forwarding rather than -X. Alternatively, sit down at a workstation in one of the CSUG labs. Machine resources You will be running this assignment on node2x14a.csug.rochester.edu and node2x18a.csug.rochester.edu. Each of these machines has two processor chips. The smaller machine has 14 cores per chip, the larger 18 cores per chip. Each core has 2 hardware contexts (hyperthreads). This means the machines can execute up to 56 or 72 threads in parallel. You will probably find that your code runs faster with 2, 4, or even 8 threads, but probably slows down again before it gets to 32, due to thread creation overhead, lack of available concurrency, and/or bus, memory, or ALU contention. As the due date approaches, we will reserve much of the time on node2x18a for timing experiments, with a sign-up system that allows you to obtain exclusive access to the machine. Note that you will almost certainly not be able to get last-minute exclusive access, and since results of timing experiments are required for full credit on the assignment, you will need to plan to have your code ready for testing several days ahead of the due date. Analyzing speedup The write-up requirements are more extensive for this assignment. In addition to parallelizing the code and describing what you did, you must evaluate the success of your parallelization. Using node2x18a, create a graph that plots, for numbers of threads from 1 to 64, the time required for a glider (see above) to travel from one corner of the game board to the diagonally opposite corner. (You do not necessarily have to plot every possible thread count—that would take a lot of experimentation time. Thread counts of, say, 1, 2, 3, 4, 6, 8, 12, 16, 24, 32, 48, and 64 should suffice. Be sure to use -s to specify a fixed spin count for all experiments.) Also plot the speedup of your code: the run time of the original (unmodified!) sequential version divided by the run time of your parallel version. Ideally, you’d see a speedup of k with k threads. How close do you come? What bottleneck(s) keep you from doing better? Division of labor and parallelization strategy As in previous assignments, you may work alone or in teams of two. If you choose to work in pairs, one possible division of labor is for one partner to write the Thread version and one to write the Executor version. If you do this, you’ll want to consult with one another frequently to avoid duplication of effort. Be sure to follow all the rules on the Grading page. As with all assignments, use the turn-in script: ~cs254/bin/TURN_IN. Put your write-up in a README.pdf file in the directory in which you run the script. Be sure to describe any features of your code that the TAs might not immediately notice. On-line resources Java 17 documentation Tutorial Concurrency “trail” Language Reference Manual Extra credit suggestions Add a button that will step the application exactly one generation. Modify the program so that it can read start-up configurations from a file or web page. Can you find a way to write configurations to a file? Modify the program so that it performs work at each time step proportional to the number of occupied cells, rather than the total number of cells on the board. Instead of assigning a contiguous band of rows to each thread, tile the board into rectangular regions in such a way that the total number of boundary elements is minimized. (This may improve cache locality, and thus performance.) Translate the code into C# and experiment with that language’s concurrency features. Trivia assignment By 10:25am on Wednesday, November 10, each student should complete the T5 trivia assignment found on Blackboard. MAIN DUE DATE (revised): Friday December 3, at 11:59 pm; no extensions. Last Change: 23 November 2021 /