CS2110 Assignment 1 — Basic Java and Object-Oriented Programming, Summer 2008 Due Friday June 27, 2008, 10:00PM 0 Introduction 0.1 Goals This assignment will help you refresh your knowledge of Java and/or learn Java’s syntax. It is roughly a comprehensive review of CS100. In addition it will help you practice using some concepts we will discuss in the first few days of class, such as recursion. 0.2 Instructions Be sure to read the entire assignment before answering the questions! Do all the tasks in the following sections. In later assignments, we will not necessarily repeat the blitz of instructions in Section 0. 0.3 No Partners! (this time) and Academic Integrity In general, assignments will allow you to work a partner. However, because we need to ensure that your Java and programming skills suffice, you must work alone on this assignment. You must abide by the academic integrity standards of CS 2110, the Computer Science department, and Cornell University. Links to these academic integrity policies are provided on the course website. 0.4 Submission Follow the Submission Requirements on the course website. The last section of this assignment summarizes the files that you need to submit on CMS. 0.5 Grading All code that you submit must run without warnings or errors. Otherwise, you will receive a zero for that portion of the assignment. The assignment will be graded out of 100 points and is worth 10% of your course grade. 0.6 Advice Start early! For a Java refresher and help with using its command-line environment, refer to the bootcamp notes (available on the website). You may also read the first three chapters of the textbook for a Java review. Remember to use good programming style. Review the style guidelines set out on the course website. In particular, use meaningful variable and method names. Add comments to clarify your code, and include a comment at the top of the source file that includes your name, NetId, the date, and a brief description of your solution. Correct solutions that are short, simple, and efficient are superior to submissions that are unnecessarily long, complex, or slow. 1 Communication Purpose: To get connected Points: 2 Review the information about Usenet on the course website. In your favorite newsreader, subscribe to cornell.class.cs2110 and cornell.class.cs2110.talk. Post a message in cornell.class.cs2110.talk. 1/7 CS2110 Assignment 1, Summer 2008 100 points 2/7 In the subject line, enter only A1:. In the message body write something witty. (If you’re not in a witty mood, writing “Hello world!” will suffice.) Recall that the newsgroup cornell.class.cs2110.talk is meant for test posts and casual conversation on topics related to 2110. The newsgroup cornell.class.cs2110 is the actual course newsgroup that we reserve for technical questions regarding course material. You may ignore this until the newsgroup starts working again. When it works it will be announced in class and on the website. 2 Course website and policies Purpose: To become familiar with the class website and course policies Points: 8 Review the entire course website and answer the following questions. 1. How long do you think it will take to do this assignment? How long did it actually take? 2. What version of Java should you use? 3. What are the dates and times of the CS2110 exams? 4. What is the procedure to submit a regrade for an assignment? 5. Suppose Alice, a CS2110 student, writes some of the solutions for a particular assignment with a partner named Bob. Then, the two students divorce because of irreconcilable differences and wish to work alone or perhaps with other partners for the remainder of the current assignment. According to the 2110 rules of academic integrity on the course website, may Alice and Bob divorce? May they divorce for the remaining assignments? 6. Suppose that about two days before an assignment deadline, another student who is not your partner pleads with you to share some of your code for the assignment. According to the course website, under what conditions may you show your group’s code to him or her? 7. Where is the CS2110 consulting office? 8. Where can copies of the optional textbooks be found? Submit your answers in a plain text file called problem2.txt. If we cannot open the file with a text editor, you will receive no credit. If you do not know what plain text is, refer to the submission format requirements on the website. 3 Computing grades Purpose: Basic Java syntax, keyboard input, error checking and exceptions, course grading policies Points: 40 Imagine the future. Specifically, six weeks into the future. You’ve just completed all of the CS 2110 coursework and you’re anxious to know your overall course score and letter grade. Write a program called GradeCalc that asks for your scores on the individual 2110 coursework items and outputs your overall course score. The program should also output the minimum letter grade that you are guaranteed by that course score. The program must be user-friendly. It should ask the user to input scores for assignments, quizzes, and exams. Part of this problem is to get practice writing programs that are robust to user input errors, so you should check user input for errors, such as a score that is too high or too low (one of many possible input errors). If an error is detected, the program should print a helpful message and allow the user to try again. CS2110 Assignment 1, Summer 2008 100 points 3/7 You should begin this assignment by reviewing the course grading policies set out on the website. For the purpose of this assignment, assume that six quizzes are given but that the lowest quiz score is dropped. You can assume that scores on individual coursework items are integral (i.e. no fractional points are given). Also assume that extra credit is not given. For homework assignments, the program should ask whether the homework was submitted on-time, during the late submission period, or after the late submission period, and adjust the grade accordingly. Here is a sample of what an interactive session might look like: Enter grade for homework #1: 85 Was homework #1 submitted on time? y Enter grade for homework #2: 90 Was homework #2 submitted on time? y Enter grade for homework #3: 100 Was homework #3 submitted on time? n Was homework #3 submitted within 24 hours of the deadline? y Enter grade for homework #4: 89 Was homework #4 submitted on time? y Enter grade for homework #5: 90 Was homework #5 submitted on time? n Was homework #5 submitted within 24 hours of the deadline? n Enter grade for prelim: 950 Not a legal entry: please enter an integer between 0 and 100 Enter grade for prelim: 95 Enter grade for final: 99 Enter grade for quiz #1: 10 Enter grade for quiz #2: 6 Enter grade for quiz #3: 0 Enter grade for quiz #4: 9 Enter grade for quiz #5: 8 Enter grade for quiz #6: 9 ----------------- Your final course score: 81.8 Your letter grade will be at least: B 4 Farmer Munson’s Soybeans Purpose: Recursion, file input, command-line parameters, permutation tests Points: 50 Due to the increasing global demand for soybeans this year, Farmer Munson decided to plant three fields of soybeans. Unfortunately, the cost of chemical fertilizers is twice as much this year as last year due to the increasing price of natural gas. Ever resourceful, Farmer Munson decided to try using some organic fertilizer being sold by his friend Old MacDonald. (The fertilizer comes from MacDonald’s chickens; the farmers have a colorful name for it that we won’t repeat here.) Farmer Munson planted 1 field using chicken fertilizer and 1 field using chemical fertilizer (Munson found some leftovers at the back of his storage shed). After hearing about this natural experiment, Professor Cornellia from Cornell’s Agriculture School convinced Munson to plant the last field without fertilizer to get an idea of crop yield without fertilizer. In return, she agreed to pay Munson the profit difference between field 3 and field 1. Last week Cornellia went to the fields and measured some heights of the fledgling soybean plants in each of the fields to see what, if any, growth differences there were. Here are a few of the measurements: chemical chicken nofert CS2110 Assignment 1, Summer 2008 100 points 4/7 6 6 3 12 7 6 11 14 3 9 Is the chemical fertilizer better than the organic fertilizer? How strong is the evidence that fertilizer (of either kind) produces taller soybean plants (vs plants without fertilizer)? Cornellia wants to support her conclusions with tests of statistical significance. Unfortunately, she does not know anything about how soybean heights are distributed, and she only measured a few plants (it started to rain). She asks you to help her test her hypotheses using a permutation test. For this problem you will write a program that tests a hypothesis using a permutation test (details below). To help you, we have prepared an incomplete solution which you should use as the basis for your work. You can download the incomplete solution through CMS. There are detailed comments in these files that will hopefully clarify anything that is confusing. Your task is to add code to the incomplete solution in order to make it work properly. The instructions in the next two sections will help you do this. You may add code (including new methods and/or new member variables), but you must not change the public interface of any classes. Exception: you may add a main() method to a class to do unit testing of the class independent of the rest of the program. 4.1 Background: Permutation Tests for Hypothesis Testing Often in science we want to know if changing a condition (e.g. fertilizer type) produces a change in a quantity we care about (e.g. height of soybean plants). Naturally, we measure the quantity under the different conditions. Typically there are some random effects that influence our measurements under all conditions. As a result, there is always a possibility that differences in the quantity are due solely to random variations, and not due to the changing condition. Basically, perceived differences can happen just because we were unlucky in our measurement samples. Before concluding, for example, that chemical fertilizer produces taller plants than organic fertilizer, we need to rule out the possibillity that fertilizer type has no effect (called the null hypothesis and noted H0). This involves a hypothesis test, which proceeds kind of like a proof-by-contradiction. Assume that the null hypothesis is true. Under this assumption, compute the probability of observing the collected data. If the probability is (very) low, then we can reject H0 with some confidence and conclude that the alternative hypothesis (e.g. chemical > organic) is most likely true. One kind of hypothesis test is a permutation test. The details for the permutation test are a little long, but the basic idea is fairly simple. We are going to enumerate all possible permutations of assigning measurements to one sample or another. Given this list, we will apply some criteria to each permutation to see if the criteria is true or not. How often the criteria is true is used to measure the statistical significance of a hypothesis. Let s1 be measurements in sample 1, and s2 be measurements in sample 2 (e.g. chemical vs. organic). Also let n1 and n2 be the number of measurements in each sample. Under the null hypothesis, s1 and s2 have the same distribution, and the assignment of measurements to each group is random. Suppose we summarize the measurements in each group by taking the mean (average) of each group. We can then look at the difference in the means to see how different the groups are. For example we can compute diff(chemical, organic) = mean(chemical) - mean(organic) A large positive number would suggest that chemical fertilizer is bigger than organic. For this problem, the difference of two means is the test statistic we will use for the hypothesis test. Let obsDiff = diff(s1, s2) where s1 and s2 are the observed data samples. Assuming the null hy- pothesis is true, how likely are we to get such a large obsDiff? Is obsDiff so large and so unlikely that we should reject the null hypothesis? With access to fast computers, we can answer this directly when n is small. We will just generate all the possible ways the measurements can be split into groups of size n1 and n2. For each of these random assignments (call the random groups t1 and t2), we can compute randDiff = diff(t1, t2) and compare randDiff with obsDiff. The fraction of the time that randDiff >= obsDiff CS2110 Assignment 1, Summer 2008 100 points 5/7 is the probability of seeing such a large test statistic value when the group assignments are random. This probability is often called the p-value. If the p-value is small, we can reject the null hypothesis as being very unlikely. For example, if the p-value is 0.05, then the difference is real at the 95% confidence level. 4.2 Implementing Permutation Tests The partial solution given to you has two classes: PermTest and Labeling. Class PermTest (in PermTest.java) is responsible for implementing the permutation test logic, reading the measurements from files, and provid- ing a command line interface for running permutation tests. Labeling (in Labeling.java) is a helper class that represents which group a measurement is assigned to. The rest of this section highlights the parts you need to implement. 4.2.1 Tracking Sample Members There will be lots of shuffling measurements into different samples as part of the permutation test. To make this easier, implement the methods in Labeling found in the file Labeling.java. These objects track which items (aka measurements) are assigned to which samples, and which items are not yet assigned to any group. It might be helpful to add a main() method to make sure your implementation works. 4.2.2 Reading in the Measurement Samples Next, implement reading in the measurement samples. More precisely, implement PermTest.readSample() and add code in PermTest.main() to call readSample. Chapter 2 in the course textbook might be helpful if you need a reference for how to do File I/O. 4.2.3 The Test Statistic We need some statistic, or summary, of how a sample t1 differs from the other sample t2. For this assignment, we will summarize each sample by its mean (average), and look at mean(t1) − mean(t2) to quantify the differences between the samples. Implement PermTest.sampleDifference() to meet your needs. This was called diff() above. 4.2.4 The Permutation Test We recommend you start by implementing PermTest.nChooseR(), since it can serve as a check that you are generating the right number of random assignments. For this assignment we require that you implement the function using recursion. There is helpful information about this in Wednesday’s slides. Finally, implement PermTest.computeSignificance() and PermTest.recListCombinations(). We have left pseudo-code for computeSignificance() in the starter file so that the details of the hypoth- esis test do not distract you too much from focusing on writing recListCombinations(). Note that recListCombinations() must be implemented using recursion. 4.3 Sample Data You can find Cornellia’s data samples in the files chemical.txt, chicken.txt, and nofert.txt. There are also smaller samples (collected by a lazy grad student) in chemical.small.txt and chicken.small.txt. We strongly recommend you use these for testing your code while you are in development. Using the ‘-verbose’ option on PermTest may also be useful. 4.4 Example Output Here is the results of a permutation test of whether the chemical fertilizer is giving better results than the chicken fertilizer (using the small samples): CS2110 Assignment 1, Summer 2008 100 points 6/7 \$java -cp . PermTest -verbose chemical.small.txt chicken.small.txt No. | 6.0 12.0 11.0 6.0 7.0 14.0 | diff(s1,s2) -------------------------------------------- 0 | 1 1 1 2 2 2 | 0.6666666666666661 1 | 1 1 2 1 2 2 | -2.666666666666666 2 | 1 1 2 2 1 2 | -2.0 3 | 1 1 2 2 2 1 | 2.666666666666666 4 | 1 2 1 1 2 2 | -3.333333333333333 5 | 1 2 1 2 1 2 | -2.666666666666666 6 | 1 2 1 2 2 1 | 2.0 7 | 1 2 2 1 1 2 | -6.000000000000001 8 | 1 2 2 1 2 1 | -1.333333333333334 9 | 1 2 2 2 1 1 | -0.6666666666666661 10 | 2 1 1 1 2 2 | 0.6666666666666661 11 | 2 1 1 2 1 2 | 1.333333333333334 12 | 2 1 1 2 2 1 | 6.000000000000001 13 | 2 1 2 1 1 2 | -2.0 14 | 2 1 2 1 2 1 | 2.666666666666666 15 | 2 1 2 2 1 1 | 3.333333333333333 16 | 2 2 1 1 1 2 | -2.666666666666666 17 | 2 2 1 1 2 1 | 2.0 18 | 2 2 1 2 1 1 | 2.666666666666666 19 | 2 2 2 1 1 1 | -0.6666666666666661 -------------------------------------------- Randomly assigning data to samples shows >= sample differences than observed samples in 10/20 random permutations. -------------------------------------------- pvalue: 0.5 5 What to submit Submit the following files: 1. problem1.txt 2. GradeCalc.java 3. Labeling.java 4. PermTest.java Extra Credit, Extra Fun! If you have finished everything above and are craving for some more programming, here are a couple extra credit items you can try your hand at. We will award extra credit based on how well they are done. However, if the answers to the main part of the homework are not well done, we will not award very many extra credit points at all. Make sure you keep a pristine backup copy of the solutions before you start. 5.1 Non-Recursive Permutation Tests Write a variant of PermTest that does not use recursion. Hint: you probably need a data structure like a list or stack, that we will cover in class shortly. CS2110 Assignment 1, Summer 2008 100 points 7/7 5.2 Alternative Permutation Test Statistics In question 4 we used the difference between two sample means as a summary of how much better sample 1 was than sample 2. What other reasonable summary statistics can you think of? Justify your alternative(s) and extend PermTest to use the alternative(s) based on command line options. 5.3 Computing Grade Ranges Part way through the summer you have half your grades and are wondering what is the range of grades you could receive, based on worst and best performance on the rest of the assignments and exams. Write a variant of GradeCalc that computes the range of the user’s final grade. To make it interesting, the user should be able to enter a range of values for the unknown future evaluations—often you have a rough idea of how well you will do in the future based on past grades, comfortableness with the material, etc.