CS2011/2711 hw2 COMP 2011/2711 Data Organisation, Semester 1, 2006 Project 2 Due: Friday 5 May, 6:00 pm Marks: 15% of final assessment This project will involve writing a Java class to implement Adaptive Data Compression and Static Compression. Data Compression The purpose of data compression is to take a file A and compress it to another file B in such a way that: B is smaller than A, and it is possible to exactly reconstruct A by decompressing B. It is desirable that such an algorithm be efficient, and that B be a lot smaller than A. The size of B divided by that of A is called the compression ratio. While it would be nice to have an algorithm which can achieve a compression ratio less than 1.0 for all possible input files, unfortunately such an algorithm cannot exist. To see this, let us suppose that all possible strings of length less than or equal to k can be compressed to strings of length less than or equal to l, with l queue = new PriorityQueue();
To use the priority queue, the methods you will need are:
public boolean add(BinaryTreeNode o) - adds o to priority queue
public BinaryTreeNode poll() - removes and returns head of the queue
public int size() - returns number of items in the queue
Testing your code: We have made available hw2test.java which you can use for testing the tree creation and splaying methods in your program. The following command will use your CodeTree class to generate a tree with the given number of leaves, then perform a series of splay operations on the tree. After each step it will display the tree, so you can see the results of the splay operation. java hw2test num_leaves [ item ... ] Note that this program will not be used for testing your program in automarking. It is provided only as an aid for debugging your program. We have put up some sample test data in samples/. The table below lists the performance of our compressors on the sample test data. The first three numbers in each line are the original and compressed sizes of the files (for both compressors), given in bytes. The last number gives the weighted path length of the Huffman tree generated by the Huffman compressor. Notice that for some files, the compressed file is actually bigger than the uncompressed file.
File Original Adaptive Huffman WPL
----------------------------------------------------
image1.tiff 124120 35906 23143 1.4253
image2.jpg 33365 44284 34375 7.9954
image3.tiff 60286 15930 60905 7.9455
image4.jpg 66378 87367 67246 7.9805
random 65536 87995 66593 8.0033
text1.txt 8400 6637 5745 4.4914
text2.txt 8547 6794 5853 4.5156
text3.txt 5744 4707 4298 4.5528
text4.txt 12731 9951 8010 4.3867
Notes:
image[13].tiff - uncompressed TIFF file
image[24].jpg - compressed JPEG file
random - 65536 random bytes
text[1234].txt - plain English text files
Your program will be assessed on unseen test data and not just the sample test data provided. Adaptive compression: if your compressor is within 10% of ours you get full performance marks, prorated accordingly for worse performance. Huffman compression: if your compressor implements the Huffman algorithm correctly it will achieve the same performance as ours - we will look for this when marking. There are other assessment components such as style and question answering for the total mark. Question At the top of your code, in a block of comments, you should provide a brief answer (one or two short paragraphs) to the following question: (i) How well do these methods of data compression work? (ii) On which types of files do they work well (i.e. achieve a low compression ratio) and on which do they work poorly? (iii) What is the main advantage of the adaptive method over the static method? Submission Once submissions are open, you should submit by typing give cs2011 hw2 CodeTree.java HuffmanTree.java You can submit as many times as you like - later submissions will overwrite earlier ones. Remember that you should not edit the files BinaryTreeNode.java, InternalNode.java, LeafNode.java, BitInputStream.java, BitOutputStream.java or hw2test.java. If you attempt to submit an altered version of any of these files, your submitted version will be ignored and the standard version will be used instead. The submission deadline is as above. 15% penalty will be applied to the (maximum) mark for every 24 hours late after the deadline. Additional information may be found in the FAQ as it is built up, and will be considered as part of the specification for the project. If you have a question that has not already been answered on the FAQ, you can email it to cs2011.hw2@cse.unsw.edu.au Assessment The project will be marked on style and comments as well as functionality and performance. Programs which generate compilation errors will receive a very low mark, no matter what other virtues they may have. In general, a program that attempts a substantial part of the job but does that part correctly will receive more marks than one attempting to do the entire job but with many errors. The answer to the question is worth 3 (out of 15) marks. Plagiarism Policy Group submissions will not be allowed. Your program must be entirely your own work. Plagiarism detection software will be used to compare all submissions pairwise. These checks may be run after the automarking and release of marks, so the released marks are not necessarily final. First detection: negative mark for the assignment Second detection: failure of the course Third detection: faculty disciplinary action (including possible expulsion from the university) DO NOT COPY FROM OTHERS; DO NOT ALLOW ANYONE TO SEE YOUR CODE Please refer to the Yellow Form, or to section on Originality of Assignment Submissions in the Unix Primer, if you require further clarification on this matter. Good luck!