Home | COMP9313 21T3 | WebCMS3 Toggle navigation WebCMS3 Search Courses Login COMP9313 21T3 Home Course Outline Course Work Lectures Labs Projects Forums Timetable Groups Activities Toggle Menu Big Data Management COMP9313 21T3 Notices Project 1 Mark Released Posted by Xin Cao Monday 01 November 2021, 04:20:34 PM. Dear All, The marks of project 1 have been released. I will explain the solution of the first project in tomorrow's lecture. After tomorrow's lecture, if you are still questionable about your mark, please contact the tutor who marked your submission. If they cannot make a decision, I can further check your codes. Some students were granted an extension for some days, and I've sent the list of names to the tutors. However, it is possible that you still got the late submission penalty. If so, please contact the tutor as well. For project 2, please work on it as soon as possible. Similar to project 1, first work on labs 5-7. These lab problems aim to help you solving the project problems. You can also use Eclipse to write and debug your codes, and the instruction is at here: https://webcms3.cse.unsw.edu.au/COMP9313/21T3/resources/68851 Regards, Xin Some FAQs about Project 1 Posted by Xin Cao Friday 15 October 2021, 11:54:41 PM. 1. I saw some requests for more test cases in the forum. Thus, one more test data is provided. If you use Java, the running script is also provided, including packaging your java files and run your jar file on Hadoop (please change the document number and the reducer number accordingly). 2. Please use the pre-installed Hadoop at ~/hadoop. Lab 1 only aims to let you know how to install and configure Hadoop. Please delete the ~/workdir folder after you compete Lab 1, as well as the corresponding configurations in ~/.bashrc. 3. Before you run mrjob code on Hadoop, please start both HDFS and YARN. Please check if you have configured YARN correctly by following the instructions of Lab 1, including two files: $HADOOP_CONF_DIR/mapred-site.xml and $HADOOP_CONF_DIR/yarn-site.xml. 4. The "\t" means the tab character, not a string "\t". Because one tab character may take 4 or 8 space characters, in the editor and in the terminal the texts may be displayed differently. 5. Please try your best to debug your code, and then ask questions. You can first test your code locally, and then run on Hadoop. Note that it is very possible that your code can generate correct results locally but fails on Hadoop. There must be something wrong due to the key partitions. In mrjob, you can first test your mapper and then test your reducer. To test the mapper, you can write a simple reducer which writes the mapper output directly to the reducers. By doing so, you will be able to know if your mapper can send the key-value pairs to the reducers as expected. After the mapper is OK, you can proceed to test your reducer. 6. A variable defined in mapper_init/reducer_init and mapper/reducer has different scopes. If it is defined in mapper/reducer, it can only be seen within this mapper function call for the current input. If it is defined in mapper_init/reducer_init, it can be seen by all mapper/reducer functions within each mapper/reducer. 7. It is strongly recommended to complete the two problems in Lab 4 first, and then work on the project. Otherwise, you will meet many problems during working on the project. Some tips about the first project Posted by Xin Cao Saturday 09 October 2021, 02:06:13 AM. Dear All, You still have one weekend plus one week to work on the first project. 1. Lab 4 is released already, which will help you writing codes for project 1, especially on how to use the partitioner and comparator class in mrjob (if you use java, the lab provides you some practices on defining a custom partitioner and defining an order for your keys). If you do not know how to work on the project now, please first complete the problems in Lab 4, and then you will have better ideas on solving the project problem. 2. It is allowed to pass the number of documents as an argument in the python version of the project. To make it fair, if you use java, you are also allowed to do so. I have updated the project description for the java version. Please download the new document. 3. I've made a mistake in slide 21 of Chapter 3.1 on how to use the partitioner class in mrjob. I have updated that slide, and please download a new version as well. Regards, Xin view all notices Upcoming Due Dates There is nothing due! Back to top COMP9313 21T3 (Big Data Management) is powered by WebCMS3 CRICOS Provider No. 00098G