Java程序辅导

C C++ Java Python Processing编程在线培训 程序编写 软件开发 视频讲解

客服在线QQ:2653320439 微信:ittutor Email:itutor@qq.com
wx: cjtutor
QQ: 2653320439
COMP9313 2016s2 Assignment 2 
Problem 1 (5 pts): HBase Bulk Load 
Download the file “Comments” from: 
https://webcms3.cse.unsw.edu.au/COMP9313/16s2/resources/5019. The 
data forma of “Comments” is as below: 
- **comments**.xml 
       - Id 
       - PostId 
       - Score 
       - Text, e.g.: "@Stu Thompson: Seems possible to me - why not 
try it?" 
       - CreationDate, e.g.:"2008-09-06T08:07:10.730" 
       - UserId 
 
Your task is to create a table “comments” using HBase Java API, which 
contains three column families: “postInfo” (containing “PostId”), 
“commentInfo” (containing “Score”, “Text”, and “CreationDate”), and 
“userInfo” (containing “UserId”), and to bulk load data into “comments” 
from the file “Comments”. 
Create a class “HBaseBulkLoadComments.java” in package 
“comp9313.ass2” to finish this task.  
Compile and Test 
Your java code will be compiled and packaged as a jar file, and we will use 
the following commands to check the correctness of your solution: 
$ $HADOOP_HOME/bin/hadoop jar YOURJAR.jar YOURCLASS input output 
Please ensure that the code you submit can be compiled and packaged. Any 
solution that has compilation errors will receive no more than 2 points for 
this problem. 
Problem 2 (5 pts): HBase and MapReduce 
Read input data from table “comments” in HBase, and calculate the number 
of comments per UserId.  
Your MapReduce job should write the result to another HBase table 
“user_comment_stats”, with only one column family “stats” containing 
column “count”. Create the table using HBase Java API.  
Write your code in “ReadHBaseComments.java” in package 
“comp9313.ass2”. 
Compile and Test 
Your java code will be compiled and packaged as a jar file, and we will use 
the following commands to check the correctness of your solution: 
$ $HADOOP_HOME/bin/hadoop jar YOURJAR.jar YOURCLASS 
Please ensure that the code you submit can be compiled and packaged. Any 
solution that has compilation errors will receive no more than 2 points for 
this problem. 
Problem 3 (5 pts): Hive 
Download files “Votes.fmt” and “Comments.fmt” from: 
https://webcms3.cse.unsw.edu.au/COMP9313/16s2/resources/4732. The two 
files are converted from “Votes” and “Comments”, in which the fields are 
separated by ‘ctrl+A’ and the lines are separated by ‘\n’. The data format of 
“Votes.fmt” is as below: 
     - Id 
     - PostId 
     - VoteTypeId 
        - ` 1`: AcceptedByOriginator 
        - ` 2`: UpMod 
        - ` 3`: DownMod 
        - ` 4`: Offensive 
        - ` 5`: Favorite - if VoteTypeId = 5 UserId will be populated 
        - ` 6`: Close 
        - ` 7`: Reopen 
        - ` 8`: BountyStart 
        - ` 9`: BountyClose 
        - `10`: Deletion 
        - `11`: Undeletion 
        - `12`: Spam 
        - `13`: InformModerator 
     - UserId (only for VoteTypeId 5) 
     - CreationDate 
The data format of “Comments.fmt” is as below: 
       - Id 
       - PostId 
       - Score 
       - UserId 
       - CreationDate, e.g.:"2008-09-06T08:07:10.730" 
       - Text, e.g.: "@Stu Thompson: Seems possible to me - why not 
try it?" 
Please put the two files in the home folder of HDFS, i.e., /user/comp9313 
Your tasks include: 
1. (1 pt) Create a table for Votes and another table for Comments and 
load data into the two tables from the two files provided. 
2. (2 pt) Write “Select … from …” to compute the number of comments 
generated by each user (the result contains two columns: UserId and 
number of comments). 
3. (2 pt) Write “Select … from …” to find the Ids of all posts that have 
been favoured by more than five users (the result only contains one 
column: PostId). 
You should put everything in a Hive script “ass2.sql”, and we will use the 
following command to run the script and to check the results. 
$ $HIVE_HOME/bin/hive –f ass2.sql > hive.result 
Problem 4 (5 pts): Pig 
Your tasks include: 
1. (1 pt) Load data into schemas from the files converted from Votes 
and Comments. 
Hint: use PigStorage(‘\u0001’) to delimit the fields when loading data 
2. (2 pt) Compute the number of comments generated by each user (the 
result contains two fields: UserId and number of comments). 
3. (2 pt) Find the Ids of all posts that have been favoured by more than 
five users (the result only contains one field: PostId). 
Hint: you will need to use “group by”, “count”, and the “filter” 
command, http://pig.apache.org/docs/r0.16.0/basic.html#filter. 
Use the “dump” command to print out the result of a query. You should put 
everything in a Pig script “ass2.pig”, and we will use the following 
command to run the script and to check the results. 
$ $PIG_HOME/bin/pig ass2.pig > pig.result 
Documentation and code readability 
Your source code will be inspected and marked based on readability and 
ease of understanding. The documentation (comments of the codes) in your 
source code is also important. 
Submission: 
Deadline: Friday 23rd September 09:59:59 
Log in any CSE server (williams or wagner), and use the give command 
below to submit your solutions: 
$ give cs9313 assignment2 HBaseBulkLoadComments.java 
ReadHBaseComments.java ass2.sql ass2.pig 
Or you can submit through: 
https://cgi.cse.unsw.edu.au/~give/Student/give.php 
If you submit your assignment more than once, the last submission will 
replace the previous one. To prove successful submission, please take a 
screenshot as assignment submission instructions show and keep it by 
yourself.  
Late submission penalty 
10% reduction of your marks for the 1st day, 30% reduction/day for the 
following days. 
Plagiarism: 
The work you submit must be your own work. Submission of work partially 
or completely derived from any other person or jointly written with any 
other person is not permitted. The penalties for such an offence may include 
negative marks, automatic failure of the course and possibly other academic 
discipline. Assignment submissions will be examined manually.  
 
Relevant scholarship authorities will be informed if students holding 
scholarships are involved in an incident of plagiarism or other misconduct.  
 
Do not provide or show your assignment work to any other person - apart 
from the teaching staff of this subject. If you knowingly provide or show 
your assignment work to another person for any reason, and work derived 
from it is submitted you may be penalized, even if the work was submitted 
without your knowledge or consent.