
C C++ Java Python Processing编程在线培训 程序编写 软件开发 视频讲解

客服在线QQ:2653320439 微信:ittutor
wx: cjtutor
QQ: 2653320439
Alex Chengelis 
LAB 4_1 
1. Create a new virtual Machine with Ubuntu. I am using VMware player to do this.  
Just keep filling in the info you want and let it install.  
2. Download the Appropriate Java and Hadoop files. 
I am using 2.7.3 since it is the latest stable release of Hadoop. You can either use the website to 
download it or use curl.  
Curl -O
For Java Go to this page
And download the linux tar.gz.  
Place both the Hadoop and Java binaries in downloads. 
3. Configure the SSH server. 
sudo apt-get update 
sudo apt-get install openssh-server 
4. Configure the password-less ssh login. 
ssh-keygen -t rsa -P "" 
cat ./.ssh/ >> ./.ssh/authorized_keys 
chmod 600 ~/.ssh/authorized_keys 
sudo service ssh restart 
5. Standalone Mode Setup (you start with this and add more and more functionality). Start by 
extracting the downloaded files. 
cd Downloads 
tar xzvf hadoop-2.7.2.tar.gz 
After running the tar command the terminal will quickly fill up.  
Verify that Hadoop has been extracted 
6. Create soft links  
ln -s ./Downloads/hadoop-2.7.2/ ./Hadoop 
7. Configure .bashrc 
vi ./.bashrc 
export HADOOP_HOME=/home/alex/hadoop 
8. Configure Hadoop’s file 
vi ./hadoop/etc/hadoop/ 
export JAVA_HOME=/home/alex/jdk 
9. Run a Hadoop job on a Standalone cluster. First exit and restart the terminal. Then type the 
Hadoop command. 
 A sign that our installation is good so far.  
Run a Hadoop job 
 Create a testhadoop directory 
 Create input directory inside testhadoop 
 Create some input files (the .xml files) 
 Run MapReduce example job 
 View the output directory using cat command 
mkdir testhadoop 
cd testhadoop 
mkdir input 
cp ~/hadoop/etc/hadoop/*.xml input 
hadoop jar ~/hadoop/share/hadoop/mapreduce/hadoop-
mapreduce-examples-2.7.2.jar grep input output 'dfs[a-z.]+' 
cat output/* 
You’ll see some output in the terminal 
 Finally check the output 
This is working.  
10. Now to transform this into a pseud-Distributed Mode without YARN setup (to start). 
a. Configure core-site.xml and hdfs-site.xml 
vi ./hadoop/etc/hadoop/core-site.xml 
## adding these lines to the file ## 
vi ./hadoop/etc/hadoop/hdfs-site.xml 
## adding these lines to the file ## 
Replace the ip with the one from the following command 
11. Format the namenode 
hdfs namenode -format 
12. Start/Stop Hadoop cluster 
13. Create a user on the HDFS system 
$ hdfs dfs -mkdir /user 
$hdfs dfs -mkdir /user/alex 
Put some info into that input  
$hdfs dfs -put ~/hadoop/etc/hadoop input 
14. Run a Hadoop job now  
$hadoop jar ~/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-
examples-2.7.2.jar grep input output ‘d[a-z.]+’ 
Check the output  
$hdfs dfs -cat output/* 
15. Since everything is working so far we are going to extend our Pseudo-Distributed Mode with 
YARN Setup.  
a. Configure mapred-site.xml and yarn-site.xml 
$nano ./hadoop/etc/hadoop/mapred-site.xml 
Add the following lines 
  $nano ./hadoop/etc/hadoop/yarn-site.xml 
  Add the following lines 
16. Start YARN cluster 
 Go to http://localhost:8088 to make sure it is working 
17. Let’s test. 
$cd testhadoop 
$rm -rf output/ 
$hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar grep input     
output ‘dfs[a-z.]+’ 
$hdfs dfs -cat output/* 
Will look the same as the previous one  
18. Time to run the word count.  
a. Let’s get a file from the Gutenberg project: 
it’s a copy of Huckleberry Fin 
b. Use wget to get it. 
c. Create a directory for our wordcount, and the input directory 
$mkdir wordcount && cd wordcount 
$mkdir input  
d. Move our test file into the input file 
e. Navigate back to the wordcount directory 
$cd wordcount 
f. Remove the output file currently in the system 
$ hdfs dfs -rmr /user/alex/output 
g. Now remove and copy over our current input directory. 
$ hdfs dfs -rm -r /user/alex/input 
$ hdfs dfs -put input /user/alex/input 
$ hdfs dfs -ls /user/alex/input (just to check to make sure it is there) 
h. Finally it is time to run the wordcount program.  
$ Hadoop jar ~/Hadoop/share/Hadoop/mapreduce/Hadoop-mapreduce-examples-
2.7.3.jar wordcount input output 
i. Check the output 
$ hdfs dfs -cat output/* 
j. Copy over the output to the “local” machine.  
$ hdfs dfs -get /user/alex/output/ . 
$ ls (to verify) 
$ ls output (to verify) 
k. Open it up in your favorite editor. Have fun looking through the results.  
Guide was taken from: