Java程序辅导

C C++ Java Python Processing编程在线培训 程序编写 软件开发 视频讲解

客服在线QQ:2653320439 微信:ittutor Email:itutor@qq.com
wx: cjtutor
QQ: 2653320439
Alex Chengelis 
2632220 
CIS-612 
LAB 4_1 
1. Create a new virtual Machine with Ubuntu. I am using VMware player to do this.  
 
Just keep filling in the info you want and let it install.  
 
 
 
2. Download the Appropriate Java and Hadoop files. 
 
I am using 2.7.3 since it is the latest stable release of Hadoop. You can either use the website to 
download it or use curl.  
 
Curl -O http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.7.3/hadoop-
2.7.3.tar.gz 
 
For Java Go to this page http://www.oracle.com/technetwork/java/javase/downloads/jdk8-
downloads-2133151.html 
 
And download the linux tar.gz.  
 
 
Place both the Hadoop and Java binaries in downloads. 
 
 
 
 
3. Configure the SSH server. 
sudo apt-get update 
sudo apt-get install openssh-server 
 
 
 
4. Configure the password-less ssh login. 
cd 
ssh-keygen -t rsa -P "" 
cat ./.ssh/id_rsa.pub >> ./.ssh/authorized_keys 
chmod 600 ~/.ssh/authorized_keys 
##THEN 
sudo service ssh restart 
 
 
 
 
 
5. Standalone Mode Setup (you start with this and add more and more functionality). Start by 
extracting the downloaded files. 
 
cd Downloads 
tar xzvf hadoop-2.7.2.tar.gz 
 
After running the tar command the terminal will quickly fill up.  
 
 
Verify that Hadoop has been extracted 
 
 
 
6. Create soft links  
cd 
ln -s ./Downloads/hadoop-2.7.2/ ./Hadoop 
 
 
 
7. Configure .bashrc 
cd 
vi ./.bashrc 
export HADOOP_HOME=/home/alex/hadoop 
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin 
 
 
 
8. Configure Hadoop’s Hadoop-env.sh file 
cd 
vi ./hadoop/etc/hadoop/hadoop-env.sh 
export JAVA_HOME=/home/alex/jdk 
 
 
9. Run a Hadoop job on a Standalone cluster. First exit and restart the terminal. Then type the 
Hadoop command. 
 A sign that our installation is good so far.  
Run a Hadoop job 
 Create a testhadoop directory 
 Create input directory inside testhadoop 
 Create some input files (the .xml files) 
 Run MapReduce example job 
 View the output directory using cat command 
cd 
mkdir testhadoop 
cd testhadoop 
mkdir input 
cp ~/hadoop/etc/hadoop/*.xml input 
hadoop jar ~/hadoop/share/hadoop/mapreduce/hadoop-
mapreduce-examples-2.7.2.jar grep input output 'dfs[a-z.]+' 
cat output/* 
 
 
You’ll see some output in the terminal 
 Finally check the output 
 
This is working.  
10. Now to transform this into a pseud-Distributed Mode without YARN setup (to start). 
a. Configure core-site.xml and hdfs-site.xml 
cd 
vi ./hadoop/etc/hadoop/core-site.xml 
## adding these lines to the file ## 
 
 
fs.defaultFS 
hdfs://10.1.37.12:9000 
 
 
 
vi ./hadoop/etc/hadoop/hdfs-site.xml 
## adding these lines to the file ## 
 
 
dfs.replication 
1 
 
 
 
 
Replace the ip with the one from the following command 
 
 
11. Format the namenode 
hdfs namenode -format 
  
12. Start/Stop Hadoop cluster 
$ start-fs.sh 
 
 
 
13. Create a user on the HDFS system 
$ hdfs dfs -mkdir /user 
$hdfs dfs -mkdir /user/alex 
 
 
 
Put some info into that input  
 
$hdfs dfs -put ~/hadoop/etc/hadoop input 
 
14. Run a Hadoop job now  
$hadoop jar ~/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-
examples-2.7.2.jar grep input output ‘d[a-z.]+’ 
 
Check the output  
$hdfs dfs -cat output/* 
 
15. Since everything is working so far we are going to extend our Pseudo-Distributed Mode with 
YARN Setup.  
a. Configure mapred-site.xml and yarn-site.xml 
$cd 
$nano ./hadoop/etc/hadoop/mapred-site.xml 
 
Add the following lines 
 
 
mapreduce.framework.name 
yarn 
 
 
 
  $nano ./hadoop/etc/hadoop/yarn-site.xml 
  Add the following lines 
 
 
yarn.nodemanager.aux-
servicesmapreduce_shuffle 
 
 
 
16. Start YARN cluster 
$start-yarn.sh 
 
 
 
 Go to http://localhost:8088 to make sure it is working 
 
17. Let’s test. 
$cd 
$cd testhadoop 
$rm -rf output/ 
 
$hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar grep input     
output ‘dfs[a-z.]+’ 
 
 
$hdfs dfs -cat output/* 
 
Will look the same as the previous one  
 
 
18. Time to run the word count.  
a. Let’s get a file from the Gutenberg project: http://www.gutenberg.org/files/76/76-0.txt 
it’s a copy of Huckleberry Fin 
b. Use wget to get it. 
 
c. Create a directory for our wordcount, and the input directory 
$mkdir wordcount && cd wordcount 
$mkdir input  
 
d. Move our test file into the input file 
 
e. Navigate back to the wordcount directory 
$cd wordcount 
f. Remove the output file currently in the system 
$ hdfs dfs -rmr /user/alex/output 
 
 
 
g. Now remove and copy over our current input directory. 
 
$ hdfs dfs -rm -r /user/alex/input 
$ hdfs dfs -put input /user/alex/input 
$ hdfs dfs -ls /user/alex/input (just to check to make sure it is there) 
 
 
h. Finally it is time to run the wordcount program.  
$ Hadoop jar ~/Hadoop/share/Hadoop/mapreduce/Hadoop-mapreduce-examples-
2.7.3.jar wordcount input output 
 
i. Check the output 
$ hdfs dfs -cat output/* 
 
 
j. Copy over the output to the “local” machine.  
$ hdfs dfs -get /user/alex/output/ . 
$ ls (to verify) 
$ ls output (to verify) 
 
k. Open it up in your favorite editor. Have fun looking through the results.  
  
Guide was taken from: 
https://medium.com/@luck/installing-hadoop-2-7-2-on-ubuntu-16-04-3a34837ad2db