Yassmeen
Abu
Hasson
-‐
2555788
Extra
Lab
2
Part
#1:
Install Hadoop
Step
1:
Install
Homebrew
$ ruby -e "$(curl –fsSL https://raw.github.com/mxcl/homebrew/go/install)"
Yassmeen
Abu
Hasson
-‐
2555788
Extra
Lab
2
Yassmeen
Abu
Hasson
-‐
2555788
Extra
Lab
2
Installing
the
command
Line
done
automatically
Yassmeen
Abu
Hasson
-‐
2555788
Extra
Lab
2
Step2:
Installing
Hadoop
Step
3:
Configure
Hadoop
Note:
It
installed
(
Hadoop-‐1.2.1
)
version
Yassmeen
Abu
Hasson
-‐
2555788
Extra
Lab
2
Step3:
Continue…
Add
the
following
line
to conf/hadoop-env.sh:
export
HADOOP_OPTS="-‐Djava.security.krb5.realm=
-‐Djava.security.krb5.kdc="
Yassmeen
Abu
Hasson
-‐
2555788
Extra
Lab
2
Add
the
following
lines
to conf/core-site.xml inside
the
configuration
tags:
fs.default.name
hdfs://localhost:9000
Yassmeen
Abu
Hasson
-‐
2555788
Extra
Lab
2
Add
the
following
lines
to conf/hdfs-site.xml inside
the
configuration
tags:
dfs.replication
1
Yassmeen
Abu
Hasson
-‐
2555788
Extra
Lab
2
Add
the
following
lines to conf/mapred-site.xml inside
the
configuration
tags:
mapred.job.tracker
localhost:9001
Step
4:
Enable
SSH
to
localhost
Go
to
System
Preferences
>
Sharing.
Make
sure
“Remote
Login”
is
checked.
$ ssh-keygen -t rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Yassmeen
Abu
Hasson
-‐
2555788
Extra
Lab
2
Yassmeen
Abu
Hasson
-‐
2555788
Extra
Lab
2
Yassmeen
Abu
Hasson
-‐
2555788
Extra
Lab
2
Step
5:
Format
Hadoop
filesystem
$ bin/hadoop namenode -format
Yassmeen
Abu
Hasson
-‐
2555788
Extra
Lab
2
Step
6:
Start
Hadoop
$ bin/start-all.sh
Make
sure
that
all
Hadoop
processes
are
running:
$ jps
Yassmeen
Abu
Hasson
-‐
2555788
Extra
Lab
2
Run
a
Hadoop
example:
$ bin/hadoop jar /usr/local/Cellar/hadoop/1.2.1/libexec/hadoop-examples-1.2.1.jar pi 10 100
Yassmeen
Abu
Hasson
-‐
2555788
Extra
Lab
2
Yassmeen
Abu
Hasson
-‐
2555788
Extra
Lab
2
Step
7:
Verify
hadoop
started
properly
using:
(
Output
must
be
6
)
ps ax | grep hadoop | wc -l
Yassmeen
Abu
Hasson
-‐
2555788
Extra
Lab
2
Part
#2:
Install and Run Eclipse
Step
1:
Create
a
java
project
(
wordcount
)
Step
2:
Configure
the
project
-‐
Select
the
project WordCount in
the
Package
Explorer.
-‐
Select
File
>
Properties.
-‐
Select
Java
Build
Path.
-‐
Select
Libraries.
-‐
Press
Add
External
JARS
and
select
the
following
file:
Yassmeen
Abu
Hasson
-‐
2555788
Extra
Lab
2
Step
3:
Add
a
Java
class
to
the
project
(WordCount.java)
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class WordCount {
public static class Map extends Mapper {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context) throws IOException,
InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
}
}
}
public static class Reduce extends Reducer {
public void reduce(Text key, Iterable values, Context context) throws
IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
context.write(key, new IntWritable(sum));
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf, "WordCount");
job.setJarByClass(WordCount.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(Map.class);
job.setCombinerClass(Reduce.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}
}
Yassmeen
Abu
Hasson
-‐
2555788
Extra
Lab
2
Yassmeen
Abu
Hasson
-‐
2555788
Extra
Lab
2
Step
4:
Configure
the
application
*
Select
the
project
WordCount
in
the
Package
Explorer.
*
Select
Run
>
Run
Configurations…
*
Select
Java
Application.
*
Press
the
icon
for
New
launch
configuration.
*
Enter
wordcount
as
the
name.
*
Enter
WordCount
as
the
main
class.
*
Select
Arguments.
*
Add
the
following
line
to
Program
arguments:
input
output
*
Add
the
following
line
to
VM
arguments:
-Djava.security.krb5.realm= -Djava.security.krb5.kdc=
*
Press
Apply.
*
Press
Close.
Yassmeen
Abu
Hasson
-‐
2555788
Extra
Lab
2
Yassmeen
Abu
Hasson
-‐
2555788
Extra
Lab
2
Step
5:
Create
input
files
$ cd ~/Documents/workspace/WordCount
$ mkdir input
$ curl http://www.gutenberg.org/cache/epub/1342/pg1342.txt >
input/pg1342.txt
$ curl http://www.gutenberg.org/cache/epub/4300/pg4300.txt >
input/pg4300.txt
$ curl
http://www.gutenberg.org/cache/epub/5000/pg5000.txt >
input/pg5000.txt
$ curl http://www.gutenberg.org/cache/epub/20417/pg20417.txt >
input/pg20417.txt
Yassmeen
Abu
Hasson
-‐
2555788
Extra
Lab
2
Step
6:
Run
the
application
This
will
create
output
files _SUCCESS and part-r-00000 in
a
folder output.
Yassmeen
Abu
Hasson
-‐
2555788
Extra
Lab
2
Examine
the
output:
$ cat output/part-r-00000
Yassmeen
Abu
Hasson
-‐
2555788
Extra
Lab
2
-‐
Create
my
own
file
and
run
it
using
WordCount
program
:
*
First,
I
removed
the
output
folder
before
rerunning
the
application:
$ rm -rf output
*
This
the
output
I
got
when
I
run
my
own
file
in
Eclipse
Yassmeen
Abu
Hasson
-‐
2555788
Extra
Lab
2