Java程序辅导

C C++ Java Python Processing编程在线培训 程序编写 软件开发 视频讲解

客服在线QQ:2653320439 微信:ittutor Email:itutor@qq.com
wx: cjtutor
QQ: 2653320439
Hadoop Example Program in Java Hadoop Example Program in Java This tutorial mirrors the Pythonic example of multifetch, but accomplishes the same task using the Hadoop Java API. Back to checklist Back to cs147a homepage Prereqs Same as for the Pythonic example. What you Will Create Again, same as the Pythonic example, except in Java. Let's Get Right to the Code View the source code for MultiFetch.java (opens in new window). Notes Notice the package declaration; you must either change this or else put MultiFetch.java in edu/brandeis/cs147a/examples. The class MultiFetch contains two nested classes, Map and Reduce. Nesting the classes in this way improves organization but is not necessary; arbitrary classes can be set for the mapper and reducer (and combiner) tasks when you provide the configuration. This class ignores URLs that are malformed or cannot be fetched for any reason. Consider the try/catch block in the map method of class Map. Each of the errors that can occur (malformed url, unmatched title, or unable to fetch page) result in printing a warning on stderr, and then processing continues. Output on stderr is found in log/userlogs/[map_task_id]. This directory can be hard to search, look in the web interface for the task id, then search each of the mapper invocation ids to find the stderr outputs, something like this: cat task_${TASKID}_m_*/stderr Where ${TASKID} is the id of the task found from the mapreduce console output or the web interface. Like the python example, the reducer essentially does nothing, it is just an identity function which outputs all the input tuples. Technically you can achieve this same behavior by setting the number of reduce tasks to 0, but we wanted you to have an example of setting up the reduce task scaffolding. Unlike the Python example (which uses HadoopStreaming and treats all pairs as lines of plain text), a Hadoop program in Java needs some typing for the keys and values of each pair. Keys and values must be of a type which implements Writable (i.e., one of the implementing classes listed here). This is because keys and values must be serialized in the particular way that Hadoop understands. How to Build mkdir multifetch_classes javac -classpath ${HADOOP_HOME}/hadoop-${HADOOP_VERSION}-core.jar \ -d multifetch_classes MultiFetch.java jar -cvf $HOME/proj/hadoop/MultiFetch.jar -C multifetch_classes/ . How to Initialize Data Input URLs into the DFS in the same way described in the Pythonic example. How to Run bin/hadoop jar $HOME/proj/MultiFetch.jar \ edu.brandeis.cs147a.examples.MultiFetch \ urls/* \ titles What's Next? Set up a real Hadoop cluster, or go back to the Python version of the example. 2008 Brandeis University — cs147a Networks and Distributed Computing