Data Analysis and comparison of execution times of a MapReduce program to compute the word counts of sample input text files of different sizes executed on Hadoop. Mapper and Reducer class are written in Java and run on the local cluster mode in Hadoop.
- Hadoop local mode
- JDK 8
- apache-hadoop-wiki.txt: 46.5 kB
- big.txt: 6.5 MB
-
Start hdfs and yarn daemons
start-hdfs.sh start-yarn.sh
-
Open a java project in Eclipse
-
In the java project, Add External Archives using a build path From the hadoop software folder add :
/common/*.jar /common/lib/All /hdfs/*.jar /mapreduce/*.jar /yarn/*.jar
-
Write the mapper, reducer and driver class
-
Export it into a .jar file
-
Make a sample input file (Source: text, data set)
-
Copy file from local to hdfs
hdfs dfs -copyFromLocal <sourcefile> <destinationPath>
-
Submit the file to the hdfs cluster
Hadoop jar <source.jar Path> MainClassDriver <sourceFile Path in hdfs> <Destination Folder path in hdfs>
-
Read the output file
hdfs dfs -cat /DestinationFolder/* to view the results
-
Stop all the hdfs daemons
stop-dfs.sh
stop-yarn.sh
The average execution times for the word count programs on hadoop are:
- apache-hadoop-wiki.txt: 3 seconds
- big.txt: 12 secs