Skip to content

Latest commit

 

History

History

Spark-Word-Count-Data-Analysis

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 

Spark Word Count Data Analysis

Apache Spark

Overview

Data Analysis & comparisons of the execution time take to compute word counts for different input textFile sizes executed on Spark-Shell in Scala in local cluster mode.

File Sizes

  • apache-hadoop-wiki.txt: 46.5 kB
  • big.txt: 6.5 MB

Prerequisites

  • Linux System
  • Hadoop
  • Spark 2.0 set up in Local cluster mode

Execution

In terminal execute the following command:

spark-shell -i "SparkWordCount.scala"

Observations


The average execution times for the spark jobs on Spark local mode are:

  • apache-hadoop-wiki.txt: 1 second
  • big.txt: 3 seconds

Source code

To view the source code of the word counts of:

Word Count Results

To view the results of the word counts

File Sources