Name		Name	Last commit message	Last commit date
parent directory ..
apache_wiki-result		apache_wiki-result
big-result		big-result
images		images
README.md		README.md
bigWordCount.scala		bigWordCount.scala
smallWordCount.scala		smallWordCount.scala

README.md

Spark Word Count Data Analysis

Overview

Data Analysis & comparisons of the execution time take to compute word counts for different input textFile sizes executed on Spark-Shell in Scala in local cluster mode.

File Sizes

apache-hadoop-wiki.txt: 46.5 kB
big.txt: 6.5 MB

Prerequisites

Linux System
Hadoop
Spark 2.0 set up in Local cluster mode

Execution

In terminal execute the following command:

spark-shell -i "SparkWordCount.scala"

Observations

The average execution times for the spark jobs on Spark local mode are:

apache-hadoop-wiki.txt: 1 second
big.txt: 3 seconds

Source code

To view the source code of the word counts of:

Word Count Results

To view the results of the word counts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark-Word-Count-Data-Analysis

Spark-Word-Count-Data-Analysis

README.md

Spark Word Count Data Analysis

Overview

File Sizes

Prerequisites

Execution

Observations

Source code

Word Count Results

File Sources

Files

Spark-Word-Count-Data-Analysis

Directory actions

More options

Directory actions

More options

Latest commit

History

Spark-Word-Count-Data-Analysis

Folders and files

parent directory

README.md

Spark Word Count Data Analysis

Overview

File Sizes

Prerequisites

Execution

Observations

Source code

Word Count Results

File Sources