Skip to content

Latest commit

Β 

History

History
86 lines (60 loc) Β· 3.79 KB

File metadata and controls

86 lines (60 loc) Β· 3.79 KB

GitHub license GitHub top language GitHub language count GitHub last commit ViewCount

Aim

To find out which of the programming languages and execution engines take the maximum and the minimum amount of time to process files.

Methodology

This πŸ“— project conducts data analysis πŸ“Š & comparisons of the execution times ⌚ taken for computing the word count of input text files varying from extremely small to extremely large sizes in various programming languages and execution engines. This project includes sample findings, observations, comparisons and sample word count programs. We then calculate the time taken to process the files individually and gather the results. All of the findings from individual analyses were collected and combined in a google colab notebook where we have plotted graphs using matplotlib and drawn conclusions based on our findings.

File Sizes

File Name Size
apache-hadoop-wiki.txt 46.5 kB
big.txt 6.5 MB

File Sources

Programming Languages

Computing for individual languages. Click the images to go to the respective data analysis results.


Python Java Scala

Execution engines

Computing for individual execution engines. Click the images to go to the respective data analysis results.


Hadoop Spark

Visualizing Results

Comparing Programming Languages

Languages findings

Comparing Execution Engines

Execution engines findings

Conclusions

We have observed from the graphs that Python has the least execution time for small and large files while Scala has the largest execution time.

We have observed that Spark has the least execution time while Hadoop has the highest execution time.

Notebook

The Google Colab Notebook with the complete Analysis with Graphs: Notebook