SparkBLAST is a parallelization of a sequence alignment application (BLAST) that employs cloud computing for the provisioning of computational resources and Apache Spark as the coordination framework. Here you will find suplementary material to the article entitled "SparkBLAST: Scalable BLAST processing using in-memory operations", submitted to the BMC Bioinformatics journal.
Contents:
bacteria - Contais the FASTA files with genomic data for all the 11 bacteria used in the experiments.
com/rbh - Source code (in Java) to find the RBH.
images - The figures of the paper.
src/main/scala - SparkBLAST source code in Scala.
HowTo - How to run SparkBLAST.
RBH.xls - Results with RBH data.
README.md - This file.
Results for Google and Azure.xlsx - Results with performance data.
simple.sbt - Script to compile the src folder.