Big Data Machine Learning using Apache Spark MLlib Mehdi Assefi , Ehsun Behravesh , Guangchi Liu , and Ahmad P. Tafti
Motivation Big Data World! Applications Challenges healthcare informatics genomic data analysis text mining stochastic modeling Challenges Cost Time
Major Libraries
Major Libraries Apache Spark StreamingEnhanced situational awareness, Apache Spark SQL, Spark GraphX, Apache Spark MLlib ,
Apache Spark MLlib platform independent open-source libraries distributed architecture and automatic data parallelization
Functions Regression dimension reduction Classification Clustering rule extraction
Pathway
Experimental Evaluation Datasets VMWARE Cluster environment Machine Learning Algorithms
Results
Conclusion
Questions?