Group 15 Swathi Gurram Prajakta Purohit K-means Clustering Group 15 Swathi Gurram Prajakta Purohit
Goal To program K-means on Twister (Iterative Map-Reduce) and Hadoop(Map - Reduce) and see how the change of framework effects the implementation time.
Survey Twister Configurable long running (cacheable) map/reduce tasks Pub/sub messaging based communication/data transfers Efficient support for Iterative MapReduce computation Combine phase to collect all reduce outputs Data access via local disks
Survey Hadoop: a software framework that supports data-intensive distributed applications Uses Map- reduce programming model it's own filesystem ( HDFS Hadoop Distributed File System based on the Google File System) which is specifically tailored for dealing with large files can intelligently manage the distribution of processing and your files, and breaking those files down into more manageable chunks for processing
Survey Haloop : a modified version of the Hadoop MapReduce framework provide caching options for loop-invariant data access let users reuse major building blocks from applications' Hadoop implementations have similar intra-job fault-tolerance mechanisms to Hadoop. HaLoop reduces query runtimes by 1.85 compared with Hadoop
K-means Clustering
K-means Clustering
Twister K-means
Hadoop K-means
Implementation Timeline Week Task Team member Oct 24th – Oct 31st Understand K-means algorithm and design Prajakta, Swathi Nov 1st – Nov 7th Implement K-means Nov 8th – Nov 21st Implement K-means on Twister and performance analysis Nov 21st – Nov 28th Optimized validation method for Kmeans algorithm Nov 29th – Dec 3rd Implement K-means on Hadoop Dec 4th – Dec 5th Performance Analysis and Presentation Dec 6th – Dec 12th Final Technical report
Validation methods
Conclusion Twister framework is faster than Hadoop for iterative map- reduce applications.
References http://salsahpc.indiana.edu http://www.iterativemapreduce.org/samples.html http://hadoop.apache.org/ http://en.wikipedia.org/wiki/Apache_Hadoop http://clue.cs.washington.edu/node/14 http://code.google.com/p/haloop/ http://www.cs.washington.edu/homes/billhowe/pubs/Ha Loop.pdf
Demo
Thank you