Download presentation
Presentation is loading. Please wait.
Published byIra Bruno Sullivan Modified over 9 years ago
1
By Shivaraman Janakiraman, Magesh Khanna Vadivelu
2
Introduction Mining frequent item sets from large databases– an important problem in data mining Proposed to implement Apriori algorithm in Hadoop MapReduce MapReduce, a programming model for large data sets Programs written in this functional style are automatically parallelized and executed on a large cluster of machines programmers without any experience with parallel and distributed systems - easily utilize the resources of a large distributed system.
4
The Apriori Algorithm
5
Generating 1-itemset Frequent Pattern
6
Generating 2-itemset Frequent Pattern
7
Generating 3-itemset Frequent Pattern
8
MapReduce Isolated processes - Hadoop limits communication - each individual record processed by a task in isolation from one another records are processed in isolation by tasks called Mappers Mappers is then brought together into a second set of tasks called Reducers, where results from different mappers can be merged together.
10
Implementation Timeline TimeTask Week 11/07/2011 Discuss the algorithm and design the coding methodology sequentially Week 11/14/2011 Complete coding the algorithm sequentially Week 11/21/2011 Complete coding the algorithm sequentially Week 11/28/2011 Discuss the design and implementation in twister Project Review Discuss the design with partial implementation in twister Week 12/05/2011 Complete the implementation in twister Week 12/12/2011 Do validation Project Review Presentation
11
Thank you Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.