By Shivaraman Janakiraman, Magesh Khanna Vadivelu.

By Shivaraman Janakiraman, Magesh Khanna Vadivelu

Introduction Mining frequent item sets from large databases– an important problem in data mining Proposed to implement Apriori algorithm in Hadoop MapReduce MapReduce, a programming model for large data sets Programs written in this functional style are automatically parallelized and executed on a large cluster of machines programmers without any experience with parallel and distributed systems - easily utilize the resources of a large distributed system.

The Apriori Algorithm

Generating 1-itemset Frequent Pattern

MapReduce Isolated processes - Hadoop limits communication - each individual record processed by a task in isolation from one another records are processed in isolation by tasks called Mappers Mappers is then brought together into a second set of tasks called Reducers, where results from different mappers can be merged together.

Implementation Timeline TimeTask Week 11/07/2011 Discuss the algorithm and design the coding methodology sequentially Week 11/14/2011 Complete coding the algorithm sequentially Week 11/21/2011 Complete coding the algorithm sequentially Week 11/28/2011 Discuss the design and implementation in twister Project Review Discuss the design with partial implementation in twister Week 12/05/2011 Complete the implementation in twister Week 12/12/2011 Do validation Project Review Presentation

Thank you Questions?

By Shivaraman Janakiraman, Magesh Khanna Vadivelu.

Similar presentations

Presentation on theme: "By Shivaraman Janakiraman, Magesh Khanna Vadivelu."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

By Shivaraman Janakiraman, Magesh Khanna Vadivelu.

Similar presentations

Presentation on theme: "By Shivaraman Janakiraman, Magesh Khanna Vadivelu."— Presentation transcript:

Similar presentations

About project

Feedback