Sample Projects.

Slides:



Advertisements
Similar presentations
CS525: Special Topics in DBs Large-Scale Data Management
Advertisements

Mining Association Rules from Microarray Gene Expression Data.
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Mining Frequent Patterns Using FP-Growth Method Ivan Tanasić Department of Computer Engineering and Computer Science, School of Electrical.
Association Analysis (Data Engineering). Type of attributes in assoc. analysis Association rule mining assumes the input data consists of binary attributes.
Mining Biological Data
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
Bryan Lahartinger. “The Apriori algorithm is a fundamental correlation-based data mining [technique]” “Software implementations of the Aprioiri algorithm.
By Dan Stalloch. Association – what could be linked together in away with something Patterns – sequential and time series, shows us how often certain.
Data Mining Association Analysis: Basic Concepts and Algorithms
Frequent Itemsets Mining in Distributed Wireless Sensor Networks Manjunath Rajashekhar.
Weekly Report Start learning GPU Ph.D. Student: Leo Lee date: Sep. 18, 2009.
Summarization of Frequent Pattern Mining. What is FPM? Why being frequent is so important? Application of FPM Decision make/Business Software Debugging.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Performance and Scalability: Apriori Implementation.
Association Analysis (5) (Mining Word Associations)
Chapter 5 Mining Association Rules with FP Tree Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
Mining Association Rules between Sets of Items in Large Databases presented by Zhuang Wang.
Analytics Map Reduce Query Insight Hive Pig Hadoop SQL Map Reduce Business Intelligence Predictive Operational Interactive Visualization Exploratory.
CS525: Big Data Analytics Machine Learning on Hadoop Fall 2013 Elke A. Rundensteiner 1.
Ch5 Mining Frequent Patterns, Associations, and Correlations
1 Apriori Algorithm Review for Finals. SE 157B, Spring Semester 2007 Professor Lee By Gaurang Negandhi.
Distributed Computing Rik Sarkar. Distributed Computing Old style: Use a computer for computation.
Hadoop Basics -Venkat Cherukupalli. What is Hadoop? Open Source Distributed processing Large data sets across clusters Commodity, shared-nothing servers.
Data Mining Frequent-Pattern Tree Approach Towards ARM Lecture
EFFICIENT ITEMSET EXTRACTION USING IMINE INDEX By By U.P.Pushpavalli U.P.Pushpavalli II Year ME(CSE) II Year ME(CSE)
computer
Tools for Privacy Preserving Distributed Data Mining
Apriori Algorithms Feapres Project. Outline 1.Association Rules Overview 2.Apriori Overview – Apriori Advantage and Disadvantage 3.Apriori Algorithms.
Apache Mahout. Mahout Introduction Machine Learning Clustering K-means Canopy Clustering Fuzzy K-Means Conclusion.
Frequent Item Mining. What is data mining? =Pattern Mining? What patterns? Why are they useful?
Advanced Analytics on Hadoop Spring 2014 WPI, Mohamed Eltabakh 1.
I. Problem  Improve large-scale retrieval / classification accuracy  Incorporate spatial relationship between the features in the image  Oxford 5K Dataset.
Hadoop + Mahout Anton Slutsky, Lead Data Scientist, EPAM Systems
 Frequent Word Combinations Mining and Indexing on HBase Hemanth Gokavarapu Santhosh Kumar Saminathan.
Apache Mahout Qiaodi Zhuang Xijing Zhang.
Redpoll A machine learning library based on hadoop Jeremy CS Dept. Jinan University, Guangzhou.
HEMANTH GOKAVARAPU SANTHOSH KUMAR SAMINATHAN Frequent Word Combinations Mining and Indexing on HBase.
CPS 216: Data-intensive Computing Systems Information about Project 1 Shivnath Babu.
Biomedicine and Big Data Analyzing spatio-temporal patterns in biomedical data Normal Stiff Wavy.
Guided By Ms. Shikha Pachouly Assistant Professor Computer Engineering Department 2/29/2016.
By Shivaraman Janakiraman, Magesh Khanna Vadivelu.
Tallahassee, Florida, 2016 CIS4930 Introduction to Data Mining Midterm Review Peixiang Zhao.
Spanning Trees Alyce Brady CS 510: Computer Algorithms.
Implementation of Classifier Tool in Twister Magesh khanna Vadivelu Shivaraman Janakiraman.
Data Mining – Association Rules
MapReduce MapReduce is one of the most popular distributed programming models Model has two phases: Map Phase: Distributed processing based on key, value.
CFI-Stream: Mining Closed Frequent Itemsets in Data Streams
Big Data Analytics: HW#2
Data Mining Association Analysis: Basic Concepts and Algorithms
Distributed Computation Framework for Machine Learning
Waikato Environment for Knowledge Analysis
DATA ANALYTICS AND TEXT MINING
Jiawei Han Department of Computer Science
Chapter 6 Tutorial.
Data Structures and Algorithms in Parallel Computing
SEG 4630 E-Commerce Data Mining — Final Review —
Mining Association Rules from Stars
Mining Complex Data COMP Seminar Spring 2011.
12/2/2018.
COMP5331 FP-Tree Prepared by Raymond Wong Presented by Raymond Wong
732A02 Data Mining - Clustering and Association Analysis
Objectives Data Mining Course
FP-Growth Wenlong Zhang.
C.2.10 Sample Questions.
Finding Frequent Itemsets by Transaction Mapping
C.2.8 Sample Questions.
C.2.8 Sample Questions.
EAST MDSplus Log Data Management System
Presentation transcript:

Sample Projects

Hadoop-based Real World Applications Do you have any large scale data processing problem in your work or research? Can you formalize your data analysis problem? Can you use Hadoop to scale the computation? Make sure you can finish it in about one month.

Algorithm Design and Programming Project Implement a Hadoop-based Frequent itemset mining Apriori Eclat FP-tree MapReduce PLSA (EM-algorithm) Hadoop based Subgraph Matching

Analysis of Hadoop and hadoop-based system Hadoop, Giraph, Mahout… Open Source Projects Lack of good design documentation Many inefficiency bottleneck Read and analyze one of open source projects