Data Mining Techniques Cluster Analysis Induction Neural Networks OLAP Data Visualization.

Slides:



Advertisements
Similar presentations
Association Rule and Sequential Pattern Mining for Episode Extraction Jonathan Yip.
Advertisements

Data Mining (Apriori Algorithm)DCS 802, Spring DCS 802 Data Mining Apriori Algorithm Spring of 2002 Prof. Sung-Hyuk Cha School of Computer Science.
Mining Multiple-level Association Rules in Large Databases
Association Rules Spring Data Mining: What is it?  Two definitions:  The first one, classic and well-known, says that data mining is the nontrivial.
IT 433 Data Warehousing and Data Mining Association Rules Assist.Prof.Songül Albayrak Yıldız Technical University Computer Engineering Department
10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.
Chase Repp.  knowledge discovery  searching, analyzing, and sifting through large data sets to find new patterns, trends, and relationships contained.
Rakesh Agrawal Ramakrishnan Srikant
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Quality Class 9. Rule Discovery Decision and Classification Trees Association Rules.
4/3/01CS632 - Data Mining1 Data Mining Presented By: Kevin Seng.
Association Rules Presented by: Anilkumar Panicker Presented by: Anilkumar Panicker.
DATA MINING -ASSOCIATION RULES-
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Association Rule Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
Association Rule Mining (Some material adapted from: Mining Sequential Patterns by Karuna Pande Joshi)‏
2/8/00CSE 711 data mining: Apriori Algorithm by S. Cha 1 CSE 711 Seminar on Data Mining: Apriori Algorithm By Sung-Hyuk Cha.
Search engines. The number of Internet hosts exceeded in in in in in
Fast Algorithms for Association Rule Mining
Research Project Mining Negative Rules in Large Databases using GRD.
Mining Association Rules
Mining Association Rules in Large Databases. What Is Association Rule Mining?  Association rule mining: Finding frequent patterns, associations, correlations,
Association Discovery from Databases Association rules are a simple formalism for expressing positive connections between columns in a 0/1 matrix. A classical.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.
『 Data Mining 』 By Jung, hae-sun. 1.Introduction 2.Definition 3.Data Mining Applications 4.Data Mining Tasks 5. Overview of the System 6. Data Mining.
Data Mining Techniques
Data Mining Chun-Hung Chou
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
1 1 Slide Introduction to Data Mining and Business Intelligence.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
1 Mining Association Rules Mohamed G. Elfeky. 2 Introduction Data mining is the discovery of knowledge and useful information from the large amounts of.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.
1 Topics about Data Warehouses What is a data warehouse? How does a data warehouse differ from a transaction processing database? What are the characteristics.
Introduction of Data Mining and Association Rules cs157 Spring 2009 Instructor: Dr. Sin-Min Lee Student: Dongyi Jia.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Data Mining In contrast to the traditional (reactive) DSS tools, the data mining premise is proactive. Data mining tools automatically search the data.
Fast Algorithms For Mining Association Rules By Rakesh Agrawal and R. Srikant Presented By: Chirayu Modi.
Intelligent Internet Agents for Distributed Data Mining {yzhang, sowen, sprasad, Yanqing Zhang, Scott Owen, Sushil Prasad.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Outline Knowledge discovery in databases. Data warehousing. Data mining. Different types of data mining. The Apriori algorithm for generating association.
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
Data Mining Find information from data data ? information.
Association Rule Mining
WEEK INTRODUCTION IT440 ARTIFICIAL INTELLIGENCE.
CS Data Mining1 Data Mining The Extraction of useful information from data The automated extraction of hidden predictive information from (large)
Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
 Frequent Word Combinations Mining and Indexing on HBase Hemanth Gokavarapu Santhosh Kumar Saminathan.
1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.
HEMANTH GOKAVARAPU SANTHOSH KUMAR SAMINATHAN Frequent Word Combinations Mining and Indexing on HBase.
Data Resource Management Agenda What types of data are stored by organizations? How are different types of data stored? What are the potential problems.
Chapter 8 Association Rules. Data Warehouse and Data Mining Chapter 10 2 Content Association rule mining Mining single-dimensional Boolean association.
Chap 6: Association Rules. Rule Rules!  Motivation ~ recent progress in data mining + warehousing have made it possible to collect HUGE amount of data.
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
COMP53311 Association Rule Mining Prepared by Raymond Wong Presented by Raymond Wong
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining COMP Seminar BCB 713 Module Spring 2011.
Smart Web Search Agents Data Search Engines >> Information Search Agents - Traditional searching on the Web is done using one of the following three: -
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
DATA MINING © Prentice Hall.
A Research Oriented Study Report By :- Akash Saxena
Association Rules Repoussis Panagiotis.
RESEARCH APPROACH.
Data Mining Association Rules Assoc.Prof.Songül Varlı Albayrak
Association Rule Mining
I don’t need a title slide for a lecture
Supporting End-User Access
DATA MINING E0 261 Jayant Haritsa Computer Science and Automation
Fast Algorithms for Mining Association Rules
Presentation transcript:

Data Mining Techniques Cluster Analysis Induction Neural Networks OLAP Data Visualization

Association Rule An association rule is a rule, which implies certain association relationships among a set of objects (such as “occur together” or “one implies the other”) in a database. Given a set of transactions, where each transaction is a set of literals (called items), an association rule is an expression of the form X Y, where X and Y are sets of items. The intuitive meaning of such a rule is that transactions of the database, which contain X, tend to contain Y.

Support The support of an item set S is the percentage of those transactions in T which contain S. If U is the set of all transactions that contain all items in S, then support(S) = (|U| / |T|) *100%, where |U| and |T| are the number of elements in U and T, respectively.

Confidence Confidence of a candidate rule X Y is calculated as support(XY) / support(X). The confidence of rule X Y represents the percentage of transactions containing items in X that also contain items in Y

Example: Association Rule In a store we might have I={cheese,ham,bread,butter,salt,coke} A transaction could look like: t={bread,butter} for a customer who bought cheese and coke. An association rule would be like the following bread=>butter with support 60% and confidence 80% also bought butter.

Apriori Algorithm Find all combinations of items that have transaction support above minimum support. Call those combinations frequent itemsets. Use the frequent itemsets to generate the desired rules.

Apriori Algorithm(cont’d) Pass 1 1.Generate the candidate itemsets in C 1 2.Save the frequent itemsets in L 1 Pass k 1.Generate the candidate itemsets in Ck from the frequent itemsets in L k-1 2.Join L k-1 with L k-1, as follows: insert into C k select p.item 1, q.item 1,..., p.item k-1, q.item k-1 from L k-1 p, L k-1 q where p.item 1 = q.item 1,..., p.item k-1 < q.item k-1

Apriori Algorithm(cont’d) 3. Generate all (k-1)-subsets from the candidate itemsets in C k 4. Prune all candidate itemsets from C k where some (k-1)-subset of the candidate itemset is not in the frequent itemset L k-1 2. Scan the transaction database to determine the support for each candidate itemset in C k 3. Save the frequent itemsets in L k

Smart Web Search Agents Data Search Engines >> Information Search Agents - Traditional searching on the Web is done using one of the following three: - Directories (Yahoo, Lycos, etc) - Search Engines (AltaVista, NorthernLight, etc) - Metasearch Engines (MetaCrawler, SavvySearch, AskJeeves, etc) All of these involve keyword searches; Drawback: not easily personalized, too many results (although many give relevancy factors)

- local cache databases (containing frequently asked queries/results; possibly updated periodically - nightly!) - local cache information base (containing mined information and discovered knowledge for efficient personal use) - domain-based agents (e.g. Job Search; Sports- NBA Stats, Bibliography-Digital Libraries)

Intelligent Tools for E-Business Computational Intelligence, Neural Networks, Fuzzy Logic, Genetic Algorithms, Hybrid Systems Learning Algorithms, Heuristic Searching Data Analysis and Modeling, Data Fusion and Mining, Knowledge Discovery Prediction & Time Series Analysis Information Retrieval, Intelligent User Interface Intelligent Agents, Distributed IA and Multi-Agents, Cooperative Knowledge-based Systems

Enhancing E-Business Process Through Data Mining Quality of discovered knowledge –Having right data –Having appropriate data mining tools!!! Traditional Data Mining Tools –Simple query and reporting –Visualization driven data exploration tools, OLAP –Discovery process is user driven

Intelligent Data Mining Tools Automate the process of discovering patterns/knowledge in data Require hypothesis, exploration Derive business knowledge (patterns) from data Combine business knowledge of users with results of discovery algorithms

Intelligent Information Agents The Data Mining Problem: –Clustering/ Classification –Association –Sequencing Viewed as an Optimization Problem Tools: Genetic Algorithms

Fuzzy Rules Discovering Rules discovering : The discovery of associations between business events, i.e. which items are purchased together In order to do flexible querying and intelligent searching, fuzzy query is developed to uncover potential valuable knowledge Fuzzy Query uses fuzzy terms like tall, small, and near to define linguistic concepts and formulate a query Automated search for fuzzy Rules is carried out by the discovery of fuzzy clusters or segmentation in data