I don’t need a title slide for a lecture

Slides:



Advertisements
Similar presentations
LOGO Association Rule Lecturer: Dr. Bo Yuan
Advertisements

10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.
Data Mining Sangeeta Devadiga CS 157B, Spring 2007.
Data Mining Techniques Cluster Analysis Induction Neural Networks OLAP Data Visualization.
1 Data Warehousing. 2 Data Warehouse A data warehouse is a huge database that stores historical data Example: Store information about all sales of products.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
Week 9 Data Mining System (Knowledge Data Discovery)
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
CS 590M Fall 2001: Security Issues in Data Mining Lecture 5: Association Rules, Sequential Associations.
Data Mining By Archana Ketkar.
Mining Association Rules
Data Mining Concepts 1.1 COT5230 Data Mining Week 1 Data Mining Concepts M O N A S H A U S T R A L I A ’ S I N T E R N A T I O N A L U N I V E R S I T.
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Performance and Scalability: Apriori Implementation.
Pattern Recognition Lecture 20: Data Mining 3 Dr. Richard Spillman Pacific Lutheran University.
CIS 674 Introduction to Data Mining
Data Mining CS 157B Section 2 Keng Teng Lao. Overview Definition of Data Mining Application of Data Mining.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.
『 Data Mining 』 By Jung, hae-sun. 1.Introduction 2.Definition 3.Data Mining Applications 4.Data Mining Tasks 5. Overview of the System 6. Data Mining.
Data warehousing and mining. 2 Introduction Organizations getting larger and amassing ever increasing amounts of data Historic data encodes useful information.
FALL 2012 DSCI5240 Graduate Presentation By Xxxxxxx.
Data Mining Chun-Hung Chou
Data Management Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Data Mining CS157B Fall 04 Professor Lee By Yanhua Xue.
Chapter 1 Introduction to Data Mining
1 Knowledge Discovery Transparencies prepared by Ho Tu Bao [JAIST] ITCS 6162.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Copyright © 2004 Pearson Education, Inc.. Chapter 27 Data Mining Concepts.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
CS 8751 ML & KDDSupport Vector Machines1 Mining Association Rules KDD from a DBMS point of view –The importance of efficiency Market basket analysis Association.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Outline Knowledge discovery in databases. Data warehousing. Data mining. Different types of data mining. The Apriori algorithm for generating association.
Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94 presented by kurt partridge cse 590db oct 4, 1999.
Chapter 13 Designing Databases Systems Analysis and Design Kendall & Kendall Sixth Edition.
Data Mining Find information from data data ? information.
1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.
IS 320 Notes for April 15, Learning Objectives Understand database concepts. Use normalization to efficiently store data in a database. Use.
Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Data Mining and Decision Support
Academic Year 2014 Spring Academic Year 2014 Spring.
Monday, February 22,  The term analytics is often used interchangeably with:  Data science  Data mining  Knowledge discovery  Extracting useful.
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
Data Mining: Confluence of Multiple Disciplines Data Mining Database Systems Statistics Other Disciplines Algorithm Machine Learning Visualization.
Data Mining – Intro.
Data Mining Find information from data data ? information.
DATA MINING © Prentice Hall.
A Research Oriented Study Report By :- Akash Saxena
Association rule mining
MIS 451 Building Business Intelligence Systems
Data and Applications Security Introduction to Data Mining
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Jiawei Han Department of Computer Science
Data Mining: Concepts and Techniques Course Outline
Sangeeta Devadiga CS 157B, Spring 2007
Data Analysis.
Data Warehousing and Data Mining
Supporting End-User Access
Data Warehousing Data Mining Privacy
Chapter 17 Designing Databases
Presentation transcript:

I don’t need a title slide for a lecture Long long ago, in a galaxy far, far away… 11/28/2018

Outline Background Data mining Association Rules Classification Clustering Sequential Patterns Sequence Similarity 11/28/2018

Knowledge Discovery in Databases (KDD) What is it? Finding useful patterns in data Why do we need it? Terabytes of data Impractical to manually search for patterns Where does data mining come in? 11/28/2018

Steps of a KDD process Learn the application domain Create a target dataset Clean and preprocess data Choose type of data mining Pick an algorithm Perform data mining Interpret results 11/28/2018

Databases vs. Data warehousing Storage of all data Details or summaries Metadata Data cleaning, integration Databases Queries over current data Persistent storage Atomic updates 11/28/2018

Databases vs. Data warehouses Databases provide for: Queries over current data Persistent storage Atomic updates Data warehouses provide for: Storage of all data Meta data Data cleaning, integration Fast access to data 11/28/2018

Who’s interested? Databases - large amounts of data Artificial Intelligence - search, planning, machine learning Information Retrieval - searching for similar documents Image Processing - finding similar images 11/28/2018

Types of data mining Association Rules Classification Clustering Sequential Patterns Sequence Similarity 11/28/2018

Association rules What are they? Where are they used? Looking for common causal relationships in basket data Where are they used? Store layout Catalog design Customer segmentation 11/28/2018

Association rules example Find all itemsets that occur at least twice, and the causal relationship of each 11/28/2018

Association rules metrics For a rule a b support = a and b occur together in at least s% of the n baskets confidence = of all of the baskets containing a, at least c% also contain b 11/28/2018

Association rules algorithms Focus on finding support for “itemsets” The naïve method: Combine itemsets of size k-1 that differ only on the last item to find Candidatesk Measure support of itemsets from step 1 to form large itemsetk Increase k and repeat until no new large itemsets 11/28/2018

Itemsets of size 1 Looking for support of 2 11/28/2018

Finding candidate set 2 11/28/2018

Finding candidate set 3 11/28/2018

Apriori algorithm An itemset cannot be a large itemset unless all of its subsets are large itemsets Reduces number of candidate itemsets considered 11/28/2018

Research directions Online construction of rules CARMA (Berkeley) Pre filtering the data a posteriori (Limburgs Universitair Centrum) 11/28/2018

Classification What is it? Where is it used? Rules that partition data into separate groups. Where is it used? to classify people as good/bad credit risks weather prediction fraud detection Variation: best k of n (who to send flyers to) 11/28/2018

Classification example 11/28/2018

Possible solutions Bayesian classification Neural networks Genetic algorithms Decision Trees 11/28/2018

Decision trees Salary < 25,000 no yes Graduate education? Accept no Reject 11/28/2018

Decision trees Build the tree in two steps Build a perfect tree on sample data At each node, pick a “good” attribute Split data according to attribute Recursively build tree on children Prune the tree Minimum Description Length Cost of encoding tree structure Cost of encoding split attribute Cost of encoding leaf data records 11/28/2018

Research directions Integrate building and pruning Incremental Updates PUBLIC (Bell Labs) Incremental Updates BOAT (University of Wisconsin) 11/28/2018

Clustering What is it? Where is it used? Given n points, separate them into k clusters Where is it used? Information retrieval - text classification Identify similar web documents Mapping the universe 11/28/2018

Clustering example 11/28/2018

Traditional clustering algorithms Partitional Determine k partitions that optimize a function Common function is the “square error function” Hierarchical Each point starts as a cluster Clusters are merged until k clusters remain 11/28/2018

Clustering difficulties 11/28/2018

Research directions Higher dimension subspace clustering CLIQUE (IBM Almaden) Incremental clustering Incremental DBScan (University of Munich) Remove problems with outliers CURE (Bell Labs) 11/28/2018

Sequential patterns What is it? Where is it used? Given a set of events, find frequently occurring patterns Where is it used? Analyzing basket data Medical diagnosis 11/28/2018

Sequential patterns example 11/28/2018

AprioriAll Create all large events that occur once Map each subset to numbers While there still are large itemsets: Find candidate itemsets of length k Find large itemsets of length k Increase k 11/28/2018

Mapping the itemsets 11/28/2018

Research directions Time limitations WINEPI (Helsinki/Microsoft) Itemsets over multiple transactions CSP (IBM Almaden) 11/28/2018

Sequence Similarity What is it? Where is it used? Given a number of data sets, look for similar trends Where is it used? Find stocks with similar price movements Find geological irregularities 11/28/2018

Example Are the two sequences similar? 11/28/2018

Basic algorithm Scale data Match all gap-free sequences Form pairs of large similar sequences Find the longest common subsequence 11/28/2018

Research directions Finding surprising patterns IBM Almaden 11/28/2018

Data mining directions Sampling Fractals Pre-partitioning data Making data mining more accessible User defined aggregation support 11/28/2018

References General Data mining: http://www.almaden.ibm.com/cs/quest, www.bell-labs.com/project/serendip Association Rules: “Fast Algorithms for Mining Association Rules”, Agrawal and Srikant; VLDB 94. Classification: “PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning”, Rastogi and Shim; VLDB 98. 11/28/2018

References (cont.) Clustering: “CURE: An Efficient Clustering Algorithm for Large Databases”, Guha, Rastogi, Shim; SIGMOD 98. Sequential Patterns: “Mining Sequential Patterns: Generalizations and Performance Improvements”, Srikant and Agrawal; EDBT 98. Similarity Search: “Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases”, Agrawal, Nin, Sawhney, and Shim; VLDB 95. 11/28/2018