A Data Mining Tutorial David Madigan

Slides:



Advertisements
Similar presentations
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Advertisements

CSE 634 Data Mining Techniques
What is Statistical Modeling
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
A Data Mining Tutorial David Madigan
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Mining Association Rules in Large Databases
6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.
These slides are additional material for TIES4451 Lecture 5 TIES445 Data mining Nov-Dec 2007 Sami Äyrämö.
Scan Statistics via Permutation Tests David Madigan.
Mining Association Rules
Mining Association Rules
Data Mining – Intro.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Mining Association Rules in Large Databases. What Is Association Rule Mining?  Association rule mining: Finding frequent patterns, associations, correlations,
Pattern Recognition Lecture 20: Data Mining 3 Dr. Richard Spillman Pacific Lutheran University.
Classification and Prediction: Regression Analysis
Association Discovery from Databases Association rules are a simple formalism for expressing positive connections between columns in a 0/1 matrix. A classical.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Anomaly detection with Bayesian networks Website: John Sandiford.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Principles of Data Mining. Introduction: Topics 1. Introduction to Data Mining 2. Nature of Data Sets 3. Types of Structure Models and Patterns 4. Data.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Data Mining Find information from data data ? information.
Data Mining: An Overview David Madigan
Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.
Data Mining and Decision Support
Data Mining  Association Rule  Classification  Clustering.
Chapter 8 Association Rules. Data Warehouse and Data Mining Chapter 10 2 Content Association rule mining Mining single-dimensional Boolean association.
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining COMP Seminar BCB 713 Module Spring 2011.
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Oracle Advanced Analytics
CS 9633 Machine Learning Support Vector Machines
Data Mining – Intro.
Data Mining Find information from data data ? information.
Data Transformation: Normalization
Chapter 7. Classification and Prediction
Deep Feedforward Networks
Machine Learning and Data Mining Clustering
DATA MINING © Prentice Hall.
Association rule mining
Mining Association Rules
Waikato Environment for Knowledge Analysis
©Jiawei Han and Micheline Kamber
Data Mining Lecture 11.
Market Basket Many-to-many relationship between different objects
Data Mining II: Association Rule mining & Classification
Mining Association Rules in Large Databases
Association Rule Mining
I don’t need a title slide for a lecture
Association Rule Mining
Unit 3 MINING FREQUENT PATTERNS ASSOCIATION AND CORRELATIONS
10701 / Machine Learning Today: - Cross validation,
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
Market Basket Analysis and Association Rules
©Jiawei Han and Micheline Kamber
Generally Discriminant Analysis
Biointelligence Laboratory, Seoul National University
Machine Learning and Data Mining Clustering
Presentation transcript:

A Data Mining Tutorial David Madigan dmadigan@rci.rutgers.edu http://stat.rutgers.edu/~madigan

Overview Brief Introduction to Data Mining Data Mining Algorithms Specific Examples Algorithms: Disease Clusters Algorithms: Model-Based Clustering Algorithms: Frequent Items and Association Rules Future Directions, etc.

Of “laws”, Monsters, and Giants… Moore’s law: processing “capacity” doubles every 18 months : CPU, cache, memory It’s more aggressive cousin: Disk storage “capacity” doubles every 9 months What do the two “laws” combined produce? A rapidly growing gap between our ability to generate data, and our ability to make use of it.

Finding interesting structure in data What is Data Mining? Finding interesting structure in data Structure: refers to statistical patterns, predictive models, hidden relationships Examples of tasks addressed by Data Mining Predictive Modeling (classification, regression) Segmentation (Data Clustering ) Summarization Visualization

Ronny Kohavi, ICML 1998

Ronny Kohavi, ICML 1998

Ronny Kohavi, ICML 1998

Stories: Online Retailing

Data Mining Algorithms “A data mining algorithm is a well-defined procedure that takes data as input and produces output in the form of models or patterns” Hand, Mannila, and Smyth “well-defined”: can be encoded in software “algorithm”: must terminate after some finite number of steps

Algorithm Components 1. The task the algorithm is used to address (e.g. classification, clustering, etc.) 2. The structure of the model or pattern we are fitting to the data (e.g. a linear regression model) 3. The score function used to judge the quality of the fitted models or patterns (e.g. accuracy, BIC, etc.) 4. The search or optimization method used to search over parameters and/or structures (e.g. steepest descent, MCMC, etc.) 5. The data management technique used for storing, indexing, and retrieving data (critical when data too large to reside in memory)

Backpropagation data mining algorithm x1 h1 x2 y x3 h2 x4 4 2 vector of p input values multiplied by p  d1 weight matrix resulting d1 values individually transformed by non-linear function resulting d1 values multiplied by d1  d2 weight matrix 1

Backpropagation (cont.) Parameters: Score: Search: steepest descent; search for structure?

Models and Patterns Models Probability Distributions Structured Data Prediction Linear regression Piecewise linear

Models Probability Distributions Structured Data Prediction Linear regression Piecewise linear Nonparamatric regression

Models Probability Distributions Structured Data Prediction Linear regression Piecewise linear Nonparametric regression Classification logistic regression naïve bayes/TAN/bayesian networks NN support vector machines Trees etc.

Models Probability Distributions Structured Data Prediction Linear regression Piecewise linear Nonparametric regression Classification Parametric models Mixtures of parametric models Graphical Markov models (categorical, continuous, mixed)

Models Probability Distributions Structured Data Prediction Time series Markov models Mixture Transition Distribution models Hidden Markov models Spatial models Linear regression Piecewise linear Nonparametric regression Classification Parametric models Mixtures of parametric models Graphical Markov models (categorical, continuous, mixed)

Bias-Variance Tradeoff High Bias - Low Variance Low Bias - High Variance “overfitting” - modeling the random component Score function should embody the compromise

The Curse of Dimensionality X ~ MVNp (0 , I) Gaussian kernel density estimation Bandwidth chosen to minimize MSE at the mean Suppose want: Dimension # data points 1 4 2 19 3 67 6 2,790 10 842,000

Patterns Local Global Outlier detection Changepoint detection Bump hunting Scan statistics Association rules Clustering via partitioning Hierarchical Clustering Mixture Models

Scan Statistics via Permutation Tests x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x The curve represents a road Each “x” marks an accident Red “x” denotes an injury accident Black “x” means no injury Is there a stretch of road where there is an unually large fraction of injury accidents?

Scan with Fixed Window If we know the length of the “stretch of road” that we seek, e.g., we could slide this window long the road and find the most “unusual” window location x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

How Unusual is a Window? Let pW and p¬W denote the true probability of being red inside and outside the window respectively. Let (xW ,nW) and (x¬W ,n¬W) denote the corresponding counts Use the GLRT for comparing H0: pW = p¬W versus H1: pW ≠ p¬W lambda measures how unusual a window is -2 log l here has an asymptotic chi-square distribution with 1df

Permutation Test Since we look at the smallest l over all window locations, need to find the distribution of smallest-l under the null hypothesis that there are no clusters Look at the distribution of smallest-l over say 999 random relabellings of the colors of the x’s smallest-l xx x xxx x xx x xx x 0.376 xx x xxx x xx x xx x 0.233 xx x xxx x xx x xx x 0.412 xx x xxx x xx x xx x 0.222 … Look at the position of observed smallest-l in this distribution to get the scan statistic p-value (e.g., if observed smallest-l is 5th smallest, p-value is 0.005)

Variable Length Window No need to use fixed-length window. Examine all possible windows up to say half the length of the entire road O = fatal accident O = non-fatal accident

Spatial Scan Statistics Spatial scan statistic uses, e.g., circles instead of line segments

Spatial-Temporal Scan Statistics Spatial-temporal scan statistic use cylinders where the height of the cylinder represents a time window

Other Issues Poisson model also common (instead of the bernoulli model) Covariate adjustment Andrew Moore’s group at CMU: efficient algorithms for scan statistics

Software: SaTScan + others http://www.satscan.org http://www.phrl.org http://www.terraseer.com

Association Rules: Support and Confidence Customer buys both Customer buys diaper Find all the rules Y  Z with minimum confidence and support support, s, probability that a transaction contains {Y & Z} confidence, c, conditional probability that a transaction having {Y & Z} also contains Z Customer buys beer Let minimum support 50%, and minimum confidence 50%, we have A  C (50%, 66.6%) C  A (50%, 100%)

Mining Association Rules—An Example Min. support 50% Min. confidence 50% For rule A  C: support = support({A &C}) = 50% confidence = support({A &C})/support({A}) = 66.6% The Apriori principle: Any subset of a frequent itemset must be frequent

Mining Frequent Itemsets: the Key Step Find the frequent itemsets: the sets of items that have minimum support A subset of a frequent itemset must also be a frequent itemset i.e., if {AB} is a frequent itemset, both {A} and {B} should be a frequent itemset Iteratively find frequent itemsets with cardinality from 1 to k (k-itemset) Use the frequent itemsets to generate association rules.

The Apriori Algorithm Join Step: Ck is generated by joining Lk-1with itself Prune Step: Any (k-1)-itemset that is not frequent cannot be a subset of a frequent k-itemset Pseudo-code: Ck: Candidate itemset of size k Lk : frequent itemset of size k L1 = {frequent items}; for (k = 1; Lk !=; k++) do begin Ck+1 = candidates generated from Lk; for each transaction t in database do increment the count of all candidates in Ck+1 that are contained in t Lk+1 = candidates in Ck+1 with min_support end return k Lk;

The Apriori Algorithm — Example Database D L1 C1 Scan D C2 C2 L2 Scan D C3 L3 Scan D

Association Rule Mining: A Road Map Boolean vs. quantitative associations (Based on the types of values handled) buys(x, “SQLServer”) ^ buys(x, “DMBook”) ® buys(x, “DBMiner”) [0.2%, 60%] age(x, “30..39”) ^ income(x, “42..48K”) ® buys(x, “PC”) [1%, 75%] Single dimension vs. multiple dimensional associations (see ex. Above) Single level vs. multiple-level analysis What brands of beers are associated with what brands of diapers? Various extensions (thousands!)

Model-based Clustering Padhraic Smyth, UCI

Mixtures of {Sequences, Curves, …} Generative Model - select a component ck for individual i - generate data according to p(Di | ck) - p(Di | ck) can be very general - e.g., sets of sequences, spatial patterns, etc [Note: given p(Di | ck), we can define an EM algorithm]

Example: Mixtures of SFSMs Simple model for traversal on a Web site (equivalent to first-order Markov with end-state) Generative model for large sets of Web users - different behaviors <=> mixture of SFSMs EM algorithm is quite simple: weighted counts

WebCanvas: Cadez, Heckerman, et al, KDD 2000

Discussion What is data mining? Hard to pin down – who cares? Textbook statistical ideas with a new focus on algorithms Lots of new ideas too

Privacy and Data Mining Ronny Kohavi, ICML 1998