A task of induction to find patterns

Slides:



Advertisements
Similar presentations
Florida International University COP 4770 Introduction of Weka.
Advertisements

Ch. Eick: More on Machine Learning & Neural Networks Different Forms of Learning: –Learning agent receives feedback with respect to its actions (e.g. using.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Intelligent Environments1 Computer Science and Engineering University of Texas at Arlington.
Classification Techniques: Decision Tree Learning
Sparse vs. Ensemble Approaches to Supervised Learning
Ensemble Learning: An Introduction
Three kinds of learning
Sparse vs. Ensemble Approaches to Supervised Learning
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Machine Learning CS 165B Spring 2012
Issues with Data Mining
Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN.
Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
CSE573 Autumn /09/98 Machine Learning Administrative –Last topic: Decision Tree Learning Reading: 5.1, 5.4 Last time –finished NLP sample system’s.
CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.
Ensemble Classifiers.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning: Ensemble Methods
Machine Learning Supervised Learning Classification and Regression
Machine Learning Inductive Learning and Decision Trees
Data Mining Practical Machine Learning Tools and Techniques
CS240A Final Project 2.
Learning with Perceptrons and Neural Networks
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Prepared by: Mahmoud Rafeek Al-Farra
Data Science Algorithms: The Basic Methods
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Classification with Perceptrons Reading:
Supervised Learning Seminar Social Media Mining University UC3M
Dipartimento di Ingegneria «Enzo Ferrari»,
Cost-Sensitive Learning
Data Mining Lecture 11.
A task of induction to find patterns
Figure 1.1 Rules for the contact lens data.
Data Mining Practical Machine Learning Tools and Techniques
Machine Learning Techniques for Data Mining
Cost-Sensitive Learning
Introduction to Data Mining, 2nd Edition
Machine Learning: Lecture 3
Machine Learning Ensemble Learning: Voting, Boosting(Adaboost)
Generative Models and Naïve Bayes
Classification and Prediction
Artificial Intelligence Lecture No. 28
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Ensemble learning.
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Model Combination.
Artificial Intelligence 9. Perceptron
COSC 4335: Part2: Other Classification Techniques
Generative Models and Naïve Bayes
A task of induction to find patterns
Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,
David Kauchak CS158 – Spring 2019
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor
A task of induction to find patterns
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Data Mining CSCI 307, Spring 2019 Lecture 6
Presentation transcript:

A task of induction to find patterns Classification A task of induction to find patterns 4/19/2019 CSE 572, CBS572: Data Mining by H. Liu

CSE 572, CBS572: Data Mining by H. Liu Outline Data and its format Problem of Classification Learning a classifier Different approaches Key issues 4/19/2019 CSE 572, CBS572: Data Mining by H. Liu

CSE 572, CBS572: Data Mining by H. Liu Data and its format Data attribute-value pairs with/without class Data type continuous/discrete nominal Data format Flat If not flat, what should we do? 4/19/2019 CSE 572, CBS572: Data Mining by H. Liu

CSE 572, CBS572: Data Mining by H. Liu Sample data 4/19/2019 CSE 572, CBS572: Data Mining by H. Liu

Induction from databases Inferring knowledge from data The task of deduction infer information that is a logical consequence of querying a database Who conducted this class before? Which courses are attended by Mary? Deductive databases: extending the RDBMS RDBMS - relational database management systems RDBMS offer simple operators for the deduction of information, such as join 4/19/2019 CSE 572, CBS572: Data Mining by H. Liu

CSE 572, CBS572: Data Mining by H. Liu Classification It is one type of induction data with class labels Examples - If weather is rainy then no golf If Induction is different from deduction and DBMS does not not support induction; The result of induction is higher-level information or knowledge: general statements about data There are many approaches. Refer to the lecture notes for CS3244 available at the Co-Op. We focus on three approaches here, other examples: Other approaches Instance-based learning other neural networks Concept learning (Version space, Focus, Aq11, …) Genetic algorithms Reinforcement learning 4/19/2019 CSE 572, CBS572: Data Mining by H. Liu

CSE 572, CBS572: Data Mining by H. Liu Different approaches There exist many techniques Decision trees Neural networks K-nearest neighbors Naïve Bayesian classifiers Support Vector Machines Ensemble methods Semi-supervised and many more ... 4/19/2019 CSE 572, CBS572: Data Mining by H. Liu

CSE 572, CBS572: Data Mining by H. Liu A decision tree Outlook Humidity Wind sunny overcast rain YES high normal strong weak NO Issues How to build such a tree from the data? What are the criteria for performance measurement correctness conciseness What are the key components? test stopping criterion 4/19/2019 CSE 572, CBS572: Data Mining by H. Liu

Inducing a decision tree There are many possible trees let’s try it on the golfing data How to find the most compact one that is consistent with the data? Why the most compact? Occam’s razor principle Issue of efficiency w.r.t. optimality How to find an optimal tree? 4/19/2019 CSE 572, CBS572: Data Mining by H. Liu

CSE 572, CBS572: Data Mining by H. Liu Information gain and Entropy - Information gain - the difference between the node before and after splitting 4/19/2019 CSE 572, CBS572: Data Mining by H. Liu

Building a compact tree The key to building a decision tree - which attribute to choose in order to branch. The heuristic is to choose the attribute with the maximum IG. Another explanation is to reduce uncertainty as much as possible. 4/19/2019 CSE 572, CBS572: Data Mining by H. Liu

Learning a decision tree Outlook sunny overcast rain Humidity Wind YES high normal strong weak NO YES NO YES 4/19/2019 CSE 572, CBS572: Data Mining by H. Liu

Issues of Decision Trees Number of values of an attribute Your solution? When to stop Data fragmentation problem Any solution? Mixed data types Scalability 4/19/2019 CSE 572, CBS572: Data Mining by H. Liu

CSE 572, CBS572: Data Mining by H. Liu Rules and Tree stumps Generating rules from decision trees One path is a rule We can do better. Why? Tree stumps and 1R For each attribute value, determine a default class (#of values = # of rules) Calculate the # of errors for each rule Find # of errors for that attribute’s rule set Choose one rule set that has the least # of errors 4/19/2019 CSE 572, CBS572: Data Mining by H. Liu

CSE 572, CBS572: Data Mining by H. Liu K-Nearest Neighbor One of the most intuitive classification algorithm An unseen instance’s class is determined by its nearest neighbor The problem is it is sensitive to noise Instead of using one neighbor, we can use k neighbors Is there any need for a quick review for basic probability theory? 4/19/2019 CSE 572, CBS572: Data Mining by H. Liu

CSE 572, CBS572: Data Mining by H. Liu K-NN New problems How large should k be lazy learning – does it learn? large storage A toy example (noise, majority) How good is k-NN? How to compare Speed Accuracy 4/19/2019 CSE 572, CBS572: Data Mining by H. Liu

Naïve Bayes Classifier This is a direct application of Bayes’ rule P(C|X) = P(X|C)P(C)/P(X) X - a vector of x1,x2,…,xn That’s the best classifier we can build But, there are problems There are only a limited number of instances How to estimate P(x|C) Your suggestions? 4/19/2019 CSE 572, CBS572: Data Mining by H. Liu

CSE 572, CBS572: Data Mining by H. Liu NBC (2) Assume conditional independence between xi’s We have P(C|x) ≈ P(x1|C) P(xi|C) (xn|C)P(C) What’s missing? Is it really correct? Why? An example (Golfing or not) How good is it in reality? Even when the assumption is not held true … How to update an NBC when new data stream in? What if one of P(xi|C) is 0? Laplace estimator – adding 1 to each count 4/19/2019 CSE 572, CBS572: Data Mining by H. Liu

CSE 572, CBS572: Data Mining by H. Liu “No Free Lunch” If the goal is to obtain good generalization performance, there are no context-independent or usage-independent reasons to favor one learning or classification method over another. http://en.wikipedia.org/wiki/No-Free-Lunch_theorems What does it indicate? Or is it easy to choose a good classifier for your application? Again, there is no off-the-shelf solution for a reasonably challenging application. Source: Pattern Classification, 2nd Edition 4/19/2019 CSE 572, CBS572: Data Mining by H. Liu

CSE 572, CBS572: Data Mining by H. Liu Ensemble Methods Motivation Achieve the stability of classification Model generation Bagging (Bootstrap Aggregating) Boosting Model combination Majority voting Meta learning Stacking (using different types of classifiers) Examples (classify-ensemble.ppt) 4/19/2019 CSE 572, CBS572: Data Mining by H. Liu

AdaBoost.M1 (from the Weka Book) Model generation Assign equal weight to each training instance For t iterations: Apply learning algorithm to weighted dataset, store resulting model Compute model’s error e on weighted dataset If e = 0 or e > 0.5: Terminate model generation For each instance in dataset: If classified correctly by model: Multiply instance’s weight by e/(1-e) Normalize weight of all instances Classification Assign weight = 0 to all classes For each of the t models (or fewer): For the class this model predicts add –log e/(1-e) to this class’s weight Return class with highest weight 4/19/2019 CSE 572, CBS572: Data Mining by H. Liu

Using many different classifiers We have learned some basic and often-used classifiers There are many more out there. Regression Discriminant analysis Neural networks Support vector machines Pick the most suitable one for an application Where to find all these classifiers? Don’t reinvent the wheel that is not as round We will likely come back to classification and discuss support vector machines as requested 4/19/2019 CSE 572, CBS572: Data Mining by H. Liu

CSE 572, CBS572: Data Mining by H. Liu Assignment 3 Pick one of your favorite software package (feel free to use any at your disposal, as we discussed in class) Use the mushroom dataset found at UC Irvine Machine Learning Repository Run a decision tree induction algorithm to get the following: Use resubstitution error to measure Use 10-fold cross validation to measure Show the confusion matrix for the above two error measures Summarize and report your observations and conjectures if any Submit a hardcopy report on Wednesday 10/4/06 4/19/2019 CSE 572, CBS572: Data Mining by H. Liu

Some software for demo or for teaching C4.5 at the Rulequest site http://www.rulequest.com/download.html The free demo versions of Magnum Opus (for association rule mining) can be downloaded from the Rulequest site Alphaminer (you probably will like it) at http://www.eti.hku.hk/alphaminer/ WEKA http://www.cs.waikato.ac.nz/ml/weka/ 4/19/2019 CSE 572, CBS572: Data Mining by H. Liu

Classification via Neural Networks Squash  A perceptron 4/19/2019 CSE 572, CBS572: Data Mining by H. Liu

What can a perceptron do? Neuron as a computing device To separate a linearly separable points Nice things about a perceptron distributed representation local learning weight adjusting 4/19/2019 CSE 572, CBS572: Data Mining by H. Liu

CSE 572, CBS572: Data Mining by H. Liu Linear threshold unit Basic concepts: projection, thresholding W vectors evoke 1 W = [.11 .6] L= [.7 .7] .5 4/19/2019 CSE 572, CBS572: Data Mining by H. Liu

E.g. 1: solution region for AND problem Find a weight vector that satisfies all the constraints AND problem 0 0 0 0 1 0 1 0 0 1 1 1 4/19/2019 CSE 572, CBS572: Data Mining by H. Liu

E.g. 2: Solution region for XOR problem? 0 0 0 0 1 1 1 0 1 1 1 0 4/19/2019 CSE 572, CBS572: Data Mining by H. Liu

Learning by error reduction Perceptron learning algorithm If the activation level of the output unit is 1 when it should be 0, reduce the weight on the link to the ith input unit by r*Li, where Li is the ith input value and r a learning rate If the activation level of the output unit is 0 when it should be 1, increase the weight on the link to the ith input unit by r*Li Otherwise, do nothing 4/19/2019 CSE 572, CBS572: Data Mining by H. Liu

Multi-layer perceptrons Using the chain rule, we can back-propagate the errors for a multi-layer perceptrons. Output layer Hidden layer Differences between DT and NN Speed Accuracy Comprehensibility Which one to use Many successful applications of both approaches Input layer 4/19/2019 CSE 572, CBS572: Data Mining by H. Liu