A task of induction to find patterns

A task of induction to find patterns
Classification A task of induction to find patterns 11/9/2018 CSE 572, CBS572: Data Mining by H. Liu

CSE 572, CBS572: Data Mining by H. Liu
Outline Data and its format Problem of Classification Learning a classifier Different approaches Key issues 11/9/2018 CSE 572, CBS572: Data Mining by H. Liu

Data and its format Data attribute-value pairs with/without class Data type continuous/discrete nominal Data format Flat If not flat, what should we do? 11/9/2018 CSE 572, CBS572: Data Mining by H. Liu

Sample data 11/9/2018 CSE 572, CBS572: Data Mining by H. Liu

Induction from databases
Inferring knowledge from data The task of deduction infer information that is a logical consequence of querying a database Who conducted this class before? Which courses are attended by Mary? Deductive databases: extending the RDBMS RDBMS - relational database management systems RDBMS offer simple operators for the deduction of information, such as join 11/9/2018 CSE 572, CBS572: Data Mining by H. Liu

Classification It is one type of induction data with class labels Examples - If weather is rainy then no golf If Induction is different from deduction and DBMS does not not support induction; The result of induction is higher-level information or knowledge: general statements about data There are many approaches. We focus on three approaches here, other examples: Other approaches Instance-based learning other neural networks Concept learning (Version space, Focus, Aq11, …) Genetic algorithms Reinforcement learning 11/9/2018 CSE 572, CBS572: Data Mining by H. Liu

Different approaches There exist many techniques Decision trees Neural networks K-nearest neighbours Naïve Bayesian classifiers Support Vector Machines Ensemble methods Semi-supervised and many more ... 11/9/2018 CSE 572, CBS572: Data Mining by H. Liu

A decision tree Outlook Humidity Wind sunny overcast rain No high normal strong weak Yes Issues How to build such a tree from the data? What are the criteria for performance measurement correctness conciseness What are the key components? test stopping criterion 11/9/2018 CSE 572, CBS572: Data Mining by H. Liu

Inducing a decision tree
There are many possible trees let’s try it on the golfing data How to find the most compact one that is consistent with the data (i.e., accurate)? Why the most compact? Occam’s razor principle Issue of efficiency w.r.t. optimality How to find an optimal tree? Is there any need for a quick review for basic probability theory? 11/9/2018 CSE 572, CBS572: Data Mining by H. Liu

Information gain and Entropy - Information gain - the difference between the node before and after splitting 11/9/2018 CSE 572, CBS572: Data Mining by H. Liu

Building a compact tree
The key to building a decision tree - which attribute to choose in order to branch. The heuristic is to choose the attribute with the maximum IG. Another explanation is to reduce uncertainty as much as possible. 11/9/2018 CSE 572, CBS572: Data Mining by H. Liu

Learning a decision tree
Should Outlook be chosen first? If not, which one should? Outlook sunny overcast rain Humidity Wind No high normal strong weak Yes No Yes No 11/9/2018 CSE 572, CBS572: Data Mining by H. Liu

Issues of Decision Trees
Number of values of an attribute Your solution? When to stop Data fragmentation problem Any solution? Mixed data types Scalability 11/9/2018 CSE 572, CBS572: Data Mining by H. Liu

Rules and Tree stumps Generating rules from decision trees One path is a rule We can do better. Why? Tree stumps and 1R For each attribute value, determine a default class (#of values = # of rules) Calculate the # of errors for each rule Find the total # of errors for that attribute’s rule set For n attributes, there are n rule sets Choose the rule set that has the least # of errors Let’s go back to our example data and learn a 1R rule 11/9/2018 CSE 572, CBS572: Data Mining by H. Liu

K-Nearest Neighbor One of the most intuitive classification algorithm An unseen instance’s class is determined by its nearest neighbor The problem is it is sensitive to noise Instead of using one neighbor, we can use k neighbors 11/9/2018 CSE 572, CBS572: Data Mining by H. Liu

K-NN New problems How large should k be lazy learning – does it learn? large storage A toy example (noise, majority) How good is k-NN? How to compare Speed Accuracy 11/9/2018 CSE 572, CBS572: Data Mining by H. Liu

Naïve Bayes Classifier
This is a direct application of Bayes’ rule P(C|X) = P(X|C)P(C)/P(X) X - a vector of x1,x2,…,xn That’s the best classifier we can build But, there are problems There are only a limited number of instances How to estimate P(x|C) Your suggestions? 11/9/2018 CSE 572, CBS572: Data Mining by H. Liu

NBC (2) Assume conditional independence between xi’s We have P(C|x) ≈ P(x1|C) P(xi|C) (xn|C)P(C) What’s missing? Is it really correct? Why? An example (Golfing or not) How good is it in reality? Even when the assumption is not held true … How to update an NBC when new data stream in? What if one of P(xi|C) is 0? Laplace estimator – adding 1 to each count 11/9/2018 CSE 572, CBS572: Data Mining by H. Liu

“No Free Lunch” If the goal is to obtain good generalization performance, there are no context-independent or usage-independent reasons to favor one learning or classification method over another. What does it indicate? Or is it easy to choose a good classifier for your application? Again, there is no off-the-shelf solution for a reasonably challenging application. Source: Pattern Classification, 2nd Edition 11/9/2018 CSE 572, CBS572: Data Mining by H. Liu

Ensemble Methods Motivation Achieve the stability of classification Model generation Bagging (Bootstrap Aggregating) Boosting Model combination Majority voting Meta learning Stacking (using different types of classifiers) Examples (classify-ensemble.ppt) 11/9/2018 CSE 572, CBS572: Data Mining by H. Liu

AdaBoost.M1 (from the Weka Book)
Model generation Assign equal weight to each training instance For t iterations: Apply learning algorithm to weighted dataset, store resulting model Compute model’s error e on weighted dataset If e = 0 or e > 0.5: Terminate model generation For each instance in dataset: If classified correctly by model: Multiply instance’s weight by e/(1-e) Normalize weight of all instances Classification Assign weight = 0 to all classes For each of the t models (or fewer): For the class this model predicts add –log e/(1-e) to this class’s weight Return class with highest weight 11/9/2018 CSE 572, CBS572: Data Mining by H. Liu

Using many different classifiers
We have learned some basic and often-used classifiers There are many more out there. Regression Discriminant analysis Neural networks Support vector machines Pick the most suitable one for an application Where to find all these classifiers? Don’t reinvent the wheel that is not as round We will likely come back to classification and discuss support vector machines as requested 11/9/2018 CSE 572, CBS572: Data Mining by H. Liu

Assignment 3 Questions about classification and evaluation (deadline 2/14, Wednesday) Manually create a decision tree for the golfing data (D) Manually create a NBC for D How to create 1-NN for D in your view? Discuss your thoughts. Run your decision tree algorithm (if you don’t like to implement your own algorithm, you can use an available one) on D using 10-fold cross validation (or leave-one-out for this particular D) and 5 2-fold cross validation Discuss the differences between the above two evaluation methods 11/9/2018 CSE 572, CBS572: Data Mining by H. Liu

Some software for demo or for teaching
C4.5 at the Rulequest site The free demo versions of Magnum Opus (for association rule mining) can be downloaded from the Rulequest site Alphaminer (you probably will like it) at WEKA 11/9/2018 CSE 572, CBS572: Data Mining by H. Liu

Classification via Neural Networks
Squash  A perceptron 11/9/2018 CSE 572, CBS572: Data Mining by H. Liu

What can a perceptron do?
Neuron as a computing device To separate a linearly separable points Nice things about a perceptron distributed representation local learning weight adjusting 11/9/2018 CSE 572, CBS572: Data Mining by H. Liu

Linear threshold unit Basic concepts: projection, thresholding W vectors evoke 1 W = [.11 .6] L= [.7 .7] .5 11/9/2018 CSE 572, CBS572: Data Mining by H. Liu

E.g. 1: solution region for AND problem
Find a weight vector that satisfies all the constraints AND problem 0 0 0 0 1 0 1 0 0 1 1 1 11/9/2018 CSE 572, CBS572: Data Mining by H. Liu

E.g. 2: Solution region for XOR problem?
0 0 0 0 1 1 1 0 1 1 1 0 11/9/2018 CSE 572, CBS572: Data Mining by H. Liu

Learning by error reduction
Perceptron learning algorithm If the activation level of the output unit is 1 when it should be 0, reduce the weight on the link to the ith input unit by r*Li, where Li is the ith input value and r a learning rate If the activation level of the output unit is 0 when it should be 1, increase the weight on the link to the ith input unit by r*Li Otherwise, do nothing 11/9/2018 CSE 572, CBS572: Data Mining by H. Liu

Multi-layer perceptrons
Using the chain rule, we can back-propagate the errors for a multi-layer perceptrons. Output layer Hidden layer Differences between DT and NN Speed Accuracy Comprehensibility Which one to use Many successful applications of both approaches Input layer 11/9/2018 CSE 572, CBS572: Data Mining by H. Liu

A task of induction to find patterns

Similar presentations

Presentation on theme: "A task of induction to find patterns"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A task of induction to find patterns

Similar presentations

Presentation on theme: "A task of induction to find patterns"— Presentation transcript:

Similar presentations

About project

Feedback