1 Interacting with Data Materials from a Course in Princeton University -- Hu Yan.

Slides:



Advertisements
Similar presentations
1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)
Advertisements

Decision Trees Decision tree representation ID3 learning algorithm
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
1er. Escuela Red ProTIC - Tandil, de Abril, Decision Tree Learning 3.1 Introduction –Method for approximation of discrete-valued target functions.
ICS320-Foundations of Adaptive and Learning Systems
Classification Techniques: Decision Tree Learning
Lazy vs. Eager Learning Lazy vs. eager learning
Decision Tree Learning 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
Decision Tree Learning
K nearest neighbor and Rocchio algorithm
Decision Tree Rong Jin. Determine Milage Per Gallon.
Part 7.3 Decision Trees Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting.
Decision Tree Algorithm
Instance Based Learning
CS 590M Fall 2001: Security Issues in Data Mining Lecture 4: ID3.
Induction of Decision Trees
Decision Trees Decision tree representation Top Down Construction
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Aprendizagem baseada em instâncias (K vizinhos mais próximos)
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Ch 3. Decision Tree Learning
Classification.
INSTANCE-BASE LEARNING
Decision Tree Learning
Decision tree learning
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Machine Learning Chapter 3. Decision Tree Learning
Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.
Artificial Intelligence 7. Decision trees
Mohammad Ali Keyvanrad
Decision tree learning Maria Simi, 2010/2011 Inductive inference with decision trees  Decision Trees is one of the most widely used and practical methods.
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
Machine Learning Lecture 10 Decision Tree Learning 1.
CpSc 810: Machine Learning Decision Tree Learning.
Learning from Observations Chapter 18 Through
1 Knowledge Discovery Transparencies prepared by Ho Tu Bao [JAIST] ITCS 6162.
For Wednesday No reading Homework: –Chapter 18, exercise 6.
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.
Decision Trees. What is a decision tree? Input = assignment of values for given attributes –Discrete (often Boolean) or continuous Output = predicated.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Decision Trees, Part 1 Reading: Textbook, Chapter 6.
Decision Tree Learning
Training Examples. Entropy and Information Gain Information answers questions The more clueless I am about the answer initially, the more information.
Decision Tree Learning Presented by Ping Zhang Nov. 26th, 2007.
Seminar on Machine Learning Rada Mihalcea Decision Trees Very short intro to Weka January 27, 2003.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.
1 Learning Bias & Clustering Louis Oliphant CS based on slides by Burr H. Settles.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
CS 8751 ML & KDDInstance Based Learning1 k-Nearest Neighbor Locally weighted regression Radial basis functions Case-based reasoning Lazy and eager learning.
Decision Tree Learning DA514 - Lecture Slides 2 Modified and expanded from: E. Alpaydin-ML (chapter 9) T. Mitchell-ML.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.
Machine Learning Inductive Learning and Decision Trees
CS 9633 Machine Learning Decision Tree Learning
Decision Tree Learning
Decision trees (concept learnig)
Decision trees (concept learnig)
Decision Tree Saed Sayad 9/21/2018.
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning: Lecture 3
Decision Trees Decision tree representation ID3 learning algorithm
Machine Learning Chapter 3. Decision Tree Learning
Decision Trees Decision tree representation ID3 learning algorithm
A task of induction to find patterns
Presentation transcript:

1 Interacting with Data Materials from a Course in Princeton University -- Hu Yan

2 Outline Introduction to this course Introduction to Classification The Nearest Neighbor Algorithm Decision Tree Algorithm Conclusion and future talks

3 What is this course about? This course is about data!  how to get the most out of data and convert data into knowledge, information or predictions. Examples of the datasets  credit cards: every purchase you make is tracked, used to detect fraud, marketing purposes, make predictions  security cameras: used for tracking (enforce fine), or finding criminals via facial recognition software.  articles: articles are indexed in multiple databases, organize articles by topics or even track the evolution of topics over time. There are all kinds of data  Text, images, transaction records, etc.

4 Tasks make predictions or classifications  classify customers whether or not switch companies cluster or organize data  cluster articles by topic  different from classification: don’t know the classes ahead of time find “simple” descriptions of complex objects  find a simple description of faces identify what is typical and what is an outlier  identify purchases that are typical or unusual for a given customer

5 Perspective Related fields  Pattern recognition (from 60s) primarily concerns with images  Machine learning (from 80s) was a natural outgrowth of Artificial Intelligence (AI)  Data mining (from 90s) in order to deal with the vast amounts of data to discover “interesting patterns” This course is largely a mixture of statistics, machine learning, and data mining Look at interacting with data:  Classification, clustering, regression, and dimensionality reduction

6 Outline Introduction to this course Introduction to Classification The Nearest Neighbor Algorithm Decision Tree Algorithm Conclusion and future talks

7 Introduction to Classification Classifying objects from a data set based on a certain characteristic.  Binary classification: positive or negative. Classification learning algorithm  Input: labeled data sets  Output: classifier (predict the label of input unclassified examples)

8 Example classification criterion: any integer greater than 196 or less than 47 will be labeled negative, and positive otherwise.

9 Example a decimal integer is positive if the second and sixth most significant bits in its binary representation are set; it’s negative otherwise.

10 Outline Introduction to this course Introduction to Classification The Nearest Neighbor Algorithm Decision Tree Algorithm Conclusion and future talks

11 The Nearest Neighbor Algorithm Training  There are m training examples.  Each training example is of the form (x i, y i ), where x i \in R n and y i \in {v 1, …, v s }.  Store all the training examples. Testing.  Given a test point x, predict y i where x i is the closest training example to x.

12 The Nearest Neighbor Algorithm is a kind of Instance-based learning methods. referred as to “lazy” learning methods.  Simply store the training examples, delay processing until a new instance must be classified  Some methods construct a general, explicit description of the target function when training examples are provided  advantage: instead of estimating the target function once for the entire space, estimate it locally and differently for each new instance.  disadvantage: the cost of classifying new instances can be high.

13 k-Nearest Neighbor Algorithm

14 k-Nearest Neighbor Algorithm Two-dimensional space Positive, negative 1-nearest neighbor: x q + 5-nearest neighbor: x q -

15 k-Nearest Neighbor Algorithm Never forms an explicit general hypothesis f^ regarding the target function f Simply computes the classification of each new query instance as needed What’s the implicit general function?

16 Distance-weighted k-Nearest Neighbor Algorithm Obvious refinement  Weight the contribution of each of the k neighbors according to their distance to the query point.

17 Curse of dimensionality Imagine instances described by 20 attributes but only 2 are relevant to target function Curse of dimensionality nearest neighbor is easily mislead when high-dimensional One approach  Stretch jth axis by weight z j where z 1, …, z n chosen to minimize prediction error  Use cross-validation to automatically choose weights z 1, …, z n  Note setting z j to zero eliminates this dimension altogether

18 Outline Introduction to this course Introduction to Classification The Nearest Neighbor Algorithm Decision Tree Algorithm  Decision tree representation  ID3 learning algorithm  Entropy, Information gain  Overfitting Conclusion and future talks

19 Decision tree for PlayTennis

20 Decision tree representation Instances are represented by attribute-value pairs Each internal node tests an attribute Each branch corresponds to attribute value Each leaf node assigns a classification In general, decision tree represent a disjunction of conjunctions of constraints on the attribute values of attributes tests.

21 Building a Decision tree ID3 (1986), C4.5 (1993) A top-down, greedy search through the space of possible decision trees. Main loop:  A  the best decision attribute for next node;  Assign A as decision attribute for node;  For each value of A create new branch of node;  Sort training examples to leaf nodes;  If training examples perfectly classified Then STOP Else iterate over new leaf nodes;

22 Entropy S is a sample of training examples p + is the proportion of positive examples in S p - is the proportion of negative examples in S Entropy measures the impurity of S Entropy ([9+,5-]) = -(9/14)log2(9/14) - (5/14)log2(5/14) = 0.940

23 Entropy function Entropy(S) = expected number of bits needed to encode class (+ or -) of randomly drawn member of S (under the optimal shortest-length code)

24 Information Gain

25 Information Gain S is a collection of training example days described by attributes including Wind, which have the values Weak and Strong. S contains 14 examples, [9+, 5-] 6 of the positive and 2 of the negative examples have Wind = Weak, and the remainder have Wind = Strong.

26 Training Examples

27 Which Attribute Is the Best Classifier Information gain is the measure used by ID3 to select the best attribute at each step in growing the tree. Example: information gain of two attributes: Humidity, and Wind, is computed to determine witch is better for classifying the training examples.

28 An Illustrative Example Gain(S,Outlook) = Gain(S,Humidity) = Gain(S,Wind) = Gain(S,Temp) = Which attributes should be tested here?

29 Selecting the Next Attribute S sunny = {D1,D2,D8,D9,D11} Gain (S sunny, Humidity) =0.970 Gain (S sunny, Temp) = Gain (S sunny, Wind)= ,2,8,9,113,7,12,13 4,5,6,10,14 9,111,2,8

30 Hypothesis Space Search by ID3 ID3 search through the space of possible decision trees from simple to increasingly complex, guided by the information gain Gain(S,A)

31 Hypothesis Space Search by ID3 ID3 searches a complete hypothesis space, it searches incompletely through the space. Outputs a single hypothesis; No back tracking, converging to Local optimal solution (maybe not global optimal); Using statistical properties, robust to noisy data; Inductive bias: Preference for short trees and for those with high information gain attributes near the root

32 DayTempHumidityWindPlay D1CoolHighWeakNo D2CoolNormalWeakNo D3HotHighStrongNo D4HotNormalWeakYes D5CoolNormalStrongYes humid temp Ywind N 1,3 highnormal 2,4,5 5 2 hot 2, 5 cool 4 NY strong weak humidtemp Y wind N 3 highnormal 5 3,5 1,2,4 hot 1,2 cool 4 NY strongweak [2+,3-] [1+,2-][1+,1-] [2+,3-] [2+,1-] [1+,1-] Gain(s) = -2/5 log 2 2/5 – 3/5 log 2 3/5 = Gain(S,humid) = Gain(S,wind) = Gain(S,Temp) =0.805

33 Overfitting in Decision Tree Learning Consider error of hypothesis h over training data error train (h) entire distribution D of data error D (h) Hypothesis h \in H overfits training data if there is an alternative hypothesis h’ \in H such that error train (h) error D (h’)

34 Overfitting in Decision Tree Learning

35 Avoiding Overfitting How can we avoid overfitting?  stop growing before it reaches the point where it perfect classifies the training dada  grow full tree then post-prune (widely used) How to select best tree during the pruning?  Split data into training and validation set  build decision tree over training data  measure performance over separate validation data set Two ways of pruning:  reduced-error pruning  rule post pruning

36 Reduced Error Pruning 1. Split data into training and validation set 2. Build the tree over training data 3. For each of the decision node Evaluate impact on validation set of pruning each decision node remove the one that improves validation set accuracy  removing the subtree rooted at that node, making it a leaf node  assigning it the most common classification of the training examples affiliated with that node

37 Effect of Reduced-Error Pruning

38 Rule Post Pruning 1. Convert tree to equivalent set of rules (if-then expression) 2. Prune (generalize) each rule by removing any preconditions that improves its estimated accuracy 3. Sort the pruned rules by their estimated accuracy (can be used in classifying subsequent instances) IF (Outlook= Sunny) and (Humidity = High) THEN PlayTennis = No IF (Outlook = Sunny) and (Humidity = Normal) THEN PlayTennis = Yes ….

39 Conclusion Interacting with data  how to get the most out of data and convert data into knowledge, information or predictions  Classification, clustering, regression, and dimensionality reduction Classification  categorize objects into particular classes based on their attributes The Nearest Neighbor Algorithm Decision Tree Algorithm

40 Contents Classification  K-nearest-neighbor algorithm, Decision trees  Computational learning theory  Boosting, Support vector machines Clustering  K-means clustering, Agglomerative clustering Graphic Models (a marriage of probability theory and graph theory)  Naive Bayes classification, EM (Expectation-Maximization) algorithm Regression (predict a real value quantity based on observed data)  Linear regression, Logistic regression Dimensionality Reduction (reduce the representation of data)  PCA (Principal Components Analysis), Factor analysis Advanced Topics and Applications

41 Thank you !