Machine Learning in Practice Lecture 23

Slides:

Advertisements

Similar presentations

Ch2 Data Preprocessing part3 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.

Advertisements

CHAPTER 9: Decision Trees

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.

Decision Trees with Numeric Tests

C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.

Molecular Biomedical Informatics 分子生醫資訊實驗室 Machine Learning and Bioinformatics 機器學習與生物資訊學 Machine Learning & Bioinformatics 1.

Machine Learning in Practice Lecture 7 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Chapter 7 – Classification and Regression Trees

CMPUT 466/551 Principal Source: CMU

Chapter 7 – Classification and Regression Trees

Machine Learning in Practice Lecture 3 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Decision Tree under MapReduce Week 14 Part II. Decision Tree.

Exploratory Data Mining and Data Preparation

Basic Data Mining Techniques

ICS 273A Intro Machine Learning

Special Topic: Missing Values. Missing Values Common in Real Data  Pneumonia: –6.3% of attribute values are missing –one attribute is missing in 61%

Ensemble Learning (2), Tree and Forest

Data Engineering Data preprocessing and transformation Data Engineering Data preprocessing and transformation.

Learning: Nearest Neighbor Artificial Intelligence CMSC January 31, 2002.

Slides for “Data Mining” by I. H. Witten and E. Frank.

Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

Chapter 9 – Classification and Regression Trees

Chapter 7: Transformations. Attribute Selection Adding irrelevant attributes confuses learning algorithms---so avoid such attributes Both divide-and-conquer.

Learning from Observations Chapter 18 Through

For Wednesday No reading Homework: –Chapter 18, exercise 6.

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.7: Instance-Based Learning Rodney Nielsen.

For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.

Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.

Machine Learning in Practice Lecture 19 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.

Why preprocessing? Learning method needs data type: numerical, nominal,.. Learning method cannot deal well enough with noisy / incomplete data Too many.

1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.

Machine Learning in Practice Lecture 10 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Machine Learning in Practice Lecture 21 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Machine Learning in Practice Lecture 8 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based Learning Instance Distances for Instance-Based.

Ensemble Learning, Boosting, and Bagging: Scaling up Decision Trees (with thanks to William Cohen of CMU, Michael Malohlava of 0xdata, and Manish Amde.

In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.

Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Machine Learning in Practice Lecture 18

Machine Learning with Spark MLlib

Advanced data mining with TagHelper and Weka

C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.

C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.

Ch9: Decision Trees 9.1 Introduction A decision tree:

Chapter 6 Classification and Prediction

Data Science Algorithms: The Basic Methods

Data preprocessing and transformation

Decision Tree Saed Sayad 9/21/2018.

Data Mining Practical Machine Learning Tools and Techniques

Learning with Identification Trees

Machine Learning in Practice Lecture 26

Lecture 6: Introduction to Machine Learning

CSCI N317 Computation for Scientific Applications Unit Weka

Machine Learning in Practice Lecture 22

Machine Learning in Practice Lecture 7

Machine Learning in Practice Lecture 17

Machine Learning in Practice Lecture 6

Machine Learning in Practice Lecture 19

Machine Learning in Practice Lecture 27

Chapter 7: Transformations

Feature Selection Methods

Ensembles An ensemble is a set of classifiers whose combined results give the final decision. test feature vector classifier 1 classifier 2 classifier.

Decision Trees Jeff Storey.

Machine Learning in Practice Lecture 20

Data Mining CSCI 307, Spring 2019 Lecture 11

Data Mining CSCI 307, Spring 2019 Lecture 6

Presentation transcript:

Machine Learning in Practice Lecture 23 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute

In the home stretch…. Announcements Questions? Quiz (second to last!) Homework (last!) Discretization and Time Series Transformations Data Cleansing http://keijiro.typepad.com/journey/images/finish_01.jpg

About the Quiz…. * Fold 1 1 2 3 4 5 For fold i in {1..5} Test set is test_i Select one of the remaining sets to be validation_i Concatenate the remaining sets into part_train_i For each algorithm in {X1-1, X1-2, X2-1, X2-2} Train the algorithm on part_train_i and test on validation_i Now concatenate all but test_i into train_i Now train the best algorithm on train_i and test on test_i to get the performance for fold i Average the performance for the 5 folds

About the Quiz…. * Fold 1 1 2 3 4 5 For fold i in {1..5} Test set is test_i Select one of the remaining sets to be validation_i Concatenate the remaining sets into part_train_i For each algorithm in {X1-1, X1-2, X2-1, X2-2} Train the algorithm on part_train_i and test on validation_i Now concatenate all but test_i into train_i Now train the best algorithm on train_i and test on test_i to get the performance for fold i Average the performance for the 5 folds

About the Quiz…. * Fold 1 1 2 3 4 5 For fold i in {1..5} Test set is test_i Select one of the remaining sets to be validation_i Concatenate the remaining sets into part_train_i For each algorithm in {X1-1, X1-2, X2-1, X2-2} Train the algorithm on part_train_i and test on validation_i Now concatenate all but test_i into train_i Now train the best algorithm on train_i and test on test_i to get the performance for fold i Average the performance for the 5 folds

About the Quiz…. * Fold 2 1 2 3 4 5 For fold i in {1..5} Test set is test_i Select one of the remaining sets to be validation_i Concatenate the remaining sets into part_train_i For each algorithm in {X1-1, X1-2, X2-1, X2-2} Train the algorithm on part_train_i and test on validation_i Now concatenate all but test_i into train_i Now train the best algorithm on train_i and test on test_i to get the performance for fold i Average the performance for the 5 folds

Discretization and Time Series Transforms

Discretization Connection between discretization and clustering Finding natural breaks in your data Connection between discretization and feature selection You can think of each interval as a feature or a feature value Discretizing before classification limits options for breaks If you attempt to discretize and it fails to find a split that would have been useful, it has the effect of eliminating a feature

Discretization and Feature Selection Adding breaks is like creating new attribute values Each attribute value is potentially a new binary attribute Inserting boundaries is like a forward selection approach to attribute selection

Discretization and Feature Selection Adding breaks is like creating new attribute values Each attribute value is potentially a new binary attribute Inserting boundaries is like a forward selection approach to attribute selection

Discretization and Feature Selection Adding breaks is like creating new attribute values Each attribute value is potentially a new binary attribute Inserting boundaries is like a forward selection approach to attribute selection

Discretization and Feature Selection Adding breaks is like creating new attribute values Each attribute value is potentially a new binary attribute Inserting boundaries is like a forward selection approach to attribute selection

Discretization and Feature Selection Adding breaks is like creating new attribute values Each attribute value is potentially a new binary attribute Inserting boundaries is like a forward selection approach to attribute selection

Discretization and Feature Selection Removing boundaries is like a backwards elimination approach to attribute selection

Discretization and Feature Selection Removing boundaries is like a backwards elimination approach to attribute selection

Discretization and Feature Selection Removing boundaries is like a backwards elimination approach to attribute selection

Discretization Discretization sometimes improves performance even if you don’t strictly need nominal attributes Breaks in good places biases classifier to learn a good model Decision tree learners do discretization locally when they are selecting an attribute to branch on Advantages and disadvantages to local discretization

Layers Think of building a model in layers You can build a complex shape by combining lots of simple shapes We’ll come back to this idea when we talk about ensemble methods in the next lecture! You could build a complex model all at once Or you could build a complex model in a series of simple stages Discretization, feature selection, model building

Unsupervised Discretization Equal intervals (equal interval binning) E.g, For temperature: breaks every 10 degrees E.g, For weight: breaks every 5 pounds Equal frequencies (equal frequency binning) E.g., Groupings of about 10 instances E.g., Groupings of about 100 instances

Supervised Discretization Supervised splitting: find the best split point by generating all possible splits and using attribute selection to pick one Keep splitting till you don’t get value anymore It’s a little like building a decision tree and then throwing the tree away, but keeping the grouping of instances at the leaf nodes Entropy based: rank splits using information gain

Built-In Supervised Discretization NaiveBayes can be used with or without supervised discretization SpeakerID data set has numeric attributes Not normally distributed Without discretization kappa = .16 With discretization kappa = .34

Doing Discretization in Weka Note: there is also an unsupervised discretization filter attributeIndices: which attributes do you want to discretize Target class set inside the classifier

Doing Discretization in Weka The last two options are for the stoping criterion Not clear how it is evaluating the goodness of each split Not well documented

Example for Time Series Transforms Amount of CO2 in a room is related to how many people were in the room N minutes ago Let’s say you take a measurement every N/2 minutes Before you apply a numeric prediction model to predict CO2 from number of people, first copy number of people forward 2 instances 1NumPeople AmountCO2 2NumPeople AmountCO2 3NumPeople AmountCO2 4NumPeople AmountCO2 ?NumPeople AmountCO2

Example for Time Series Transforms Amount of CO2 in a room is related to how many people were in the room N minutes ago Let’s say you take a measurement every N/2 minutes Before you apply a numeric prediction model to predict CO2 from number of people, first copy number of people forward 2 instances 1NumPeople AmountCO2 2NumPeople AmountCO2 3NumPeople AmountCO2 4NumPeople AmountCO2 ?NumPeople AmountCO2

Time Series Transforms Fill in with the delta or fill in with a previous value instanceRange: You specify how many instances backward or forward to look (negative means backwards) fillWithMissing: default is to ignore first and last instance. If true, use missing as the value for the attributes

Data Cleansing

Data Cleansing: Removing Outliers Noticing outliers is easier when you look at the overall distribution of your data Especially when using human judgment You know what doesn’t look right It’s harder to tell automatically whether the problem is that your data doesn’t fit the model or you have outliers

Eliminating Noise with Decision Tree Learning Train a tree Eliminate misclassified examples Train on the clean subset of the data You will get a simpler tree that generalizes better You can do this iteratively

Data Cleansing: Removing Outliers One way of identifying outliers is to look for examples that several algorithms misclassify Algorithms moving down different optimization paths are unlikely to get trapped in the same local minima You can compensate for outliers by adjusting the learning algorithm Using absolute distance rather than squared distance for a regression problem Doesn’t remove outliers, but reduces the effect of outliers

Take Home Message Discretization is related to feature selection and clustering Similar alternative search strategies Think about learning a model in stages Getting back to the idea of natural breaks in your data Difficult to tell with only one model whether a data point is noisy or a model is overly simplistic