Download presentation
Presentation is loading. Please wait.
1
Machine Learning in Practice Lecture 23
Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute
2
In the home stretch…. Announcements
Questions? Quiz (second to last!) Homework (last!) Discretization and Time Series Transformations Data Cleansing
3
About the Quiz…. * Fold 1 1 2 3 4 5 For fold i in {1..5}
Test set is test_i Select one of the remaining sets to be validation_i Concatenate the remaining sets into part_train_i For each algorithm in {X1-1, X1-2, X2-1, X2-2} Train the algorithm on part_train_i and test on validation_i Now concatenate all but test_i into train_i Now train the best algorithm on train_i and test on test_i to get the performance for fold i Average the performance for the 5 folds
4
About the Quiz…. * Fold 1 1 2 3 4 5 For fold i in {1..5}
Test set is test_i Select one of the remaining sets to be validation_i Concatenate the remaining sets into part_train_i For each algorithm in {X1-1, X1-2, X2-1, X2-2} Train the algorithm on part_train_i and test on validation_i Now concatenate all but test_i into train_i Now train the best algorithm on train_i and test on test_i to get the performance for fold i Average the performance for the 5 folds
5
About the Quiz…. * Fold 1 1 2 3 4 5 For fold i in {1..5}
Test set is test_i Select one of the remaining sets to be validation_i Concatenate the remaining sets into part_train_i For each algorithm in {X1-1, X1-2, X2-1, X2-2} Train the algorithm on part_train_i and test on validation_i Now concatenate all but test_i into train_i Now train the best algorithm on train_i and test on test_i to get the performance for fold i Average the performance for the 5 folds
6
About the Quiz…. * Fold 2 1 2 3 4 5 For fold i in {1..5}
Test set is test_i Select one of the remaining sets to be validation_i Concatenate the remaining sets into part_train_i For each algorithm in {X1-1, X1-2, X2-1, X2-2} Train the algorithm on part_train_i and test on validation_i Now concatenate all but test_i into train_i Now train the best algorithm on train_i and test on test_i to get the performance for fold i Average the performance for the 5 folds
7
Discretization and Time Series Transforms
8
Discretization Connection between discretization and clustering
Finding natural breaks in your data Connection between discretization and feature selection You can think of each interval as a feature or a feature value Discretizing before classification limits options for breaks If you attempt to discretize and it fails to find a split that would have been useful, it has the effect of eliminating a feature
9
Discretization and Feature Selection
Adding breaks is like creating new attribute values Each attribute value is potentially a new binary attribute Inserting boundaries is like a forward selection approach to attribute selection
10
Discretization and Feature Selection
Adding breaks is like creating new attribute values Each attribute value is potentially a new binary attribute Inserting boundaries is like a forward selection approach to attribute selection
11
Discretization and Feature Selection
Adding breaks is like creating new attribute values Each attribute value is potentially a new binary attribute Inserting boundaries is like a forward selection approach to attribute selection
12
Discretization and Feature Selection
Adding breaks is like creating new attribute values Each attribute value is potentially a new binary attribute Inserting boundaries is like a forward selection approach to attribute selection
13
Discretization and Feature Selection
Adding breaks is like creating new attribute values Each attribute value is potentially a new binary attribute Inserting boundaries is like a forward selection approach to attribute selection
14
Discretization and Feature Selection
Removing boundaries is like a backwards elimination approach to attribute selection
15
Discretization and Feature Selection
Removing boundaries is like a backwards elimination approach to attribute selection
16
Discretization and Feature Selection
Removing boundaries is like a backwards elimination approach to attribute selection
17
Discretization Discretization sometimes improves performance even if you don’t strictly need nominal attributes Breaks in good places biases classifier to learn a good model Decision tree learners do discretization locally when they are selecting an attribute to branch on Advantages and disadvantages to local discretization
18
Layers Think of building a model in layers
You can build a complex shape by combining lots of simple shapes We’ll come back to this idea when we talk about ensemble methods in the next lecture! You could build a complex model all at once Or you could build a complex model in a series of simple stages Discretization, feature selection, model building
19
Unsupervised Discretization
Equal intervals (equal interval binning) E.g, For temperature: breaks every 10 degrees E.g, For weight: breaks every 5 pounds Equal frequencies (equal frequency binning) E.g., Groupings of about 10 instances E.g., Groupings of about 100 instances
20
Supervised Discretization
Supervised splitting: find the best split point by generating all possible splits and using attribute selection to pick one Keep splitting till you don’t get value anymore It’s a little like building a decision tree and then throwing the tree away, but keeping the grouping of instances at the leaf nodes Entropy based: rank splits using information gain
21
Built-In Supervised Discretization
NaiveBayes can be used with or without supervised discretization SpeakerID data set has numeric attributes Not normally distributed Without discretization kappa = .16 With discretization kappa = .34
22
Doing Discretization in Weka
Note: there is also an unsupervised discretization filter attributeIndices: which attributes do you want to discretize Target class set inside the classifier
23
Doing Discretization in Weka
The last two options are for the stoping criterion Not clear how it is evaluating the goodness of each split Not well documented
24
Example for Time Series Transforms
Amount of CO2 in a room is related to how many people were in the room N minutes ago Let’s say you take a measurement every N/2 minutes Before you apply a numeric prediction model to predict CO2 from number of people, first copy number of people forward 2 instances 1NumPeople AmountCO2 2NumPeople AmountCO2 3NumPeople AmountCO2 4NumPeople AmountCO2 ?NumPeople AmountCO2
25
Example for Time Series Transforms
Amount of CO2 in a room is related to how many people were in the room N minutes ago Let’s say you take a measurement every N/2 minutes Before you apply a numeric prediction model to predict CO2 from number of people, first copy number of people forward 2 instances 1NumPeople AmountCO2 2NumPeople AmountCO2 3NumPeople AmountCO2 4NumPeople AmountCO2 ?NumPeople AmountCO2
26
Time Series Transforms
Fill in with the delta or fill in with a previous value instanceRange: You specify how many instances backward or forward to look (negative means backwards) fillWithMissing: default is to ignore first and last instance. If true, use missing as the value for the attributes
27
Data Cleansing
28
Data Cleansing: Removing Outliers
Noticing outliers is easier when you look at the overall distribution of your data Especially when using human judgment You know what doesn’t look right It’s harder to tell automatically whether the problem is that your data doesn’t fit the model or you have outliers
29
Eliminating Noise with Decision Tree Learning
Train a tree Eliminate misclassified examples Train on the clean subset of the data You will get a simpler tree that generalizes better You can do this iteratively
30
Data Cleansing: Removing Outliers
One way of identifying outliers is to look for examples that several algorithms misclassify Algorithms moving down different optimization paths are unlikely to get trapped in the same local minima You can compensate for outliers by adjusting the learning algorithm Using absolute distance rather than squared distance for a regression problem Doesn’t remove outliers, but reduces the effect of outliers
31
Take Home Message Discretization is related to feature selection and clustering Similar alternative search strategies Think about learning a model in stages Getting back to the idea of natural breaks in your data Difficult to tell with only one model whether a data point is noisy or a model is overly simplistic
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.