Download presentation
Presentation is loading. Please wait.
Published byDustin Phelps Modified over 8 years ago
1
Machine Learning in Practice Lecture 25 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute
2
Plan for the day Announcements Questions? Final quiz today!!!! Next Lecture: Homework 9 due, midterm 2 review Strategies for Efficient Experimentation Call center application example Active learning and Semi-Supervised Learning
3
Weka Helpful Hint of the Day
6
Strategies for Efficient Experimentation
7
Adversarial Learning People constantly “gaming the system” Behavior needs to adjust over time Automatic essay grading Spam detection Computer network security Computer Assisted Passenger Prescreening Solution: Incremental learning Issues: eventually you have a huge amount of data
8
Personalization Similar issues Massive amounts of data collected over time Example types of data Click stream data Web pages visited Why keep collecting data? Interests change over time Personal situation changes over time Connection between learning models and summarization
9
Incremental Learning Problem with cross validation If spam changes over time, more recent data is more relevant than old data Instance based learning is a good solution If you can’t do incremental learning, you’ll need a quick learning scheme that allows you to periodically retrain
10
Learning from Massive Data Sets Critical issues: space and time One strategy is to carefully select an algorithm that can handle a large amount of data If space is an issue: Naïve bayes is a good choice – only one instance needs to be kept in main memory at a time Instance based learning: sophisticated caching and indexing mechanisms can make this workable
11
Learning from Massive Data Sets If time is an issue: Again, Naïve bayes is linear with respect to time for both number of instances and number of attributes Decision trees are linear with respect to number of attributes Instance based learning can be parallelized Bagging and stacking (but not boosting) can easily be parallelized Understanding check: Why do you think that is so?
12
Other options… Sub-sampling: just work with a carefully selected subset of data Note how performance varies with amount of training data Tur, G., Hakkani-Tur, D., Schapire, E. (2005). Combining active and semi-supervised learning for spoken language understanding, Speech Communication, 45, pp171-186
13
Strategies Find computers you can leave things running on Plan your strategy ahead of time Keep good notes so you don’t have to rerun experiments Make the most out of time-consuming tasks so you can avoid doing them frequently Error analysis, feature extraction from large texts
14
Strategies Do quick tests to get “in the ballpark” Use part of your data Only use 5-fold cross validation Don’t tune Test subsets of features – throw out ones that don’t have any predictive value
15
Strategies Sample broadly and shallowly at first General feature space design Linear versus non-linear What class of algorithm? Push to see why some algorithms work better on your data than others Eliminate parameters that it doesn’t make sense to tune You can use CVParameterSelection to determine whether the tuned performance is better when you tune a parameter
16
Call Center Routing
17
Tur et al. paper Call routing Replacement for typical routing by touch tone interfaces System prompt: “How may I help you?” User: “I would like to buy a laptop” Speech recognition, Language understanding Requires a lot of hand labeling Tur, G., Hakkani-Tur, D., Schapire, E. (2005). Combining active and semi-supervised learning for spoken language understanding, Speech Communication, 45, pp171-186
18
Tur et al. paper Goal is to reduce labeling by being strategic about which examples are labeled Evaluated in terms of reduction in number of labeled examples to achieve same performance as random selection
19
Tur et al. paper Goal: Limit the amount of data that needs to be labeled by hand Active Learning: Reduce human labeling by being strategic about which examples to label On each iteration select the examples the algorithm is least confidence about Note: since you get confidence scores back from TagHelper tools, you can do something like active learning with it Semi-supervised learning: You can increase confidence in predictions by adding very confidently classified examples to your set of labeled data Note that TagHelper tools also has a form of semi-supervised learning built in – look at the Advanced tab (self-training)
20
Active Learning: Committee Based Methods Related idea to Bagging Similar to bagging: you have multiple versions of your labeled data You train a model on each version of the data You apply the models to the unlabeled data The examples with the highest “vote entropy” or disagreement are the ones with the least certainty The ones with the least certainty are the ones you should have someone label Problem: hard to distinguish outliers from informative examples Another problem: priors in labeled data are different from initial set
23
NOTE about Usefulness If only a very little amount of labeled data is available, then adding in even very confident predictions will add noise If a huge amount of labeled data is available, adding in predictions on unlabeled data won’t be that useful Take home message: semi-supervised learning is only useful if you have a “medium” amount of labeled data
24
Modified Semi-Supervised Learning
26
Combining Active and Semi-Supervised Learning
28
Project Advice
29
What Makes a Good Paper Good example is Arguello & Rosé 2006 paper in Course Documents folder on blackboard! Argue why the problem is important, what your big result is, and what you learned about your application area Ideally there should be something cleaver about your approach Explain prior work, where previous approaches “went wrong”, and where your approach fits Best to have a baseline from previous work – you might need a broad definition of prior work (there is *always* relevant prior work) You need to show respect and awareness of what came before you and where you fit into the larger community Here prior work will be in multiple areas: your own application area, core machine learning work, and possibly computational linguistics work
30
More about What Makes a Good Paper Summarize your approach Here you may give a lengthy analysis of your data, how you decided to code it, how you determined that your coding was reliable, why your coding was meaningful Describe your experimentation process – I need to be able to see that you are aware of and used proper methodology What was your final best approach?
31
More about What Makes a Good Paper Justify your evaluation metrics, corpus, gold standard, and baselines You need to say enough to give the reader confidence that you know what you’re doing and your evaluation is valid If possible evaluate baseline approach and yours on both the same data set from baseline previous publication and possibly an additional one
32
More about What Makes a Good Paper Present your results Make sure you evaluate your work both in terms of typical metrics in comparison to a baseline as well as in a task specific manner Which errors affect users more? How would your classifier be used? Discuss your error analysis What did you learn about your domain from doing this? What are your current directions (based on your error analysis)
33
What Numbers Should You Report? Evaluation methodology is a matter of taste You should make your evaluation comparable to other recent published results Evaluation standards change over time!! How you do your evaluation is a big part of what determines the quality of your work High quality work explains the reasons for its evaluation methodology
34
Debates Over Evaluation Metrics F-measure hides trade-offs between precision and recall F-measure, precision, and recall obscure agreement by chance Less of an issue if everyone is comparing on the same data set F-measure ignores false alarm rate False alarm rate is of the total number of errors you could make, how many do you make In some cases, there are not many errors of comission you can make, so f-measure makes the classifier look better than it is F-measure might be overly harsh for some tasks On topic segmentation, it punishes errors that are close just as much as errors that are way off Graphs show more trade-offs between classifiers – especially useful when classifiers can be tuned
35
Reporting Results Remember to use the measures that are standard for the problem you are working on Don’t just give a dump from the Weka interface Summarize the comparison in a table Discuss the significance of the difference between approaches *Always* have a baseline for comparison Report the significance of the difference (shows that you tested on enough data)
36
Last Minute Project Tips Don’t forget that process is more important than product! Talk about what you learned from doing this
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.