Machine Learning in Practice Lecture 25 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Machine Learning in Practice Lecture 25 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute

Plan for the day Announcements  Questions?  Final quiz today!!!!  Next Lecture: Homework 9 due, midterm 2 review Strategies for Efficient Experimentation Call center application example  Active learning and Semi-Supervised Learning

Weka Helpful Hint of the Day

Strategies for Efficient Experimentation

Adversarial Learning People constantly “gaming the system” Behavior needs to adjust over time  Automatic essay grading  Spam detection  Computer network security  Computer Assisted Passenger Prescreening Solution: Incremental learning  Issues: eventually you have a huge amount of data

Personalization Similar issues  Massive amounts of data collected over time Example types of data  Click stream data  Web pages visited Why keep collecting data?  Interests change over time  Personal situation changes over time Connection between learning models and summarization

Incremental Learning Problem with cross validation  If spam changes over time, more recent data is more relevant than old data Instance based learning is a good solution If you can’t do incremental learning, you’ll need a quick learning scheme that allows you to periodically retrain

Learning from Massive Data Sets Critical issues: space and time One strategy is to carefully select an algorithm that can handle a large amount of data If space is an issue:  Naïve bayes is a good choice – only one instance needs to be kept in main memory at a time  Instance based learning: sophisticated caching and indexing mechanisms can make this workable

Learning from Massive Data Sets If time is an issue:  Again, Naïve bayes is linear with respect to time for both number of instances and number of attributes  Decision trees are linear with respect to number of attributes  Instance based learning can be parallelized  Bagging and stacking (but not boosting) can easily be parallelized Understanding check: Why do you think that is so?

Other options… Sub-sampling: just work with a carefully selected subset of data Note how performance varies with amount of training data  Tur, G., Hakkani-Tur, D., Schapire, E. (2005). Combining active and semi-supervised learning for spoken language understanding, Speech Communication, 45, pp171-186

Strategies Find computers you can leave things running on Plan your strategy ahead of time  Keep good notes so you don’t have to rerun experiments  Make the most out of time-consuming tasks so you can avoid doing them frequently Error analysis, feature extraction from large texts

Strategies Do quick tests to get “in the ballpark”  Use part of your data  Only use 5-fold cross validation  Don’t tune  Test subsets of features – throw out ones that don’t have any predictive value

Strategies Sample broadly and shallowly at first  General feature space design  Linear versus non-linear  What class of algorithm? Push to see why some algorithms work better on your data than others  Eliminate parameters that it doesn’t make sense to tune You can use CVParameterSelection to determine whether the tuned performance is better when you tune a parameter

Call Center Routing

Tur et al. paper Call routing  Replacement for typical routing by touch tone interfaces System prompt: “How may I help you?” User: “I would like to buy a laptop”  Speech recognition, Language understanding  Requires a lot of hand labeling Tur, G., Hakkani-Tur, D., Schapire, E. (2005). Combining active and semi-supervised learning for spoken language understanding, Speech Communication, 45, pp171-186

Tur et al. paper Goal is to reduce labeling by being strategic about which examples are labeled Evaluated in terms of reduction in number of labeled examples to achieve same performance as random selection

Tur et al. paper Goal: Limit the amount of data that needs to be labeled by hand Active Learning: Reduce human labeling by being strategic about which examples to label  On each iteration select the examples the algorithm is least confidence about  Note: since you get confidence scores back from TagHelper tools, you can do something like active learning with it Semi-supervised learning: You can increase confidence in predictions by adding very confidently classified examples to your set of labeled data  Note that TagHelper tools also has a form of semi-supervised learning built in – look at the Advanced tab (self-training)

Active Learning: Committee Based Methods Related idea to Bagging Similar to bagging: you have multiple versions of your labeled data You train a model on each version of the data You apply the models to the unlabeled data The examples with the highest “vote entropy” or disagreement are the ones with the least certainty The ones with the least certainty are the ones you should have someone label Problem: hard to distinguish outliers from informative examples Another problem: priors in labeled data are different from initial set

NOTE about Usefulness If only a very little amount of labeled data is available, then adding in even very confident predictions will add noise If a huge amount of labeled data is available, adding in predictions on unlabeled data won’t be that useful Take home message: semi-supervised learning is only useful if you have a “medium” amount of labeled data

Modified Semi-Supervised Learning

Combining Active and Semi-Supervised Learning

Project Advice

What Makes a Good Paper Good example is Arguello & Rosé 2006 paper in Course Documents folder on blackboard! Argue why the problem is important, what your big result is, and what you learned about your application area  Ideally there should be something cleaver about your approach Explain prior work, where previous approaches “went wrong”, and where your approach fits  Best to have a baseline from previous work – you might need a broad definition of prior work (there is *always* relevant prior work)  You need to show respect and awareness of what came before you and where you fit into the larger community Here prior work will be in multiple areas: your own application area, core machine learning work, and possibly computational linguistics work

More about What Makes a Good Paper Summarize your approach  Here you may give a lengthy analysis of your data, how you decided to code it, how you determined that your coding was reliable, why your coding was meaningful  Describe your experimentation process – I need to be able to see that you are aware of and used proper methodology  What was your final best approach?

More about What Makes a Good Paper Justify your evaluation metrics, corpus, gold standard, and baselines  You need to say enough to give the reader confidence that you know what you’re doing and your evaluation is valid  If possible evaluate baseline approach and yours on both the same data set from baseline previous publication and possibly an additional one

More about What Makes a Good Paper Present your results  Make sure you evaluate your work both in terms of typical metrics in comparison to a baseline as well as in a task specific manner Which errors affect users more? How would your classifier be used? Discuss your error analysis  What did you learn about your domain from doing this?  What are your current directions (based on your error analysis)

What Numbers Should You Report? Evaluation methodology is a matter of taste You should make your evaluation comparable to other recent published results  Evaluation standards change over time!!  How you do your evaluation is a big part of what determines the quality of your work  High quality work explains the reasons for its evaluation methodology

Debates Over Evaluation Metrics F-measure hides trade-offs between precision and recall F-measure, precision, and recall obscure agreement by chance  Less of an issue if everyone is comparing on the same data set F-measure ignores false alarm rate  False alarm rate is of the total number of errors you could make, how many do you make  In some cases, there are not many errors of comission you can make, so f-measure makes the classifier look better than it is F-measure might be overly harsh for some tasks  On topic segmentation, it punishes errors that are close just as much as errors that are way off Graphs show more trade-offs between classifiers – especially useful when classifiers can be tuned

Reporting Results Remember to use the measures that are standard for the problem you are working on Don’t just give a dump from the Weka interface Summarize the comparison in a table Discuss the significance of the difference between approaches *Always* have a baseline for comparison Report the significance of the difference (shows that you tested on enough data)

Last Minute Project Tips Don’t forget that process is more important than product! Talk about what you learned from doing this

Machine Learning in Practice Lecture 25 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Similar presentations

Presentation on theme: "Machine Learning in Practice Lecture 25 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Machine Learning in Practice Lecture 25 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Similar presentations

Presentation on theme: "Machine Learning in Practice Lecture 25 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute."— Presentation transcript:

Similar presentations

About project

Feedback