Download presentation
Presentation is loading. Please wait.
Published byAlberta Watts Modified over 8 years ago
1
Machine Learning Tutorial Amit Gruber The Hebrew University of Jerusalem
2
Example: Spam Filter Spam message: unwanted email message –Dozens or even hundreds per day Goal: Automatically distinguish between spam and non-spam email messages
3
Spam message 1
4
Spam message 2
5
Spam message 3
6
Spam message 4
7
How to Distinguish ? Message contents ? –Automatic semantic analysis is yet to be solved Message sender ? –What about unfamiliar senders or fake senders ? Collection of keywords ? Message Length ? Mail server ? Time of delivery ?
8
How to Distinguish ? It’s hard to define an explicit set of rules to distinguish between spam and non-spam Learn the concept of “spam” from examples !
9
Example: Gender Classification
10
The Power of Learning: Real Life example How much time does it take you to get to work ? –First approach: Analyze your route Distance, traffic lights, traffic, etc… Can be quite complicated… –Second Approach: how much time does it usually take ? Despite of some variance, works remarkably well! Requires “training” for different times May fail in special cases
11
Machine Translation
12
Collaborative Filtering Collaborative Filtering: Prediction of user ratings based on the ratings of other users Examples: –Movie ratings –Product recommendation Is this of merely theoretical interest ??
13
Netflix Prize Over 100 million ratings from 480 thousand customers over 17000 movie titles (sparsity: 0.0123)
14
Recommendation system
15
Machine Learning Applications Search Engines Collaborative Filtering (Netflix, Amazon) Face, speech and pattern Recognition Machine Translation Natural language processing Medical diagnosis and treatment Bioinformatics Computer games Many more !
16
Generalization: Train vs. Test The central assumption we make is that the train set and the new examples are “similar” Formally, the assumption is that samples are drawn from the same distribution Is this assumption realistic ?
17
Train vs. Test: Might Fail to Generalize
18
Acquiring a good train set Have a huge train set –Train data might be available on the web –Use humans to collect data –Collect results (or aggregations thereof) of user actions Unsupervised methods – require only raw data, no need for labels !
19
Machine Learning Strategies Discriminative Approach –Feature selection: find the features that carry the most information for separation Generative Approach –Model the data using a generative process –Estimate the parameters of the model
20
Supervised vs. Unsupervised Supervised Machine Learning –Classification (learning) –Collection of large representative train set might not be simple Unsupervised Machine Learning –Clustering The number of clusters may be known or unknown –Usually plenty of train data is available
21
Discriminative Learning Data representation and Feature selection: What is relevant for classification ? –Gender classification: hair, ears, make up, beard, moustache, etc. Linear Separation –SVM, Fisher LDA, Perceptron and more –Different criteria for separation – what would generalize well ? –Non-linear separation
22
Linear Separation
23
Nonlinear Separation (Kernel Trick)
24
Generative Approach Model the observations using a generative process The generative process induces a distribution over the observations Learn a set of parameters
25
Statistical Approach – Real Life Example You’re stuck in traffic. Which Lane is faster? The complicated approach: –Consider the traffic, trucks, merging lanes, etc. The statistical (Bayesian) Approach: –Which lane is usually faster ? (prior) –What are you seeing ? (evidence)
26
Summary Machine Learning: Learn a concept from examples For good generalization, train data has to faithfully represent test data Many potential applications Already in use and works remarkably well
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.