Download presentation
Presentation is loading. Please wait.
Published byEsther Boyd Modified over 8 years ago
1
First Lecture of Machine Learning Hung-yi Lee
2
Learning to say “yes/no” Binary Classification
3
Learning to say yes/no Spam filtering Is an e-mail spam or not? Recommendation systems recommend the product to the customer or not? Malware detection Is the software malicious or not? Stock prediction Will the future value of a stock increase or not with respect to its current value? Binary Classification
4
Example Application: Spam filtering (http://spam-filter-review.toptenreviews.com/) E-mail Spam Not spam
5
Example Application: Spam filtering How to estimate P(yes|x)? What does the function f look like?
6
Example Application: Spam filtering To estimate P(yes|x), collect examples first Yes (Spam) No (Not Spam) Yes (Spam) ……. ….. Earn … free …… free Talk … Meeting … Win … free…… Some words frequently appear in the spam e.g., “free” Use the frequency of “free” to decide if an e-mail is spam x1x1 x2x2 x3x3 Estimate P(yes|x free = k) x free is the number of “free” in e-mail x
7
Regression Frequency of “Free” (x free ) in an e-mail x p(yes|x free ) Problem: What if one day you receive an e-mail with 3 “free” …. In training data, there is no e- mail containing 3 “free”. p(yes| x free = 1 ) = 0.4 p(yes| x free = 0 ) = 0.1
8
Regression f(x free ) = wx free + b (f(x free ) is an estimate of p(yes|x free ) ) Store w and b Frequency of “Free” (x free ) in an e-mail x Regression p(yes|x free )
9
Regression Problem: What if one day you receive an e-mail with 6 “free” …. Frequency of “Free” (x free ) in an e-mail x p(yes|x free ) Regression f(x free ) = wx free + b The output of f is not between 0 and 1
10
Logit vertical line: Probability to be spam p(yes|x free ) (p) p is always between 0 and 1 vertical line: logit(p) x free
11
Logit vertical line: Probability to be spam p(yes|x free ) (p) p is always between 0 and 1 vertical line: logit(p) x free f ’ (x free ) = w ’ x free + b ’ (f ’ (x free ) is an estimate of logit(p) )
12
Logit > 0.5, so “yes” “yes” vertical line: logit(p) x free Store w ’ and b ’ f ’ (x free ) = w ’ x free + b ’ (f ’ (x free ) is an estimate of logit(p) )
13
Multiple Variables x free Consider two words “free” and “hello” compute p(yes|x free,x hello ) (p) x hello
14
Multiple Variables Regression x free x hello Consider two words “free” and “hello” compute p(yes|x free,x hello ) (p)
15
Multiple Variables Of course, we can consider all words {t 1, t 2, … t N } in a dictionary z is to approximate logit(p)
16
approximate Logistic Regression If the probability p = 1 or 0, ln(p/1-p) = +infinity or –infinity Can not do regression t 1 appears 3 times t 2 appears 0 time … t N appears 1 time x The probability to be spam p is always 1 or 0.
17
Logistic Regression Sigmoid Function
18
Logistic Regression Yes (Spam) x1x1 close to 1 No (not Spam) x2x2 close to 0
19
Logistic Regression … bias feature Yes 1 0 No This is a neuron in neural network. This is a neuron in neural network.
20
More than saying “yes/no” Multiclass Classification
21
More than saying “yes/no” Handwriting digit classification This is Multiclass Classification
22
More than saying “yes/no” Handwriting digit classification Simplify the question: whether an image is “2” or not … feature of an image Each pixel corresponds to one dimension in the feature Describe the characteristics of input object
23
More than saying “yes/no” Handwriting digit classification Simplify the question: whether an image is “2” or not … “2” or not
24
More than saying “yes/no” Handwriting digit classification Binary classification of 1, 2, 3 … … “1” or not “2” or not “3” or not …… If y 2 is the max, then the image is “2”.
25
This is not good enough …
26
Limitation of Logistic Regression Input Output x1x1 x2x2 00No 01Yes 10 11No Yes No
27
So we need neural network …… InputOutputLayer 1 …… Layer 2 …… Layer L …… y1y1 y2y2 yMyM Deep means many layers
28
Thank you for your listening!
29
Appendix
30
More reference http://www.ccs.neu.edu/home/vip/teach/MLcourse/2_ GD_REG_pton_NN/lecture_notes/logistic_regression_l oss_function/logistic_regression_loss.pdf http://mathgotchas.blogspot.tw/2011/10/why-is-error- function-minimized-in.html https://cs.nyu.edu/~yann/talks/lecun-20071207- nonconvex.pdf http://www.cs.columbia.edu/~blei/fogm/lectures/glms.pdf http://grzegorz.chrupala.me/papers/ml4nlp/linear- classifiers.pdf
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.