Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 16– Linear and Logistic Regression) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Similar presentations


Presentation on theme: "CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 16– Linear and Logistic Regression) Pushpak Bhattacharyya CSE Dept., IIT Bombay."— Presentation transcript:

1 CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 16– Linear and Logistic Regression) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th Feb, 2011

2 Least Square Method: fitting a line (following Manning and Schutz, Foundation of Statistical NLP, 1999) Given set of N points (x 1,y 1 ), (x 2,y 2 ),…, (x N,y N ) Find a line f(x)=mx+b that best fits the data m and b are the parameters to be found W: is the weight vector The line that best fits the data is the one that minimizes the sum of squares of the distances

3 Values of m and b Partial differentiation of SS(m,b) wrt b and m yields respectively

4 Example (Manning and Schutz, FSNLP, 1999)

5 Implication of the “line” fitting A B CD 1 2 3 4 O 1, 2, 3, 4: are the points A, B, C, D: are their projections on the fitted line Suppose 1, 2 form a class and 3, 4 another class Of course, it is easy to set up a hyper plane that will separate 1 and 2 from 3 and 4 That will be classification in 2 dimension But suppose we form another attribute of these points, viz., distances of their projections On the line from “O” Then the points can be classified by a threshold on these distances This effectively is classification in the reduced dimension (1 dimension)

6 When the dimensionality is more than 2 Let X be the input vectors: M X N (M input vectors with N features) Yj=w 0 +w 1.x j1 +w 2.x j2 +w 3.x j3 +…+w n.x jn find the weight vector W: It can be shown that

7 The multivariate data f 1 f 2 f 3 f 4 f 5 … f n x 11 x 12 x 13 x 14 x 15 … x 1n y 1 x 21 x 22 x 23 x 24 x 25 … x 2n y 2 x 31 x 32 x 33 x 34 x 35 … x 3n y 3 x 41 x 42 x 43 x 44 x 45 … x 4n y 4 … x m1 x m2 x m3 x m4 x m5 … x mn y m

8 Logistic Regression Linear regression: predicting a real- valued outcome Classification: Output takes a value from a small set of discrete values Simplest classification: Two classes (o/1 or true/false) Predict the class and also give the probability of belongingness to the class

9 Linear to logistic regression P(y=true |x)= Σ i=0,n w i X f i = w.f But, not a legal probability value! Value from –∞ to +∞ Predict the ratio of the probability of being in the class to the probability of not being in the class Odds Ratio: If an event has probability 0.75 of occurring and probability 0.25 of not occurring, we say the odds of occurring is 0.75/0.25 = 3.

10 Odds Ratio (following Jurafski and Martin, Speech and NLP, 2009) Ratio of probabilities can lie between 0 and ∞. But the RHS is between -∞ and + ∞. Introduce log. Then get the expression for p(y=true|x)

11 Logistic function for p(y=true|x) The form of p() is called the logistic function It maps values from –∞ to +∞ to lie between 0 and 1

12 Classification using logistic regression For belonging to the true class This gives In other words, Equivalent to placing a Hyperplane to separate the Two classes

13 Learning in logistic regression In linear regression we used minimizing the sum square error (SSE) In Logistic regression, we use maximum likelihood estimation Choose the weights such that the conditional probability p(y|x) is maximized

14 Steps of learning w For a particular Substituting the values of Ps This can be converted to For all pairs Working with log


Download ppt "CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 16– Linear and Logistic Regression) Pushpak Bhattacharyya CSE Dept., IIT Bombay."

Similar presentations


Ads by Google