1 Probability theory LING 570 Fei Xia Week 2: 10/01/07
2 Misc. Patas account and dropbox Course website, “Collect it”, and GoPost. Mailing list –Received message on Thursday? Questions about hw1?
3 Outline Quiz #1 Unix commands Linguistics Elementary Probability theory: M&S 2.1
4 Quiz #1 Five areas: weight ave Programming: 4.0 (3.74) –Try Perl or Python Unix commands: 1.2 (0.99) Probability: 2.0 (1.09) Regular expression: 2.0 (1.62) Linguistics knowledge: 0.8 (0.71)
5 Results : : 8 < 8.0: 8
6 Unix commands ls (list), cp (copy), rm (remove) more, less, cat cd, mkdir, rmdir, pwd chmod: to change file permission tar, gzip: to tar/zip files ssh, sftp: to log on or ftp files man: to learn a command
7 Unix commands (cont) compilers: javac, gcc, g++, perl, … ps, top, which Pipe: cat input_file | eng_tokenizer.sh | make_voc.sh > output_file sort, unique, awk, grep grep “the” voc | awk ‘{print $2}’ | sort | uniq –c | sort -nr
8 Examples Set the permission of foo.pl so it is readable and executable by the user and the group. rwx rwx rwx => chmod 550 foo.pl Move a file, foo.pl, from your home dir to /tmp mv ~/foo.pl /tmp
9 Linguistics: POS tags Open class: Noun, verb, adjective, adverb –Auxiliary verb/modal: can, will, might,.. –Temporal noun: tomorrow –Adverb: adj+ly, always, still, not, … Closed class: Preposition, conjunction, determiner, pron, –Conjunction: CC (and), SC (if, although) –Complementizer: that,
10 Linguistics: syntactic structure Two kinds: –Phrase structure (a.k.a. parse tree): –Dependency structure Examples: –John said that he would call Mary tomorrow
11 Outline Quiz #1 Unix commands Linguistics Elementary Probability theory
12 Probability Theory
13 Basic concepts Sample space, event, event space Random variable and random vector Conditional probability, joint probability, marginal probability (prior)
14 Sample space, event, event space Sample space (Ω): the set of all possible outcomes. –Ex: toss a coin three times: {HHH, HHT, HTH, HTT, …} Event: an event is a subset of Ω. –Ex: an event is {HHT, HTH, THH} Event space (2 Ω ): the set of all possible events.
15 Probability function A probability function (a.k.a. a probability distribution) distributes a probability mass of 1 throughout the sample space . It is a function from 2 ! [0,1] such that: P( ) = 1 For any disjoint sets A j 2 2 , P( A j ) = P(A j ) - Ex: P({HHT, HTH, HTT}) = P({HHT}) + P({HTH}) + P({HTT})
16 The coin example The prob of getting a head is 0.1 for one toss. What is the prob of getting two heads out of three tosses? P(“Getting two heads”) = P({HHT, HTH, THH}) = P(HHT) + P(HTH) + P(THH) = 0.1*0.1* *0.9* *0.1*0.1 = 3*0.1*0.1*0.9
17 Random variable The outcome of an experiment need not be a number. We often want to represent outcomes as numbers. A random variable X is a function: Ω R. –Ex: the number of heads with three tosses: X(HHT)=2, X(HTH)=2, X(HTT)=1, …
18 The coin example (cont) X = the number of heads with three tosses P(X=2) = P({HHT, HTH, THH}) = P({HHT}) + P({HTH}) + P({THH})
19 Two types of random variables Discrete: X takes on only a countable number of possible values. –Ex: Toss a coin three times. X is the number of heads that are noted. Continuous: X takes on an uncountable number of possible values. –Ex: X is the speed of a car
20 Common trick #1: Maximum likelihood estimation An example: toss a coin 3 times, and got two heads. What is the probability of getting a head with one toss? Maximum likelihood: (ML) * = arg max P(data | ) In the example, –P(X=2) = 3 * p * p * (1-p) e.g., the prob is 3/8 when p=1/2, and is 12/27 when p=2/3 3/8 < 12/27
21 Random vector Random vector is a finite-dimensional vector of random variables: X=[X 1,…,X k ]. P(x) = P(x 1,x 2,…,x n )=P(X 1 =x 1,…., X n =x n ) Ex: P(w 1, …, w n, t 1, …, t n )
22 Notation X, Y, X i, Y i are random variables. x, y, x i are values. P(X=x) is written as P(x) P(X=x | Y=y) is written as P(x | y).
23 Three types of probability Joint prob: P(x,y)= prob of X=x and Y=y happening together Conditional prob: P(x | y) = prob of X=x given a specific value of Y=y Marginal prob: P(x) = prob of X=x for all possible values of Y.
24 An example There are two coins. Choose a coin and then toss it. Do that 10 times. Coin 1 is chosen 4 times: one head and three tails. Coin 2 is chosen six times: four heads and two tails. Let’s calculate the probabilities.
25 Probabilities P(C=1) = 4/10, P(C=2) = 6/10 P(X=h) = 5/10, P(X=t) = 5/10 P(X=h | C=1) = ¼, P(X=h |C=2) =4/6 P(X=t | C=1) = ¾, P(X=t |C=2) = 2/6 P(X=h, C=1) =1/10, P(X=h, C=2)= 4/10 P(X=t, C=1) = 3/10, P(X=t | C=2) = 2/10
26 Relation between different types of probabilities P(X=h, C=1) = P(C=1) * P(X=h | C=1) = 4/10 * ¼ = 1/10 P(X=h) = P(X=h, C=1) + P(X=h, C=2) = 1/10 + 4/10 = 5/10
27 Common trick #2: Chain rule
28 Common trick #3: joint prob Marginal prob
29 Common trick #4: Bayes’ rule
30 Independent random variables Two random variables X and Y are independent iff the value of X has no influence on the value of Y and vice versa. P(X,Y) = P(X) P(Y) P(Y|X) = P(Y) P(X|Y) = P(X) Our previous examples: P(X, C) != P(X) P(C)
31 Conditional independence Once we know C, the value of A does not affect the value of B and vice versa. P(A,B | C) = P(A|C) P(B|C) P(A|B,C) = P(A | C) P(B|A, C) = P(B |C)
32 Independence and conditional independence If A and B are independent, are they conditional independent? Example: –Burglar, Earthquake –Alarm
33 Common trick #5: Independence assumption
34 An example P(w 1 w 2 … w n ) = P(w 1 ) P(w 2 | w 1 ) P(w 3 | w 1 w 2 ) * … * P(w n | w 1 …, w n-1 ) ¼ P(w 1 ) P(w 2 | w 1 ) …. P(w n | w n-1 ) Why do we make independence assumption which we know are not true?
35 Summary of elementary probability theory Basic concepts: sample space, event space, random variable, random vector Joint / conditional /marginal probability Independence and conditional independence Five common tricks: –Max likelihood estimation –Chain rule –Calculating marginal probability from joint probability –Bayes’ rule –Independence assumption
36 Outline Quiz #1 Unix commands Linguistics Elementary Probability theory
37 Next time J&M Chapt 2 –Formal language and formal grammar –Regular expression Hw1 is due at 3pm on Wed.