Presentation is loading. Please wait.

Presentation is loading. Please wait.

CRFs for SPLODD William W. Cohen Sep 8, 2011.

Similar presentations


Presentation on theme: "CRFs for SPLODD William W. Cohen Sep 8, 2011."— Presentation transcript:

1 CRFs for SPLODD William W. Cohen Sep 8, 2011

2 Announcements No office hours for William tomorrow Wiki assignments
Instead I’m getting a free lunch at Google Wiki assignments Assignment 0: (your homepage) if you haven’t done this and are planning on staying in the course see an instructor. Assignment 1: due 9/30 I added a place to post proposed projects Start now if you have an idea!

3 Projects Added new section to class wiki to add project proposal ideas
Deadlines: short project proposal: Monday 9/12 One sentence might be enough join or recruit a team: Monday 9/19 identify a dataset and baseline: Mon 9/26 complete proposal: Thus 10/6 5min talk: Tues 10/24

4 Conditional Random Fields

5 Inference for MXPOST When will prof Cohen post the notes … B B B B B B B I I I I I I I O O O O O O O More accurately: find total flow to each node, weights are now on arcs from state to state. Flow out of a node is always fixed:

6 Implications of the MEMM model
Does this do what we want? Q: does Y[i-1] depend on X[i+1] ? “a nodes is conditionally independent of its non-descendents given its parents” Q: what is Y[0] for the sentence “Qbbzzt of America Inc announced layoffs today in …”

7 Label Bias Problem Pr(0123|rib)=1 Pr(0453|rob)=1
Consider this MEMM, and enough training data to perfectly model it: Pr(0123|rib)=1 Pr(0453|rob)=1 Pr(0123|rob) = Pr(1|0,r)/Z1 * Pr(2|1,o)/Z2 * Pr(3|2,b)/Z3 = 0.5 * 1 * 1 Pr(0453|rib) = Pr(4|0,r)/Z1’ * Pr(5|4,i)/Z2’ * Pr(3|5,b)/Z3’ = 0.5 * 1 *1

8 How important is label bias?
Could be avoided in this case by changing structure: Our models are always wrong – is this “wrongness” a problem? See Klein & Manning’s paper (next Thursday’s lecture) for more on this….

9 Another view of label bias [Sha & Pereira]
So what’s the alternative?

10 Inference for MXPOST When will prof Cohen post the notes … B B B B B B B I I I I I I I O O O O O O O More accurately: find total flow to each node, weights are now on arcs from state to state. Flow out of a node is always fixed:

11 Another max-flow scheme
When will prof Cohen post the notes … B B B B B B B I I I I I I I O O O O O O O More accurately: find total flow to each node, weights are now on arcs from state to state. Flow out of a node is always fixed:

12 Another max-flow scheme: MRFs
When will prof Cohen post the notes … B B B B B B B I I I I I I I O O O O O O O Goal is to learn how to weight edges in the graph: weight(yi,yi+1) = 2*[(yi=B or I) and isCap(xi)] + 1*[(yi=B and isFirstName(xi)] - 5*[(yi+1≠B and isLower(xi) and isUpper(xi+1)]

13 Another max-flow scheme: MRFs
When will prof Cohen post the notes … B B B B B B B I I I I I I I O O O O O O O Find total flow to each node, weights are now on edges from state to state. Goal is to learn how to weight edges in the graph, given features from the examples.

14 CRFs vs MEMMs CRFs: MEMMs: x1 x2 x3 x4 x5 x6 x1 x2 x3 x4 x5 x6 … … …
Sequence classification f:xy is done by: Converting x,Y to a MRF Using “flow” computations on the MRF to compute the best y|x (inference) Learning is tuning this inference process MEMMs: Sequence classification f:xy is reduced to many cases of ordinary classification, f:xiyi …combined with Viterbi or beam search x1 x2 x3 x4 x5 x6 x1 x2 x3 x4 x5 x6 Pr(Y|x2,y1’) Pr(Y|x4,y3) Pr(Y|x5,y5) MRF: φ(Y1,Y2), φ(Y2,Y3),…. Pr(Y|x2,y1) y1 y2 y3 y4 y5 y6 y1 y2 y3 y4 y5 y6

15 CRFs vs MEMMs CRFs: MEMMs: Learning is tuning this inference process
Sequence classification f:xy is done by: Converting x,Y to a MRF Using “flow” computations on the MRF to compute the best y|x (inference) Learning is tuning this inference process Learning involves inference: Need to run forward-backward on each example in the inner loop of your learner Lafferty et al 2001 paper: >1000 iterations MEMMs: Sequence classification f:xy is reduced to many cases of ordinary classification, f:xiyi …combined with Viterbi or beam search Learning is independent of inference (as in HMMs with fully-labeled data)

16 The math: Review of maxent

17 Review of maxent/MEMM/CMMs
We know how to compute this.

18 Details on CMMs

19 From CMMs to CRFs Recall why we’re unhappy: we don’t want local normalization New model How to compute this?

20 What’s the new model look like?
What’s independent? If fi is HMM-like and depends on only xj,yj or yj,yj-1 y1 y2 y3 x1 x2 x3

21 What’s the new model look like?
What’s independent now?? y1 y2 y3 x

22 CRF learning – from Sha & Pereira

23 CRF learning – from Sha & Pereira

24 CRF learning – from Sha & Pereira
Something like forward-backward Idea: Define matrix of y,y’ “affinities” at stage i Mi[y,y’] = “unnormalized probability” of transition from y to y’ at stage I Mi * Mi+1 = “unnormalized probability” of any path through stages i and i+1

25 y1 y2 y3 x y1 y2 y3

26 Forward backward ideas
name name name c g b f nonName nonName nonName d h

27 CRF learning – from Sha & Pereira

28 Sha & Pereira results CRF beats MEMM (McNemar’s test); MEMM probably beats voted perceptron

29 Sha & Pereira results in minutes, 375k examples


Download ppt "CRFs for SPLODD William W. Cohen Sep 8, 2011."

Similar presentations


Ads by Google