CRFs for SPLODD William W. Cohen Sep 8, 2011.

CRFs for SPLODD William W. Cohen Sep 8, 2011

Announcements No office hours for William tomorrow Wiki assignments
Instead I’m getting a free lunch at Google Wiki assignments Assignment 0: (your homepage) if you haven’t done this and are planning on staying in the course see an instructor. Assignment 1: due 9/30 I added a place to post proposed projects Start now if you have an idea!

Projects Added new section to class wiki to add project proposal ideas
Deadlines: short project proposal: Monday 9/12 One sentence might be enough join or recruit a team: Monday 9/19 identify a dataset and baseline: Mon 9/26 complete proposal: Thus 10/6 5min talk: Tues 10/24

Conditional Random Fields

Inference for MXPOST When will prof Cohen post the notes … B B B B B B B I I I I I I I O O O O O O O More accurately: find total flow to each node, weights are now on arcs from state to state. Flow out of a node is always fixed:

Implications of the MEMM model
Does this do what we want? Q: does Y[i-1] depend on X[i+1] ? “a nodes is conditionally independent of its non-descendents given its parents” Q: what is Y[0] for the sentence “Qbbzzt of America Inc announced layoffs today in …”

How important is label bias?
Could be avoided in this case by changing structure: Our models are always wrong – is this “wrongness” a problem? See Klein & Manning’s paper (next Thursday’s lecture) for more on this….

Another view of label bias [Sha & Pereira]
So what’s the alternative?

Inference for MXPOST When will prof Cohen post the notes … B B B B B B B I I I I I I I O O O O O O O More accurately: find total flow to each node, weights are now on arcs from state to state. Flow out of a node is always fixed:

Another max-flow scheme
When will prof Cohen post the notes … B B B B B B B I I I I I I I O O O O O O O More accurately: find total flow to each node, weights are now on arcs from state to state. Flow out of a node is always fixed:

Another max-flow scheme: MRFs
When will prof Cohen post the notes … B B B B B B B I I I I I I I O O O O O O O Goal is to learn how to weight edges in the graph: weight(yi,yi+1) = 2*[(yi=B or I) and isCap(xi)] + 1*[(yi=B and isFirstName(xi)] - 5*[(yi+1≠B and isLower(xi) and isUpper(xi+1)]

Another max-flow scheme: MRFs
When will prof Cohen post the notes … B B B B B B B I I I I I I I O O O O O O O Find total flow to each node, weights are now on edges from state to state. Goal is to learn how to weight edges in the graph, given features from the examples.

CRFs vs MEMMs CRFs: MEMMs: x1 x2 x3 x4 x5 x6 x1 x2 x3 x4 x5 x6 … … …
Sequence classification f:xy is done by: Converting x,Y to a MRF Using “flow” computations on the MRF to compute the best y|x (inference) Learning is tuning this inference process MEMMs: Sequence classification f:xy is reduced to many cases of ordinary classification, f:xiyi …combined with Viterbi or beam search … x1 x2 x3 x4 x5 x6 x1 x2 x3 x4 x5 x6 Pr(Y|x2,y1’) Pr(Y|x4,y3) … Pr(Y|x5,y5) MRF: φ(Y1,Y2), φ(Y2,Y3),…. Pr(Y|x2,y1) … y1 y2 y3 y4 y5 y6 y1 y2 y3 y4 y5 y6

CRFs vs MEMMs CRFs: MEMMs: Learning is tuning this inference process
Sequence classification f:xy is done by: Converting x,Y to a MRF Using “flow” computations on the MRF to compute the best y|x (inference) Learning is tuning this inference process Learning involves inference: Need to run forward-backward on each example in the inner loop of your learner Lafferty et al 2001 paper: >1000 iterations MEMMs: Sequence classification f:xy is reduced to many cases of ordinary classification, f:xiyi …combined with Viterbi or beam search Learning is independent of inference (as in HMMs with fully-labeled data)

The math: Review of maxent

Review of maxent/MEMM/CMMs
We know how to compute this.

Details on CMMs

From CMMs to CRFs Recall why we’re unhappy: we don’t want local normalization New model How to compute this?

What’s the new model look like?
What’s independent? If fi is HMM-like and depends on only xj,yj or yj,yj-1 y1 y2 y3 x1 x2 x3

What’s the new model look like?
What’s independent now?? y1 y2 y3 x

CRF learning – from Sha & Pereira

Something like forward-backward Idea: Define matrix of y,y’ “affinities” at stage i Mi[y,y’] = “unnormalized probability” of transition from y to y’ at stage I Mi * Mi+1 = “unnormalized probability” of any path through stages i and i+1

y1 y2 y3 x y1 y2 y3

Forward backward ideas
name name name c g b f nonName nonName nonName d h

Sha & Pereira results CRF beats MEMM (McNemar’s test); MEMM probably beats voted perceptron

Sha & Pereira results in minutes, 375k examples

CRFs for SPLODD William W. Cohen Sep 8, 2011.

Similar presentations

Presentation on theme: "CRFs for SPLODD William W. Cohen Sep 8, 2011."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CRFs for SPLODD William W. Cohen Sep 8, 2011.

Similar presentations

Presentation on theme: "CRFs for SPLODD William W. Cohen Sep 8, 2011."— Presentation transcript:

Similar presentations

About project

Feedback