Natural Language Processing Earley’s Algorithm and Dependencies
Survey Feedback Expanded office hours More detail in the lectures Tuesday evenings Friday afternoons More detail in the lectures Piazza Quiz & Midterm policy You don’t get them back Grading policy
Earley’s Algorithm
Grammar for Examples NP -> N DT -> a NP -> DT N DT -> the NP -> NP PP NP -> PNP PP -> P NP S -> NP VP S -> VP VP -> V NP VP -> VP PP DT -> a DT -> the P -> through P -> with PNP -> Swabha PNP -> Chicago V -> book V -> books N -> book N -> books N -> flight
Earley’s Algorithm More “top-down” than CKY. Still dynamic programming. The Earley chart: ROOT → • S [0,0] goal: ROOT → S• [0,n] book the flight through Chicago
Earley’s Algorithm: Predict Given V → α•Xβ [i, j] and the rule X → γ, create X → •γ [j, j] ROOT → • S [0,0] S→ VP S → VP • [0,0] ROOT → • S [0,0] S → • VP [0,0] S → • NP VP [0,0] ... VP → • V NP [0,0] NP → • DT N [0,0] book the flight through Chicago
Earley’s Algorithm: Scan Given V → α•Tβ [i, j] and the rule T → wj+1, create T → wj+1• [j, j+1] VP → • V NP [0,0] V → book V → book • [0,1] ROOT → • S [0,0] S → • VP [0,0] S → • NP VP [0,0] ... VP → • V NP [0,0] NP → • DT N [0,0] V → book• [0, 1] book the flight through Chicago
Earley’s Algorithm: Complete Given V → α•Xβ [i, j] and X → γ• [j, k], create V → αX•β [i, k] VP → • V NP [0,0] V → book • [0,1] VP → V • NP [0,1] ROOT → • S [0,0] S → • VP [0,0] S → • NP VP [0,0] ... VP → • V NP [0,0] NP → • DT N [0,0] V → book• [0, 1] VP → V • NP [0,1] book the flight through Chicago
Earley’s Algorithm: Complete Given V → α•Xβ [i, j] and X → γ• [j, k], create V → αX•β [i, k] VP → • V NP [0,0] V → book • [0,1] VP → V • NP [0,1] ROOT → • S [0,0] S → • VP [0,0] S → • NP VP [0,0] ... VP → • V NP [0,0] NP → • DT N [0,0] V → book• [0, 1] VP → V • NP [0,1] book the flight through Chicago
Thought Questions Runtime? Memory? Weighted version? Recovering trees?
Parsing as Search
Implementing Recognizers as Search Agenda = { state0 } while(Agenda not empty) s = pop a state from Agenda if s is a success-state return s // valid parse tree else if s is not a failure-state: generate new states from s push new states onto Agenda return nil // no parse!
Agenda-Based Probabilistic Parsing Agenda = { (item, value) : initial updates from equations } // items take the form [X, i, j]; values are reals while(Agenda not empty) u = pop an update from Agenda if u.item is goal return u.value // valid parse tree else if u.value > Chart[u.item] store Chart[u.item] ← u.value if u.item combines with other Chart items: generate new updates from u and items stored in Chart push new updates onto Agenda return nil // no parse! “States” on the agenda are (possible) updates to the chart. BEST FIRST: Order the updates by their values Guarantee: the first time you pop any state, you have its final value! Extension: order by update times h, where h introduces more information about which states we like. Under some conditions, this is faster and still optimal. Even when not optimal, performance is sometimes good.
Catalog of CF Parsing Algorithms Recognition/Boolean vs. parsing/probabilistic Chomsky normal form/CKY vs. general/Earley’s Exhaustive vs. agenda
Dependency Parsing
Treebank Tree S VP PP NP NP NP NP DT NN NN NN JJ NN VBD CD NNS IN DT NNP The luxury auto maker last year sold 1,214 cars in the U.S.
Headed Tree S VP PP NP NP NP NP DT NN NN NN JJ NN VBD CD NNS IN DT NNP The luxury auto maker last year sold 1,214 cars in the U.S.
Lexicalized Tree Ssold VPsold PPin NPmaker NPyear NPcars NPU.S. DT NN NN NN JJ NN VBD CD NNS IN DT NNP The luxury auto maker last year sold 1,214 cars in the U.S.
Dependency Tree
Methods for Dependency Parsing Parse with a phrase-structure parser with headed / lexicalized rules Reuse algorithms we know Leverage improvements in phrase structure parsing Maximum spanning tree algorithms Words are nodes, edges are possible links MSTParser Shift-reduce parsing Read words in one at a time, decide to “shift” or “reduce” to incrementally build tree structures MaltParser, Stanford NN Dependency Parser
Maximum Spanning Tree Each dependency is an edge Assign each edge a goodness score (ML problem) Dependencies must form a tree Find the highest scoring tree (Chu-Liu-Edmonds algorithm) Figure: Graham Neubig
Shift-Reduce Parsing Two data structures At each point choose Buffer: words that are being read in Stack: partially built dependency trees At each point choose Shift: move word from stack to queue Reduce-left: combine top two items in stack by making the top word the head of the tree Reduce-right: combine top two items in stack by maing the second word the head of the tree Parsing as classification: classifier says “shift” or “reduce-left” or “reduce-right”
Shift-Reduce Parsing Stack Buffer Stack Buffer Figure: Graham Neubig
Parsing as Classification Given a state: What action is best? Better classification -> better parsing Stack Buffer
Shift-Reduce Algorithm ShiftReduce(queue) make list heads stack = [ (0, “ROOT”, “ROOT”) ] while |buffer| > 0 or |stack| > 1: feats = MakeFeats(stack, buffer) action = Predict(feats, weights) if action = shift: stack.push(buffer.read()) elif action = reduce_left: heads[stack[-2]] = stack[-1] stack.remove(-2) else: # action = reduce_right heads[scack[-1]] = stack[-2] stack.remove(-1)