Natural Language Processing

Name: Natural Language Processing
Uploaded: 2017-08-25T01:28:14+00:00
Duration: PTM10S19
Channel: Alvin Cox
Description: Natural Language Processing

Natural Language Processing
Earley’s Algorithm and Dependencies

Survey Feedback Expanded office hours More detail in the lectures
Tuesday evenings Friday afternoons More detail in the lectures Piazza Quiz & Midterm policy You don’t get them back Grading policy

Earley’s Algorithm

Grammar for Examples NP -> N DT -> a NP -> DT N DT -> the
NP -> NP PP NP -> PNP PP -> P NP S -> NP VP S -> VP VP -> V NP VP -> VP PP DT -> a DT -> the P -> through P -> with PNP -> Swabha PNP -> Chicago V -> book V -> books N -> book N -> books N -> flight

Earley’s Algorithm More “top-down” than CKY.
Still dynamic programming. The Earley chart: ROOT → • S [0,0] goal: ROOT → S• [0,n] book the flight through Chicago

Earley’s Algorithm: Predict
Given V → α•Xβ [i, j] and the rule X → γ, create X → •γ [j, j] ROOT → • S [0,0] S→ VP S → VP • [0,0] ROOT → • S [0,0] S → • VP [0,0] S → • NP VP [0,0] ... VP → • V NP [0,0] NP → • DT N [0,0] book the flight through Chicago

Earley’s Algorithm: Scan
Given V → α•Tβ [i, j] and the rule T → wj+1, create T → wj+1• [j, j+1] VP → • V NP [0,0] V → book V → book • [0,1] ROOT → • S [0,0] S → • VP [0,0] S → • NP VP [0,0] ... VP → • V NP [0,0] NP → • DT N [0,0] V → book• [0, 1] book the flight through Chicago

Earley’s Algorithm: Complete
Given V → α•Xβ [i, j] and X → γ• [j, k], create V → αX•β [i, k] VP → • V NP [0,0] V → book • [0,1] VP → V • NP [0,1] ROOT → • S [0,0] S → • VP [0,0] S → • NP VP [0,0] ... VP → • V NP [0,0] NP → • DT N [0,0] V → book• [0, 1] VP → V • NP [0,1] book the flight through Chicago

Thought Questions Runtime? Memory? Weighted version? Recovering trees?

Parsing as Search

Implementing Recognizers as Search
Agenda = { state0 } while(Agenda not empty) s = pop a state from Agenda if s is a success-state return s // valid parse tree else if s is not a failure-state: generate new states from s push new states onto Agenda return nil // no parse!

Agenda-Based Probabilistic Parsing
Agenda = { (item, value) : initial updates from equations } // items take the form [X, i, j]; values are reals while(Agenda not empty) u = pop an update from Agenda if u.item is goal return u.value // valid parse tree else if u.value > Chart[u.item] store Chart[u.item] ← u.value if u.item combines with other Chart items: generate new updates from u and items stored in Chart push new updates onto Agenda return nil // no parse! “States” on the agenda are (possible) updates to the chart. BEST FIRST: Order the updates by their values Guarantee: the first time you pop any state, you have its final value! Extension: order by update times h, where h introduces more information about which states we like. Under some conditions, this is faster and still optimal. Even when not optimal, performance is sometimes good.

Catalog of CF Parsing Algorithms
Recognition/Boolean vs. parsing/probabilistic Chomsky normal form/CKY vs. general/Earley’s Exhaustive vs. agenda

Dependency Parsing

Treebank Tree S VP PP NP NP NP NP
DT NN NN NN JJ NN VBD CD NNS IN DT NNP The luxury auto maker last year sold 1,214 cars in the U.S.

Headed Tree S VP PP NP NP NP NP DT NN NN NN JJ NN VBD CD NNS IN DT NNP
The luxury auto maker last year sold 1,214 cars in the U.S.

Lexicalized Tree Ssold VPsold PPin NPmaker NPyear NPcars NPU.S.
DT NN NN NN JJ NN VBD CD NNS IN DT NNP The luxury auto maker last year sold 1,214 cars in the U.S.

Dependency Tree

Methods for Dependency Parsing
Parse with a phrase-structure parser with headed / lexicalized rules Reuse algorithms we know Leverage improvements in phrase structure parsing Maximum spanning tree algorithms Words are nodes, edges are possible links MSTParser Shift-reduce parsing Read words in one at a time, decide to “shift” or “reduce” to incrementally build tree structures MaltParser, Stanford NN Dependency Parser

Maximum Spanning Tree Each dependency is an edge
Assign each edge a goodness score (ML problem) Dependencies must form a tree Find the highest scoring tree (Chu-Liu-Edmonds algorithm) Figure: Graham Neubig

Shift-Reduce Parsing Two data structures At each point choose
Buffer: words that are being read in Stack: partially built dependency trees At each point choose Shift: move word from stack to queue Reduce-left: combine top two items in stack by making the top word the head of the tree Reduce-right: combine top two items in stack by maing the second word the head of the tree Parsing as classification: classifier says “shift” or “reduce-left” or “reduce-right”

Shift-Reduce Parsing Stack Buffer Stack Buffer Figure: Graham Neubig

Parsing as Classification
Given a state: What action is best? Better classification -> better parsing Stack Buffer

Shift-Reduce Algorithm
ShiftReduce(queue) make list heads stack = [ (0, “ROOT”, “ROOT”) ] while |buffer| > 0 or |stack| > 1: feats = MakeFeats(stack, buffer) action = Predict(feats, weights) if action = shift: stack.push(buffer.read()) elif action = reduce_left: heads[stack[-2]] = stack[-1] stack.remove(-2) else: # action = reduce_right heads[scack[-1]] = stack[-2] stack.remove(-1)

Natural Language Processing

Similar presentations

Presentation on theme: "Natural Language Processing"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Natural Language Processing

Similar presentations

Presentation on theme: "Natural Language Processing"— Presentation transcript:

Similar presentations

About project

Feedback