Download presentation
Presentation is loading. Please wait.
Published byDoreen Hawkins Modified over 8 years ago
1
1 Data-Driven Dependency Parsing
2
2 Background: Natural Language Parsing Syntactic analysis String to (tree) structure He likes fish S NP VP NP VNPrn Helikesfish
3
3 He likes fish S NP VP NP VNPrn Helikesfish
4
4 He likes fish S NP VP NP VNPrn Helikesfish Useful in Natural Language Understanding NL interfaces, conversational agents Language technology applications Machine translation, question answering, information extraction Scientific study of language Syntax Language processing models
5
5 He likes fish S NP VP NP VNPrn Helikesfish Not enough coverage, Too much ambiguity
6
6 6 He likes fish S NP VP NP VNPrn Helikesfish S NP VP AdvP VAdv Det The runsfast N boy S NP VP AdvP VAdv N Dogs runfast S NP VP V N Dogs run S NP VP AdvP VAdv Det The runsfast N boy S NP VP AdvP VAdv N Dogs runfast S NP VP V N Dogs run Charniak (1996); Collins (1996); Charniak (1997) Charniak (1996); Collins (1996); Charniak (1997)
7
7 He likes fish S NP VP NP VNPrn Helikesfish S NP VP AdvP VAdv Det The runsfast N boy S NP VP AdvP VAdv N Dogs runfast S NP VP V N Dogs run S NP VP AdvP VAdv Det The runsfast N boy S NP VP AdvP VAdv N Dogs runfast S NP VP V N Dogs run
8
8 S NP VP NP VNDet Theatethe DetN boycheesesandwich N Theatetheboycheesesandwich
9
9 S NP VP NP VNDet Theatethe DetN boycheesesandwich N Theatetheboycheesesandwich ate boy
10
10 Theatetheboycheesesandwich The boy cheesethe ate sandwich SUBJ DET OBJ MOD DET SUBJ OBJ DET MOD HEAD DEPENDENT LABEL
11
11 Background: Linear Classification with the Perceptron Classification: given an input x predict output y Example: x is a document, y ∈ {Sports, Politics, Science} x is represented as a feature vector f(x) Example: x f(x) y Just add feature weights given in a vector w Wednesday night, when the Lakers play the Mavericks at American Airlines Center, they get to see first hand … # games:5 # Lakers:4 # said:3 # rebounds:3 # democrat:0 # republican:0 # science:0 Sports
12
12 Multiclass Perceptron Learn vectors of feature weights w class for each class c w c = 0 For N iterations For each training example (x i, y i ) z i = argmax z w z f(x i ) if z i ≠ y i w z i = w z i – f(x i ) w y i = w y i + f(x i ) Try to classify each example. If a mistake is made, update the weights.
13
13 Shift-Reduce Dependency Parsing Two main data structures Stack S (initially empty) Queue Q (initialized to contain each word in the input sentence) Two types of actions Shift: removes a word from Q, pushes onto S Reduce: pops two items from S, pushes a new item onto S New item is a tree that contains the two popped items This can be applied to either dependencies (Nivre, 2004) or constituents (Sagae & Lavie, 2005)
14
14Shift Under a proposal… to StackInput string Stack Input string Before SHIFTAfter SHIFT SHIFT expandIRAsa a shift action removes the next token from the input list… … and pushes this new item onto the stack PMOD Under a proposal… to PMOD expandIRAsa
15
15Reduce StackInput StackInput Before REDUCEAfter REDUCE REDUCE-RIGHT-VMOD a reduce action pops these two items… … and pushes this new item Under a proposal… to expand PMOD Under a proposal… to expand PMOD VMOD IRAsa$2000IRAsa$2000
16
16 STACKQUEUE Helikesfish SUBJ OBJ He likes fish SUBJ He likes Parser Action:
17
17 Choosing Parser Actions No grammar, no action table Learn to associate stack/queue configurations with appropriate parser actions Classifier Treated as a black-box Perceptron, SVM, maximum entropy, memory-based learning, etc Features: top two items on the stack, next input token, context, lookahead, … Classes: parser actions
18
18 STACKQUEUE He likes fish Features: stack(0) = likes stack(0).POS = VBZ stack(1) = Hestack(1).POS = PRP stack(2) = 0stack(2).POS = 0 queue(0) = fishqueue(0).POS = NN queue(1) = 0queue(1).POS = 0 queue(2) = 0queue(2).POS = 0
19
19 STACKQUEUE He likes fish Features: stack(0) = likes stack(0).POS = VBZ stack(1) = Hestack(1).POS = PRP stack(2) = 0stack(2).POS = 0 queue(0) = fishqueue(0).POS = NN queue(1) = 0queue(1).POS = 0 queue(2) = 0queue(2).POS = 0 Class: Reduce-Right-SUBJ
20
20 STACKQUEUE fish Features: stack(0) = likes stack(0).POS = VBZ stack(1) = Hestack(1).POS = PRP stack(2) = 0stack(2).POS = 0 queue(0) = fishqueue(0).POS = NN queue(1) = 0queue(1).POS = 0 queue(2) = 0queue(2).POS = 0 Class: Reduce-Right-SUBJ He likes
21
21 STACKQUEUE fish Features: stack(0) = likes stack(0).POS = VBZ stack(1) = Hestack(1).POS = PRP stack(2) = 0stack(2).POS = 0 queue(0) = fishqueue(0).POS = NN queue(1) = 0queue(1).POS = 0 queue(2) = 0queue(2).POS = 0 Class: Reduce-Right-SUBJ He likes
22
22 STACKQUEUE fish Features: stack(0) = likes stack(0).POS = VBZ stack(1) = Hestack(1).POS = PRP stack(2) = 0stack(2).POS = 0 queue(0) = fishqueue(0).POS = NN queue(1) = 0queue(1).POS = 0 queue(2) = 0queue(2).POS = 0 Class: Reduce-Right-SUBJ SUBJ He likes
23
23 Accurate Parsing with Greedy Search Experiments: WSJ Penn Treebank 1M words of WSJ text Accuracy: ~90% (unlabeled dependency links) Other languages (CoNLL 2006, 2007 shared tasks) Arabic, Basque, Chinese, Czech, Japanese, Greek, Hungarian, Turkish, … about 75% to 92% Good accuracy, fast (linear time), easy to implement!
24
24 Maximum Spanning Tree Parsing (McDonald et al., 2005) Dependency tree is a graph (obviously) Words are vertices, dependency links are edges Imagine instead a fully connected weighted graph Each weight is the score for the dependency link Each scores is independent of other dependencies Edge-factored model Find the Maximum Spanning Tree Score for the tree is the sum of the scores of its individual dependencies How are edge weights determined?
25
25 1 (I) 2 (ate) 3 (a) 4 (sandwich) 0 (root) I ate a sandwich 1 2 3 4
26
26 1 (I) 2 (ate) 3 (a) 4 (sandwich) 0 (root) I ate a sandwich 1 2 3 4 -11 2 12 -8 8 1 -2 0 5 7 9 33 -3 3 9 5 1 3
27
27 1 (I) 2 (ate) 3 (a) 4 (sandwich) 0 (root) I ate a sandwich 1 2 3 4 -11 2 12 -8 8 1 -2 0 5 7 9 33 -3 3 5 1 3
28
28 Structured Classification x is a sentence, G is a dependency tree, f(G) is a vector of features for the entire tree Features: h(ate):d(sandwich) hPOS(VBD):dPOS(NN) h(ate):d(I)hPOS(VBD):dPOS(PRP) h(sandwich):d(a)hPOS(NN):dPOS(DT) hPOS(VBD)hPOS(NN)dPOS(NN) dPOS(DT)dPOS(NN)dPOS(PRP) h(ate)h(sandwich)d(sandwich) … (many more) To assign edge weights, we learn a feature weight vector w
29
29 Structured Perceptron Learn a vector of feature weights w w = 0 For N iterations For each training example (x i, G i ) G’ i = argmax G’ ∈ GEN (x i ) w f(G’) if G’ i ≠ G i w = w + f(G i ) – f(G’ i ) The same as before, but to find the argmax we use MST, since each G is a tree (which also contains the corresponding input x ). If G’ i is not the right tree, update the feature vector
30
30 Question: Are there trees that an MST parser can find, but a Shift-Reduce parser* can’t? (*shift-reduce parser as described in slides 13-19)
31
31 Accurate Parsing with Edge-Factored Models The Maximum Spanning Tree algorithm for directed trees (Chu & Liu, 1965; Edmonds, 1967) runs in quadratic time Finds the best out of exponentially many trees Exact inference! Edge-factored: each dependency link is considered independently from the others Compare to Shift-Reduce parsing Greedy inference Rich set of features includes partially built trees McDonald and Nivre (2007) show that shift-reduce and MST parsing get similar accuracy, but have different strengths
32
32 Parser Ensembles By using different types of classifiers and algorithms, we get several different parsers Ensemble idea: combine the output of several parsers to obtain a single more accurate result I like cheese
33
33 Parser Ensembles with Maximum Spanning Trees (Sagae and Lavie, 2006) First, build a graph Create a node for each word in the input sentence (plus one extra “root” node) Each dependency proposed by any of the parsers is an weighted edge If multiple parsers propose the same dependency, add weight to the corresponding edge Then, simply find the MST Maximizes the votes Structure guaranteed to be a dependency tree
34
34 1 (I) 2 (ate) 3 (a) 4 (sandwich) 0 (root) I ate a sandwich 1 2 3 4
35
35 1 (I) 2 (ate) 3 (a) 4 (sandwich) 0 (root) I ate a sandwich 1 2 3 4
36
36 1 (I) 2 (ate) 3 (a) 4 (sandwich) 0 (root) I ate a sandwich 1 2 3 4 Parser A Parser B Parser C
37
37 1 (I) 2 (ate) 3 (a) 4 (sandwich) 0 (root) I ate a sandwich 1 2 3 4
38
38 1 (I) 2 (ate) 3 (a) 4 (sandwich) 0 (root) I ate a sandwich 1 2 3 4
39
39 MST Parser Ensembles Are Very Accurate Highest accuracy in CoNLL 2007 shared task on multilingual dependency parsing (a parser bake-off with 22 teams) Nilson et al. (2007); Sagae and Tsujii (2007) Improvement depends on selection of parsers for the ensemble With four parsers with accuracy between 89 and 91, ensemble accuracy = 92.7
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.