Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Data-Driven Dependency Parsing. 2 Background: Natural Language Parsing Syntactic analysis String to (tree) structure He likes fish S NP VP NP VNPrn.

Similar presentations


Presentation on theme: "1 Data-Driven Dependency Parsing. 2 Background: Natural Language Parsing Syntactic analysis String to (tree) structure He likes fish S NP VP NP VNPrn."— Presentation transcript:

1 1 Data-Driven Dependency Parsing

2 2 Background: Natural Language Parsing Syntactic analysis String to (tree) structure He likes fish S NP VP NP VNPrn Helikesfish

3 3 He likes fish S NP VP NP VNPrn Helikesfish

4 4 He likes fish S NP VP NP VNPrn Helikesfish Useful in Natural Language Understanding NL interfaces, conversational agents Language technology applications Machine translation, question answering, information extraction Scientific study of language Syntax Language processing models

5 5 He likes fish S NP VP NP VNPrn Helikesfish Not enough coverage, Too much ambiguity

6 6 6 He likes fish S NP VP NP VNPrn Helikesfish S NP VP AdvP VAdv Det The runsfast N boy S NP VP AdvP VAdv N Dogs runfast S NP VP V N Dogs run S NP VP AdvP VAdv Det The runsfast N boy S NP VP AdvP VAdv N Dogs runfast S NP VP V N Dogs run Charniak (1996); Collins (1996); Charniak (1997) Charniak (1996); Collins (1996); Charniak (1997)

7 7 He likes fish S NP VP NP VNPrn Helikesfish S NP VP AdvP VAdv Det The runsfast N boy S NP VP AdvP VAdv N Dogs runfast S NP VP V N Dogs run S NP VP AdvP VAdv Det The runsfast N boy S NP VP AdvP VAdv N Dogs runfast S NP VP V N Dogs run

8 8 S NP VP NP VNDet Theatethe DetN boycheesesandwich N Theatetheboycheesesandwich

9 9 S NP VP NP VNDet Theatethe DetN boycheesesandwich N Theatetheboycheesesandwich ate boy

10 10 Theatetheboycheesesandwich The boy cheesethe ate sandwich SUBJ DET OBJ MOD DET SUBJ OBJ DET MOD HEAD DEPENDENT LABEL

11 11 Background: Linear Classification with the Perceptron Classification: given an input x predict output y Example: x is a document, y ∈ {Sports, Politics, Science} x is represented as a feature vector f(x) Example: x f(x) y Just add feature weights given in a vector w Wednesday night, when the Lakers play the Mavericks at American Airlines Center, they get to see first hand … # games:5 # Lakers:4 # said:3 # rebounds:3 # democrat:0 # republican:0 # science:0 Sports

12 12 Multiclass Perceptron Learn vectors of feature weights w class for each class c w c = 0 For N iterations For each training example (x i, y i ) z i = argmax z w z f(x i ) if z i ≠ y i w z i = w z i – f(x i ) w y i = w y i + f(x i ) Try to classify each example. If a mistake is made, update the weights.

13 13 Shift-Reduce Dependency Parsing Two main data structures Stack S (initially empty) Queue Q (initialized to contain each word in the input sentence) Two types of actions Shift: removes a word from Q, pushes onto S Reduce: pops two items from S, pushes a new item onto S New item is a tree that contains the two popped items This can be applied to either dependencies (Nivre, 2004) or constituents (Sagae & Lavie, 2005)

14 14Shift Under a proposal… to StackInput string Stack Input string Before SHIFTAfter SHIFT SHIFT expandIRAsa a shift action removes the next token from the input list… … and pushes this new item onto the stack PMOD Under a proposal… to PMOD expandIRAsa

15 15Reduce StackInput StackInput Before REDUCEAfter REDUCE REDUCE-RIGHT-VMOD a reduce action pops these two items… … and pushes this new item Under a proposal… to expand PMOD Under a proposal… to expand PMOD VMOD IRAsa$2000IRAsa$2000

16 16 STACKQUEUE Helikesfish SUBJ OBJ He likes fish SUBJ He likes Parser Action:

17 17 Choosing Parser Actions No grammar, no action table Learn to associate stack/queue configurations with appropriate parser actions Classifier Treated as a black-box Perceptron, SVM, maximum entropy, memory-based learning, etc Features: top two items on the stack, next input token, context, lookahead, … Classes: parser actions

18 18 STACKQUEUE He likes fish Features: stack(0) = likes stack(0).POS = VBZ stack(1) = Hestack(1).POS = PRP stack(2) = 0stack(2).POS = 0 queue(0) = fishqueue(0).POS = NN queue(1) = 0queue(1).POS = 0 queue(2) = 0queue(2).POS = 0

19 19 STACKQUEUE He likes fish Features: stack(0) = likes stack(0).POS = VBZ stack(1) = Hestack(1).POS = PRP stack(2) = 0stack(2).POS = 0 queue(0) = fishqueue(0).POS = NN queue(1) = 0queue(1).POS = 0 queue(2) = 0queue(2).POS = 0 Class: Reduce-Right-SUBJ

20 20 STACKQUEUE fish Features: stack(0) = likes stack(0).POS = VBZ stack(1) = Hestack(1).POS = PRP stack(2) = 0stack(2).POS = 0 queue(0) = fishqueue(0).POS = NN queue(1) = 0queue(1).POS = 0 queue(2) = 0queue(2).POS = 0 Class: Reduce-Right-SUBJ He likes

21 21 STACKQUEUE fish Features: stack(0) = likes stack(0).POS = VBZ stack(1) = Hestack(1).POS = PRP stack(2) = 0stack(2).POS = 0 queue(0) = fishqueue(0).POS = NN queue(1) = 0queue(1).POS = 0 queue(2) = 0queue(2).POS = 0 Class: Reduce-Right-SUBJ He likes

22 22 STACKQUEUE fish Features: stack(0) = likes stack(0).POS = VBZ stack(1) = Hestack(1).POS = PRP stack(2) = 0stack(2).POS = 0 queue(0) = fishqueue(0).POS = NN queue(1) = 0queue(1).POS = 0 queue(2) = 0queue(2).POS = 0 Class: Reduce-Right-SUBJ SUBJ He likes

23 23 Accurate Parsing with Greedy Search Experiments: WSJ Penn Treebank 1M words of WSJ text Accuracy: ~90% (unlabeled dependency links) Other languages (CoNLL 2006, 2007 shared tasks) Arabic, Basque, Chinese, Czech, Japanese, Greek, Hungarian, Turkish, … about 75% to 92% Good accuracy, fast (linear time), easy to implement!

24 24 Maximum Spanning Tree Parsing (McDonald et al., 2005) Dependency tree is a graph (obviously) Words are vertices, dependency links are edges Imagine instead a fully connected weighted graph Each weight is the score for the dependency link Each scores is independent of other dependencies Edge-factored model Find the Maximum Spanning Tree Score for the tree is the sum of the scores of its individual dependencies How are edge weights determined?

25 25 1 (I) 2 (ate) 3 (a) 4 (sandwich) 0 (root) I ate a sandwich 1 2 3 4

26 26 1 (I) 2 (ate) 3 (a) 4 (sandwich) 0 (root) I ate a sandwich 1 2 3 4 -11 2 12 -8 8 1 -2 0 5 7 9 33 -3 3 9 5 1 3

27 27 1 (I) 2 (ate) 3 (a) 4 (sandwich) 0 (root) I ate a sandwich 1 2 3 4 -11 2 12 -8 8 1 -2 0 5 7 9 33 -3 3 5 1 3

28 28 Structured Classification x is a sentence, G is a dependency tree, f(G) is a vector of features for the entire tree Features: h(ate):d(sandwich) hPOS(VBD):dPOS(NN) h(ate):d(I)hPOS(VBD):dPOS(PRP) h(sandwich):d(a)hPOS(NN):dPOS(DT) hPOS(VBD)hPOS(NN)dPOS(NN) dPOS(DT)dPOS(NN)dPOS(PRP) h(ate)h(sandwich)d(sandwich) … (many more) To assign edge weights, we learn a feature weight vector w

29 29 Structured Perceptron Learn a vector of feature weights w w = 0 For N iterations For each training example (x i, G i ) G’ i = argmax G’ ∈ GEN (x i ) w f(G’) if G’ i ≠ G i w = w + f(G i ) – f(G’ i ) The same as before, but to find the argmax we use MST, since each G is a tree (which also contains the corresponding input x ). If G’ i is not the right tree, update the feature vector

30 30 Question: Are there trees that an MST parser can find, but a Shift-Reduce parser* can’t? (*shift-reduce parser as described in slides 13-19)

31 31 Accurate Parsing with Edge-Factored Models The Maximum Spanning Tree algorithm for directed trees (Chu & Liu, 1965; Edmonds, 1967) runs in quadratic time Finds the best out of exponentially many trees Exact inference! Edge-factored: each dependency link is considered independently from the others Compare to Shift-Reduce parsing Greedy inference Rich set of features includes partially built trees McDonald and Nivre (2007) show that shift-reduce and MST parsing get similar accuracy, but have different strengths

32 32 Parser Ensembles By using different types of classifiers and algorithms, we get several different parsers Ensemble idea: combine the output of several parsers to obtain a single more accurate result I like cheese

33 33 Parser Ensembles with Maximum Spanning Trees (Sagae and Lavie, 2006) First, build a graph Create a node for each word in the input sentence (plus one extra “root” node) Each dependency proposed by any of the parsers is an weighted edge If multiple parsers propose the same dependency, add weight to the corresponding edge Then, simply find the MST Maximizes the votes Structure guaranteed to be a dependency tree

34 34 1 (I) 2 (ate) 3 (a) 4 (sandwich) 0 (root) I ate a sandwich 1 2 3 4

35 35 1 (I) 2 (ate) 3 (a) 4 (sandwich) 0 (root) I ate a sandwich 1 2 3 4

36 36 1 (I) 2 (ate) 3 (a) 4 (sandwich) 0 (root) I ate a sandwich 1 2 3 4 Parser A Parser B Parser C

37 37 1 (I) 2 (ate) 3 (a) 4 (sandwich) 0 (root) I ate a sandwich 1 2 3 4

38 38 1 (I) 2 (ate) 3 (a) 4 (sandwich) 0 (root) I ate a sandwich 1 2 3 4

39 39 MST Parser Ensembles Are Very Accurate Highest accuracy in CoNLL 2007 shared task on multilingual dependency parsing (a parser bake-off with 22 teams) Nilson et al. (2007); Sagae and Tsujii (2007) Improvement depends on selection of parsers for the ensemble With four parsers with accuracy between 89 and 91, ensemble accuracy = 92.7


Download ppt "1 Data-Driven Dependency Parsing. 2 Background: Natural Language Parsing Syntactic analysis String to (tree) structure He likes fish S NP VP NP VNPrn."

Similar presentations


Ads by Google