Download presentation
Presentation is loading. Please wait.
Published byBrent Chapman Modified over 9 years ago
1
1 CS546: Machine Learning and Natural Language Preparation to the Term Project: - Dependency Parsing - Dependency Representation for Semantic Role Labeling Slides for Dependency Parsing are based on Joakim Nivre and Sandar Kuebler slides from ACL 06 Tutorial TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA
2
2 Outline – Dependency Parsing: Formalism Dependency Parsing algorithms – Semantic Role Labeling Dependency Formalism Basic Approach for the First Part of the Term Project – Pipeline for the first assignment
3
3 Formalization by Lucien Tesniere [Tesniere, 1959] Idea known long before (e.g., Panini, India, >2000 yrs ago) Studied extensively in the Prague School approach in syntax (in US, research was focused more on constituent formalism)
4
4
5
5
6
(or Constituent Structure)
8
8 There are advantages of dependency structures: – for free (or semi-free) order languages – easier to convert to predicate-argument structure –... But there are drawbacks too... You can try to convert one representation into another – but, in general, these formalisms are not equivalent Constituent vs Dependency
9
9 Most of approaches have been focused constituent tree-based features But now it changes – Machine Translation (e.g., Menezes & Quirk, 07) – Summarization and sentence compression (e.g., Fillippova & Strube, 08) – Opinion mining, (e.g., Lerman et al, 08) – Information extraction, Question Answering (e.g., Bouma et al, 06) Dependency structures for NLP tasks
12
All these conditions will be violated for semantic dependency graphs we will consider later
13
You can think of it as (related) planarity
14
14 Global inference algorithms: – graph-based approaches – transition-based approaches We will not consider – rule-based systems – constraint satisfaction Algorithms
15
15 Idea: Convert dependency structures to constituent structures – easy for projective dependency structures Apply algorithms for constituent parsing to them – E.g., CKY - if some of you attend the class by Julia Hockenmaier on parsing it was/will be covered there Converting to Constituent Formalism
16
16 Converting to Constituent Formalism Different independence assumption lead to different statistical models – both accuracy and parsing time (dynamic programming) varies
18
Features f(i,j) can include dependence on any words in the sentence, i.e. f(i, j, sent) But still the score decomposes over edges in the graph Strong independence assumption
19
Online Learning (Structured Perceptron) Joint feature representation: – we will talk about it more later Algoritm: Here we run MST or Eisner’s algorithm Features over edges only
20
Here, when we say parsing algorithm (=derivation order) we often mean mapping: – Given a tree map it to a sequence of actions which create this tree Tree T is equivalent to these sequence of actions: d 1,..., d n Therefore, P(T) = P(d 1,..., d n ) P(T) = P(d 1,..., d n ) = P(d 1 ) P(d 2 |d 1 )... P(d n |d n-1,..., d 1 ) Ambigous: some times “parsing algorithms” refers to the decoding algorithm to find the most likely sequence Parsing Algorithms You can use classifiers here and search for most likely sequence (Recall Maryam’s talk)
22
Most algorithms are restricted to projective structures, but not all
23
It can handle only projective structures
31
Your training examples are {(d j ; d 1,....,d n-1 )} -- collections of parsing contexts Your want to predict correct actions P(d n |d n-1,..., d 1 ) How to define feature representation of (d n-1,..., d 1 ) You can think instead of (d n-1,..., d 1 ) in terms of: – partial tree corresponding to them – current contents of queue (Q) and stack (S) – The most important features are top of S and front of Q (only between them you can potentially create links) (Inference: you can do it greedily or with beam search) How to learn in this case?
32
CoNLL-2006 Shared Task, Average over 12 langs (Labeled Attachment Score) McDonald et al (MST): 80.27 Nivre et al (Transitions): 80.19 Results are the same A lot of research in both directions, – e.g., Latent Variable Models for Transition Based Parsing (Titov and Henderson, 07) – best single-model system in CoNLL-2007 (third overall) Results: Transition-based vs Graph-Based
33
Graph-Based Algorithms (McDonald) Post-Processing of Projective Algorithms (Hall and Novak, 05) Transition-Based Algorithms which handle non-projectivity (Attardi 06, Titov et al, 08; Nivre et al, 08) Pseudo Projective Parsing: Removing non- projective (crossing) links and encoding them in labels (Nivre and Nilsson, 05) Non-Projective Parsing
34
Graph-Based Algorithms (McDonald) Post-Processing of Projective Algorithms (Hall and Novak, 05) Transition-Based Algorithms which handle non-projectivity (Attardi 06, Titov et al, 08; Nivre et al, 08) Pseudo Projective Parsing: Removing non- projective (crossing) links and encoding them in labels (Nivre and Nilsson, 05) Non-Projective Parsing
37
37 First Phase of Term Project – The goal is to construct a joint syntax-SRL (Semantic Role Labeling) dependency structures – Similar to CoNLL-2008, 09 Shared Tasks – 2 nd phase will focus on SRL – Now we need to create the entire pipeline Tagger: SVM tagger Pseudo-Projective Transformations: tool by Nilsson & Nivre Dependency Parser: Malt Parser by Nivre et al Implement a basic classifier for SRL (see next slide) – Due after Spring Break – I’ll send email and description by email
38
38 First Phase of Term Project Syntactic structure Semantic structure Properties of the Semantic (SRL) Structure Multiple heads (parents) Need to annotate predicates with senses (predicates are potential parents in the graph) – not indicated in the figure It is not the most standard formalism for SRL
39
39 SRL Pipeline – 1 st Stage: For every word you decide if a word is a predicate (binary classification) – 2 nd Stage: For all the words which are predicates predict their sense – 3 rd Stage: For every pair of words decide: word A is an argument of word B word B is an argument of word A there is no SRL relation between them (constraint: only predicates can be parents) – 4th Stage: Label all the relations
40
40 SRL Pipeline – Use any features: hint: dependency parse features are going to be very useful see the CoNLL 2008 shared task papers to see which features were useful – Use any learning algorithm You can use a package (e.g., SnOW) Or implement it (e.g., averaged perceptron is easy) – Do not use any SRL tools
41
41 Next lectures – I will be away for 2 weeks – Next week (Mar, 9 – Mar, 15): Wednesday: Alex Klementiev on Weak Supervision Friday: Kevin Small on Active Learning + student presentation by Ryan on Friday – 2 nd week (Mar, 16 – Mar, 22): work on the project – 1 st phase will be due around April, 1 (exact dates later)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.