Towards Parsing Unrestricted Text into PropBank Predicate- Argument Structures ACL4 Project NCLT Seminar Presentation, 7th June 2006 Conor Cafferkey
Project Overview Open research problem: ● Integrating syntactic parsing and semantic role labeling (SRL) Approach ● Retraining a history-based generative lexicalized parser (Bikel, 2002) ● Semantically-enriched training corpus (Penn Treebank + PropBank-derived semantic role annotations)
Treebank Syntactic Bracketing Style
Semantic Roles ● Relationship that a syntactic constituent has with a predicate ● Predicate-argument relations ● PropBank (Palmer et al., 2005)
PropBank Predicate-Argument Relations Frameset: hate.01 ARG0: experiencer ARG1: target
PropBank Argument Types ● ARG0 - ARG5: arguments associated with a verb predicate, defined in the PropBank Frames scheme. ● ARGM-XXX: adjunct-like arguments of various sorts, where XXX is the type of the adjunct. Types include locative (LOC), temporal (TMP), manner (MNR), etc. ● ARGA: causative agents. ● rel: the verb of the proposition.
Current Approaches ● Semantic role labeling (SRL) task: – Identify, given a verb: ● which nodes of the syntactic tree are arguments of that verb, and ● what semantic role each such argument plays with regard to the verb.
Current Approaches ● “Pipelined” approach ● Parsing → Pruning → ML-techniques → post-processing ● CoNLL-2005 (Carreras and Márquez, 2005) – SVM, Random Fields, Random Forests, … – Various lexical parameters
An Integrated Approach to Semantic Parsing ● Integrate syntactic and semantic parsing ● Retrain parser using semantically-enriched corpus (Treebank + PropBank-derived semantic roles) ● Parser itself performs semantic role labeling (SRL)
Project Components ● “Off-the-shelf”: – Parser (Bikel, 2002) emulating Collins’ (1999) model 2 – Penn Treebank Release 2 (Marcus et al., 1993) – PropBank 1.0 (Palmer, 2005) ● Written for project (mainly in Python): – Scripts to annotate Treebank with PropBank data – Script to generate new head-finding rules for Bikel’s parser – SRL evaluation scripts – Utility scripts (pre-processing, etc.)
Appending Semantic Roles to Treebank Syntactic Category Labels wsj/15/wsj_1568.mrg 16 2 gold hate.01 vn--a 0:1-ARG0 2:0-rel 3:1-ARG1
Syntactic Bracketing Evaluation Parseval measures (Black, et al., 1992)
Syntactic Bracketing Evaluation ● Harmonic mean of precision and recall:
Baseline Syntactic Bracketing Performance Parsing Section 00, trained with sections of Penn Treebank (1918 sentences) Parse Time: 114:41
Semantically-Augmented Treebanks ● N: augment node labels with ARGNs only ● N-C: augment node label with conflated ARGNs only ● M: augment node labels with ARGMs only ● M-C: augment node labels with conflated ARGMs only ● NMR: augment node labels with ARGNs, ARGMs and rels
Syntactic Bracketing Evaluation Parsing Section 00, trained with sections of Penn Treebank (1918 sentences)
Semantic Evaluation
● Evaluating by terminal number and height ● Evaluating by terminal span ● How strictly to evaluate?
Semantic Role Labeling Evaluation Parsing Section 00, trained with sections of Penn Treebank (1918 sentences)
Semantic Role Labeling Evaluation Parsing Section 00, trained with sections of Penn Treebank (1918 sentences)
Syntactic Nodes that Play Multiple Semantic Roles
Adding More Information ● Co-index the semantic role labels with governing predicate (verb) ● i.e. include the appropriate roleset name in each semantic label augmentation
Co-indexing the Semantic Augmentations
Adding More Information ● Data sparseness ● Time efficiency ● Need to make some sort of generalizations ● “Syntacto-semantic” verb classes ● VerbNet (Kipper et al., 2002)
Co-indexing with VerbNet classes
Future Ideas ● Integrate the (un co-indexed) output from the re-trained parser into a pipelined SRL system ● Syntactic parsing informed by semantic roles? – Recoding the parser to take better advantage of the semantic roles – Reranking n-best parser outputs based on semantic roles
Summary ● Retrained a history-based generative lexicalized parser with semantically-enriched corpus – Corpus annotation – Generating head-finding rules ● Evaluated parser’s performance – Syntactic parsing ( evalb ) – Semantic parsing (SRL)
References ● Bikel, Daniel M Design of a Multi-lingual, Parallel-processing Statistical Parsing Engine. In Proceedings of HLT2002, San Diego, California. ● Black, Ezra, Frederick Jelinek, John D. Lafferty, David M. Magerman, Robert L. Mercer and Salim Roukos Towards History-based Grammars: Using Richer Models for Probabilistic Parsing. In Proceedings DARPA Speech and Natural Language Workshop, Harriman, New York, pages Morgan Kaufmann. ● Carreras, Xavier and Lluís Màrquez Introduction to the CoNLL Shared Task: Semantic Role Labeling. In Proceedings of CoNLL- 2005, pages ● Collins, Michael John Head-driven Statistical Models for Natural Language Parsing. Ph.D. thesis, University of Pennsylvania, Philadelphia.
References ● Kipper, Karin, Hoa Trang Dang and Martha Palmer Class-Based Construction of a Verb Lexicon. In Proceedings of Seventeenth National Conference on Artificial Intelligence, Austin, Texas. ● Marcus, Mitchell P., Beatrice Santroini and Mary Ann Marcinkiewicz Building a large annotated corpus of English: the Penn Treebank. Computational Linguistics, 19(2): ● Palmer, Martha, Daniel Gildea and Paul Kingsbury The Proposition Bank: An Annotated Corpus of Semantic Roles. Computational Linguistics, 31(1): ● Yi, Szu-ting and Martha Palmer The integration of syntactic parsing and semantic role labeling. In Proceedings of CoNLL-2005, pages