PRESENTED BY: PEAR A BHUIYAN PROJECT UPDATE PRESENTATION CSI 5386, FALL 2012 Title: Exploring Natural Language Parsing Techniques and evaluations PRESENTED BY: PEAR A BHUIYAN
OBJECTIVE OF THE PROJECT To learn some parsing techniques and evaluation methods by studying research papers. Collect research papers and select three large papers which describe parsing techniques or evaluation methods. Study to learn those techniques of parsing and evaluation methods. Summarize them and write a composite report with strengths and weaknesses.
INTRODUCTION Parsing is defined as the task of recognizing an input string and assigning a structure to it. It is useful in applications like grammar checking, as an intermediate stage of semantic analysis, information extraction etc. We have learnt some parsing algorithms in this course. The goal of this project is to learn some other parsing techniques by searching and studying some research papers on parsing and to write a review report on this. Therefore, I have collected 9 research papers on parsing. Of them, I have chosen 3 large papers to study and write the composite review report. At this point it is hard to summarize them as it needs more study to grasp the contents. Thus, for now we have a look at each paper individually.
PAPER-1 Title: Head-driven statistical models for natural language parsing. Author: Michel Collins, MIT Computer Science and Artificial Intelligence Laboratory, Published: © 2003 Association of computational linguistics. (49 pages) Three parsing models were introduced and evaluated on the Penn Wall Street Journal Treebank. Their strengths and weaknesses were discussed. The effects of various features on the parsing accuracy. The relationship of the models to other work on statistical parsing.
PAPER-1: Model 1 This extends probabilistic context-free grammars (PCFGs) to lexicalized grammars. Context-free grammar is defined as a 4-tuple (N,∑,A,R) where N is a set of non-terminals, ∑ is an alphabet, A is a start symbol in N, and R is a finite set of rules. In probabilistic context-free grammar (PCFG) each rule in a grammar has a probability. A PCFG can be lexicalized by associating a word w and a part-of-speech tag t with each non-terminal X in the tree.
PAPER-1: Model-2 &3 Model-2 extends the parser to distinct between complement and adjunct(temporal modifier). This may help parsing accuracy. Model-3 gives a probabilistic treatment of wh-movement.
PAPER-2 Title: Discriminative re-ranking for natural language parsing,. Authors: By: Michel Collins and Terry Koo, MIT Computer Science and Artificial Intelligence Laboratory. Published in: © 2005 Association of computational linguistics. (46 pages).
PAPER-2 This article provides with some approaches to re-rank the output of an existing probabilistic parser. The base parser defines an initial ranking of the candidate parse trees. A second model uses some additional features to improve the initial ranking. A method based on the boosting approach is introduced for re-ranking the parse trees.
PAPER-2 The algorithm can be viewed as a feature selection method. The boosting method is applied to parsing the wall Street Journal (WSJ) treebank. The method combines the log-likelihood under a baseline model with evidence from an additional 500000 features over parse trees. The new model achieved 89.75% whereas the baseline model achieved 88.2%.
PAPER-3 Title: Wide-Coverage Deep Statistical Parsing Using Automatic Dependency Structure Annotation. Authors: , Aoife Cahill, Michael Burke, Ruth O’ Donovan, Josef van Geabith, and Andy Way, Dublin City University, Stefan Riezler, Palo Alto Research Center. Published in: © 2008 Association of computational linguistics.(44 pages).
PAPER-3 Tree-based parser evaluation has some drawbacks: Does not provide enough information for NLP applications like deep dependency relations, predicate- argument structure etc. A number of alternative tree representations for the same input. Such problems motivated research on dependency based parser evaluation.
PAPER-3 A number of researchers conducted such experiments using simple and automatic methods to convert shallow parsers output trees into dependencies. In this article, such experiments revisited using sophisticated automatic LFG f-structure annotation methodologies. Various PCFG and history-based parsers are compared to find a baseline parsing system that fits best into this automatic dependency structure annotation technique. The experiments show that the combined system of syntactic parser and dependency structure annotation outperforms hand-crafted deep wide coverage grammars.
PAPER-3 Four machine-learning based shallow parsers and two hand-crafted wide coverage deep probabilistic parsers were evaluated. Their best system achieved f-score of 82.73% against PARC 700 Dependency Bank whereas the f-score for the hand crafted LFG grammar and XLE parsing system was 80.55%(2.18% improvement). The system achieved f-score of 80.23% against the CBS 500 Dependency Bank whereas the f- score for the hand-crafted RASP grammar and parsing system was 76.57%(3.66% improvement).
YET TO DO Studying the papers thoroughly to understand different parsing techniques and evaluation methods. Writing a report on the review of the papers.