GLARF-ULA: ULA08 Workshop March 19, 2007 GLARF-ULA: Working Towards Usability Unified Linguistic Annotation Workshop Adam Meyers New York University March.

Slides:



Advertisements
Similar presentations
Three Basic Problems Compute the probability of a text: P m (W 1,N ) Compute maximum probability tag sequence: arg max T 1,N P m (T 1,N | W 1,N ) Compute.
Advertisements

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
Learning Accurate, Compact, and Interpretable Tree Annotation Recent Advances in Parsing Technology WS 2011/2012 Saarland University in Saarbrücken Miloš.
Learning and Inference for Hierarchically Split PCFGs Slav Petrov and Dan Klein.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Layering Semantics (Putting meaning into trees) Treebank Workshop Martha Palmer April 26, 2007.
Overview of the Hindi-Urdu Treebank Fei Xia University of Washington 7/23/2011.
Semantic Role Labeling Abdul-Lateef Yussiff
April 26th, 2007 Workshop on Treebanking, HLT/NAACL, Rochester 1 Layering of Annotations in the Penn Discourse TreeBank (PDTB) Rashmi Prasad Institute.
Recognizing Implicit Discourse Relations in the Penn Discourse Treebank Ziheng Lin, Min-Yen Kan, and Hwee Tou Ng Department of Computer Science National.
Annotating language data Tomaž Erjavec Institut für Informationsverarbeitung Geisteswissenschaftliche Fakultät Karl-Franzens-Universität Graz Tomaž Erjavec.
Shallow Parsing CS 4705 Julia Hirschberg 1. Shallow or Partial Parsing Sometimes we don’t need a complete parse tree –Information extraction –Question.
Semantic Annotation Meeting April 14, 2005 NomBank & the Down-to-Earth Parts of Pie-in-the-Sky Adam Meyers New York University April 14, 2004.
Applications of Sequence Learning CMPT 825 Mashaal A. Memon
1 A Sentence Boundary Detection System Student: Wendy Chen Faculty Advisor: Douglas Campbell.
Introduction to treebanks Session 1: 7/08/
6/29/051 New Frontiers in Corpus Annotation Workshop, 6/29/05 Ann Bies – Linguistic Data Consortium* Seth Kulick – Institute for Research in Cognitive.
1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 20, 2004.
Annotation Types for UIMA Edward Loper. UIMA Unified Information Management Architecture Analytics framework –Consists of components that perform specific.
DS-to-PS conversion Fei Xia University of Washington July 29,
Tasks Talk: ULA08 Workshop March 18, 2007 A Talk about Tasks Unified Linguistic Annotation Workshop Adam Meyers New York University March 18, 2008.
Extracting LTAGs from Treebanks Fei Xia 04/26/07.
Learning Accurate, Compact, and Interpretable Tree Annotation Slav Petrov, Leon Barrett, Romain Thibaux, Dan Klein.
NomBank 1.0: ULA08 Workshop March 18, 2007 NomBank 1.0 Released 12/2007 Unified Linguistic Annotation Workshop Adam Meyers New York University March 18,
Parsing the NEGRA corpus Greg Donaker June 14, 2006.
Workshop on Treebanks, Rochester NY, April 26, 2007 The Penn Treebank: Lessons Learned and Current Methodology Ann Bies Linguistic Data Consortium, University.
Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.
Czech-to-English Translation: MT Marathon 2009 Session Preview Jonathan Clark Greg Hanneman Language Technologies Institute Carnegie Mellon University.
LING/C SC/PSYC 438/538 Lecture 27 Sandiway Fong. Administrivia 2 nd Reminder – 538 Presentations – Send me your choices if you haven’t already.
1 Data-Driven Dependency Parsing. 2 Background: Natural Language Parsing Syntactic analysis String to (tree) structure He likes fish S NP VP NP VNPrn.
1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.
A Survey of NLP Toolkits Jing Jiang Mar 8, /08/20072 Outline WordNet Statistics-based phrases POS taggers Parsers Chunkers (syntax-based phrases)
Korean Treebank & Propbank Martha Palmer, Narae Han, Jinyoung Choi, Shijong Ryu University of Pennsylvania May 23, 2005.
GALE Banks 11/9/06 1 Parsing Arabic: Key Aspects of Treebank Annotation Seth Kulick Ryan Gabbard Mitch Marcus.
Syntactically annotated corpora of Estonian Heli Uibo Institute of Computer Science University of Tartu
The CoNLL-2013 Shared Task on Grammatical Error Correction Hwee Tou Ng, Yuanbin Wu, and Christian Hadiwinoto 1 Siew.
Ling 570 Day 17: Named Entity Recognition Chunking.
Tree-based Machine Translation using syntax and semantics
April 17, 2007MT Marathon: Tree-based Translation1 Tree-based Translation with Tectogrammatical Representation Jan Hajič Institute of Formal and Applied.
The Prague (Czech-)English Dependency Treebank Jan Hajič Charles University in Prague Computer Science School Institute of Formal and Applied Linguistics.
HW7 Extracting Arguments for % Ang Sun March 25, 2012.
Czech-English Word Alignment Ondřej Bojar Magdalena Prokopová
Methods for the Automatic Construction of Topic Maps Eric Freese, Senior Consultant ISOGEN International.
Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop Nizar Habash and Owen Rambow Center for Computational Learning.
Automatic classification for implicit discourse relations Lin Ziheng.
Hindi Parts-of-Speech Tagging & Chunking Baskaran S MSRI.
AQUAINT Workshop – June 2003 Improved Semantic Role Parsing Kadri Hacioglu, Sameer Pradhan, Valerie Krugler, Steven Bethard, Ashley Thornton, Wayne Ward,
INSTITUTE OF COMPUTING TECHNOLOGY Forest-based Semantic Role Labeling Hao Xiong, Haitao Mi, Yang Liu and Qun Liu Institute of Computing Technology Academy.
University of Edinburgh27/10/20151 Lexical Dependency Parsing Chris Brew OhioState University.
CS 6998 NLP for the Web Columbia University 04/22/2010 Analyzing Wikipedia and Gold-Standard Corpora for NER Training William Y. Wang Computer Science.
Conversion of Penn Treebank Data to Text. Penn TreeBank Project “A Bank of Linguistic Trees” (as of 11/1992) University of Pennsylvania, LINC Laboratory.
A.F.K. by SoTel. An Introduction to SoTel SoTel created A.F.K., an Android application used to auto generate text message responses to other users. A.F.K.
March 2006Introduction to Computational Linguistics 1 CLINT Tokenisation.
NLP. Introduction to NLP Background –From the early ‘90s –Developed at the University of Pennsylvania –(Marcus, Santorini, and Marcinkiewicz 1993) Size.
Arabic Syntactic Trees Zdeněk Žabokrtský Otakar Smrž Center for Computational Linguistics Faculty of Mathematics and Physics Charles University in Prague.
FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Part-of-Speech Tagging CSCI-GA.2590 – Lecture 4 Ralph Grishman NYU.
Dependency Parsing Niranjan Balasubramanian March 24 th 2016 Credits: Many slides from: Michael Collins, Mausam, Chris Manning, COLNG 2014 Dependency Parsing.
Natural Language Processing Vasile Rus
Language Identification and Part-of-Speech Tagging
English Proposition Bank: Status Report
[A Contrastive Study of Syntacto-Semantic Dependencies]
Prototype-Driven Learning for Sequence Models
Constraining Chart Parsing with Partial Tree Bracketing
Progress report on Semantic Role Labeling
Presentation transcript:

GLARF-ULA: ULA08 Workshop March 19, 2007 GLARF-ULA: Working Towards Usability Unified Linguistic Annotation Workshop Adam Meyers New York University March 19, 2008

GLARF-ULA: ULA08 Workshop March 19, 2007 Outline Introduction to the GLARF Approach What is a standard anyway? Improving & Distributing Easy to Use Parts Participation in CONLL Chinese GLARF

GLARF-ULA: ULA08 Workshop March 19, 2007 GLARF Approach to ULA A Typed Feature Structure Representation Produces a single-theory analysis –Not Reversible GLARF System combines: –hand-annotation –automatically generated annotation –combination of manual/automatic annotation

GLARF-ULA: ULA08 Workshop March 19, 2007 Example Sentence Meanwhile, they made three bids. –Offset of first character = 123 Meanwhile: ARG1 = previous S, ARG2 = current S –PDTB made: ARG0 = they, ARG1 = three bids –PropBank bids: ARG0 = they, Support = made –NomBank (S (ADVP (RB Meanwhile)) (,,) (NP (PRP they))‏ (VP (VBN made)‏ (NP (CD three)‏ (NNS bids))) (..))‏ –Penn Treebank

GLARF-ULA: ULA08 Workshop March 19, 2007 GLARF TFS (S (ADV (ADVP (HEAD (ADVX (HEAD (RB Meanwhile 0))‏ (P-ARG1 (S (EC-TYPE PB) (INDEX 0+0)) (P-ARG2 (S (EC-TYPE PB) (INDEX 0))))‏ (POINTER 0:1))))‏ (PUNCTUATION (,, 1))‏ (SBJ (NP (HEAD (PRP they 2)) (INDEX 1) (POINTER 2:1))))‏ (PRD (VP (HEAD (VX (HEAD (VBN made 3))‏ (P-ARG0 (NP (EC-TYPE PB) (INDEX 1))) (P-ARG1 (NP (EC-TYPE PB) (INDEX 3)))‏ (INDEX 2)))‏ (OBJ (NP (T-POS (CD three 4))‏ (HEAD (NX (HEAD (NNS bids 5))‏ (P-ARG0-Supp (NP (EC-TYPE PB) (INDEX 1)))‏ (Support (VX (EC-TYPE PB) (INDEX 2)))))‏ (INDEX 3)‏ (POINTER 4:1)))‏ (POINTER 3:1)))‏ (PUNCTUATION (.. 6))‏ (POINTER 0:2) (TREE-NUM 1) (INDEX 0)‏

GLARF-ULA: ULA08 Workshop March 19, 2007 What is a Standard Anyway? Wide Usage (VHS/Betamax, cassette/8-track, Windows/MAC)‏ –Quality, the first of its kind, etc. –Papers written by happy users –A Shared Task like CONLL What need does GLARF-ULA fill? –Unified Detailed Linguistic Annotation German, Czech, Japanese, but not English –A la carte analyses with compatible encodings insufficient –Because it is desirable to have common tokenization, phrase boundaries, POS tags, etc. obvious to GALE participants (part of SRI team uses GLARF)‏ Working toward a standard, not necessarily GLARF –Make the “useful” pieces available –Contribute to the CONLL representation

GLARF-ULA: ULA08 Workshop March 19, 2007 Parts of GLARF-ULA that non-GLARF-users Want Last Year’s ULA meeting –Tokenization splits around hyphens Based on NomBank and NE tags –Offset information –Possibly POS correction (if accurate)‏ CONLL –Tokenization splits around hyphens All real words (not just NomBank) NE tags –NP-internal relations apposition, relative, possessive, etc. –NE modification relations POST-HON, TITLE

GLARF-ULA: ULA08 Workshop March 19, 2007 CONLL Splitting at Hyphens/Slashes 1 Split tokens: –Assign POS tags Automatic results for sample of 179 tokens –153 correct (85.5%), 14 incorrect (7.8%), 12 unclear (6.7%)‏ –Decimal token numbers (VP (NP (NNP New 6) – (NNP York 7.1)))‏ – (HYPH – 7.2) – (VBN based 7.3))‏

GLARF-ULA: ULA08 Workshop March 19, 2007 CONLL Splitting at Hyphens/Slashes 2 Split Segments iff: –COMLEX words, numbers, prefixes (from a list)‏ –Required by BBN NE tags (we made a gazatteer)‏ Relations from GLARF –Conjunction cases: Japan-U.S. agreement –Everything else (distinguish HMOD/HEAD)‏ GLARF distinguishes them further

GLARF-ULA: ULA08 Workshop March 19, 2007 NP-internal Relations NP internal relations used for CONLL –Title: Mr. John Smith –Post-Hon: John Smith Jr. III, Inc., Ph.D., etc. –APPOsite: John Smith, president of the U.S. –SUFFIX: John 's –Near 100% accuracy for small sample 45 correct, 2 unclear All NP GLARF Roles –RELATIVE, COMP, A-POS, T-POS, Q-POS, etc. –224 correct (83.9%), 32 wrong (12%), 11 unclear (4.1%)‏

GLARF-ULA: ULA08 Workshop March 19, 2007 Automatic GLARF for ULA-OANC-1 Out of the Box with Charniak parser –Role Precision for 1st 5 sentences in Kaufman –NomBank: 8/10 (80%)‏ –PropBank: 25/31 (81%)‏ –PDTB: 7/11 (64%) Tune Charniak results Run/Tune on Treebank (and other hand data)‏ Process CONLL style Use for LAW 2 WG task

GLARF-ULA: ULA08 Workshop March 19, 2007 Chinese TreeBank and PropBank police now investigate this matter “The police are investigating this matter.” NPNP ADV P NP VV VPVP VPVP IPIP 警方警方 正在正在 此 调查 事 DTDTN DPDP NNAD predicat e Arg 0 Arg 1 NPNP

GLARF-ULA: ULA08 Workshop March 19, 2007 Chinese GLARF (IP (SBJ (NP (HEAD (NN 警方 )) (INDEX 1))‏ (PRD (VP (ADV (ADVP (HEAD (AD 正在 ))))‏ (HEAD (VX (HEAD (VV 调查 ))‏ (P-ARG0 (NP (EC-TYPE PB) (INDEX 1)))‏ (P-ARG1 (NP (EC-TYPE PB) (INDEX 2)))))‏ (OBJ (NP (T-POS (DP (HEAD (DT 此 )))‏ (HEAD (NX (HEAD (Nn 事 )))‏ (INDEX 2)))))‏

GLARF-ULA: ULA08 Workshop March 19, 2007 Summary Helped build a CONLL standard –Adopting the “useful” parts of GLARF Interoperability –Automatic GLARF –Input Annotation (hand or automatic)‏ Extend to Chinese (and Japanese)

GLARF-ULA: ULA08 Workshop March 19, 2007 Future for GLARF-ULA NE-like integration, e.g. TIMEX, Opinion –Structure-changing vs. match dependency head –NEs with markable Nom/PropBank structure PDTB and NomBank overlap occasionally –For example, As a result, etc. –adjudication procedures needed TimeML relations, NonOvert PDTB More CONLL integration