Tasks Talk: ULA08 Workshop March 18, 2007 A Talk about Tasks Unified Linguistic Annotation Workshop Adam Meyers New York University March 18, 2008.

Slides:



Advertisements
Similar presentations
Jing-Shin Chang National Chi Nan University, IJCNLP-2013, Nagoya 2013/10/15 ACLCLP – Activities ( ) & Text Corpora.
Advertisements

Layering Semantics (Putting meaning into trees) Treebank Workshop Martha Palmer April 26, 2007.
Multilinugual PennTools that capture parses and predicate-argument structures, and their use in Applications Martha Palmer, Aravind Joshi, Mitch Marcus,
Class-based nominal semantic role labeling: a preliminary investigation Matt Gerber Michigan State University, Department of Computer Science.
GLARF-ULA: ULA08 Workshop March 19, 2007 GLARF-ULA: Working Towards Usability Unified Linguistic Annotation Workshop Adam Meyers New York University March.
June 6, 20073rd PIRE Meeting1 Tectogrammatical Representation of English in Prague Czech-English Dependency Treebank Lucie Mladová Silvie Cinková, Kristýna.
Overview of the Hindi-Urdu Treebank Fei Xia University of Washington 7/23/2011.
1 Corpora for the coming decade Adam Kilgarriff. Dublin June 2009 Kilgarriff: Corpora for the coming decade2 How should they be different?  Bigger 
Annotating language data Tomaž Erjavec Institut für Informationsverarbeitung Geisteswissenschaftliche Fakultät Karl-Franzens-Universität Graz Tomaž Erjavec.
Recent Developments in Technological Tools for the Purpose of Facilitating SLA.
Semantic Annotation Meeting April 14, 2005 NomBank & the Down-to-Earth Parts of Pie-in-the-Sky Adam Meyers New York University April 14, 2004.
Introduction to treebanks Session 1: 7/08/
Corpus Linguistics 2000 American National Corpus Lancaster, England Nancy Ide Vassar College Catherine Macleod New York University.
Annotation Types for UIMA Edward Loper. UIMA Unified Information Management Architecture Analytics framework –Consists of components that perform specific.
DS-to-PS conversion Fei Xia University of Washington July 29,
Are Linguists Dinosaurs? 1.Statistical language processors seem to be doing away with the need for linguists. –Why do we need linguists when a machine.
NomBank 1.0: ULA08 Workshop March 18, 2007 NomBank 1.0 Released 12/2007 Unified Linguistic Annotation Workshop Adam Meyers New York University March 18,
The Use of Corpora for Automatic Evaluation of Grammar Inference Systems Andrew Roberts & Eric Atwell Corpus Linguistics ’03 – 29 th March Computer Vision.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Parsing the NEGRA corpus Greg Donaker June 14, 2006.
Workshop on Treebanks, Rochester NY, April 26, 2007 The Penn Treebank: Lessons Learned and Current Methodology Ann Bies Linguistic Data Consortium, University.
TimeBank Status Status of TimeML annotation for the ULA project James Pustejovsky and Marc Verhagen Brandeis University.
PropBank Martha Palmer University of Colorado. Unified Linguistic Annotation: Merging PropBank, NomBank, TimeBank, Penn Discourse Treebank, Coreference,
Named Entity Recognition and the Stanford NER Software Jenny Rose Finkel Stanford University March 9, 2007.
EMPOWER 2 Empirical Methods for Multilingual Processing, ‘Onoring Words, Enabling Rapid Ramp-up Martha Palmer, Aravind Joshi, Mitch Marcus, Mark Liberman,
1 Statistical NLP: Lecture 6 Corpus-Based Work. 2 4 Text Corpora are usually big. They also need to be representative samples of the population of interest.
ELN – Natural Language Processing Giuseppe Attardi
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
1 Corpora: Annotating and Searching LING 5200 Computational Corpus Linguistics Martha Palmer.
A Web Application for Customized Corpus Delivery Nancy Ide, Keith Suderman, Brian Simms Department of Computer Science Vassar College USA.
Linguistics & AI1 Linguistics and Artificial Intelligence Linguistics and Artificial Intelligence Frank Van Eynde Center for Computational Linguistics.
Tree-based Machine Translation using syntax and semantics
The Prague (Czech-)English Dependency Treebank Jan Hajič Charles University in Prague Computer Science School Institute of Formal and Applied Linguistics.
Language Data Resources About Corpora. J. Sinclair: “Language looks rather different when you look at a lot of it at once.“ P. Eisner: “Znáte jej, ten.
1/(13) Using Corpora and Evaluation Tools Diana Maynard Kalina Bontcheva
ACL Birds of a Feather Corpus Annotation with Interlingual Content Interlingual Annotation of Multilingual Text Corpora Bonnie Dorr, David Farwell, Rebecca.
Tracking Language Development with Learner Corpora Xiaofei Lu CALPER 2010 Summer Workshop July 12, 2010.
MASC The Manually Annotated Sub- Corpus of American English Nancy Ide, Collin Baker, Christiane Fellbaum, Charles Fillmore, Rebecca Passonneau.
Modelling Human Thematic Fit Judgments IGK Colloquium 3/2/2005 Ulrike Padó.
인공지능 연구실 황명진 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand.
CS 6998 NLP for the Web Columbia University 04/22/2010 Analyzing Wikipedia and Gold-Standard Corpora for NER Training William Y. Wang Computer Science.
Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments.
Resemblances between Meaning-Text Theory and Functional Generative Description Zdeněk Žabokrtský Institute of Formal and Applied Linguistics Charles University,
Interlingua Annotation Owen Rambow Advaith Siddharthan Kathleen McKeown
Statistical NLP: Lecture 6 Corpus-Based Work (Ch 4)
nd PIRE project workshop1 Tectogrammatical Representation of English Silvie Cinková Lucie Mladová, Anja Nedoluzhko, Jiří Semecký, Jana Šindlerová,
March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Intro 1 The Prague Dependency Treebank (PDT) Introduction Jan Hajič Institute.
March 2006Introduction to Computational Linguistics 1 CLINT Tokenisation.
Arabic Syntactic Trees Zdeněk Žabokrtský Otakar Smrž Center for Computational Linguistics Faculty of Mathematics and Physics Charles University in Prague.
Multilinugual PennTools that capture parses and predicate-argument structures, for use in Applications Martha Palmer, Aravind Joshi, Mitch Marcus, Mark.
Automatic annotation of context and speech acts for dialogue corpora. K.Georgila, O.Lemon, Henderson, J., and J.D.Moore Basic idea of paper –Denser idea.
Open Health Natural Language Processing Consortium
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Named Entities in Czech Texts and Their Processing Magda Ševčíková Zdeněk Žabokrtský ÚFAL MFF UK.
Corpora and databases Introduction to Computational Linguistics 17 February 2016.
Find International Driving Document Translator Online
COSC 6336: Natural Language Processing
contrastive linguistics
Parsing in Multiple Languages
[A Contrastive Study of Syntacto-Semantic Dependencies]
contrastive linguistics
Text Analytics Giuseppe Attardi Università di Pisa
Stanford CoreNLP
Prof. Adam Meyers: Proteus Project
Computational Linguistics: New Vistas
CS224N Section 3: Corpora, etc.
contrastive linguistics
contrastive linguistics
CS224N Section 3: Project,Corpora
Owen Rambow 6 Minutes.
Presentation transcript:

Tasks Talk: ULA08 Workshop March 18, 2007 A Talk about Tasks Unified Linguistic Annotation Workshop Adam Meyers New York University March 18, 2008

Tasks Talk: ULA08 Workshop March 18, 2007 Outline The Annotation Task and the LAW II Task –Our 120K ULA Corpus 40K of OANC 40K of Brown 7K of LU Corpus 33K of Parallel English –Selecting & Annotating Corpora –Sharing Annotated Corpora The CONLL 2008 Task –A Step toward a Standardized ULA

Tasks Talk: ULA08 Workshop March 18, 2007 ULA-OANC-1: 40K words Part of Open American National Corpus (OANC) Breakdown –Spoken 10K –Letters 10K –Slate 5K –Travel Guides 5K –911 Report 5K –Textbook 5K A Blueprint for an open “balanced” corpus

Tasks Talk: ULA08 Workshop March 18, 2007 Status of ULA-OANC-1 Available for Download to anyone Annotated for the Penn Treebank About 20% has FrameNet Annotation Part of the LAW II Working Group Task –Hand and Automatic Annotation Automatic Charniak and GLARF annotation (including NomBank, PropBank and sort-of PDTB) –Shared by the Community –Some Interest in Translating the Corpus

Tasks Talk: ULA08 Workshop March 18, K of the Brown Corpus Not Selected Yet All Treebank’d We need to choose this ASAP –Includes CONLL test data

Tasks Talk: ULA08 Workshop March 18, 2007 Other Corpora Language Understanding Corpus –About 7K of English –Includes some Arabic –Will be distributed by the LDC Includes some Public Domain Data Includes some licensed data 33K of Parallel English –Mitch, what is the status? –Should we choose something else?

Tasks Talk: ULA08 Workshop March 18, 2007 The Bottom Line Annotating 120K –Easier than annotating 3 subcorpora Corpus Selection has Stalled Corpus Annotation We have 1 more year to get this right We have the opportunity to get other annotators to annotate our corpora.

Tasks Talk: ULA08 Workshop March 18, 2007 The CONLL Task 2 Levels/Tiers –Syntactic Dependencies based on the Penn Treebank –Semantic Dependencies based on NomBank/PropBank Similar to –Chomsky-style Linguistics: D-structure/S-structure –LFG: C-structure/F-structure –Prague Dependency Framework

Tasks Talk: ULA08 Workshop March 18, 2007 CONLL and the ULA CONLL uses GLARF-ULA –BBN named entities –SPLITTING tokens at hyphens and slashes –GLARF NP-internal relations: POST-HON, TITLE, APPOSITION, SUFFIX What about next year and future years? –PDTB exists for the main CONLL corpus (WSJ) –What about the other ULA corpora and annotation? –Chinese GLARF? Suppose we use the Chinese Treebank for the Parallel Data

Tasks Talk: ULA08 Workshop March 18, 2007 A Progression of CONLL : PropBank : Dependencies in Multiple Languages 2008: Syntactic & Semantic Dependencies for English 2009: Slight elaboration of the 2008 task? –More semantic roles? E.g., PDTB? –More languages? 2010: What’s the next step?

Tasks Talk: ULA08 Workshop March 18, 2007 Could a ULA be a CONLL Task? Unified Detailed Linguistic Analyses –German, Czech, Japanese, but not English English Annotation –Possibly more detailed, but a la carte –Everyone has their own framework Penn Treebank, PropBank, NomBank, TimeML, PDTB, TimeML, Opinion Annotation, etc. CONLL 2010 or 2012: –A ULA? Single-Theory (aggressively merged?) A la Carte, but compatible formats?