Olga Pustylnikov, Alexander Mehler Bielefeld University A Unified Database of Dependency Treebanks Integrating, Quantifying & Evaluating Dependency Data.

Slides:



Advertisements
Similar presentations
The Grid Job Monitoring Service Luděk Matyska et al. CESNET, z.s.p.o. Prague Czech Republic.
Advertisements

TU/e technische universiteit eindhoven WebNet 2001October 26, XML to XML through XML Pim Lemmens Geert-Jan Houben Eindhoven University of Technology.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li *, Junichi Tatemura Wang-Pin Hsiung,
Large-Scale Entity-Based Online Social Network Profile Linkage.
Lab 1 Part 2: Concept Maps IAT 106 Spatial Thinking and Communicating Fall 2013.
Overview of the Hindi-Urdu Treebank Fei Xia University of Washington 7/23/2011.
Interactive Generation of Integrated Schemas Laura Chiticariu et al. Presented by: Meher Talat Shaikh.
Web-site Management System Strudel Presented by: LAKHLIFI Houda Instructor: Dr. Haddouti.
From Semistructured Data to XML: Migrating The Lore Data Model and Query Language Roy Goldman, Jason McHugh, Jennifer Widom Stanford University
COMP 6703 eScience Project Semantic Web for Museums Student : Lei Junran Client/Technical Supervisor : Tom Worthington Academic Supervisor : Peter Strazdins.
PROMPT: Algorithm and Tool for Automated Ontology Merging and Alignment Natalya Fridman Noy and Mark A. Musen.
Keys For XML Peter Buneman Susan Davidson Wenfei Fan Carmem Hara Wang Chiew Tan.
1 I256: Applied Natural Language Processing Marti Hearst Sept 25, 2006.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
Sangam: A Transformation Modeling Framework Kajal T. Claypool (U Mass Lowell) and Elke A. Rundensteiner (WPI)
CIS607, Fall 2005 Semantic Information Integration Article Name: Clio Grows Up: From Research Prototype to Industrial Tool Name: DH(Dong Hwi) kwak Date:
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
The LC-STAR project (IST ) Objectives: Track I (duration 2 years) Specification and creation of large word lists and lexica suited for flexible.
Webpage Understanding: an Integrated Approach
UAM CorpusTool: An Overview Debopam Das Discourse Research Group Department of Linguistics Simon Fraser University Feb 5, 2014.
A Unified Framework for the Semantic Integration of XML Databases
Ontology Alignment/Matching Prafulla Palwe. Agenda ► Introduction  Being serious about the semantic web  Living with heterogeneity  Heterogeneity problem.
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
Lecture 6 of Advanced Databases XML Schema, Querying & Transformation Instructor: Mr.Ahmed Al Astal.
1 XML at a neighborhood university near you Innovation 2005 September 16, 2005 Kwok-Bun Yue University of Houston-Clear Lake.
Logics for Data and Knowledge Representation Semantic Matching.
Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.
NLP superficial and lexic level1 Superficial & Lexical level 1 Superficial level What is a word Lexical level Lexicons How to acquire lexical information.
MusicXML David Sears MUMT September, 2009.
A Survey of NLP Toolkits Jing Jiang Mar 8, /08/20072 Outline WordNet Statistics-based phrases POS taggers Parsers Chunkers (syntax-based phrases)
GLOSSARY COMPILATION Alex Kotov (akotov2) Hanna Zhong (hzhong) Hoa Nguyen (hnguyen4) Zhenyu Yang (zyang2)
WS-Security: SOAP Message Security Web-enhanced Information Management (WHIM) Justin R. Wang Professor Kaiser.
A Flexible and Extensible Architecture for Linguistic Annotation Steven Bird *, David Day †, John Garofolo ‡, John Henderson †, Christophe Laprun ‡ and.
A Web Application for Customized Corpus Delivery Nancy Ide, Keith Suderman, Brian Simms Department of Computer Science Vassar College USA.
Language Resources College 11 th ECESS meeting 11th ECESS Meeting College Language Resources 0. Minutes making for College ‘Language Resources’ 1. Goal.
© Copyright 2008 STI INNSBRUCK NLP Interchange Format José M. García.
Mining and Analysis of Control Structure Variant Clones Guo Qiao.
Querying Structured Text in an XML Database By Xuemei Luo.
Winter 2006Keller, Ullman, Cushing18–1 Plan 1.Information integration: important new application that motivates what follows. 2.Semistructured data: a.
1 XSLT An Introduction. 2 XSLT XSLT (extensible Stylesheet Language:Transformations) is a language primarily designed for transforming the structure of.
ACBiMA: Advanced Chinese Bi-Character Word Morphological Analyzer 1 Ting-Hao (Kenneth) Huang Yun-Nung (Vivian) Chen Lingpeng Kong
A Language Independent Method for Question Classification COLING 2004.
2XML Marko Tadić Department of linguistics, Faculty of philosophy, University of Zagreb ( Tübingen,
Quality Control for Wordnet Development in BalkaNet Pavel Smrž Faculty of Informatics, Masaryk University in Brno, Czech.
Verification and Validation in the Context of Domain-Specific Modelling Janne Merilinna.
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
Transcripts are stored in a relational database Transcripts are divided up to their smallest constituent (words), while the context is preserved, in a.
XML – A Quick Introduction Kerry Raymond (stolen from others)
The Semistructured-Data Model Programming Languages for XML Spring 2011 Instructor: Hassan Khosravi.
1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )
MedKAT Medical Knowledge Analysis Tool December 2009.
Supertagging CMSC Natural Language Processing January 31, 2006.
Computing & Information Sciences Kansas State University Friday, 20 Oct 2006CIS 560: Database System Concepts Lecture 24 of 42 Friday, 20 October 2006.
1 Multi-level Configuration Management with Fine-grained Logical Units Tien N. Nguyen Electrical and Computer Engineering Department Iowa State University.
Generality and Openness in Enabling Methodologies for Morphology and Text Processing Anssi Yli-Jyrä Department of General Linguistics, University of Helsinki.
Building Sub-Corpora Suitable for Extraction of Lexico-Syntactic Information Ondřej Bojar, Institute of Formal and Applied Linguistics, ÚFAL.
Introduction A field survey of Dutch language resources has been carried out within the framework of a project launched by the Dutch Language Union (Nederlandse.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Standards for representing meeting metadata and annotations in meeting databases Standards for representing meeting metadata and annotations in meeting.
The Dagstuhl Middle Model: An Overview Timothy C. Lethbridge SITE, University. of Ottawa
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Representing Web Data:
Of 24 lecture 11: ontology – mediation, merging & aligning.
XML Databases Presented By: Pardeep MT15042 Anurag Goel MT15006.
Unit 4 Representing Web Data: XML
[A Contrastive Study of Syntacto-Semantic Dependencies]
Chapter 7 Representing Web Data: XML
XML Data Introduction, Well-formed XML.
بسمه تعالی کارگاه ارزشیابی پیشرفت تحصیلی
Presentation transcript:

Olga Pustylnikov, Alexander Mehler Bielefeld University A Unified Database of Dependency Treebanks Integrating, Quantifying & Evaluating Dependency Data

SFB 673 Motivation  Exploring similarities among languages by means of syntactic treebanks  We collected a database covering 11 languages  Treebanks have been developed separately by different research projects  quantitative investigations on these treebanks -> the need for unification

SFB 673 Motivation John loves Mary Mary John loves 1 John n 2 2 loves v 0 3 Mary n 2 John loves <W DOM="2" ID="3“ Mary (loves v ( (John n) (Mary n) ) corpusstructureannotation

SFB 673 Motivation (+) generic: allowing to represent as many treebanks as possible (+) extensible to new treebanks (+) complete: preserving all corpus specific information (+) transferable to other kinds of corpora (–) complex: exhibiting the minimal complexity -> graph representations Demands on the unified format of treebanks

SFB 673 Motivation  Graph eXtensible Language is a graph model representig corpora in terms of graphs XML GXL WIKI Multimodal Data Treebanks TOOLS GXL (Holt et al., 2006)  GXL can be applied to any kinds of corpora. (See e.g. Mehler and Gleim (2005), Ferrer i Cancho et al. (2007), Pustylnikov and Mehler (2008)) Treebanks eGXL

1. eGXL 2. Data 3. Complexity Evaluation 4. Application 5. Conclusion SFB 673 Agenda

SFB 673 eGXL Sentences Types IDREF … level data model

SFB 673 eGXL Sentences Types IDREF … level data model

SFB 673 The eGXL Types-graph  The Types-graph contains treebank specific attributes (e.g.POS, morphological attribute etc.) -> nodes  Each instance of an attribute is given a unique identifier … a unique identifier the value of the attribute a unique identifier the value of the attribute

SFB 673 The eGXL Sentences-graph vill Dettabestämtjagbemöta each token of a treebank word form an IDREF to the POS-node of the Types-graph a (syntactic) relation from (e.g. a head verb) to (e.g. a dependent argument) from (e.g. a head verb) to (e.g. a dependent argument)

SFB 673 The eGXL Sentences-graph nodeeach token of a treebank ida unique identifier formword form posan IDREF to the POS-node of the Types-graph rela (syntactic) relation relenda relation anchor infrom (e.g. a head verb) outto (e.g. a dependent argument) vill Dettabestämtjagbemöta

SFB 673 eGXL

1. eGXL 2. Data 3. Complexity Evaluation 4. Application 5. Conclusion SFB 673 Agenda

SFB Dependency Treebanks 7 different formats

SFB 673 Input vs. Output Formats Examples from Dutch, Swedish, Italian treebanks

SFB 673 Unification is possible… … due to the separation of the core from the secondary parts … diversity commonality

SFB 673 The TreebankWiki

1. eGXL 2. Data 3. Complexity Evaluation 4. Application 5. Conclusion SFB 673 Agenda

SFB 673 Complexity of eGXL Logical Scalling Factor (LSF): number of logical elements (e.g. XML-element) required to represent a treebank unit (e.g. a word form, POS etc.) noderel eGXLothereGXL other

1. eGXL 2. Data 3. Complexity Evaluation 4. Application 5. Conclusion SFB 673 Agenda

SFB 673 DTDB

1. eGXL 2. Data 3. Complexity Evaluation 4. Application 5. Conclusion SFB 673 Agenda

SFB 673 Conclusions  a database covering 11 languages  eGXL – a generic XML graph model adopted to syntactic treebanks  use of treebanks within a single application (Ariadne) SFB 673 Thank you for your attention!