Example-Based Treebank Querying Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde CLARIN Sofia, 2012-10-28.

Slides:



Advertisements
Similar presentations
Prolog programming....Dr.Yasser Nada. Chapter 8 Parsing in Prolog Taif University Fall 2010 Dr. Yasser Ahmed nada prolog programming....Dr.Yasser Nada.
Advertisements

The CLARIN INFRASTRUCTURE Jan Odijk MA Rotation Utrecht,
NederBooms Hands on session GrETEL - Greedy Extraction of Trees for Empirical Linguistics Vincent Vandeghinste.
SEEING THE WOOD FOR THE TREES Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde.
Finding your way through the woods with GrETEL Liesbeth Augustinus Vincent Vandeghinste Ineke Schuurman Frank Van Eynde TABU-dag - June 14, 2013.
Example queries for Federated search Jan Odijk CLARIN Federated Search Workshop Copenhagen, 24 Apr
Linguistic Research with PaQu Jan Odijk, Utrecht University Small Experiment (was intended as a user test) Take all Dutch CHILDES corpora Select all adult.
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Feature Structures and Unification.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Approaches to Parsing.
Layering Semantics (Putting meaning into trees) Treebank Workshop Martha Palmer April 26, 2007.
Linguistics with CLARIN OpenSONAR Jan Odijk LOT Winterschool Amsterdam,
A Syntactic Translation Memory Vincent Vandeghinste Centre for Computational Linguistics K.U.Leuven
Noun. Noun - verb noun Noun - verb article- adj. - adj. - Noun - verb.
Natural Language Processing - Feature Structures - Feature Structures and Unification.
Foundations of Language Science and Technology - Corpus Linguistics - Silvia Hansen-Schirra.
Extracting LTAGs from Treebanks Fei Xia 04/26/07.
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Feature Structures and Unification.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 2.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing INTRODUCTION Muhammed Al-Mulhem March 1, 2009.
ELN – Natural Language Processing Giuseppe Attardi
CLARIN for Linguists Introduction Jan Odijk LOT Summerschool Nijmegen,
Training Course 2 User Module Training Course 3 Data Administration Module Session 1 Orientation Session 2 User Interface Session 3 Database Administration.
XML, CFMX CFML & SQL XML Kevin Penny, MMCP
UAM CorpusTool: An Overview Debopam Das Discourse Research Group Department of Linguistics Simon Fraser University Feb 5, 2014.
BİL711 Natural Language Processing1 Statistical Parse Disambiguation Problem: –How do we disambiguate among a set of parses of a given sentence? –We want.
ColdFusion’s XML Capabilities Maryland CFUG April 12, 2005 Presented by Doug Ward.
Automatic Identification of Concurrency in Handel-C Joseph C Libby, Kenneth B Kent, Farnaz Gharibian Faculty of Computer Science University of New Brunswick.
Knowledge Modeling from Software Documentation By Madhuri Gopal, G.S Mahalakshmi V.Vani Vijayan.
Parsing arithmetic expressions Reading material: These notes and an implementation (see course web page). The best way to prepare [to be a programmer]
Linguistics & AI1 Linguistics and Artificial Intelligence Linguistics and Artificial Intelligence Frank Van Eynde Center for Computational Linguistics.
CS : Language Technology for the Web/Natural Language Processing Pushpak Bhattacharyya CSE Dept., IIT Bombay Constituent Parsing and Algorithms (with.
Understanding Natural Language
Development of a German- English Translator Felix Zhang Period Thomas Jefferson High School for Science and Technology Computer Systems Research.
Olga Pustylnikov, Alexander Mehler Bielefeld University A Unified Database of Dependency Treebanks Integrating, Quantifying & Evaluating Dependency Data.
Copyright © 2004 Pearson Education, Inc.. Chapter 26 XML and Internet Databases.
Data Management Console Synonym Editor
Chapter 4 Data and Databases. Learning Objectives Upon successful completion of this chapter, you will be able to: Describe the differences between data,
Project Overview Vangelis Karkaletsis NCSR “Demokritos” Frascati, July 17, 2002 (IST )
Dependency Parser for Swedish Project for EDA171 by Jonas Pålsson Marcus Stamborg.
CSA2050 Introduction to Computational Linguistics Parsing I.
MedKAT Medical Knowledge Analysis Tool December 2009.
Document Databases for Information Management Gregor Erbach FTW, Wien DFKI, Saarbrucken ETL, Tsukuba
Supertagging CMSC Natural Language Processing January 31, 2006.
Linguistic Research with CLARIN Jan Odijk MA Rotation Utrecht,
Using the WWW to resolve PP attachment ambiguities in Dutch Vincent Vandeghinste Centrum voor Computerlinguïstiek K.U.Leuven, Belgium.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Spring 2006-Lecture 2.
More Trees Discrete Structures (CS 173)
Using PaQu for language acquisition research Jan Odijk CLARIN 2015 Conference Wroclaw,
An Ontology-based Automatic Semantic Annotation Approach for Patent Document Retrieval in Product Innovation Design Feng Wang, Lanfen Lin, Zhou Yang College.
CIS Treebanks, Trees, Querying, QC, etc. Seth Kulick Linguistic Data Consortium University of Pennsylvania
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 17 th.
GPML Plugin for Cytoscape Thomas Kelder Maastricht University
LING 581: Advanced Computational Linguistics Lecture Notes March 2nd.
CLARIN - Flanders Activities and Achievements Frank Van Eynde Center for Computational Linguistics (KU Leuven) Digital Humanities Spring Event, April.
Querying GrAF data in linguistic analysis
Relations between Data Categories
Treebanks, Trees, Querying, QC, etc.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27
CKY Parser 0Book 1 the 2 flight 3 through 4 Houston5 11/16/2018
LING/C SC 581: Advanced Computational Linguistics
Improving an Open Source Question Answering System
RichAnnotator: Annotating rich (XML-like) documents
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27
Language and Speech Technology: Parsing
Jan Odijk LREC Miyazaki
2/18/2019.
Extracting Recipes from Chemical Academic Papers
Search in Token-annotated Corpora Search in Treebanks
Presentation transcript:

Example-Based Treebank Querying Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde CLARIN Sofia,

NEDERBOOMS Exploitation of Dutch treebanks for research in linguistics CLARIN-VL Project Centre for Computational Linguistics (CCL) Dutch Grammar and Language Use (NGTG) Goals: –User-friendly tools –Access to large data files

NEDERBOOMS How can we combine the data-oriented approach of treebank mining with the knowledge-oriented method of theoretical and descriptive linguistics?

QUERYING LASSY Existing search tools: dtsearch (Kloosterman 2007) Dact (de Kok 2010) stand-alone tools query language: XPath = standard query language for xml trees

QUERYING LASSY Some examples: “look for all NP nodes in which the head noun is modified by the adjective ‘politiek’, e.g. politieke discussies” and and

QUERYING LASSY Some examples: “look for all NP nodes in which the head noun is modified by the adjective ‘politiek’, e.g. politieke discussies” and and “look for verb clusters in which a separable verb particle occurs between two verb forms, e.g. “Hij zegt dat ze spoedig zullen kennis maken” and and and and

QUERYING LASSY XPath –Not user-friendly –Knowledge of Alpino grammar necessary = problematic for non-technical linguists Verify theory through data with corpus or treebank examples Time consuming, requires some effort

QUERYING LASSY XPath –Not user-friendly –Knowledge of Alpino grammar necessary = problematic for non-technical linguists Verify theory through data with corpus or treebank examples Time consuming, requires some effort How to make interaction between computational linguistics and theoretical linguistics possible?

GrETEL Greedy Extraction of Trees for Empirical Linguistics Search tool based on example sentences Input = natural language No explicit knowledge of formal query language nor Alpino grammar required Bridge gap between descriptive and computational linguistics Available online (optimised for Mozilla Firefox)

GrETEL - online

GrETEL - input Green versus red word order in Dutch –green: past participle – auxiliary De NAVO stelt dat ze er alles aan gedaan heeft –red: auxiliary – past participle De NAVO stelt dat ze er alles aan heeft gedaan “The NATO claim that they have done everything in their power” (deredactie.be)

GrETEL - input

>> parsed with Alpino

GrETEL - annotation

>> info added to Alpino parse

GrETEL – query info

input example

GrETEL – query info input example Alpino parse

GrETEL – query info input example query treeAlpino parse

GrETEL – query info input example query treeAlpino parse XPath query

GrETEL – query info input example query treeAlpino parse XPath query treebanks

GrETEL – query tree

>> subtree extraction

GrETEL – query tree query tree

GrETEL – query tree query tree>> XPath generator >>

GrETEL – query tree query tree and and and XPath expression>> XPath generator >>

GrETEL – treebanks

LASSY Small in PostgreSQL database >> no local installation required

GrETEL – results

>> (adapted) query

GrETEL – results >> (adapted) query >> quantitative information

GrETEL – results

>> tree viewer

GrETEL – results >> tree viewer >> list of results

Thanks for your attention! Questions?