A centralized approach to language resources Piek Vossen S&T Forum on Multilingualism, Luxembourg, June 6th 2005.

Slides:



Advertisements
Similar presentations
C OMMON C ORE S TATE S TANDARDS I NITIATIVE March 2010.
Advertisements

Building Wordnets Piek Vossen, Irion Technologies.
Expert knowledge in public Revision of the Norwegian national Bachelor in Nursing Ingrid Torsteinson Bergen Deaconess University College, Haraldsplass.
Multilinguality & Semantic Search Eelco Mossel (University of Hamburg) Review Meeting, January 2008, Zürich.
DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
SDMX in the Vietnam Ministry of Planning and Investment - A Data Model to Manage Metadata and Data ETV2 Component 5 – Facilitating better decision-making.
Information Society Technologies Third Call for Proposals Norbert Brinkhoff-Button DG Information Society European Commission Key action III: Multmedia.
COMMON CORE STATE STANDARDS INITIATIVE December
CILC2011 A framework for structured knowledge extraction and representation from natural language via deep sentence analysis Stefania Costantini Niva Florio.
Computational Paradigms in the Humanities – eHumanities and their role and impact in transdisciplinary research Gerhard Budin University of Vienna.
BalkaNet project overview Dan Tufiş Dan Cristea Sofia Stamou RACAI UAIC DBLAB.
IPY and Semantics Siri Jodha S. Khalsa Paul Cooper Peter Pulsifer Paul Overduin Eugeny Vyazilov Heather lane.
WG3: Innovative e-dictionaries Simon Krek „Jožef Stefan“ Institute, Ljubljana, Slovenia Carole Tiberius Institute of Dutch Lexicology, Leiden, the Netherlands.
Improving Machine Translation Quality via Hybrid Systems and Refined Evaluation Methods Andreas Eisele DFKI GmbH and Saarland University Helsinki, November.
ENeL: European Network of e-Lexicography COST Action IS1305.
Using the Semantic Web to Construct an Ontology- Based Repository for Software Patterns Scott Henninger Computer Science and Engineering University of.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Ontologies and the Semantic Web by Ian Horrocks presented by Thomas Packer 1.
The Semantic Web Week 13 Module Website: Lecture: Knowledge Acquisition / Engineering Practical: Getting to know.
Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu.
Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Evaluating the Impact of Educational Technology Erno Lehtinen University of Turku Finland European Association for Research on Learning and Instruction.
WG3: Innovative e-dictionaries Simon Krek „Jožef Stefan“ Institute, Ljubljana, Slovenia Carole Tiberius Institute of Dutch Lexicology, Leiden, the Netherlands.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
ICT work programme ICT 17 Cracking the language barrier Aleksandra Wesolowska Unit G.3 - Data Value Chain.
Ontology-Driven Information Retrieval Nicola Guarino Laboratory for Applied Ontology Institute for Cognitive Sciences and Technology (ISTC-CNR) Trento-Roma,
The Semantic Web William M Baker
Experiments on Building Language Resources for Multi-Modal Dialogue Systems Goals identification of a methodology for adapting linguistic resources for.
How to write a successful EU funded project proposal? Fred de Vries Brussels 21 April 2004 Seminar Networking eLearning Practitioners.
© Copyright 2013 ABBYY NLP PLATFORM FOR EU-LINGUAL DIGITAL SINGLE MARKET Alexander Rylov LTi Summit 2013 Confidential.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
PREVIOUS EVENTS Panel on International Co-operation (LREC - Granada) Panel of the Funding Agencies (LREC - Granada) Post-LREC Workshop on “Multilingual.
ISLE: International Standards for Language Engineering A European/US joint project Martha Palmer University of Pennsylvania Tides Kickoff March 22, 2000.
Dutch HLT Resources: from BLARK to Priority Lists Helmer Strik, Diana Binnenpoorte, Janienke Sturm, Folkert de Vriend, and Catia Cucchiarini* A 2 RT, Dept.
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
The Agricultural Ontology Service (AOS) A Tool for Facilitating Access to Knowledge AGRIS/CARIS and Documentation Group Library and Documentation Systems.
Using a Lemmatizer to Support the Development and Validation of the Greek WordNet Harry Kornilakis 1, Maria Grigoriadou 1, Eleni Galiotou 1,2, Evangelos.
Application of INTEX in refinement and validation of Serbian WordNet Ivan Obradović, Ranka Stanković Cvetana Krstev, Gordana Pavlović-Lažetić University.
LOGIC AND ONTOLOGY Both logic and ontology are important areas of philosophy covering large, diverse, and active research projects. These two areas overlap.
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
From Allesandro Lenci. Linguistic Ontologies Mikrokosmos (Nirenburg, Mahesh et al.) Generalized Upper Model (Bateman et al.)Generalized Upper Model WordNet.
Creating a European entity Management Architecture for eGovernment CUB - corvinus.hu Id Réka Vas
SVETLA KOEVA SVETLOZARA LESEVA BORISLAV RIZOV. The project Automatic information extraction based on semantic relations (RILA – a bilateral co-operation.
ISO-PWI Lexical ontology some loose remarks Thierry Declerck, DFKI GmbH.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Catia Cucchiarini, Walter Daelemans and Helmer Strik Strengthening the Dutch Language and Speech Technology Infrastructure Catia Cucchiarini, Walter Daelemans.
1 STO A Lexical Database of Danish for Language Technology Applications Anna Braasch Center for Sprogteknologi Copenhagen SPINN Seminar, October 27, 2001.
31 March Learning design: models for computers, for engineers or for teachers? Jean-Philippe PERNIN (*,**) Anne LEJEUNE (**) (*) Institut national.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Semantic Data Extraction for B2B Integration Syntactic-to-Semantic Middleware Bruno Silva 1, Jorge Cardoso 2 1 2
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
Chapter 6 Guidelines for Modelling. 1. The Modelling Process 1. Modelling as a Transformation Process 2. Basic Modelling Activities 3. Types of Modelling.
Ioana Barbantan and Rodica Potolea. Lots of technology to capture health information.
H2020 ICT WP ICT 17 Cracking the language barrier Aleksandra Wesolowska Unit G.3 - Data Value Chain H2020-LEIT-ICT WP pending Commission.
Mapping the NCI Thesaurus and the Collaborative Inter-Lingual Index Amanda Hicks University of Florida HealthInsight Workshop, Oslo, Norway.
Semantic Roles and Ontologies Ontologies Growing interest in the data structures known as ontologies Language expressions covering the.
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
Lexicons, Concept Networks, and Ontologies
Approaches to Machine Translation
Integrating SysML with OWL (or other logic based formalisms)
Digital NOTAM Towards a concept of operations
YourDataStories: Transparency and Corruption Fighting through Data Interlinking and Visual Exploration Georgios Petasis1, Anna Triantafillou2, Eric Karstens3.

Cross-language Information Retrieval
Illustrations of different approaches Peter Clark and John Thompson
European Network of e-Lexicography
Approaches to Machine Translation
CS246: Information Retrieval
Infrastructrural Language Resources and International Cooperation
Presentation transcript:

A centralized approach to language resources Piek Vossen S&T Forum on Multilingualism, Luxembourg, June 6th 2005

Overview What has been achieved? What has not been achieved? What are the major challenges?

What has been achieved? Research and technology development : Lexical representations Large-scale and medium-scale lexical acquisition: Machine Readable Dictionaries Corpora Acquilex, Multilex, Parole, Simple, EuroWordNet, BalkaNet, MEANING, etc.. Standardization : early initiatives EAGLES, ISLE best practices and descriptions Medium-scale shallow resources for a number of languages, e.g. Parole lexicons and wordnets for about 15 languages. Small-scale deep resources for a few languages, i.e. Acquilex, Simple

What has not been achieved (1)? Evaluation and benchmarking: No well-defined and commonly accepted criteria No benchmark data to validate language resources Insufficient concerage: 100K entries and 200K concepts per languages is needed for realistic applications, only half is achieved Many European languages still do not have the basic resources Insufficiently rich in data coverage: Language coverage: mainly English Size: e.g. Simple, FrameNet 10,000 concepts

What has not been achieved (2)? Most resources are developed in a distributive way, i.e. common project but national groups with different approaches: Insufficient conceptual overlap and matching across languages: very low intersection of concepts (all Wordnets about 10,000 concepts) diversing interpretations and definitions of relations and concepts Insufficient overlap and consensus in the representation of lexical knowledge Not enough progress to integrate and merge different types of resources: Ontological resources (Semantic Web) Lexical semantic resources (Wordnets) Morpho-syntactic & semantic (Simple, Acquilex) Morpho-syntactic (Parole)

What has not been achieved (3)? Integration in real applications: Evidence of added value, i.e. scientific proof that language technology and resources help -> more deep-thought applications More acceptance by the general public (show cases): The positive effects of language technology should be visible to the general public Be aware of the language myth! The negative effects and limitations should be clear too... More awareness by the general public on limitations: create realization how bad the current systems are (precision and recall) explain the undemocratic limitations of the current Internet

What is the major challenge (1)? Critical issues: Languages that are not well-supported: lower economic value less speakers Divergence of resources and lack of semantic and conceptual intersection Integration of semantic-conceptual knowledge (more language neutral and sharable) with morpho-syntactic knowledge (language-specific)

What is the major challenge (2)? Centralized development of a semantic conceptual backbone: Maximizes sharing and re-use of lexical knowledge and tools across languages; Maximizes intersection of concepts and this interlinking of languages; Stimulates the standardization of lexical knowledge representation; Enables the early development of impressive Europe-wide applications on a short term: Good show cases (Information retrieval or dialogues in all European languages) Application-based evaluation and benchmarking

What is the major challenge (3)? Interlinking and developing morpho-syntactic lexicons on top of the semantic backbone: Captures the valuable non-sharable, idiosyncratic properties of languages (also has cultural value) Enables long-term high-quality applications such as Machine Translations Should be corpus-based but is also necessary to develop large-scale comparable corpora Can be achieved gradually (phase-by-phase) with intermediate results

T M D D D D D D Semantic Backbone Wordnets Corpora Morpho-syntactic Lexicons bank violin violist play Sharable Language neutralLanguage specific Non-Sharable Semantic Web