Presentation is loading. Please wait.

Presentation is loading. Please wait.

Boulder, March 2008 1 Nicoletta Calzolari Istituto di Linguistica Computazionale del CNR, Pisa, Italy CLARIN and FLaReNet: new European.

Similar presentations


Presentation on theme: "Boulder, March 2008 1 Nicoletta Calzolari Istituto di Linguistica Computazionale del CNR, Pisa, Italy CLARIN and FLaReNet: new European."— Presentation transcript:

1 Boulder, March 2008 1 Nicoletta Calzolari Istituto di Linguistica Computazionale del CNR, Pisa, Italy glottolo@ilc.cnr.it CLARIN and FLaReNet: new European Initiatives for Language Resources and Language Technologies

2 N. Calzolari 2Boulder, March 2008 In Spoken, Written, Multimodal areas … … in new emerging areas Statistical approaches… Different dimensions & layers: Content (Ontologies), Emotion, Time, … For Evaluation For Training … LREC (> 900 submissions); many LRs at COLING and even at ACL!! ELRA (self-sustaining) & LDC LRE (new Journal: N. Ide & NC) ISO-TC37-SC4/WG4 (International Standards for LRs) AFNLP… ESFRI - CLARIN (also political & strategic role) New calls or initiatives in EU, US, ASIA, on LRs, interoperability, cooperation, … Today, many vitality & success signs… for LRs

3 N. Calzolari 3Boulder, March 2008 BUT … an important point: In the ’90s There was a global vision of the field & its main components: There was a global vision of the field & its main components: Standards Standards Creation of LRs Creation of LRs Distribution DistributionThen: Automatic acquisition Automatic acquisition … towards the Infrastructure of LRs & LT While today: There is an ever increasing set of initiatives for new LRs, basic robust technologies, models??, algorithms, There is an ever increasing set of initiatives for new LRs, basic robust technologies, models??, algorithms, We have a LR community culture BUT sort of scattered, opportunistic, not much coherence ELRALDC

4 N. Calzolari 4Boulder, March 2008 Today … The wealth of data & of basic technologies is such that: We should reflect again at the field as a whole & ask if Standards Standards Creation of LRs Creation of LRs Automatic acquisition Automatic acquisition Distribution Distribution are still “the” important components, or how they have changed/must change … Which new challenges towards a new & more mature infrastructure of LRs & LTs?? Dynamic LRs  Dynamic LRs Sharing  Sharing Collaborative creation & Manag.  Collaborative creation & Manag.  Content interoperability could be at the basis of a new Paradigm for LRs & LT & of a new Infrastructure ??

5 N. Calzolari 5Boulder, March 2008 ISO LMF – Lexical Markup Framework Structural skeleton, with the basic hierarchy of information in a lexical entry + various extensions; LMF specs comply with modeling UML principles; an XML DTD allows implementation Builds also on EAGLES/ISLE NEDOAsianLang. The field is mature from Monica Monachini NICT Language- Grid Service Ontology

6 N. Calzolari 6Boulder, March 2008 XML based Abstract Lexicon Interchange Format Mapping exercise Major best practices: OLIF PAROLE/SIMPLE LC-Star WordNet - EuroWordNet FrameNet BDef formal database of lexicographic definitions derived from Explanatory Dictionary of Contemporary French … …others on the way… Entries from existing lexicons have been mapped to LMF to prove that the model is able to represent many best practices and achieve unification from Monica Monachini

7 N. Calzolari 7Boulder, March 2008 Lexical WEB & Content Interoperability  ‘Standards’ As a critical step for semantic mark-up in the SemWeb As a critical step for semantic mark-up in the SemWeb ComLex SIMPLE WordNets FrameNet Lex_x Lex_y LMF with intelligent agents NomLex Standards for Interoperability Enough? ?

8 N. Calzolari 8Boulder, March 2008 Need of tools to make this vision operational & concrete New prototype “LeXFlow”: (http://xmlgroup.iit.cnr.it:98/MILE/lexflow/demo.xhtml) http://xmlgroup.iit.cnr.it:98/MILE/lexflow/demo.xhtml web-based collaborative environment for semi-automatic management/integration of lexical resources web-based collaborative environment for semi-automatic management/integration of lexical resources enabling interoperability of distributed lexical resources enabling interoperability of distributed lexical resources accessed by different types of agents accessed by different types of agents From Language Resources To Language Services To Language Services

9 N. Calzolari 9Boulder, March 2008 ILI Mapper Italian Simple Italian Wordnet Chinese Wordnet Relation Mapper Web service Interface MultiWordnet Relation Calculator Web service Interface Simple-Wordnet Relation Calculator Agent Role1Agent Role4 Agent Role2 Agent Role3 Coordination Application Data Architecture for cooperative integration of lexicons

10 N. Calzolari 10Boulder, March 2008 passaggio, strada,via N#1290 iperonimia/HYP parte, tratto N#12348 carreggiata N#21225 iponimia/HPO che_dao ( 車道 ) N#3245327 tong_dao ( 通道 ) N#03092396 dao_lu,dao,lu ( 道路, 道, 路 ) N#03243979 上位(泛稱)詞 _ 為 /HYP meronimy/MPT ILI1.5-3001757-n road,route ILI1.6-3243979-n Synonym ILI1.5-8488101-n bend,crook,turn ILI1.6-9992072-n ILI1.5-2857000-n passage ILI1.6-3092396-n ILI1.5-5691718-n stretch ILI1.6-??? ILI1.5-3002522-n roadway ILI1.6-3245327-n curvatura, svolta,curva N#20944 Synonym 下位(特指)詞 _ 為 /HPO wan ( 彎 ) N#9992072 部件 _ 部份詞 _ 為 /MPT A new proposed mero relation Reinforcement & validity Derived

11 N. Calzolari 11Boulder, March 2008 LexFlow Architecture for making distributed wordnets interoperable Architecture for making distributed wordnets interoperable It lends itself to different applications in LR processing: It lends itself to different applications in LR processing: Enrichment of existing lexical resources Enrichment of existing lexical resources Creation of new resources Creation of new resources Validation of existing resources Validation of existing resources Can provide a platform for cooperative & collective creation & management of LRs, by providing a web-based environment for the collaboration & interaction of distributed agents and resources Can provide a platform for cooperative & collective creation & management of LRs, by providing a web-based environment for the collaboration & interaction of distributed agents and resources Prototype of a web application supporting the GlobalWordNet Grid initiative, i.e. a shared multi-lingual knowledge base for cross-lingual processing based on distributed resources over the Grid Prototype of a web application supporting the GlobalWordNet Grid initiative, i.e. a shared multi-lingual knowledge base for cross-lingual processing based on distributed resources over the Grid New project: KYOTO

12 N. Calzolari 12Boulder, March 2008 Some steps for a “new generation” of LRs From huge efforts in building static, large-scale, general- purpose LRs From huge efforts in building static, large-scale, general- purpose LRs To non-static LRs rapidly built on-demand, tailored to spefic user needs From closed, locally developed and centralized resources From closed, locally developed and centralized resources To LRs residing over distributed places, accessible on the web, choreographed by agents acting over them From Language Resources From Language Resources To Language Services

13 N. Calzolari 13Boulder, March 2008 UIMA at ILC Create an infrastructure to allow: Create an infrastructure to allow: Distributed access to resources Distributed access to resources Creation of shared resources Creation of shared resources Use of methods to access NLP technologies Use of methods to access NLP technologies Integrate available software via Web Services Integrate available software via Web Services Standardise resources to be accessed from other research centers Standardise resources to be accessed from other research centers

14 N. Calzolari 14Boulder, March 2008 Distributed Language Services A long-term scenario implying content interoperability standards, content interoperability standards, supra-national cooperation and supra-national cooperation and development of architectures enabling accessibility development of architectures enabling accessibility Create new resources on the basis of existing Exchange and integrate information across repositories Compose new services on demand Collaborative & collective/social development and validation, cross-resource integration and exchange of information Collaborative & collective/social development and validation, cross-resource integration and exchange of information Language Grid Wik i

15 N. Calzolari 15Boulder, March 2008 Cultural issues cultural identity  Language … and cultural identity the Humanities  Language … and the Humanities Many dimensions around the notion of language Economic, social issues  Applications  Services Technical issues Interdisciplinarity & Multidisciplinarity Political issues e.g. a commonly agreed list of minimal requirements for “national” LRs: BLARK Multilingualism Need of bodies for a broad research agenda & strategic actions for LT&LRs (W/S /MM) based on all the dimensions We need to put together technical, technical, organisational, organisational, strategic, strategic, economic, economic, political issues of LRs political issues of LRs Two new European Infrastructural & Networking Initiatives finally

16 N. Calzolari 16Boulder, March 2008 Which Communities? Language Resources Language Resources Language Technologies Language Technologies Standardisation Standardisation Grid Grid Semantic Web Semantic Web Ontologists Ontologists ICT ICT … Humanities Humanities Social Sciences Social Sciences Digital Libraries Digital Libraries Cultural Heritage Cultural Heritage …  Many application domains ( eculture, egovernment, ehealth, …) ( eculture, egovernment, ehealth, …) core Multilinguality Enablinginfrastr for on Focus on cooperation Technologies exist, but the infrastructure that puts them together and sustains them is still missing for FLaReNetNetworkFLaReNetNetwork CLARINResInfra

17 N. Calzolari 17Boulder, March 2008 CLARIN Large-scale pan-European collaborative effort (31+ countries) Make LRs & LTs available & readily usable to scholars of humanities & social sciences (& all disciplines) Need to overcome the present fragmented situation by harmonising structural and terminological differences Basis is a Grid-type infrastructure and Semantic Web technology The benefits of computer enhanced language processing become available only when a critical mass of coordinated effort is invested in building an enabling infrastructure, which can provide services in the form of provision of tools & resources as well as training & counseling across a wide span of domains The infrastructure will be based on a number of resource, service and expertise centres ESFRI Research Infrastructures Common Language Resources and Technologies Infrastructure for the Humanities & Social Sciences

18 N. Calzolari 18Boulder, March 2008 comprehensive and free to use distributed archive of LRs & LTs Create a comprehensive and free to use distributed archive of LRs & LTs covering not only the languages of all member states, but also other languages studied and used in Europe tools & resources interoperable across languages & domains, supporting multilingual & multicultural European heritage Through the fact that the tools & resources will be interoperable across languages & domains, contribute to preserving and supporting multilingual & multicultural European heritage open infrastructure of web services new paradigm of distributed collaborative development An operational open infrastructure of web services will introduce a new paradigm of distributed collaborative development Allow many contributors to add all kinds of new services based on existing ones, thus ensuring reusability and allowing scaling up to suit individual needs Allow many contributors to add all kinds of new services based on existing ones, thus ensuring reusability and allowing scaling up to suit individual needs CLARIN Mission

19 N. Calzolari 19Boulder, March 2008 How can we tackle these challenges? J. Taylor “eScience is about global collaboration in key areas of science and the next generation of infrastructures that will enable it” Need to build new types of platforms  to allow researchers to combine existing resources easily to new ones to tackle the big challenges  to increase the productivity of all interested researchers, since currently too much time is wasted by preparatory work from P. Wittenburg

20 N. Calzolari 20Boulder, March 2008 eScience Vision new generation CLARIN establishes such a new generation of extended infrastructure Thus CLARIN is not about creating and building new language resources and technology, but  making them available and accessible services  as services  in a stable and persistent infrastructure to allow tackling the great challenges CLARIN:http://www.clarin.euhttp://www.clarin.eu Grid Project:http://www.mpi.nl/dam-lrhttp://www.mpi.nl/dam-lr ISO TC37/SC4:http://www.tc37sc4.orghttp://www.tc37sc4.org Standards Project:http://lirics.loria.fr/http://lirics.loria.fr/ from P. Wittenburg

21 N. Calzolari 21Boulder, March 2008 We have still a long path … in an e-Contentplus Call for a: “Thematic Network on Language Resources”: “Thematic Network on Language Resources”:FLaReNet T o provide common recommendations (to the EC) for future actions To give priorities ‘visions’ Need of ‘visions’ & also a “new project” In a global context, in cooperation with CLARIN & also with non-EU members

22 N. Calzolari 22Boulder, March 2008 CLARINResInf Which Communities? Language Resources Language Resources Language Technologies Language Technologies Standardisation Standardisation Ontologists Ontologists Content Content EC EC Funding agencies Funding agencies … Humanities Humanities Social Sciences Social Sciences Digital Libraries Digital Libraries Cultural Heritage Cultural Heritage …  Many application domains ( eculture, egovernment, ehealth, intelligence, domotics, content industry, …) ( eculture, egovernment, ehealth, intelligence, domotics, content industry, …) core Multilinguality EUForum for for Focus on cooperation LRs & LTs exist, but a global vision, policy and strategy is still missing for FLaReNetNetwork

23 N. Calzolari 23Boulder, March 2008 A European forum to facilitate interaction among LR stakeholders The Network structure considers that LRs present various dimensions and must be approached from many perspectives: technical, but also organisational economic legal political Addresses also multicultural and multilingual aspects, essential when facing access and use of digital content in today’s Europe FLaReNet Fostering Language Resources Network

24 N. Calzolari 24Boulder, March 2008 A layered structure, with leading experts & groups (national and about 40 partners A layered structure, with leading experts & groups (national and European institutions, SMEs, large companies) for all relevant LR areas (about 40 partners) in collaboration with CLARIN ensure coherence of LR-related efforts in Europe to ensure coherence of LR-related efforts in Europe FLaReNet will consolidate consolidate existing knowledge, presenting it analytically and visibly structuring the area of LRs of the futurenew strategies contribute to structuring the area of LRs of the future by discussing new strategies to: convert existing and experimental technologies related to LRs into useful economic and societal benefits integrate so far partial solutions into broader infrastructures consolidate areas mature enough for recommendation of best practices anticipate the needs of new types of LRs Organised in Thematic Working Groups

25 N. Calzolari 25Boulder, March 2008 The Chart for the area of LRs in its different dimensions Methods and models for LR building, reuse, interlinking and maintenance Harmonisation of formats and standards Definition of evaluation protocols and evaluation procedures Methods for the automatic construction and processing of LRs Thematic Areas To build together: Evolving RoadMap Blueprint of actions and infrastructures

26 N. Calzolari 26Boulder, March 2008 largest Network of LR and HLT players The largest Network of LR and HLT players, with diverse approaches, efforts and technologies community consensus Enable progress toward community consensus recast its definition Give an extended picture of LRs & recast its definition in the light of recent scientific, methodological, technological, social developments Consolidate Consolidate methods & approaches, common practices, frameworks and architectures “roadmap” priorities A “roadmap” identifying areas where consensus has been achieved or is emerging vs. areas where additional discussion and testing is required, together with an indication of priorities plan of coherent actions for the EU and national organizations Recommendations in the form of a plan of coherent actions for the EU and national organizations European model for the LRs of the next years A European model for the LRs of the next years Objectives & expected results Ambitious!

27 N. Calzolari 27Boulder, March 2008 of a directive nature The outcomes will be of a directive nature identifying priority areas to help the EC, and national funding agencies, identifying priority areas of LRs of major interest for the public that need public funding to develop or improve blueprint of actionsinput to policy development both at EU and national level A blueprint of actions will constitute input to policy development both at EU and national level for identifying new language policies that support linguistic diversity in Europe strengthening the language product market new products & innovative services in combination with strengthening the language product market, e.g. for new products & innovative services, especially for less technologically advanced languages Outcomes of FLaReNet

28 N. Calzolari 28Boulder, March 2008 international cooperation also outside Europe Call for international cooperation also outside Europe and will be relevant for worldwide Forum of Language Resources and Language Technologies setting up a global worldwide Forum of Language Resources and Language Technologies These Initiatives, … together


Download ppt "Boulder, March 2008 1 Nicoletta Calzolari Istituto di Linguistica Computazionale del CNR, Pisa, Italy CLARIN and FLaReNet: new European."

Similar presentations


Ads by Google