Presentation is loading. Please wait.

Presentation is loading. Please wait.

18-03-2013 Hung LST Day 1 Language Technology for the Humanities: why and how? Steven Krauwer Utrecht University CLARIN ERIC Executive Director.

Similar presentations


Presentation on theme: "18-03-2013 Hung LST Day 1 Language Technology for the Humanities: why and how? Steven Krauwer Utrecht University CLARIN ERIC Executive Director."— Presentation transcript:

1 18-03-2013 Hung LST Day 1 Language Technology for the Humanities: why and how? Steven Krauwer Utrecht University CLARIN ERIC Executive Director

2 18-03-2013 Hung LST Day 2 Overview  Why?  How?  CLARIN in a nutshell  The dream  The vision  Phasing  CLARIN ERIC  The nightmare  The challenge  Why join?  Concluding remarks

3 18-03-2013 Hung LST Day 3 Why (1)  Wealth of digital language data, spread all over Europe in archives, repositories, libraries  Reflects human behaviour, communication, knowledge, culture etc  Rich source of data, information and knowledge for Humanities and Social Sciences (HSS) scholars (historians, philosophers, social scientists, …)  In addition results of 30 years of European HLT efforts  In brief: a great opportunity for HSS to innovate itself and to become world leaders, especially because of our multilinguality BUT…….

4 18-03-2013 Hung LST Day 4 BUT …  How do HSS scholars know what data exists  How can they get access to data from all over Europe  How do they know what tools exist to retrieve, explore and exploit these data  How do they know how to decompose their HSS research questions into sub-questions that can be answered by digital methods OUR ANSWER:  CLARIN: the Common Language Resources and Technology Infrastructure for the Humanities and Social Sciences Why (2)

5 18-03-2013 Hung LST Day 5 How: CLARIN in a nutshell  Common Language Resources and Technology Infrastructure (http://www.clarin.eu)http://www.clarin.eu  Basic idea:  European federation of digital repositories with language data and tools (text, speech, multimodal, gesture …)  with access to language and speech technology tools through web services to retrieve, manipulate, enhance, explore and exploit data  with uniform single sign-on access to archives and tools  target audience humanities and social sciences scholars  to cover all EU and associated countries  and all languages relevant for target audience

6 18-03-2013 Hung LST Day 6 The CLARIN dream  give me digital copies of all contemporary documents in European archives that discuss the Great Plague of England (1348-1350)  give me all negative articles about Islam or about soccer in the Slovenski Narod daily newspaper (1868-1943)  find European TV news interviews that involve speakers with a Hungarian accent  summarize all articles in European newspapers of August 2012 about OCR – in Portuguese  show me the pronoun systems of the languages of Nepal

7 18-03-2013 Hung LST Day 7 The vision: the role of language  Language is at the heart of many disciplines in the Humanities and Social Sciences (HSS), e.g.  as an object of study  as a means of human communication  as a means of human expression  as a record of our history  as part of one’s cultural identity  as carrier of knowledge and information  CLARIN wants to support them all  Language and speech technology are part of this (e.g. in the form of computational linguistics or speech science) – essential, but just a part!

8 18-03-2013 Hung LST Day 8 The vision: what CLARIN wants to offer  CLARIN makes it possible for the researcher to find resources (metadata search), and to refer to them in a persistent way (persistent identifiers)  CLARIN allows for content search in and across collections  CLARIN offers access to web services and workflows to perform complex linguistic & content operations and visualisations  CLARIN covers both historical and contemporary language material in all modalities  CLARIN serves both expert and non-expert users  CLARIN offers access to depositing and long term preservation services  Ultimate goal: advancing HSS in order to get a better understanding of our society at a European scale

9 18-03-2013 Hung LST Day 9 Phasing of CLARIN  Does CLARIN exist? Yes and no.  2008-2011: CLARIN Preparatory Phase Project, 26 countries, EC funded Goal: designing the infrastructure technically and organisationally, and lining up the players  2012-2015 Construction Phase, jointly funded by the participating countries, no EC funding Goal: building the European infrastructure  2015-…: Exploitation Phase, jointly funded by the participating countries, no EC funding Goal: making and keeping it running, populating it, and ensuring that it follows new trends in technology and research – covering all EU and associated countries

10 18-03-2013 Hung LST Day 10 CLARIN ERIC  CLARIN ERIC is the governance and coordination body, but will not run or fund operational data services  An ERIC is new type of intergovernmental legal entity, created by the EC, essentially a consortium of countries, with no end point  CLARIN ERIC member countries pay a modest annual fee  Countries will each set up a national CLARIN consortium, that will provide data and linguistic services and create data and tools  It is up to the countries to decide how to shape and fund their CLARIN consortia and how to relate them to other activities at the national level (e.g. research programmes, digitisation programmes, etc)  CLARIN ERIC established by the EC on Feb 29th 2012, with 9 founding members: AT, BG, CZ, DE, DK, EE, NL, PL, DLU  More in the pipeline, NO joining at this moment – but we need all European countries!

11 18-03-2013 Hung LST Day 11 What is so nice about ERICs?  They are legal entities, not projects, which helps to make them more sustainable  Members are governments, committing themselves for longer periods of time (min. 5 years)  CLARIN ERIC is a sign of recognition by governments and EC of the importance of sharing language resources  Closeness to funding agencies may help to enforce use of standards and sharing of data in projects they fund  Good starting point for international collaboration as third countries can join or make collaboration agreements (e.g. through agencies or data centres)  ERICs may submit proposals for EC funding But: bulk of the funding dependent on funding mechanisms and cycles in participating countries – NOT from EC

12 18-03-2013 Hung LST Day 12 The CLARIN nightmare  give me digital copies of all contemporary documents in European archives that discuss the Great Plague of England (1348-1350)  give me all negative articles about Islam or about soccer in the Slovenski Narod daily newspaper (1868-1943)  find European TV news interviews that involve speakers with a Hungarian accent  summarize all articles in European newspapers of August 2012 about OCR – in Portuguese  show me the pronoun systems of the languages of Nepal

13 18-03-2013 Hung LST Day 13 The CLARIN nightmare, example1  give me digital copies of all contemporary documents in European archives that discuss the Great Plague of England (1348-1350)  “All” means from all countries and all archives, not just some archives in some (now 10) CLARIN ERIC member countries  If contemporary docs exist in digital form at all they are probably pictures – how do we get access to the content? Is OCR doable?  Can we rely on standardized metadata to find them?  Are our topic detection technologies good enough?  Many of the docs may be in Latin, can we handle that, and what about other languages, e.g. Hungarian?  How would a non-technical scholar know how to formulate this query?

14 18-03-2013 Hung LST Day 14 The CLARIN Challenge  Do HSS scholars realize at all that they should be interested in these things?  Some do, most don’t; we should make an effort to show them the potential benefits of adopting these new methods  Showcases and visualisation tools are indispensable  Distinguish between lost and future generation  Are the tools offered by language and speech technology the direct answers to the problems of HSS scholars as they see them?  Major technological efforts are needed, but technologists have a strong tendency to offer more and better gearboxes to people who are just waiting for a bus with comfortable seats (and a gearbox)  Technologies that work for modern versions of big languages may not work for older versions or not even exist for digitally less favoured languages  Use and adaptation of existing tools to specific HSS questions may always require intervention by technologically skilled people

15 18-03-2013 Hung LST Day 15 What would it take to join Only countries can be ERIC members, not individual research institutions; countries that join CLARIN ERIC would have to  recognize the ERIC as a legal entity (done for EU countries)  commit themselves for at least 5 years  pay an annual membership fee (ranging from 12.000 to 200.000 euro, depending on GDP, for HU ca 12.000 euro)  set up and fund a national CLARIN consortium (universities, data archives, etc) to provide access to their data, and to create new data and tools according to their national research priorities  identify (and fund) at least one existing data centre as the national hub that is linked to the rest of CLARIN  commit themselves to sharing resources and adoption of CLARIN standards in nationally funded projects

16 18-03-2013 Hung LST Day 16 The benefits from joining  Access to the CLARIN Infrastructure, i.e. to all CLARIN language resources and technology services for scholars in the humanities and social sciences (HSS)  Access to expertise from all over Europe via the CLARIN knowledge sharing infrastructure  Embedding in mainstream European HSS research community, with access to the same data  Better visibility of their research results, their resources, their language and their cultural heritage in the European research community  Open doors for cross-lingual and cross-cultural research  Embedding in the European Research Area  Opportunities to participate in EU projects initiated by CLARIN ERIC

17 18-03-2013 Hung LST Day 17 What if Hungary does not join? The bright side:  No need to pay an annual 12000 euro membership fee  No need to agree on and comply with standards intended to facilitate exchange of data  No obligation to share and preserve digital results from projects with public funding after their completion  No need to set up a national consortium to coordinate infrastructure building and creation of data and tools at the national level  No need to collaborate with European partners to make tools and resources interoperable at the European level  Researchers whose horizon lies within Hungary wouldn’t even notice!

18 18-03-2013 Hung LST Day 18 What if Hungary does not join? The less bright side for Hungarian researchers:  They would have to make their own individual arrangements to get access to data and services outside Hungary  Not having access to the same data and tools might create obstacles for cross-national collaboration  Their data and tools might be less visible in the European research community, and results not reproducible and therefore not recognized  Hungary was one of the leading players in the CLARIN project and risks to gradually lag behind The less bright side for CLARIN:  We would have to do without the excellent human and linguistic resources we know the Hungarian research community has to offer  We would have no alternative way to cover the Hungarian language and to provide access to its data collections to the HSS research community

19 18-03-2013 Hung LST Day 19 What makes CLARIN interesting in comparison with other RIs?  No cash contribution other than the annual fee to pay for governance and coordination; other than that no cross- border funding  Fee fixed for 5 years with 2% annual increase, no surprises  Commitment to investing at the national level, but no major capital investment required, no fixed prescribed amounts  Selection of data and tools to be created follows from own research priorities and economic situation – not centrally decided  HSS scholars have no digital tradition: unique opportunity to innovate research  HSS scholars tend to work in isolation: unique opportunity to become part of the mainstream European research community

20 18-03-2013 Hung LST Day 20 Concluding remarks  CLARIN has a lot to offer to the Hungarian research community in terms of access to data, tools and expertise, and participation in CLARIN will move Hungarian forward towards full participation in the Digital Age  Hungary has a lot to offer to CLARIN, as is demonstrated by its successful participation in the CLARIN Preparatory Phase and in sister initiatives such as META / CESAR  In times of crisis it is hard for the funding bodies to assign priorities to competing research infrastructure initiatives, but it should be kept in mind that  in financial terms CLARIN is a low cost entry model research infrastructure with no financial risks  with its language Hungary has a unique selling point in Europe!


Download ppt "18-03-2013 Hung LST Day 1 Language Technology for the Humanities: why and how? Steven Krauwer Utrecht University CLARIN ERIC Executive Director."

Similar presentations


Ads by Google