Darja Fišer DARIAH UZH Zurich, 18 December 2017


Similar presentations
DRIVER Step One towards a Pan-European Digital Repository Infrastructure Norbert Lossau Bielefeld University, Germany Scientific coordinator of the Project.

OpenAccess.se First DRIVER Summit, January 2008 Göttingen Jan Hagerlid, National Library of Sweden, co-ordinator of.
Depositing Data for Archiving Libby Bishop ESDS Qualidata, University of Essex Changing Families, Changing Food Meeting University of Sheffield 15 March.
ESDS Qualidata Libby Bishop, ESDS Qualidata Economic and Social Data Service UK Data Archive ESDS Awareness Day Friday 5 December 2003Royal Statistical.
1 Working together to strengthen research in Europe Open access and preservation: how can knowledge sharing be improved in ERA? (session 1.5) Alma Swan.
Computational Paradigms in the Humanities – eHumanities and their role and impact in transdisciplinary research Gerhard Budin University of Vienna.
Digital Collections: Use, Value and Impact Lorna Hughes University of Wales Chair in Digital Collections, National Library of Wales Aberystwth University.
Steven KrauwerCLARIN-NL Launch CLARIN-EU: Where do we stand? Steven Krauwer Utrecht institute of Linguistics UiL OTS CLARIN-EU Coordinator.
DRIVER Summit, January 2008 NEREUS A network of leading libraries collaborate on NEEO Network of European Economists Online.
ACCESS TO QUALITY RESOURCES ON RUSSIA Tanja Pursiainen, University of Helsinki, Aleksanteri institute. EVA 2004 Moscow, 29 November 2004.
CLARIN for Linguists Introduction Jan Odijk LOT Summerschool Nijmegen,
CLARIN-NL Second Open Call Jan Odijk CLARIN-NL Call 2 Info-session Amsterdam, 26 Aug 2010.
Exploring Europe's Television Heritage in Changing Contexts Connected to: Funded by the European Commission within the eContentplus programme
Social Science Data and ETDs: Issues and Challenges Joan Cheverie Georgetown University Myron Gutmann ICPSR – University of Michigan Austin McLean ProQuest.
DATAVERSE FOR JOURNALS Mercè Crosas, Ph.D. Director of Data Science IQSS, Harvard Society for Scholarly Publishing 37 th Meeting,
Interoperable Digitised Content “Discover, search, extract, link, associate, and view digitised content” Les Carr.
1 Common Challenges Across Scientific Disciplines Laurence Field CERN 18 th November 2013.
IATE EU tool for translation-oriented terminology work
CLARIN ERIC Progress according to the Strategy Plan Steven Krauwer, Bente Maegaard 1.
The role of Parthenos for CLARIN ERIC Steven Krauwer CLARIN ERIC Executive Director 1.
C ross-European data sharing made easy EDAF Luxembourg.
EUscreen: Examining An Aggregator ’ s Role in Digital Preservation Samantha Losben Digital Preservation - Final Project December 15, 2010.
Linguistics with CLARIN Introduction Jan Odijk LOT Winterschool Amsterdam,
1 CLARIN - NL Language Resources and Technology Infrastructure for the Humanities and the Social Sciences in the Netherlands.
DASISH Final Conference Common Solutions to Common Problems.
2014 CLARIN Annual Conference Jan Odijk, Chair 2014 CAC Program Committee.
Recent Developments in CLARIN-NL Jan Odijk P11 LREC, Istanbul, May 23,
A National Library for Australian Educational Research Sue Clarke Manager, Cunningham Library Australian Council for Educational Research 27 th IATUL Annual.
1 CLARIN - NL What is going on? Jan Odijk Amsterdam 26 Aug 2010.
China July 2004 The European Union Programmes for EU-China Cooperation in ICT.
Creating & Testing CLARIN Metadata Components A CLARIN-NL project Folkert de Vriend Meertens Institute, Amsterdam 18/05/2010.
Collection Description considerations in the nof-digitise programme Sarah Mitchell Programme Manager New Opportunities Fund.
CLARIN EUDAT2020 uptake plan Dieter Van Uytvanck CLARIN ERIC EUDAT User Forum, Rome.
AAI needs of the Distributed Computing Infrastructures - CLARIN Dieter Van Uytvanck Max Planck Institute for Psycholinguistics
CLARIN and CLARINO resources Knut Hofland Uni Research Computing Bergen, Norway Workshop ICAME 37, Hong Kong,
PARTHENOS-project.eu EOSC market demand for art, humanties and cultural heritage Amsterdam– EGI Conference– 7/4/2016 Franco Niccolucci Scientific Coordinator,
Largest Academic Social Science and Humanities Reference Resource Online Authoritative - written by the leading experts in the field. Comprehensive - full.
CLARIN ERIC Franciska de Jong Oxford April 2016
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No Enriching Europeana.
21st October 2008 eSciDoc – A Service Infrastructure for Cultural Heritage Content VSMM 2008 – Digital Archives Online Natasa Bulatovic, Ulla Tschida,
NRF Open Access Statement
From CLEF to TrebleCLEF Promoting Technology Transfer
Sociology “Sociology is the study of people, social life, and society
Exploring Europe’s Television Heritage in the Digital Age
Avalon's Role in the Digital Collections Ecosystem
What do Researchers and Research Infrastructures need from e-Infrastructures Franciska de Jong executive director CLARIN ERIC DI4R.
SowiDataNet - A User-Driven Repository for Data Sharing and Centralizing Research Data from the Social and Economic Sciences in Germany Monika Linne, 30.
VI-SEEM Data Repository
The European Language Resource Coordination (ELRC) Stelios Piperidis Institute for Language and Speech Processing / Athena R.C. ELRC.
Martin Müller InRoad Coordinator InRoad
e-Thesis Submission: What You Need to Know About Going Global
Data Management: Documentation & Metadata
Working Party “Cooperation on Land Cover/Use Statistics”
Darja Fišer CLARIN ERIC Director of User Involvement
CLARIN ERIC and the science cloud
EU and multilingualism
Common Solutions to Common Problems
IEEE Transactions Journals Scopus Viewpoint
Objectives, activities, and results of the database Lituanistika
Scholarly Communications Initiative scholcom.yorku.ca
ETS Working Group meeting 24-25/9/2007 Agenda point 7 CVTS3 brief update /09/ 2007 ETS working group.
DATA ACCESS IASSIST workshop on Access Policies and Licensing for Archives and Repositories Eric Balster (CentERdata) Cologne, May 28, 2013.
Bird of Feather Session
Applied linguistics in language teaching 1
Introduction to the CESSDA Data Management Expert Guide
Labour market statistics- State of play
31 January 2008 Erika de Visser Ecofys Netherlands
A new web-based corpus management and analysis platform
Presentation transcript:

Darja Fišer DARIAH Day @ UZH Zurich, 18 December 2017 Veni, vidi, CLARIN! Darja Fišer DARIAH Day @ UZH Zurich, 18 December 2017 CC-BY 4.0

Overview Intro to CLARIN CLARIN data architecture CLARIN for data science

Intro to CLARIN

CLARIN in seven bullets CLARIN is the Common Language Resources and Technology Infrastructure ESFRI ERIC status since 2012, Landmark since 2016 that provides easy and sustainable access for scholars in the humanities and social sciences and beyond to digital language data (in written, spoken, video or multimodal form) and advanced tools to discover, explore, exploit, annotate, analyse or combine them, wherever they are located through a single sign-on environment and that serves as an ecosystem for knowledge sharing.

CLARIN ERIC in members and centres A consortium of: 19 members: AT, BG, CZ, DE, DK, DLU, EE, FI, GR, HU, IT, LT, LV, NL, NO, PL, PT, SE, SI 2 observers: FR, UK; >40 centres

What CLARIN Centres offer Repository library of linguistic data and tools search for data and tools and easily use them online or download them deposit your data and be sure it is safely stored, everyone can find it, and correctly cite it Federated single sign-on log in once with your existing institutional credentials get access to protected resources Metadata describe content, provenance and formats of linguistic data and tools facilitate preservation and dissemination of linguistic data and tools Persistent Identifier (PID or handle) a special permanent URL that provides a permanent link to linguistic data and tools will resolve correctly even if in some distant future the data is moved should be used as URL in citations Licensing Public Academic Restricted Preservation (Data Seal of Approval) committed to long-term care of items in the repository ensure the archived data can be found, understood and used in the future

CLARIN data types and user communities Newspaper archives Literary texts Parliamentary records Historical letters Broadcast archives Oral History data Social Media data … Digital humanities Linguistics and Philology Translation and Lexicography Literary Studies History Political and Social Sciences Media Studies Culture, Folklore, Anthropology Speech therapy Teachers General Public

CLARIN data architecture

Repositories * slides by Dieter Van Uytvanck



Content search


CLARIN for data science

CLARIN and data science (1) Text and speech as social and cultural data Contribution to the development of new methodological frameworks for the integrated processing of multiple datatypes, and multidisciplinary research agendas Europe’s multilinguality as a basis for comparative research of societal and cultural phenomena, that are reflected in language use: Migration patterns Intellectual history Language variation across period and region Dynamics in mental health conditions Parliamentary discourse

Parliamentary records great potential for reuse and re-purposing within many fields of study in the humanities and social sciences (and beyond): suited for both close reading and distance reading Humanities: history, language change, discourse analysis … Social sciences: social and cultural dynamics, political sciences, economics ... considered a rich data type apart from linguistic content, rich in metadata (speaker, party affiliation, age, sex, education, origin, duration of speech) apart from linguistic content, rich in extralinguistic clues (interruptions, voting results) made easily available under the Freedom of Information acts in over 100 countries all around the world to enable informed participation by the public and improve effective functioning of democratic systems but also often presenting itself as messy or noisy data calling for links with data in other modalities than text and speech created under specific circumstances that need to be well understood before strong conclusions can be drawn

Corpora of parliamentary records Coverage exist for 18 countries Size (in tokens) largest: UK (1.6 billion) smallest: Portuguese (1 million) Periods covered by the corpus mostly 2nd half of 20th century and 21st century, Dutch and British corpora from early 19th century Availability For download (7) at, cz [CPM], dk, de [sample only], no [ToN], pt, lv For on-line searching (7) Finnish (KORP) CzechParl (SketchEngine) Latvian (noSketchEngine) Bulgarian (CLaRK) Hungarian (HNC, registration required) Proceedings of Norwegian Parliamentary Debates (Corpuscle) Both for download and on-line searching (5) Dutch (Political Mashup) Estonian (Keeleveeb) Swedish (KORP) Slovenian (noSketchEngine) Polish (NKJP) Full overview available here

CLARIN’s Parliamentary data for many disciplines Perspective of curators and researchers: Historical perspective: the specifics of diachronical perspective; time dynamics per topics, etc. Political science perspective: political activity of parties and politicians; the role of the various public political bodies; policy comparison; language differences as indicators to differing political views etc. Sociological perspective: conflicts in parliament; attitudes of politicians to critical issue: trending topics; patterns of language use reflecting societal dynamics, models of parliamentary communication, control, commissions, etc. Psychological and language perspective: language portraits of politicians; semantic differences of political terms; gestures; behavior in parliament, etc. Developers' perspective: Design of parliamentary speech corpora: annotations, visualization, etc. Text analytics, semantic processing and linking of parliamentary data Searches and information extraction from parliamentary corpora Multilinguality issues in parliamentary data

ParlaCLARIN @ LREC 2018 Background Aim Paper submission deadline Need for better harmonization, interoperability and comparability of the resources and tools relevant for the study of parliamentary discussions and decisions, not only in Europe but worldwide Aim Bring together researchers interested in compiling, annotating, structuring, linking and visualising parliamentary records that are suitable for research in a wide range of disciplines in the Humanities and Social Sciences Paper submission deadline 10 January 2018 More info https://www.clarin.eu/ParlaCLARIN

Veni, vidi, CLARIN! darja.fiser@ff.uni-lj.si