CLARIN Common Language Resources and Technology Infrastructure Daan Broeder & Dieter van Uytvanck Max-Planck Institute for Psycholinguistics TF-EMC2 Meeting,

Slides:



Advertisements
Similar presentations
Towards a Persistent Identifier Infrastructure for European e-Research Daan Broeder CLARIN / MPG 2008 CNRI Handle System Workshop.
Advertisements

Lousy Introduction into SWITCHaai
Building metadata components Dieter Van Uytvanck Max Planck Institute for Psycholinguistics CLARIN-NL Info Session Nijmegen
CLARIN AAI, Web Services Security Requirements
Interoperability aspects in the The Virtual Language Observatory Dieter Van Uytvanck Max Planck Institute for Psycholinguistics
CLARIN Technical Infrastructure How far are we?. Short Overview CLARIN is one of the 44 accepted ESFRI Roadmap Initiatives official start: , Kick-off:
User Attributes; who, where, how many? Daan Broeder TLA – MPI for Psycholinguistics.
Advanced Metadata Usage Daan Broeder TLA - MPI for Psycholinguistics / CLARIN Metadata in Context, APA/CLARIN Workshop, September 2010 Nijmegen.
Resource and Service Centers as the Backbone for a Sustainable Infrastructure Peter Wittenburg CLARIN Research Infrastructure Co-Authors: Nuria Bel, Lars.
Steven KrauwerCLARIN-NL Launch CLARIN-EU: Where do we stand? Steven Krauwer Utrecht institute of Linguistics UiL OTS CLARIN-EU Coordinator.
CLARIN: Common Language Resources and Technology Infrastructure for the Social Sciences and Humanities Steven Krauwer Utrecht institute of Linguistics.
CLARIN: Goals and Structure of the Project Steven Krauwer CLARIN Coordinator Utrecht institute of Linguistics UiL-OTS (NL)
Steven KrauwerLREC20081 CLARIN: Common Language Resources and Technology Infrastructure for the Humanities and Social Sciences Kimmo Koskenniemi (University.
EMI INFSO-RI Session Summary AAI Needs for DCIs John White, HIP Christoph Witzig, SWITCH
Workshop Summary I think it was an excellent and inspiring meeting (do I have to say that?) I hoped to have a kind of kick-off effect like at the LREC.
WebLicht Application and Workspaces Munich September WebLicht Application and “Workspaces” Erhard Hinrichs & Thomas Zastrow University.
FIM-ig Federated Identity Management Interest Group.
Populating the Infrastructure using Standards Daan Broeder CLARIN NL EB TLA - MPI for Psycholinguistics CLARIN Coordinators Meeting June 29,30 Budapest.
SWITCHaai Team Federated Identity Management.
Ülevaade projektist CLARIN Eesti keeleressursside keskus Koostööst Tulevikust.
CLARIN-NL Second Open Call Jan Odijk CLARIN-NL Call 2 Info-session Amsterdam, 26 Aug 2010.
Agenda CMDI Workshop 9.15 Welcome 9.30 Introduction to metadata and the CLARIN Metadata Infrastructure (CMDI) 10.15Coffee 10.30Use of ISOCat within CMDI.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Increasing the usage of endangered language archives in the.
CLARIN and the Humanities Daan Broeder The Language Archive – MPI for Psycholinguistics CLARIN EU/NL Workshop on Federated Identity Management CERN, June.
CLARIN ERIC Progress according to the Strategy Plan Steven Krauwer, Bente Maegaard 1.
Citing Archived Objects Daan Broeder MPI for Psycholinguistics DELAMAN meeting London 2006.
The role of Parthenos for CLARIN ERIC Steven Krauwer CLARIN ERIC Executive Director 1.
Metadata & CMDI CLARIN Component Metadata Infrastructure Daan Broeder et al. Max-Planck Institute for Psycholinguistics CLARIN NL CMDI Metadata Tutorial.
CLARIN - a European Research Infrastructure Peter Wittenburg Max-Planck Institut für Psycholinguistik, Nijmegen.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Why should we invest in DWF? Peter Wittenburg CLARIN Research.
CLARIN Infrastructure Vision (and some real needs) Daan Broeder CLARIN EU/NL Max-Planck Institute for Psycholinguistics.
CLARIN Metadata Infrastructure Component Metadata and intermediate solutions Daan Broeder Claus Zinn Dieter van Uytvanck - Max-Planck Institute for Psycholinguistics.
Wishes from Hum infrastructures Examples: DOBES and CLARIN Peter Wittenburg Max Planck Institute for Psycholinguistics.
2005 © SWITCH Perspectives of Integrating AAI with Grid in EGEE-2 Christoph Witzig Amsterdam, October 17, 2005.
EMI AAI Strategy & Plans John White / Helsinki Institute of Physics Federated Identity Systems for Scientific Collaborations Workshop , CERN,
FIM, , Nijmegen CLARIN: status of FIM Dieter Van Uytvanck 1.
Authentication and Authorization Overview Kimmo Koskenniemi, Antti Arppe, Mikael Lindén University of Helsinki, CSC – IT Centre for Science Consortium.
CLARIN work packages. Conference Place yyyy-mm-dd
Shibboleth Akylbek Zhumabayev September Agenda Introduction Related Standards: SAML, WS-Trust, WS-Federation Overview: Shibboleth, GSI, GridShib.
AAI WG EMI Christoph Witzig on behalf of EMI AAI WG.
CLARIN Issues Peter Wittenburg MPI for Psycholinguistics Nijmegen, NL.
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
Recent Developments in CLARIN-NL Jan Odijk P11 LREC, Istanbul, May 23,
Connect. Communicate. Collaborate The authN and authR infrastructure of perfSONAR MDM Ann Arbor, MI, September 2008.
Community Sign-On and BEN. Table of Contents  What is community sign-on?  Benefits  How it works (Shibboleth)  Shibboleth components  CSO workflow.
1 CLARIN - NL What is going on? Jan Odijk Amsterdam 26 Aug 2010.
Exploring ‘Workspaces’ Tom Visser, SARA compute and networking services, Amsterdam Garching Workshop 21 st September 2010.
Connect. Communicate. Collaborate AAI scenario: How AutoBAHN system will use the eduGAIN federation for Authentication and Authorization Simon Muyal,
Authentication and Authorisation for Research and Collaboration Licia Florio AARC Workshop The AARC Project Brussels, 26 October.
Agenda CMDI Tutorial 9.30 Welcome & Coffee Introduction to metadata and the CLARIN Metadata Infrastructure (CMDI) 10.30CMDI & ISO-DCR 10.50The CMDI.
Authorization and Authentication Infrastructure Daan Broeder & Dieter Van Uytvanck Max Planck Institute for Psycholinguistics
Creating & Testing CLARIN Metadata Components A CLARIN-NL project Folkert de Vriend Meertens Institute, Amsterdam 18/05/2010.
Clain update TF-EMC Mikael Linden, CSC.
The State of Integration National Coordinators’ Forum Report by Erhard Hinrichs, NCF Chair.
Networks ∙ Services ∙ People Thomas Bärecke Journée Fédération, Paris Collaboration européenne GÉANT SA5 03/07/2015 SA5 T5 team
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
Networks ∙ Services ∙ People Marina Adomeit FIM4R meeting Virtual Organisation Platform as a Service VOPaaS Nov 30, 2015, Austria Task Leader,
Shibboleth Use at the National e-Science Centre Hub Glasgow at collaborating institutions in the Shibboleth federation depending.
Open Science and Research – Services for Research Data Management © 2014 OKM ATT 2014–2017 initiative Licenced under.
DASISH Digital Services Infrastructure for Social Sciences and Humanities Daan Broeder TLA - MPI for Psycholinguistics / DASISH & CLARIN EGI Forum Garching,
AAI needs of the Distributed Computing Infrastructures - CLARIN Dieter Van Uytvanck Max Planck Institute for Psycholinguistics
Authentication and Authorisation for Research and Collaboration Peter Solagna, Nicolas EGI AAI integration experiences AARC Project.
Authentication and Authorisation for Research and Collaboration AARC/CORBEL Workshop for Life Sciences AAI AARC Draft Blueprint.
Community Sign-On and BEN. Table of Contents  What is community sign-on?  Benefits  How it works (Shibboleth)  Shibboleth components  CSO workflow.
LIGO Identity and Access Management
AAI for a Collaborative Data Infrastructure
CLARIN Federated Identity Vision
Krister Lindén and Ville Oksanen FINCLARIN / University of Helsinki
WP 5 Shared Data Access & Enrichment
Common Solutions to Common Problems
Presentation transcript:

CLARIN Common Language Resources and Technology Infrastructure Daan Broeder & Dieter van Uytvanck Max-Planck Institute for Psycholinguistics TF-EMC2 Meeting, Dec

What is CLARIN The CLARIN project is a large-scale pan-European collaborative effort to coordinate and make language resources and technology available and readily useable for Language & SSH (Social Sciences & Humanities) researchers. Resources: Lexica, text corpora, multi-media/multi- modal recordings, … Technology: applications & (web-)services as parsers, tokenizers, speech recognizers & segementators, … TF-EMC2 Meeting, Dec

The problem Existence and location of resources only known to insiders Archives mostly unconnected islands Every archive has its own standards for storage and access Normally need to download first when processing resources Social sciences and humanities researchers are not language or speech technologists They are often not aware of the potential benefits of using language and speech technology Available tools are hard to use for non-specialist TF-EMC2 Meeting, Dec

CLARIN is an EU Infrastructure project with 4.2 ME funding for a 3 year preparatory phase Additional funding from national governments (at this moment at least 7 ME + 9 ME) The CLARIN consortium has now 32 partners from 26 EU countries The CLARIN community has 146 member organizations in 32 countries (mostly from NLP organizations) CLARIN is based on earlier initiatives with many participants: LangWeb, EARL, TELRI, LIRICS and more recent DAM-LR CLARIN overview TF-EMC2 Meeting, Dec

CLARIN Organization WP1Management & Coordination, Steven Krauwer, OTS U. Utrecht NL WP2Technical Infrastructure, Peter Wittenburg, MPI Nijmegen NL WP3Humanities Overview, Tamas Varadi, Hung Ac. Sc., Hun DLODARIAH Liaison, Martin Wynne, Oxford Univ. UK, WP5Language Resources & Technology Erhard Hinrichs, U. Tuebingen, Ger WP6Dissemination, Dan Crista, U. Iasi Rom WP7IPR and Business Models, Kimmo Koskiennemie, Univ. Helsinki Fin WP8Organizational Agreements, Bente Maegaard, Univ. Copenhagen, Dk TF-EMC2 Meeting, Dec

Time plan Preparatory Phase –Limited set of federated centers (10+) –Showcases, demonstrators –WP8: Investigate embedding in national funding schemes for construction phase & maintenance Construction Phase –No important European funding –Depend on national project commitments 2020? - … Maintenance Phase TF-EMC2 Meeting, Dec

CLARIN “Holy Grail” User Scenario A researcher authenticates himself with his own organization and creates a “virtual” collection of resources from different repositories. He does this on the basis of browsing a catalogue, searching through metadata, or searching in resource content. He is then able to use a workflow specification tool and process this virtual collection with possibly a mix of home grown and remote service components. Resulting data can be added to the origin repositories with proper access rights and the “virtual” collection specification can be stored for future reference. For our domain this is very ambitious and challenging, but even a partial realization is worthwhile! TF-EMC2 Meeting, Dec

DAM-LR EU project ( ) Small EU project on archive integration of 4 partners corpus/computational linguistics and endangered language documentation Resource discovery: sharing a single metadata set for searching & browsing Authentication & Authorization: single user identity, single sign-on by using Shibboleth. Referencing and citing “archived resources” using a single persistent identifier system. TF-EMC2 Meeting, Dec

AAI & Federation issues Experiences from DAM-LR wrt. to AAI: –Standard eduP. attr. set is probably sufficient, (but CCs …) –Shibboleth is nice when using web applications, but applications need access too! –Shibboleth efficient when dealing with groups e.g. staff, student, … But our domain has also to deal with individuals -> store user IDs in authorization records –DAM-LR federation of both IdPs & SPs, CLARIN aims at a much larger potential user group whose home organizations do not want to run a CLARIN specific IdP -> use the national IDFs TF-EMC2 Meeting, Dec

CLARIN Federation Infrastructure I CLARIN wants to be a LR&T “service federation” simplified and unified rules for licensing, accessing agreements with national identity federations must make sure all necessary attributes are available cater also for AA of non-web applications and web services interaction with GRID AAI national Identity Federations eJournal Service Providers LRT Service Providers Trust Agreement Trust Agreements TF-EMC2 Meeting, Dec

Applications need Authentication too IdP Shib. apache userapplication User scenario: Copying resources from different repositories to the local machine archiveA The application speaks only HTTP with basic authentication It does not understand form based authentication employed by the Shib. IdP Shib. apache archiveB The application is also not able to profit from the SSO over archives IMDI copier TF-EMC2 Meeting, Dec Possible solution: Use certificates for authentication Obtained by SLCS But can auth. handshake be mimicked by sw

CHAT EAF Shoebox MPI Archive DB/SE Search service Parsers “normalize” the structural format The scenario of searching through the content of just one archive is no problem there is just one SP that needs to check the if the user has access to the annotations. Searching through annotations Auth DB IdP TF-EMC2 Meeting, Dec

CHAT EAF Shoebox MPI Archive Archive B DB/SE CHAT Search service Search service Specialized web portal Federative search scenario Parsers “normalize” the structural format Searching through annotations Auth DB IdP Auth DB TF-EMC2 Meeting, Dec The web portal app would like to act on behalf of the user and access the search services.

Licenses & Code of conducts 1 IdP SPa SPb user SP requires CC signed and takes care of this but only for its own domain This can break the SSO if the user is required to sign the same CC several times browser TF-EMC2 Meeting, Dec CC DB CLARIN will harmonize the CCs and licenses to a limited number

Licenses & Code of conducts 2 IdP SPa SPb user browser TF-EMC2 Meeting, Dec Store the CC DB info in the user attributes at the IdP But how does it get there? Special app? Not every IdP will/can run this CC DB

Licenses & Code of conducts 3 IdP SPa SPb user browser TF-EMC2 Meeting, Dec Create special CC service. This is part of the SPF independent of the IDFs CC DB CC service

The End Thank you for your attention More info: