DASISH Digital Services Infrastructure for Social Sciences and Humanities Daan Broeder TLA - MPI for Psycholinguistics / DASISH & CLARIN EGI Forum Garching,

Slides:



Advertisements
Similar presentations
DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
Advertisements

Digital Repositories: interoperability & common services Closing Remarks Dr Liz Lyon, UKOLN, University of Bath, UK
CLARIN AAI, Web Services Security Requirements
User Attributes; who, where, how many? Daan Broeder TLA – MPI for Psycholinguistics.
Computational Paradigms in the Humanities – eHumanities and their role and impact in transdisciplinary research Gerhard Budin University of Vienna.
Advanced Metadata Usage Daan Broeder TLA - MPI for Psycholinguistics / CLARIN Metadata in Context, APA/CLARIN Workshop, September 2010 Nijmegen.
DASISH Online Training Module Claudia Engelhardt Access Policies and Licensing Timo Gnadt
The current state of Metadata - as far as we understand it - Peter Wittenburg The Language Archive - Max Planck Institute CLARIN Research Infrastructure.
Converging parallel universes Library services as building blocks of digital humanities research 42nd LIBER Annual Conference Munich June 2013 Gregor Horstkemper.
DASISH Common Solutions to Common Problems. DASISH – Data Service Infrastructure for the Social Sciences and Humanities DASISH brings together 5 ESFRI.
‘european digital library’ (EDL) Julie Verleyen TEL-ME-MOR / M-CAST Seminar on Subject Access Prague, 24 November 2006.
2 nd Data without Boundaries Training Course Bucharest, February 2013.
Integrated European Census Microdata 5 th DwB Training, Barcelona, January 2015.
CLARIN Centers for a Sustainable Infrastructure Daan Broeder, MPI for Psycholinguistics Jan Odijk, Utrecht University.
Integrating Digital Curation in a Digital Library curriculum: the International Master DILL case study Anna Maria Tammaro University of Parma Florence,
EGI-Engage EGI-Engage Engaging the EGI Community towards an Open Science Commons Project Overview 9/14/2015 EGI-Engage: a project.
NSD©2014 Bjørn Henrichsen From Fragmentation to a Infrastructural System DASISH Strategic Board Gothenburg, November
1 Common Challenges Across Scientific Disciplines Laurence Field CERN 18 th November 2013.
Deploying Trust Policies on the Semantic Web Brian Matthews and Theo Dimitrakos.
CLARIN and the Humanities Daan Broeder The Language Archive – MPI for Psycholinguistics CLARIN EU/NL Workshop on Federated Identity Management CERN, June.
Permanent access to digital knowledge – the challenges for digital preservation Pat Manson Head of Unit European Commission DG Information Society and.
1 DG RTD-B ERA: Research Programmes and Capacity Research Infrastructures Unit Maria Theofilatou FP7 Community actions Research Infrastructures of Social.
The role of Parthenos for CLARIN ERIC Steven Krauwer CLARIN ERIC Executive Director 1.
U.S. Department of the Interior U.S. Geological Survey CDI Webinar Sept. 5, 2012 Kevin T. Gallagher and Linda C. Gundersen September 5, 2012 CDI Science.
CLARIN Infrastructure Vision (and some real needs) Daan Broeder CLARIN EU/NL Max-Planck Institute for Psycholinguistics.
CLARIN Metadata Infrastructure Component Metadata and intermediate solutions Daan Broeder Claus Zinn Dieter van Uytvanck - Max-Planck Institute for Psycholinguistics.
C ross-European data sharing made easy EDAF Luxembourg.
Wishes from Hum infrastructures Examples: DOBES and CLARIN Peter Wittenburg Max Planck Institute for Psycholinguistics.
Towards a European network for digital preservation Ideas for a proposal Mariella Guercio, University of Urbino.
DASISH Final Conference Common Solutions to Common Problems.
21/06/09C:\Users\ehttp://dariah.eu ah\Desktop\new_slides\dariah_slides_template_blue.odppage 1 Heiko Tjalsma, Andreas.
Authentication and Authorisation for Research and Collaboration Licia Florio (GÉANT) Christos Kanellopoulos (GRNET) Service orientation.
DARIAH Rutger Kramer Software Development Coordinator DANS – KNAW, The Hague, NL EGEE09.
CLARIN work packages. Conference Place yyyy-mm-dd
Participation in 7FP Anna Pikalova National Research University “Higher School of Economics” National Contact Points “Mobility” & “INCO”
WP 4.3 Convergence of Data Service Outcomes of in-depth interviews and a survey amongst existing and future data archive services Task Leader: DANS Partners:
CLARIN Issues Peter Wittenburg MPI for Psycholinguistics Nijmegen, NL.
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
Recent Developments in CLARIN-NL Jan Odijk P11 LREC, Istanbul, May 23,
10/24/09CK The Open Ontology Repository Initiative: Requirements and Research Challenges Ken Baclawski Todd Schneider.
Access Policies and Licensing for Archives and Repositories Laurence Horton GESIS – Leibniz Institute for the Social Sciences This work is licensed under.
FP7 /1 EUROPEAN COMMISSION - DG Research Building a Europe of Knowledge Towards the Seventh Framework Programme
26/05/2005 Research Infrastructures - 'eInfrastructure: Grid initiatives‘ FP INFRASTRUCTURES-71 DIMMI Project a DI gital M ulti M edia I nfrastructure.
Store and Share Research Data b2share.eudat.eu B2SHARE How to share and store research data using EUDAT’s B2SHARE This work is licensed under.
Infrastructure Breakout What capacities should we build now to manage data and migrate it over the future generations of technologies, standards, formats,
19-20 October 2010 IT Directors’ Group meeting 1 Item 6 of the agenda ISA programme Pascal JACQUES Unit B2 - Methodology/Research Local Informatics Security.
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
WP7: Workshops Claudia Engelhardt, Timo Gnadt, Jens Ludwig Gothenburg, Final conference.
Thomas Gutberlet HZB User Coordination NMI3-II Neutron scattering and Muon spectroscopy Integrated Initiative WP5 Integrated User Access.
CLARIN EUDAT2020 uptake plan Dieter Van Uytvanck CLARIN ERIC EUDAT User Forum, Rome.
AAI needs of the Distributed Computing Infrastructures - CLARIN Dieter Van Uytvanck Max Planck Institute for Psycholinguistics
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No EPOS and EUDAT.
DASISH Digital Services Infrastructure for Social Sciences and Humanities Daan Broeder TLA - MPI for Psycholinguistics / DASISH & CLARIN EGI Forum Garching,
PARTHENOS-project.eu EOSC market demand for art, humanties and cultural heritage Amsterdam– EGI Conference– 7/4/2016 Franco Niccolucci Scientific Coordinator,
CLARIN ERIC Franciska de Jong Oxford April 2016
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No Herbadrop.
PIDs in EUDAT Webinar, 15 Februari 2013
GISELA & CHAIN Workshop Digital Cultural Heritage Network
CESSDA – for what and for whom?
EGI-Engage Engaging the EGI Community towards an Open Science Commons
Antonella Fresa Technical Coordinator
WP7: Training & Education
Darja Fišer CLARIN ERIC Director of User Involvement
WP 5 Shared Data Access & Enrichment
Common Solutions to Common Problems
Malte Dreyer – Matthias Razum
Integrating social science data in Europe
GISELA & CHAIN Workshop Digital Cultural Heritage Network
DATA ACCESS IASSIST workshop on Access Policies and Licensing for Archives and Repositories Eric Balster (CentERdata) Cologne, May 28, 2013.
Bird of Feather Session
Presentation transcript:

DASISH Digital Services Infrastructure for Social Sciences and Humanities Daan Broeder TLA - MPI for Psycholinguistics / DASISH & CLARIN EGI Forum Garching, March

DASISH Origin FP7 Capacities Work Programme: Infrastructures INFRA : Implementation of common solutions for a cluster of ESFRI infrastructures in the field of "Social Sciences and Humanities". A project under this topic should implement harmonised solutions for the ESFRI Infrastructures in the field of Social Science and Humanities on issues like, for example metadata frameworks, registries, single-sign-on systems and permanent identifiers.

DASISH consortium I 18 partners from 10 countries EU + Norway 5 ESFRI infrastructures: CESSDA, CLARIN, DARIAH, ESS and SHARE DASISH budget: 6ME -> 700PMs Started January 2012 Duration 36 Months Build on already existing collaborations – Partners are part of multiple research infrastructures – ESFRI projects merging (proposals) e.g. CLARIN + DARIAH - > CLARIAH

Council of European Social Science Data Archives An umbrella organisation for social science data archives across Europe. Since the 1970s the members have worked together to improve access to data. CESSDA research and development projects and Expert Seminars enhance exchange of data and technologies among data organisations. 20 CESSDA member organisations serve some 30,000+ social science and humanities researchers and students each year, Documents, recordings, statistical data & surveys: demographics, health, economy, education, politics, … Developers DDI metadata std. Multiple data centers

Common Language Resources and Technology Infrastructure CLARIN with 193 member institutions is a large-scale pan-European collaborative effort to create, coordinate and make language resources and technology available and readily usable. is committed to establish an integrated and interoperable research infrastructure of language resources and its technology. aims at lifting the current fragmentation, offering a stable, persistent, accessible and extendable infrastructure and therefore enabling eHumanities - Language Resources: text & multi-media corpora and lexica, … - Language Technology: Parsers, tokenizers, speech recognizers, … - Multiple CLARIN Centers (±30) ERIC status since Feb 2012

Digital Research Infrastructure for the Arts and Humanities The mission of DARIAH is to enhance and support digitally-enabled research across the humanities and arts. DARIAH aims to develop and maintain an infrastructure in support of ICT-based research practices. It has 14 partners and 5 associate partners DARIAH is working with communities of practice to: Explore and apply ICT-based methods and tools to enable new research questions to be asked and old questions to be posed in new ways Improve research opportunities and outcomes through linking distributed digital source materials of many kinds Exchange knowledge, expertise, methodologies and practices across domains and disciplines -Wide variety of data types for all the SSH disciplines -Services: statistics, visualization (maps), NLP, … -Virtual Competence Centres

The European Social Survey An academically-driven social survey designed to chart and explain the interaction between Europe's changing institutions and the attitudes, beliefs and behaviour patterns of its diverse populations. - Survey oriented - Single data centre, multiple competence centres

The Survey of Health, Ageing and Retirement in Europe A multidisciplinary and cross-national panel database of micro data on health, socio-economic status and social and family networks of more than 45,000 individuals aged 50 or over. The survey’s third wave of data collection, SHARELIFE, collects detailed retrospective life-histories in thirteen countries in Survey oriented - Single data center ERIC status since March 2011

Consortium II SND DANS UEssex FSD NSD GESIS CITYKCL UGOE OEAW MPIPL UCPH UIB UPF NUIM MPISOC UNIVE CentERdata UT CESSDA DARIAH CLARIN ESS SHARE

Management

1 UGOT44 2Architecture and Quality assessment DANS Liaise with other e-infra initiatives Get requirements for a ref. architecture Assess results 83 3Data QualityNSDImprove EU wide survey quality: terminology, translation, vocabulary normalizations 200 4ArchivingNSD State of preservation in SSH Assessment of deposit services recommendations, negotiations. Deposit service convergence 67 5Data Access and EnrichmentMPI-PL Federated Identity PID requirements Metadata quality improvement Joint metadata domain Workflow use cases Annotation framework 171 6Legal and Ethical IssuesMPI-SOC identify legal and ethical issues wrt. current and new SSH data types resulting from the integration, linking and archiving Legal & ethical VCC 68 7Education and TrainingUGOETraining modules, workshop program56 8DisseminationUCPHCommunication strategy and means34 DASISH Work packages

DASISH Mission DASISH provides and or brokers solutions for a number of common issues of the five ESFRI projects in social sciences and humanities. DASISH identifies four major areas: data quality (surveys) ESS & SHARE data archiving data access legal and ethical issues General procedure: Inventory Analysis Brokering Implemen tation Education Outreach

Need to create common infrastructure not just strengthen community specific ones Traditions vary considerably – Between SS at one side and the humanities. But also within the humanities – Some collaborations/communities have a rich history – Others are fairly new (as an infrastructure) – Organizational models and complexity varies and impacts preferences for solutions – Differences wrt. understanding IT issues Language and terminology vary even more so as past discussions learned us DASISH Challenges

Highly domain specific Can be shared by communities as CESSDA, SHARE and ESS but probably not outside DASISH DASISH WP3 plans – Questionnaire Design Documentation Databank – Translation Tool and Databank – Question Databank – (Survey) Fieldwork monitoring system Data Quality

Is a generic service possible? Archiving policies differ and probably necessarily so – Allowed archivable formats, retention time?, required reliability, security assurances,… Any common system should accommodate multiple policies Existing solutions: National data centers – But are they available to all? – Sufficiently flexible EUDAT data management infrastructure is in progress and can be (part of) a generic solution. Data Archiving

Metadata quality improvement – Controlled set of schema -> schema registry – Controlled vocabularies -> vocabulary services – Explicit semantics for schema -> semantic/concept registries Single metadata catalogue – Metadata interoperability -> semantic/concept registry – Well defined metadata harvesting infrastructure -> OAI/PMH + metadata provider registry – Granularity is an issue! Solutions CESSDA and CLARIN already have catalogs, others also exist CLARIN claims general framework for MD interoperability EUDAT is working on a shared catalogue Technology is available e.g. US DataOne project using Mercury Access - Data Discovery

Goal: single sign-on and single user identity Many users -> maintaining separate user store is prohibitive Limited complexity for the user, e.g. no certificate handling by users Solutions Federated Identity Management offered by the national IDFs and use of SAML2 should be sufficient if also – EU GEANT/eduGain inter-federation works – Proper set of user attributes is released There are legal and practical issues concerning the user’s institute release policies for user attributes Current attribute release policies don’t scale well Access - AAI

Persistent Identifiers for data Could be generic, but tradition splits SSH institutes in using URNs or Handle/DOI Support for identifying parts of objects. Added functionality in cooperation with other e-infra projects as EUDAT Important extra functionality can be associated with PID framework e.g. checksum for verifiability Solutions EPIC, DataCite (hdl) Persid (URN) Access - PIDs

Annotation framework Create relations between (parts-of) on-line data objects Need special data-type specific visualization when linking to parts-of data objects Need registry to store and access these relations Solutions Some but only allowing to link to complete objects via URLs or else markings on browser screen dump e.g. “everNote” Access – linking data

Only few applications in SSH need intensive computing This will change in the future as more automatic feature extraction will be done on media recordings. Some SSH ESFRI projects are developing SOA with complex workflows to process data. Facilities for cheap flexible deployment of services is asked – Organizational & management problem – Not a computational resources problem Computing

PID services – EPIC / Persid SSH communities wide - DASISH common SSH metadata catalog community specific community specific CLARIN LT web service infrastructure NETWORK Services - GEANT Federated Identity Management Data Preservation – EUDAT replication & preservation DASISH Context CLARINDARIAHCESSDALife Watch DASISH

Thank you for your attention

Annotation Frame work item1 itemZ

Possible use cases I These have not yet been fleshed out. Some ideas from the CLARIN side: Social scientists have recordings that are of interest to linguists. – Locate these using appropriate metadata and process it with LT tools to analyze gesture – The analysis results should be (after evaluation) deposited into an archive with proper references to the primary data – The analysis data should again be registered with proper metadata for reuse Use of demographic data for corpus building and use – Give a linguist building a balanced speech corpus access to demographic data – How many of the speakers need to be older than 65 for the corpus to be a representative sample

Possible use cases II Combine maps of linguistic dialects or linguistic micro- variation with migration statistics. – Looking at variation both in place and time should be interesting Make metadata on medieval texts & literature available on the web and interlink it with manuscripts and transcripts available from cultural heritage institutions – Have possibility to add enhancements and comments from the research community Give historians of science and ideas access to language technology to analyze historical texts – Allows following the appearance and spread of new concepts and inventions – The Dutch CCKC project used this to analyze the correspondence between scientists in the 17 th century