Dirk Roorda, coordinator infrastructure the gap between Humanities and GRID.

Slides:



Advertisements
Similar presentations
GMD German National Research Center for Information Technology Darmstadt University of Technology Perspectives and Priorities for Digital Libraries Research.
Advertisements

Smart Qualitative Data: Methods and Community Tools for Data Mark-Up SQUAD Libby Bishop Online Qualitative Data Resources: Best Practice in Metadata Creation.
An Introduction to Repositories Thornton Staples Director of Community Strategy and Alliances Director of the Fedora Project.
1 e-Arts and Humanities Scoping an e-Science Agenda Sheila Anderson Arts and Humanities Data Service King’s College London.
Computational Paradigms in the Humanities – eHumanities and their role and impact in transdisciplinary research Gerhard Budin University of Vienna.
SDH-NEERI 2010 Panel on Integrating Infrastructures: Moderator: Peter Farago (ESFRI-SSH/FORS) CLARIN: Steven Krauwer Open Aire: Wolfram Horstmann TEL/Europeana:
The role of Parthenos for CLARIN ERIC Steven Krauwer CLARIN ERIC Executive Director 1.
Per Møldrup-Dalum State and University Library SCAPE Information Day State and University Library, Denmark, SCAPE Scalable Preservation Environments.
1 Web: Steve Brewer: Web: EGI Science Gateways Initiative.
Neil Witheridge APAN29 Sydney February 2010 ARCS Authorisation Services Neil Witheridge Manager, ARCS Authorisation Services APAN29, Sydney, February 2010.
The Grid System Design Liu Xiangrui Beijing Institute of Technology.
21/06/09C:\Users\ehttp://dariah.eu ah\Desktop\new_slides\dariah_slides_template_blue.odppage 1 Heiko Tjalsma, Andreas.
CLARIN work packages. Conference Place yyyy-mm-dd
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Exploring ‘Workspaces’ Tom Visser, SARA compute and networking services, Amsterdam Garching Workshop 21 st September 2010.
Enabling e-Research in Combustion Research Community T.V Pham 1, P.M. Dew 1, L.M.S. Lau 1 and M.J. Pilling 2 1 School of Computing 2 School of Chemistry.
Facilitating Access and Reuse of Research Materials: the Case of The European Library Nuno Freire The European Library RESAW Seminar December 2013.
26/05/2005 Research Infrastructures - 'eInfrastructure: Grid initiatives‘ FP INFRASTRUCTURES-71 DIMMI Project a DI gital M ulti M edia I nfrastructure.
Mike Hildreth DASPOS Update Mike Hildreth representing the DASPOS project 1.
1 e-Arts and Humanities Scoping an e-Science Agenda Sheila Anderson Arts and Humanities Data Service Arts and Humanities e-Science Support Centre King’s.
INTRODUCTION TO GRID & CLOUD COMPUTING U. Jhashuva 1 Asst. Professor Dept. of CSE.
EGI-InSPIRE RI EGI Compute and Data Services for Open Access in H2020 Tiziana Ferrari Technical Director, EGI.eu
CLARIN ERIC Franciska de Jong Oxford April 2016
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
EGI-InSPIRE RI An Introduction to European Grid Infrastructure (EGI) March An Introduction to the European Grid Infrastructure.
Ian Bird, CERN WLCG Project Leader Amsterdam, 24 th January 2012.
Genie Pal A Versatile Intelligent Assistant To Help Both Work And Personal life.
Using iRODS with the EnginFrame Grid Portal into the GRIDA3 project Francesco Locunto Marco Piras Matteo Vocale.
E-Business Infrastructure PRESENTED BY IKA NOVITA DEWI, MCS.
Virtual Laboratory Amsterdam L.O. (Bob) Hertzberger Computer Architecture and Parallel Systems Group Department of Computer Science Universiteit van Amsterdam.
Accessing the VI-SEEM infrastructure
Why KM is Important KM enhances mission command, facilitates the exchange of knowledge, supports doctrine development, fosters leaders’ development, supports.
What is it ? …all via a single, proven Platform-as-a-Service.
GISELA & CHAIN Workshop Digital Cultural Heritage Network
Clouds , Grids and Clusters
Pasquale Pagano CNR, Italy
Making meaningful connections
The Open Grid Service Architecture (OGSA) Standard for Grid Computing
Defining and tracking requirements for New Communities
Bridges and Clouds Sergiu Sanielevici, PSC Director of User Support for Scientific Applications October 12, 2017 © 2017 Pittsburgh Supercomputing Center.
CLARIN Federated Identity Vision
Knowledge Management Systems
WP3 Preservation Roadmap
Grid Computing.
THE STEPS TO MANAGE THE GRID
Antonella Fresa Technical Coordinator
MOTIVATOR FOR MODERN EDUCATION Assoc. Prof. Diana Popova, Ph.D.
Physical Architecture Layer Design
University of Technology
C2CAMP (A Working Title)
European Network of e-Lexicography
Grade 6 Outdoor School Program Curriculum Map
“Without a collective memory, we are nothing and can achieve nothing
EOSC services architecture
Mapping - Linking - Planning - Documenting
WP 5 Shared Data Access & Enrichment
Smart Learning concepts to enhance SMART Universities in Africa
Common Solutions to Common Problems
Integrating social science data in Europe
Brian Matthews STFC EOSCpilot Brian Matthews STFC
GISELA & CHAIN Workshop Digital Cultural Heritage Network
Mapping - Linking - Planning - Documenting
ROLE OF «electronic virtual enhanced research-engaged student teams» WEB PORTAL IN SOLUTION OF PROBLEM OF COLLABORATION INTERNATIONAL TEAMS INSIDE ONE.
February
Virtual Competency Centre 1: e-Infrastructure General VCC meeting, 2/3 April 2012, Utrecht, The Netherlands Karlheinz Moerth (Co-head of VCC 1, Austria)
Brokering as a Core Element of EarthCube’s Cyberinfrastructure
DARIAH – Competence Centre in a nutshell
EOSC-hub Contribution to the EOSC WGs
OU BATTLECARD: Oracle Identity Management Training
OU BATTLECARD: Oracle WebCenter Training
Presentation transcript:

Dirk Roorda, coordinator infrastructure the gap between Humanities and GRID

Prologue GRID: The rationalist’s paradise

SSH and ICT an accelerating story in 3 gears so far...

first gear: online acces to sources... for people

second gear: collaborate in annotating and enriched publications

third gear: resources subject to intensive computing

ESFRI EEF report (April 2010) CESSDA, CLARIN, DARIAH ESS, SHARE Common requirements single sign on unified virtual organisations persistent storage (and identifiers) webservices and workflows

represents the linguistic community relatively strong in computing many exploitable language tools research benefits from high performance computing

initial expectations of GRID enabling technology for workflows sustainable tools sustainable data..., and all of this will be available on the internet using a service oriented architecture based on secure grid technologies

Turning point There is no off-the-shelf grid/federation technology; much relies on the availability of specialists who know about the details. Much adaptation and configuration work has to be done, which requires a deep understanding of the components.... it is the interaction and integration that requires lots of efforts.

Current point of view interested in data grid for webservices: consider cloud unclear what will turn out to be a stable technology for the CLARIN RI

humanities, arts and cultural heritage mission explore and apply ICT-based methods link digital source materials of many kinds exchange expertise across disciplines key concepts virtual infrastructure virtual competence center VCC1 e-Infrastructure ; VCC3 Scholarly Content

DARIAH on the GRID? no mention on high performance computing scarcely mentions grid programmatic access to data: yes emphasis on preservation of data But then there is TextGRID

First Periodic Report 2009 announces infrastructure experiment showing large computational needs for humanities indexing unstructured resources in an on-demand and flexible manner conducted on D4Science

Umbrella for social science archives since 1970 Upgrade in ESFRI context (FP7) developing: the data portal to allow seamless access to data holdings across Europe common authentication and access middleware tools investigating the potential of grid technologies improving data harmonisation tools

Use cases contain data collection applying harmonisation collect results and analyse them further in a collaborative environment under severe access restrictions develop new surveys comparable to existing ones

In February 2009 Yes GRID can help but many approaches of which maybe none enables all scenarios at the same time

May 2009 SWOT analysis GRID solves several partial problems current GRID technology not geared to the core of CESSDA’s problems do realistic case studies on GRID, Cloud, traditional client server

requirements functional enable data intensive scholarship better support for existing hypotheses new research questions enriched publications enable collaboration around sources incremental enrichment (annotations) connecting sources (linked data) connecting people (collaboratories)

requirements “non-functional” respect access rights sustainability of data and tools ease of use and programming performance

characteristics data from irrepeatable processes data in many small chunks complex interrelationships emphasis on human interpretation natural language processing symbolic processing linking to background knowledge

human interpretation only part of the dataprocessing can be automated higher level patterns require higher performance computing researchers need interaction with data and processes store intermediate results build virtual collections

(virtual) use cases a use case spells out a concrete task from the perspective of an end user the humanities present few use cases where HPC is essential with “significant” amounts of data which can be translated into batch jobs so: virtually no concrete use cases!

virtual use cases a virtual use case spells out a generic task from an infrastructural perspective still meaningful to the end user virtual use cases pave the way for real use cases bridge big gaps between end users of “foreign” infrastructure

a specific virtual use case smart indexes on unique resources, e.g. speech sounds in audio sources names in hand-written sources artefacts in images motifs in literary texts task create a comprehensive index of patterns in all accessible collections share that index for follow-up research

smart indexes: sources accessible collections <= sources are readily accessible programmatically with minimal latency with persistent, shareable addresses access rights are implemented identities can be passed on to programs sources can be gathered in virtual collections

smart indexes: processing a smart spider visits a virtual collection on behalf of a researcher with his credentials every addressable element is subjected to a smart pattern analyzer this lends itself to parallellism smart pattern detection <= high performance occurrences are added to the index in the form

smart indexes: results the resulting index is stored in the researcher’s workspace the researcher may publish his index for programmatic use addressed with a new persistent id new smart virtual collections are possible third parties write subsequent applications portals visualisations

conclusions research projects in the humanities: “small” projects, repeatedly accessing shared sources many people annotating the same data computation becoming more intensive – gradually many computing steps to be combined in workflows “small” data, but diverse, and deeply linked need for sustaining sources and results

conclusions what is the best supporting technology? if it is GRID then do not stick to concrete use cases they do not mobilise the SSH comunity start supporting virtual use cases because then you connect technology with a community and lots of concrete use cases will follow

Epilogue nothing is simple and rational except what we ourselves have invented...