TOWARDS AN ARCHITECTURE FOR NATIONAL DATA SERVICES Ian Foster Director, Computation Institute Argonne National Laboratory & The University of

Slides:



Advertisements
Similar presentations
Data Publishing Service Indiana University Stacy Kowalczyk April 9, 2010.
Advertisements

DuraSpace: Digital Information All Ways, Always Pretoria, South Africa May 14 th, 2009.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Data Grid: Storage Resource Broker Mike Smorul. SRB Overview Developed at San Diego Supercomputing Center. Provides the abstraction mechanisms needed.
IPlant Data Commons. iPlant Data Commons leverages all elements of our CI to enhance data management, discoverability, and reuse.
Collaboration on Large Datasets using Globus Rachana Ananthakrishnan University of Chicago.
Hydra Partners Meeting March 2012 Bill Branan DuraCloud Technical Lead.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
International Workshop APAN 24, Current State of Grid Computing Researches and Applications in Vietnam Nguyen Thanh Thuy 1, Nguyen Kim Khanh 1,
Cyberinfrastructure for Rapid Prototyping Capability Tomasz Haupt, Anand Kalyanasundaram, Igor Zhuk, Vamsi Goli Mississippi State University GeoResouces.
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation Mike Smorul, Joseph JaJa, Yang Wang, and Fritz McCall.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Data-PASS Shared Catalog Micah Altman & Jonathan Crabtree 1 Micah Altman Harvard University Archival Director, Henry A. Murray Research Archive Associate.
Building Data-intensive Pipelines Ravi K Madduri Argonne National Lab University of Chicago.
Grid Information Systems. Two grid information problems Two problems  Monitoring  Discovery We can use similar techniques for both.
DoSon Grid Computing School, Current State of Grid Computing Researches and Applications in Vietnam Vu Duc Thi 1, Nguyen Thanh Thuy 2, Tran Van.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
ARGONNE  CHICAGO Ian Foster Discussion Points l Maintaining the right balance between research and development l Maintaining focus vs. accepting broader.
Making Connections: SHARE and the Open Science Framework Jeffrey Open Repositories 2015.
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
Use & Access 26 March Use “Proof of Concept” Model for General Libraries & IS faculty Model for General Libraries & IS faculty Test bed for DSpace.
Centre for eResearch The University of Auckland Research Data–Preserve, Share, Reuse, Publish, or Perish Mark Gahegan Director, Centre for eResearch 24.
Web: Minimal Metadata for Data Services Through DIALOGUE Neil Chue Hong AHM2007.
University of Illinois at Urbana-Champaign BeeSpace Navigator v4.0 and Gene Summarizer beespace.uiuc.edu `
SEAD Virtual Archive :: A Thin Layer for Scientific Discovery and Long-Term Preservation Inna Kouper April #dlbbspring2013.
Ph No: Mob: , plot No-27, NGGO's Colony, Pattabhi reddy gardens, Visakhapatnam-07 Oracle.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
DataONE: Preserving Data and Enabling Data-Intensive Biological and Environmental Research Bob Cook Environmental Sciences Division Oak Ridge National.
Information Infrastructure Evolution ARIIC is working towards – a distributed electronic research environment that allows researchers to share, annotate,
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
Replica Management Kelly Clynes. Agenda Grid Computing Globus Toolkit What is Replica Management Replica Management in Globus Replica Management Catalog.
Millman—Nov 04—1 An Update on Digital Libraries David Millman Director of Research & Development Academic Information Systems Columbia University
Globus online Software-as-a-Service for Research Data Management Steve Tuecke Deputy Director, Computation Institute University of Chicago & Argonne National.
April 14, 2005MIT Libraries Visiting Committee Libraries Strategic Plan Theme III Work to shape the future MacKenzie Smith Associate Director for Technology.
CombeDay Making Data Openly Available Simon Coles.
A Basic Introduction By Scott Phillips 2005/8/7. Agenda What is DSpace and what does it do? The DSpace Information Model Components & Features of DSpace.
Research data management using Globus ESIP Summer Meeting 2015 Rachana Ananthakrishnan University of Chicago
Globus and ESGF Rachana Ananthakrishnan University of Chicago
Globus Publish Lighting Talk Ben Blaiszik, Kyle Chard
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
Partnerships in Innovation: Serving a Networked Nation Grid Technologies: Foundations for Preservation Environments Portals for managing user interactions.
Globus.org/genomics Globus Galaxies Science Gateways as a Service Ravi K Madduri, University of Chicago and Argonne National Laboratory
Open Science Framework Jeffrey Spies University of Virginia.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
© Geodise Project, University of Southampton, Workflow Support for Advanced Grid-Enabled Computing Fenglian Xu *, M.
Brian Nosek University of Virginia -- Center for Open Science -- Improving Openness.
TeraGrid Capability Discovery John-Paul “JP” Navarro TeraGrid Area Co-Director for Software Integration University of Chicago/Argonne National Laboratory.
Enabling Digital Earth by focussing on ‘accessibility’ rather than ‘delivery’. Ryan Fraser CSIRO.
Ian Foster Ben Blaiszik Kyle Chard, Rachana Ananthakrishnan, Steven Tuecke, UChicago Michael Ondrejcek,
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Services for Distributed e-Infrastructure Access Tiziana Ferrari on behalf.
International Planetary Data Alliance Registry Project Update September 16, 2011.
IPDA Registry Definitions Project Dan Crichton Pedro Osuna Alain Sarkissian.
Developing our Metadata: Technical Considerations & Approach Ray Plante NIST 4/14/16 NMI Registry Workshop BIPM, Paris 1 …don’t worry ;-) or How we concentrate.
Enhancements to Galaxy for delivering on NIH Commons
Tools and Services Workshop
Joslynn Lee – Data Science Educator
An Overview of Data-PASS Shared Catalog
Software infrastructure for a National Research Platform
Joseph JaJa, Mike Smorul, and Sangchul Song
Research Data Alliance (RDA) 9th WG/IG Collaboration Meeting: Repository Platforms for Research Data (RPRD) Interest Group 13nd June 2018 Co-Chairs:
Repository Platforms for Research Data Interest Group: Requirements, Gaps, Capabilities, and Progress Robert R. Downs1, 1 NASA.
Social media for global scientific community – Mendeley project
Dataverse for citing and sharing research data
Presentation transcript:

TOWARDS AN ARCHITECTURE FOR NATIONAL DATA SERVICES Ian Foster Director, Computation Institute Argonne National Laboratory & The University of ianfoster.org

Architecture?

Principle 1: Reduce data friction Make simple things easy Make hard things possible For example, cloud-hosted software-as-a-service For example, publish- then-filter

Principle 2: Small pieces, loosely joined Storage systems Content management Analysis systems Registries Identity management Data movers … and many more … REST interfaces Open Simple Composable Extensible Versioned

Principle 3: Insist on stories For example: – “I need to store/backup/archive my data” – “I need to transfer/mirror my data” – “I need to share my data” – “I need to publish my data” – “I need to discover published data” – “I need to analyze my data” Good stories are detailed, urgent, popular

We have much to build on and/or integrate with Agave Brown Dog DataCite DATAone Dataverse Earth System Grid Globus Globus Connect InCommon iPlant ORCID SEAD XSEDE Zenodo Many more Many many more!

Globus cloud-hosted software-as-a-service for: Data transfer, sync, and sharing Identity and group management Data publication and discovery Globus Connect software to integrate resources and institutions Globus demonstration

What does it mean to publish? Data is: Identified Described Curated Verifiable Accessible Preserved

I can: Search Browse Access the data What does it mean to discover?

Data publication and discovery Metadata Access Control License Storage Curation Workflow Curation Workflow Policies Collection Metadata Data Metadata Data Metadata Data Dataset Community

Takeaway messages Three principles: – Reduce data friction – Small pieces, loosely joined – Insist on stories We have strong components to start with We’d like from you: – Stories to inform and prioritize – Volunteers to deploy and explore

The following are backup slides in case network failure prevents the live demonstration

Publish dashboard 14

Start a new submission 15

16 Describe submission: 1) Dublin Core

17 Describe submission: 2) Science metadata

Assemble the dataset 18

19 Transfer files to submission endpoint

20 Check dataset is assembled correctly

Submission now in curation workflow 21

Search published datasets 22

Search across collections

Discover a published dataset 24

Select a published dataset 25

View downloaded dataset 26