Proposed DataONE TeraGrid Joint Initiative John Cobb, TeraGrid, and DataONE Presentation to TeraGrid Quarterly Management Meeting August 31, 2010 Seattle,

Slides:



Advertisements
Similar presentations
DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
Advertisements

Maines Sustainability Solutions Initiative (SSI) Focuses on research of the coupled dynamics of social- ecological systems (SES) and the translation of.
CHORUS Implementation Webinar May 16, 2014 Mark Martin Assistant Director, Office of Scientific and Technical Information Office of Science U.S. Department.
Joint CASC/CCI Workshop Report Strategic and Tactical Recommendations EDUCAUSE Campus Cyberinfrastructure Working Group Coalition for Academic Scientific.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Presentation at WebEx Meeting June 15,  Context  Challenge  Anticipated Outcomes  Framework  Timeline & Guidance  Comment and Questions.
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CIF21) NSF-wide Cyberinfrastructure Vision People, Sustainability, Innovation,
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CF21) IRNC Kick-Off Workshop July 13,
NSF and Environmental Cyberinfrastructure Margaret Leinen Environmental Cyberinfrastructure Workshop, NCAR 2002.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
Data-PASS Shared Catalog Micah Altman & Jonathan Crabtree 1 Micah Altman Harvard University Archival Director, Henry A. Murray Research Archive Associate.
1 Building National Cyberinfrastructure Alan Blatecky Office of Cyberinfrastructure EPSCoR Meeting May 21,
Cyberinfrastructure Status July, NSF reverse site visit Refactoring and cleanup after review preparations Coordinating Node technology changes.
Computing in Atmospheric Sciences Workshop: 2003 Challenges of Cyberinfrastructure Alan Blatecky Executive Director San Diego Supercomputer Center.
Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.
EGI-Engage EGI-Engage Engaging the EGI Community towards an Open Science Commons Project Overview 9/14/2015 EGI-Engage: a project.
A Proposal for a Distributed Earth Observation Data Network Matthew B Jones UC Santa Barbara National Center for Ecological Analysis and Synthesis (NCEAS)
CI Days: Planning Your Campus Cyberinfrastructure Strategy Russ Hobby, Internet2 Internet2 Member Meeting 9 October 2007.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Future support of EGI services Tiziana Ferrari/EGI.eu Future support of EGI.
The Department of Energy’s Public Access Solution Giving Voice to Energy and Science R&D Results Jeffrey Salmon Deputy Director for Resource Management.
U.S. Department of the Interior U.S. Geological Survey CDI Webinar Sept. 5, 2012 Kevin T. Gallagher and Linda C. Gundersen September 5, 2012 CDI Science.
Cyberinfrastructure Overview Core Cyberinfrastructure Team Matthew B. Jones National Center for Ecological Analysis and Synthesis (NCEAS) University of.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
What is Cyberinfrastructure? Russ Hobby, Internet2 Clemson University CI Days 20 May 2008.
Libraries, Archives, and Digital Preservation: The Reality of What We Must Do Leslie Johnston Acting Director, National Digital Information Infrastructure.
ESIP Federation Air Quality Cluster Partner Agencies.
DISCIPLINARY PERSPECTIVE BIOLOGY/ECOLOGY Workshop on Cyberinfrastructure for Environmental Research and Education November 1, 2002.
SEAD Virtual Archive :: A Thin Layer for Scientific Discovery and Long-Term Preservation Inna Kouper April #dlbbspring2013.
Infrastructures for Social Simulation Rob Procter National e-Infrastructure for Social Simulation ISGC 2010 Social Simulation Tutorial.
Cyberinfrastructure What is it? Russ Hobby Internet2 Joint Techs, 18 July 2007.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
Block 7: Reports Back to Plenary Group on CE and CI Working Group Activities Tasks and Activities -- October 22 DataONE Kick-off Meeting October 20-22,
Interoperability from the e-Science Perspective Yannis Ioannidis Univ. Of Athens and ATHENA Research Center
Breakout # 1 – Data Collecting and Making It Available Data definition “ Any information that [environmental] researchers need to accomplish their tasks”
Enabling e-Research in Combustion Research Community T.V Pham 1, P.M. Dew 1, L.M.S. Lau 1 and M.J. Pilling 2 1 School of Computing 2 School of Chemistry.
Award # funded by the National Science Foundation Award #ACI Jetstream: A Distributed Cloud Infrastructure for.
1 NSF/TeraGrid Science Advisory Board Meeting July 19-20, San Diego, CA Brief TeraGrid Overview and Expectations of Science Advisory Board John Towns TeraGrid.
April 14, 2005MIT Libraries Visiting Committee Libraries Strategic Plan Theme III Work to shape the future MacKenzie Smith Associate Director for Technology.
Applied Sciences Perspective Lawrence Friedl, Program Director NASA Earth Science Applied Sciences Program LANCE User Working Group Meeting  September.
Cyberinfrastructure Overview Russ Hobby, Internet2 ECSU CI Days 4 January 2008.
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
2005 GRIDS Community Workshop1 Learning From Cyberinfrastructure Initiatives Grid Research Integration Development & Support
System Development & Operations NSF DataNet site visit to MIT February 8, /8/20101NSF Site Visit to MIT DataSpace DataSpace.
1 Why is Digital Curation Important for Workforce and Economic Development? Alan Blatecky Office of Cyberinfrastructure Symposium on Digital Curation in.
Infrastructure Breakout What capacities should we build now to manage data and migrate it over the future generations of technologies, standards, formats,
Cloud-based e-science drivers for ESAs Sentinel Collaborative Ground Segment Kostas Koumandaros Greek Research & Technology Network Open Science retreat.
NASA Earth Exchange (NEX) A collaborative supercomputing environment for global change science Earth Science Division/NASA Advanced Supercomputing (NAS)
An Open Data Platform in the framework of the EGI-LifeWatch Competence Centre Fernando Aguilar Jesús Marco
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No EUDAT Aalto Data.
A Shared Commitment to Digital Preservation and Access.
Proposed DataONE TeraGrid Joint Initiative John Cobb, TeraGrid, and DataONE Presentation to TeraGrid Quarterly Management Meeting August 31, 2010 Seattle,
CEOS Working Group on Information System and Services (WGISS) Data Access Infrastructure and Interoperability Standards Andrew Mitchell - NASA Goddard.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No Support to scientific.
TeraGrid’s Process for Meeting User Needs. Jay Boisseau, Texas Advanced Computing Center Dennis Gannon, Indiana University Ralph Roskies, University of.
Helmholtz Open Science Webinars on Research Data Webinar 34 – 6 / 11 April 2016 Dr. Birgit Schmidt Niedersächsische Staats- und Universitätsbibliothek.
EOSC MODEL Pasquale Pagano CNR - ISTI
INTAROS WP5 Data integration and management
DataNet Collaboration
Summit 2017 Breakout Group 2: Data Management (DM)
Joseph JaJa, Mike Smorul, and Sangchul Song
Repository Platforms for Research Data Interest Group: Requirements, Gaps, Capabilities, and Progress Robert R. Downs1, 1 NASA.
Three Uses for a Technology Roadmap
Bird of Feather Session
Wrap-Up – NSF Site Visit 8 February 2010
EOSC-hub Contribution to the EOSC WGs
Presentation transcript:

Proposed DataONE TeraGrid Joint Initiative John Cobb, TeraGrid, and DataONE Presentation to TeraGrid Quarterly Management Meeting August 31, 2010 Seattle, WA

DataONE objectives Develop a distributed cyberinfrastructure architecture to enable the long term preservation of digital data: support the data life cycle Engage the scientific community to move forward concepts of –Digital data archives of scholarly data –Best practices for digital data preservation –Engage journal publishers’ efforts for digital data repositories (e.g. Dryad) Enable new science via data synthesis Develop a long-term sustainability strategy – decades long –Architecture –Technology future-proofing –Arrangements/MOU’s Focus on ecological, biological, environmental science areas.

What shapes DataONE? Challenges associated with climate variability Community needs good data Good data –builds good science –makes possible wise management –enables sound decisions Good data needs –good technical infrastructure –sound organization –community engagement (you)

Architecture to support the data lifecycle UCSB Node UNM Node ORC Node 1.Deposition/acquisition/ingest 2.Curation and metadata management 3.Protection, including privacy 4.Discovery, access, use, and dissemination 5.Interoperability, standards, and integration 6.Evaluation, analysis, and visualization The data lifecycle }

DataONE – Building new global CI Additional prospective member nodes under discussion

The Character of a member node Source of data Participant in a larger collective (Usually) provides new and interesting data sets (watersheds, satellite remote observations, citizen science data collections, environmental observations, geographical diversity, specific diversity, discipline diversity Supports DataONE Member Node (MN) software stack May contribute storage to support replicas of other member nodes May differ in scale –My data –University library digital services arm –Associated data repositories for journals –DOI infrastructure –Project specific data collections –Agency specific programs for data management –National scale cyberinfrastructure providers (i.e. TG)

The Metadata challenge “the flood of increasingly heterogeneous data” Data are heterogeneous –Syntax (format) –Schema (model) –Semantics (meaning) Jones et al DataONE Focus: Synthesize data sets with disparate metadata to provide new scientific insights

DataONE Member Node Operations Minimal set of operations to enable a distributed archive –Minimal to enable wide deployment in heterogeneous environment –Does not include some operations that are Coordinating node only That set = {C,R,U,D} –Create –Replicate –Update –Delete Implementation –Pilot now (operational and operational) –Eval. of Pilot started –V.1 deploy planned next yr. Deployed platforms –Python –R –Mercury –… Note the meaning of “platform”

Coordinating Nodes Contains full metadata catalog of member node data collections Directs certain operations –Replication direction –Location tracking – Ingestion – Assisted by deployed platforms. Ex. Mercury leads to automatic ingest capability for NASA DAAC (MODIS data) CN locations also have MN instances. Provides some “free energy” for replication

Service layer model of data/knowledge services (Analogy with OSI) Platters Controllers Hardware redundancy I/O Bandwidth provisioning Connections File systems AAAA Federated Identity Wide area data distribution – Block level – Xnodes – File level Metadata generation (Automatically?) Metadata harmonization Replication, decoherent, survivable copies Workflow mediated data operations Semantics and ontology

Natural TG and DataONE interaction TG emphasizes left column DataONE emphasizes right column --- for areas of interest. DataONE MN collective resembles part of old TeraGrid collections mission DataONE includes large community engagement component with the hope of generating sufficient interest for collected communities to sustain interest (c.f. well attended data best practices tutorial at 2010 Ecological Society of Am. meeting

Proposed interaction For DataONE: TeraGrid RP’s (XD Sp’s) as Member nodes For TeraGrid: DataONE as a data oriented Science Gateway Requirements: –For DataONE: Participate in TG activities – Sci Gwy efforts – Some of TG’s distributed data efforts – Some of TG outreach Request data allocations –TeraGrid RP’s: Deploy DataONE MN services Make MN services available as REST services (advsertised SW IIS) –Both: Interact Investigate “new opportunities”

What about XD? TeraGrid is “Pre-XD” Does XD have a data archive mission? –yes (as far as I know now) –All things Digital, but eXtreme The goal of this solicitation is to encourage innovation in the design and implementation of an effective, efficient, increasingly virtualized approach to the provision of high-end digital services – extreme digital services - while ensuring that the infrastructure continues to deliver high-quality access for the many researchers and educators that use it in their work. Conclusion: work with current TeraGrid and plan to manage a smooth transition to XD (DataONE will need to be capable of this pivot if it hopes to have decades long stewardship) Go ahead and get started now

Sustainability DataONE is called to create an environment for “decades long” sustainability – technically and economically No project has more than a 5 year horizon (not even NASA archives) Datanet’s must “figure this out” Solution: plan to manage change Recognize the underlying forces. Science wants data preservation “someone will provide” (More detail needed here)

What is the Value add? Helps TG and DataONE meet their respective goals – Providing cyberinfrastructure for NSF funded research – Providing curation and life cycle support for digital data archives Diminishes DataONE need to provision large amounts of low level data resources internally – partner instead of re-invent Re-iterates TeraGrid/XD mission to provide tier 2 (and tier 1) resources for storage

Next steps/action items Commission a combined TG+D1 WG –Goals Develop TG RP’s as DataONE meber nodes –Action Items DataONE All hands meeting Nov. 2-5 Tamaya, NM Initiate DataONE SGW Initial TG allocation Deploy pilot MN stack on TG resources Demonstrate CN orchestrated replication to TG MN’s – exercise the CRUD services –Composition TeraGrid – Chris Jordan – TG AD for Data – Nancy Wilkins-Diehr – TG AD for SGW – Dan Katz – TG Dir. Of Science – Others? DataONE – Dave Vieglais, DataONE AD for CI – John Cobb, Dist Storage WG lead – Bruce Wilson, DataONE core cyberinfrastructure team (CCIT) – Others

Where are future opportunities? MN replication can be viewed as data placement. Thus DataONE can be a data staging method for large scale computations on TG/XD Metadata harmonization can imply moderate to large regular computations (“daily farm fresh” data-sets may require daily data/computation workflows) “Noodle out” how to support NSF data management plan requirement, perhaps together Ability to integrate with MRE’s as a ready data management solution Ability to integrate with similar simulation efforts (much more data intensive)

Discussion/Questions?

Post discussion action items Smaller team continue discussions (Cobb, Jordan, Katz, Wilkins-Diehr, Vieglais, Wilson, Jones) Bundle pilot MN SW for TG MN deployment Identify MN listening ports for services Initiate Security WG Initiate Gateway project Define RP’s willing to deploy these services DataONE to write TG allocation request –Gateway services –Replicated Data Service Continue larger discussion, particularly as larger needs come down the line Explore mutual line of business opportunities Separately: continue to investigate economic sustainability of large scale storage needs