ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.

Slides:



Advertisements
Similar presentations
GRADD: Scientific Workflows. Scientific Workflow E. Science laboris Workflows are the new rock and roll of eScience Machinery for coordinating the execution.
Advertisements

1 Ontolog OOR Use Case Review Todd Schneider 1 April 2010 (v 1.2)
© Fraunhofer Institute SCAI and other members of the SIMDAT consortium Data Grids for Process and Product Development using Numerical Simulation and Knowledge.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
 Amazon Web Services announced the launch of Cluster Compute Instances for Amazon EC2.  Which aims to provide high-bandwidth, low- latency instances.
Dr Matthew Stiff CEH Director Environmental Informatics Presentation to CRM SIG NeSC Edinburgh 12 July 2007 The Environmental Informatics Programme.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
UMIACS PAWN, LPE, and GRASP data grids Mike Smorul.
NGNS Program Managers Richard Carlson Thomas Ndousse ASCAC meeting 11/21/2014 Next Generation Networking for Science Program Update.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation Mike Smorul, Joseph JaJa, Yang Wang, and Fritz McCall.
Advanced Data Mining and Integration Research for Europe ADMIRE – Framework 7 ICT ADMIRE Overview European Commission 7 th.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Optimizing Business Operations Business Priorities Presentation.
Integrated e-Infrastructure for Scientific Facilities Kerstin Kleese van Dam STFC- e-Science Centre Daresbury Laboratory
Architecting Web Services Unit – II – PART - III.
Markus Dolensky, ESO Technical Lead The AVO Project Overview & Context ASTRO-WISE ((G)A)VO Meeting, Groningen, 06-May-2004 A number of slides are based.
DR Software: Essential Foundational Elements and Platform Components UCLA Smart Grid Energy Research Center (SMERC) Industry Partners Program (IPP) Meeting.
IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.
A framework to support collaborative Velo: Knowledge Management for Collaborative (Science | Biology) Projects A framework to support collaborative 1.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
Ocean Observatories Initiative OOI Cyberinfrastructure Architecture Overview Michael Meisinger September 29, 2009.
Internet2 Middleware Initiative. Discussion Outline  What is Middleware why is it important why is it hard  What are the major components of middleware.
1 Advanced Software Architecture Muhammad Bilal Bashir PhD Scholar (Computer Science) Mohammad Ali Jinnah University.
Introduction to Grid Computing Ed Seidel Max Planck Institute for Gravitational Physics
Tools for collaboration How to share your duck tales…
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Actualog Social PIM Helps Companies to Manage and Share Product Information Using Secure, Scalable Ease of Microsoft Azure MICROSOFT AZURE ISV PROFILE:
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
ISERVOGrid Architecture Working Group Brisbane Australia June Geoffrey Fox Community Grids Lab Indiana University
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
National Archives and Records Administration Status of the ERA Project RACO Chicago Meg Phillips August 24, 2010.
April 14, 2005MIT Libraries Visiting Committee Libraries Strategic Plan Theme III Work to shape the future MacKenzie Smith Associate Director for Technology.
1 Gateways. 2 The Role of Gateways  Generally associated with primary sites in ESG-CET  Provides a community-facing web presence  Can be branded as.
U.S. Department of the Interior U.S. Geological Survey Decision Support Tools and USGS Data Management Best Practices Cassandra Ladino USGS Chesapeake.
System Development & Operations NSF DataNet site visit to MIT February 8, /8/20101NSF Site Visit to MIT DataSpace DataSpace.
Securing the Grid & other Middleware Challenges Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer.
Features Of SQL Server 2000: 1. Internet Integration: SQL Server 2000 works with other products to form a stable and secure data store for internet and.
XMC Cat: An Adaptive Catalog for Scientific Metadata Scott Jensen and Beth Plale School of Informatics and Computing Indiana University-Bloomington Current.
Globus.org/genomics Globus Galaxies Science Gateways as a Service Ravi K Madduri, University of Chicago and Argonne National Laboratory
End-to-End Data Services A Few Personal Thoughts Unidata Staff Meeting 2 September 2009.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
Cyberinfrastructure Overview of Demos Townsville, AU 28 – 31 March 2006 CREON/GLEON.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
ORNL Site Report ESCC Feb 25, 2014 Susan Hicks. 2 Optical Upgrades.
EGI-InSPIRE EGI-InSPIRE RI The European Grid Infrastructure Steven Newhouse Director, EGI.eu Project Director, EGI-InSPIRE 29/06/2016CoreGrid.
Grid Services for Digital Archive Tao-Sheng Chen Academia Sinica Computing Centre
Accessing the VI-SEEM infrastructure
Organizations Are Embracing New Opportunities
GISELA & CHAIN Workshop Digital Cultural Heritage Network
Tools and Services Workshop
Joslynn Lee – Data Science Educator
Partner Logo Veropath Offers a Next-Gen Expense Management SaaS Technology Solution, Built Specifically to Harness Big Data Analytics Capabilities in Azure.
DataNet Collaboration
StratusLab Final Periodic Review
StratusLab Final Periodic Review
DOE Facilities - Drivers for Science: Experimental and Simulation Data
Joseph JaJa, Mike Smorul, and Sangchul Song
IreckonU Offers a Powerful Hospitality Software Solution, Seamlessly Integrating Existing Hospitality Systems and Services on the Powerful Microsoft Azure.
USF Health Informatics Institute (HII)
iSERVOGrid Architecture Working Group Brisbane Australia June
SDM workshop Strawman report History and Progress and Goal.
A distributed data-centric infrastructure
Common Solutions to Common Problems
Brian Matthews STFC EOSCpilot Brian Matthews STFC
GISELA & CHAIN Workshop Digital Cultural Heritage Network
Bird of Feather Session
Presentation transcript:

ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney

Data Science Centers - Conceptual View Core Integrated Services Federated Services Catalog Core Integrated Services Federated Services Catalog Data Science Center Complimentary data science services Staff with expertise in many areas of data science – Partnering with Domain Scientists Core integrated services – Federated identity – Data replication and inter-site data & metadata access – Data publication and digital curation services – Inter-site workflows – Other critical replicated services Federated services catalog – Core common services – Site-specific services – Common service provisioning API Providing the ability to construct complex multi-site data analysis environments from composable and customizable services Data Science Center

A rich environment of common services that can be flexibly composed to meet specific requirements of science domains across DOE SC MPI ADIOS Metadata Harvesting & Management Metadata Harvesting & Management Indexing, Discovery & Dissemination Indexing, Discovery & Dissemination Semantic Analysis Semantic Analysis Platform Instantiation Interface Platform Instantiation Interface Workflow Composition & Execution Manager Workflow Composition & Execution Manager Data Mining Data Services Simulation Services Simulation Frameworks Scalable Debuggers Scientific Libraries Scientific Libraries Analytic Services Data Fusion System Software & Middleware Services Map Reduce Map Reduce HIVE Key Value Stores Graph Databases SQL Databases Human Computer Interaction Infrastructure Services HPC Compute Utility Compute Parallel File Systems Archival Storage Object Storage Workflow Composition Security Message Queues Network Storage Visualization Environments Visual Analytics Interface Visual Analytics Interface Data Transfer Tools Data Transfer Tools Advanced Networking SDN Our Infrastructure is Services

Pilot Integrated Services - Federated identity -Data replication -Data publication -Inter-site workflows Pilot Integrated Services - Federated identity -Data replication -Data publication -Inter-site workflows Multi-lab collaboration to demonstrate an integrated data science capability based on existing infrastructure – Federated identity & consistent security/cyber policies – Data replication and ease of data access across sites – Advanced analysis systems – Persistent services & data publication services Data Science Center Phase 1 – Site specific service workflows – Deployment of integrated services Phase 2 – Inter-site composition of services – Prototype federated services catalog Data Science Center Demonstration Overview

Guiding Principles Infrastructure services should be API-driven to a high degree, to allow composition of services Aim for commonality and consistency but allow for uniqueness (Federated versus tightly integrated) Allow domains to create and customize data analysis environments from services at various levels of the infrastructure based on their level of sophistication and existing services Provide an increased level of availability and redundancy than exist today 5

Which Services to Demo? We will focus on core services that we envision are useful across a broad set of science domains / use cases. We will sample from a couple of the domain demos and identify high-level services that we could demonstrate in a coordinated fashion across the 5 sites. 6

Potential Services of Relevance Single sign-on Replicated data storage Data publishing / curation Data capture service from facilities Tiered storage (near line and archival) Provenance (data and/or workflow provenance) Message queues (supporting distributed workflow systems) Anycast networking Network traffic isolation for performance and security Application as a service? Admin and/or User-controlled Service provisioning: – Anycast aware load balancer (web services behind perhaps) – Highly scalable MongoDB or MySQL Ultimately the services we demo will be guided by the science domain demonstrations 7

A rich environment of common services that can be flexibly composed to meet specific requirements of science domains across DOE SC Provenance & Publishing Analysis Execution Analysis Execution Data Services Simulation Services Simulation Execution Analytic Services Data Integration System Software & Middleware Services Key Value Stores SQL Databases Human Computer Interaction Infrastructure Services HPC Compute Utility Compute File Systems Archival Storage Message Queues Web/Visual Interface Web/Visual Interface Data Capture Data Capture Advanced Networking SDN Core Services Stack for Demo Data Transfer Single Sign-on/Security Workflow Composition User Provision-able Services