Supercomputing 2006 Scientific Data Management Center Lead Institution: LBNL; PI: Arie Shoshani Laboratories: ANL, ORNL, LBNL, LLNL, PNNL Universities:

Slides:



Advertisements
Similar presentations
1 The SciDAC Scientific Data Management Center: Infrastructure and Results Arie Shoshani Lawrence Berkeley National Laboratory SC 2004 November, 2004.
Advertisements

Kensington Oracle Edition: Open Discovery Workflow Meets Oracle 10g Professor Yike Guo.
Earth System Curator Spanning the Gap Between Models and Datasets.
University of Chicago Department of Energy The Parallel and Grid I/O Perspective MPI, MPI-IO, NetCDF, and HDF5 are in common use Multi TB datasets also.
Phillip Dickens, Department of Computer Science, University of Maine. In collaboration with Jeremy Logan, Postdoctoral Research Associate, ORNL. Improving.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Companies can suffer numerous problems due to poor management of resources and careless decisions. In real-world decision- making, many organizations lack.
Priority Research Direction (I/O Models, Abstractions and Software) Key challenges What will you do to address the challenges? – Develop newer I/O models.
Grid Collector: Enabling File-Transparent Object Access For Analysis Wei-Ming Zhang Kent State University John Wu, Alex Sim, Junmin Gu and Arie Shoshani.
Astrophysics, Biology, Climate, Combustion, Fusion, Nanoscience Working Group on Simulation-Driven Applications 10 CS, 10 Sim, 1 VR.
Connecting HPIO Capabilities with Domain Specific Needs Rob Ross MCS Division Argonne National Laboratory
Arie Shoshani The Scientific Data Management Center Arie Shoshani (PI) Lawrence Berkeley National Laboratory DOE Laboratories.
UNIVERSITY of MARYLAND GLOBAL LAND COVER FACILITY High Performance Computing in Support of Geospatial Information Discovery and Mining Joseph JaJa Institute.
N Tropy: A Framework for Analyzing Massive Astrophysical Datasets Harnessing the Power of Parallel Grid Resources for Astrophysical Data Analysis Jeffrey.
Computer Science in UNEDF George Fann, Oak Ridge National Laboratory Rusty Lusk, Argonne National Laboratory Jorge Moré, Argonne National Laboratory Esmond.
Slide 1 Auburn University Computer Science and Software Engineering Scientific Computing in Computer Science and Software Engineering Kai H. Chang Professor.
July, 2001 High-dimensional indexing techniques Kesheng John Wu Ekow Otoo Arie Shoshani.
Scientific Data Management (SDM)
U.S. Department of Energy Office of Science Advanced Scientific Computing Research Program CASC, May 3, ADVANCED SCIENTIFIC COMPUTING RESEARCH An.
SDM meeting, July 10-11, 2001Area 3 Report Data mining and discovery of access patterns 3a.i) Adaptive file caching in a distributed system (LBNL) 3b.i)
Seaborg Cerise Wuthrich CMPS Seaborg  Manufactured by IBM  Distributed Memory Parallel Supercomputer  Based on IBM’s SP RS/6000 Architecture.
1 Scientific Data Management Center DOE Laboratories: ANL: Rob Ross LBNL:Doron Rotem LLNL:Chandrika Kamath ORNL: Nagiza Samatova.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
November 13, 2006 Performance Engineering Research Institute 1 Scientific Discovery through Advanced Computation Performance Engineering.
Physics Steven Gottlieb, NCSA/Indiana University Lattice QCD: focus on one area I understand well. A central aim of calculations using lattice QCD is to.
Chandrika Kamath and Imola K. Fodor Center for Applied Scientific Computing Lawrence Livermore National Laboratory Gatlinburg, TN March 26-27, 2002 Dimension.
1 Arie Shoshani, LBNL SDM center Scientific Data Management Center(SDM-ISIC) Arie Shoshani Computing Sciences Directorate Lawrence Berkeley National Laboratory.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Accelerating Scientific Exploration Using Workflow Automation Systems Terence Critchlow (LLNL) Ilkay Altintas (SDSC) Scott Klasky(ORNL) Mladen Vouk (NCSU)
SciDAC All Hands Meeting, March 2-3, 2005 Northwestern University PIs:Alok Choudhary, Wei-keng Liao Graduate Students:Avery Ching, Kenin Coloma, Jianwei.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
Opportunities in Parallel I/O for Scientific Data Management Rajeev Thakur and Rob Ross Mathematics and Computer Science Division Argonne National Laboratory.
RI User Support in DEISA/PRACE EEF meeting 2 November 2010, Geneva Jules Wolfrat/Axel Berg SARA.
The european ITM Task Force data structure F. Imbeaux.
Parallel and Grid I/O Infrastructure W. Gropp, R. Ross, R. Thakur Argonne National Lab A. Choudhary, W. Liao Northwestern University G. Abdulla, T. Eliassi-Rad.
Service - Oriented Middleware for Distributed Data Mining on the Grid ,劉妘鑏 Antonio C., Domenico T., and Paolo T. Journal of Parallel and Distributed.
Pascucci-1 Valerio Pascucci Director, CEDMAV Professor, SCI Institute & School of Computing Laboratory Fellow, PNNL Massive Data Management, Analysis,
Presented by An Overview of the Common Component Architecture (CCA) The CCA Forum and the Center for Technology for Advanced Scientific Component Software.
SDM Center’s Data Mining & Analysis SDM Center Parallel Statistical Analysis with RScaLAPACK Parallel, Remote & Interactive Visual Analysis with ASPECT.
A High performance I/O Module: the HDF5 WRF I/O module Muqun Yang, Robert E. McGrath, Mike Folk National Center for Supercomputing Applications University.
1 Arie Shoshani, LBNL SDM center Scientific Data Management Center (Integrated Software Infrastructure Center – ISIC) Arie Shoshani All Hands Meeting March.
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
1 Scientific Data Management Center(ISIC) contains extensive publication list.
The Earth System Grid (ESG) Computer Science and Technologies DOE SciDAC ESG Project Review Argonne National Laboratory, Illinois May 8-9, 2003.
Presented by Scientific Data Management Center Nagiza F. Samatova Network and Cluster Computing Computer Sciences and Mathematics Division.
1 1 What does Performance Across the Software Stack mean?  High level view: Providing performance for physics simulations meaningful to applications 
Your name here SPA: Successes, Status, and Future Directions Terence Critchlow And many, many, others Scientific Process Automation PNNL.
Scalable Systems Software for Terascale Computer Centers Coordinator: Al Geist Participating Organizations ORNL ANL LBNL.
May 6, 2002Earth System Grid - Williams The Earth System Grid Presented by Dean N. Williams PI’s: Ian Foster (ANL); Don Middleton (NCAR); and Dean Williams.
I/O for Structured-Grid AMR Phil Colella Lawrence Berkeley National Laboratory Coordinating PI, APDEC CET.
Jay Lofstead Input/Output APIs and Data Organization for High Performance Scientific Computing November.
J.-N. Leboeuf V.K. Decyk R.E. Waltz J. Candy W. Dorland Z. Lin S. Parker Y. Chen W.M. Nevins B.I. Cohen A.M. Dimits D. Shumaker W.W. Lee S. Ethier J. Lewandowski.
Software Engineering Chapter: Computer Aided Software Engineering 1 Chapter : Computer Aided Software Engineering.
Presented by Scientific Data Management Center Nagiza F. Samatova Oak Ridge National Laboratory Arie Shoshani (PI) Lawrence Berkeley National Laboratory.
DOE Network PI Meeting 2005 Runtime Data Management for Data-Intensive Scientific Applications Xiaosong Ma NC State University Joint Faculty: Oak Ridge.
SDM Center High-Performance Parallel I/O Libraries (PI) Alok Choudhary, (Co-I) Wei-Keng Liao Northwestern University In Collaboration with the SEA Group.
SDM Center Parallel I/O Storage Efficient Access Team.
Parallel I/O Performance Study and Optimizations with HDF5, A Scientific Data Package Christian Chilan, Kent Yang, Albert Cheng, Quincey Koziol, Leon Arber.
Climate-SDM (1) Climate analysis use case –Described by: Marcia Branstetter Use case description –Data obtained from ESG –Using a sequence steps in analysis,
1 Open Science Grid: Project Statement & Vision Transform compute and data intensive science through a cross- domain self-managed national distributed.
1 Particle Physics Data Grid (PPDG) project Les Cottrell – SLAC Presented at the NGI workshop, Berkeley, 7/21/99.
Fermi National Accelerator Laboratory & Thomas Jefferson National Accelerator Facility SciDAC LQCD Software The Department of Energy (DOE) Office of Science.
Design and Planning Tools John Grosh Lawrence Livermore National Laboratory April 2016.
ORNL Site Report ESCC Feb 25, 2014 Susan Hicks. 2 Optical Upgrades.
University of Chicago Department of Energy Applications In Hand:  FLASH (HDF-5)  ENZO (MPI-IO)  STAR Likely  Climate – Bill G to contact (Michalakas.
VisIt Project Overview
DOE 2000 PI Retreat Breakout C-1
Scientific Data Management contains extensive publication list
SDM workshop Strawman report History and Progress and Goal.
Presentation transcript:

Supercomputing 2006 Scientific Data Management Center Lead Institution: LBNL; PI: Arie Shoshani Laboratories: ANL, ORNL, LBNL, LLNL, PNNL Universities: NCSU, NWU, SDSC, UCD, Uof Utah Established 5 years ago (SciDAC-1) Successfully re-competed for the next 5 years (SciDAC-2) Featured in Fall 2006 issue of SciDAC Review magazine SciDAC Review, Issue 2, Fall 2006Illustration: A. Tovey

Supercomputing 2006 SDM Infrastructure Uses three-layer organization of technologies Goal: Reduce data management overhead Integrated Approach: To provide a scientific workflow capability To support data mining and analysis tools To accelerates storage and access to data Scientific Process Automation (SPA) Data Mining and Analysis (DMA) Storage Efficient Access (SEA) Operating System Hardware (e.g. Cray XT3, IBM Blue/Gene L) Benefits Scientists by: Hiding underlying parallel and indexing technology Permitting assembly of modules using workflow description tool

Supercomputing 2006 Automating Scientific Workflow in SPA Enables mining of diverse biology databases Scientific discovery is a multi-step process. SPA-Kepler workflow system automates and manages this process. Illustration: A. Tovey Tasks that required hours or days can now be completed in minutes allowing the biologists to spend their saved time on science. Contact: Terence Critchlow, LLNL

Supercomputing 2006 Feature Identification in Fusion Plasma Because both experiments and simulations report individual crossing points rather than a continuous curve, identifying the orbit topology is challenging. Illustration: A. Tovey. Source: CS Cheng, NYU The plot of orbits in a cross-section of a fusion experiment shows different types of orbits, including circle-like “quasi-periodic orbits” and “island orbits.” Contact: Chandrika Kamath, LLNL A combination of PCA and ICA technology used to identify the key parameters that are relevant to the presence of edge harmonic oscillations in a Tokomak. Approach:

Supercomputing 2006 Searching and Indexing with FastBit Gleaning Insights about Combustion Simulation Searching for regions that satisfy particular criteria is a challenge. FastBit efficiently finds regions of interest. Contact: John Wu, LBNL About FastBit: Extremely fast search of large databases Outperforms commercial software Used by various applications: combustion, STAR, astrophysics visualization. Illustration: A. Tovey Finding & tracking of combustion flame fronts SNL: Drs. J. Chen, W. Doyle NCSU: Dr. T. Echekki Collaborators:

Supercomputing 2006 Searches in STAR Experiment Finding One Event Out of A Million FastBit and Storage Resource Manager (SRM) have enabled efficient mining of the hundreds of terabytes of data generated by the STAR detector. Illustration: A. Tovey STAR scientists extract data of interest within fifteen minutes. Accomplishing the same task before consumed weeks. Contact: John Wu, LBNL

Supercomputing 2006 Parallel Statistical Computing with pR Contact: Nagiza Samatova, ORNL Able to use existing high- level (i.e. R) code Requires minimal effort for parallelizing Offers identical interface Provides efficient and scalable performance Goal: to provide a scalable high performance statistical data analysis framework to help scientists perform interactive analyses of produced data to extract knowledge.

Supercomputing 2006 Parallel Input/Output Scaling Computational Science Illustration: A. Tovey Orchestration of data transfers and speedy analyses depend on efficient systems for storage, access, and movement of data among modules. Supports Parallel-netCDF library built on top of an MPI-IO implementation called ROMIO, which is in turn built on top of an Abstract Device Interface for I/O (ADIO) system, used to access a parallel storage system. Contact: Rob Ross, ANL Multi-layer parallel I/O design: Benefits to scientists: Brings performance, productivity, and portability Improves performance by an order of magnitude.

Supercomputing 2006 Speeding Data Transfer with PnetCDF Illustration: A. Tovey Contact: Rob Ross, ANL The rate of data transfer using HDF5 decreases when a particular problem is divided among more processors. In contrast the parallel version of netCDF improves due to the low-overhead nature of PnetCDF and its tight coupling to MPI-IO. P0 P1 P2 P3 netCDF Parallel File System Parallel netCDF P0 P1 P2 P3 Parallel File System Inter-process communication Enables high performance parallel I/O to netCDF datasets Achieves up to 10 fold performance improvement over HDF5