1 Scientific Data Management Center DOE Laboratories: ANL: Rob Ross LBNL:Doron Rotem LLNL:Chandrika Kamath ORNL: Nagiza Samatova.

Slides:



Advertisements
Similar presentations
University of Chicago Department of Energy The Parallel and Grid I/O Perspective MPI, MPI-IO, NetCDF, and HDF5 are in common use Multi TB datasets also.
Advertisements

ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Priority Research Direction (I/O Models, Abstractions and Software) Key challenges What will you do to address the challenges? – Develop newer I/O models.
Implementation methodology for Emerging Reconfigurable Systems With minimum optimization an appreciable speedup of 3x is achievable for this program with.
Grid Collector: Enabling File-Transparent Object Access For Analysis Wei-Ming Zhang Kent State University John Wu, Alex Sim, Junmin Gu and Arie Shoshani.
Astrophysics, Biology, Climate, Combustion, Fusion, Nanoscience Working Group on Simulation-Driven Applications 10 CS, 10 Sim, 1 VR.
Connecting HPIO Capabilities with Domain Specific Needs Rob Ross MCS Division Argonne National Laboratory
NGNS Program Managers Richard Carlson Thomas Ndousse ASCAC meeting 11/21/2014 Next Generation Networking for Science Program Update.
Arie Shoshani The Scientific Data Management Center Arie Shoshani (PI) Lawrence Berkeley National Laboratory DOE Laboratories.
TeraGrid Gateway User Concept – Supporting Users V. E. Lynch, M. L. Chen, J. W. Cobb, J. A. Kohl, S. D. Miller, S. S. Vazhkudai Oak Ridge National Laboratory.
An Automated Component-Based Performance Experiment and Modeling Environment Van Bui, Boyana Norris, Lois Curfman McInnes, and Li Li Argonne National Laboratory,
SDM Center A Quick Update on the TSI and PIW workflows SDM All Hands March 2-3, Terence Critchlow, Xiaowen Xin, Bertram.
© Fujitsu Laboratories of Europe 2009 HPC and Chaste: Towards Real-Time Simulation 24 March
Computer Science in UNEDF George Fann, Oak Ridge National Laboratory Rusty Lusk, Argonne National Laboratory Jorge Moré, Argonne National Laboratory Esmond.
Effective User Services for High Performance Computing A White Paper by the TeraGrid Science Advisory Board May 2009.
Event Metadata Records as a Testbed for Scalable Data Mining David Malon, Peter van Gemmeren (Argonne National Laboratory) At a data rate of 200 hertz,
Chapter 6 Operating System Support. This chapter describes how middleware is supported by the operating system facilities at the nodes of a distributed.
Scientific Data Management (SDM)
SDM meeting, July 10-11, 2001Area 3 Report Data mining and discovery of access patterns 3a.i) Adaptive file caching in a distributed system (LBNL) 3b.i)
Active Monitoring in GRID environments using Mobile Agent technology Orazio Tomarchio Andrea Calvagna Dipartimento di Ingegneria Informatica e delle Telecomunicazioni.
Center for Programming Models for Scalable Parallel Computing: Project Meeting Report Libraries, Languages, and Execution Models for Terascale Applications.
November 13, 2006 Performance Engineering Research Institute 1 Scientific Discovery through Advanced Computation Performance Engineering.
Physics Steven Gottlieb, NCSA/Indiana University Lattice QCD: focus on one area I understand well. A central aim of calculations using lattice QCD is to.
Chandrika Kamath and Imola K. Fodor Center for Applied Scientific Computing Lawrence Livermore National Laboratory Gatlinburg, TN March 26-27, 2002 Dimension.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
1 Arie Shoshani, LBNL SDM center Scientific Data Management Center(SDM-ISIC) Arie Shoshani Computing Sciences Directorate Lawrence Berkeley National Laboratory.
Presented by On the Path to Petascale: Top Challenges to Scientific Discovery Scott A. Klasky NCCS Scientific Computing End-to-End Task Lead.
Accelerating Scientific Exploration Using Workflow Automation Systems Terence Critchlow (LLNL) Ilkay Altintas (SDSC) Scott Klasky(ORNL) Mladen Vouk (NCSU)
SciDAC All Hands Meeting, March 2-3, 2005 Northwestern University PIs:Alok Choudhary, Wei-keng Liao Graduate Students:Avery Ching, Kenin Coloma, Jianwei.
The Globus Project: A Status Report Ian Foster Carl Kesselman
DV/dt - Accelerating the Rate of Progress towards Extreme Scale Collaborative Science DOE: Scientific Collaborations at Extreme-Scales:
Opportunities in Parallel I/O for Scientific Data Management Rajeev Thakur and Rob Ross Mathematics and Computer Science Division Argonne National Laboratory.
Parallel and Grid I/O Infrastructure W. Gropp, R. Ross, R. Thakur Argonne National Lab A. Choudhary, W. Liao Northwestern University G. Abdulla, T. Eliassi-Rad.
Pascucci-1 Valerio Pascucci Director, CEDMAV Professor, SCI Institute & School of Computing Laboratory Fellow, PNNL Massive Data Management, Analysis,
Presented by An Overview of the Common Component Architecture (CCA) The CCA Forum and the Center for Technology for Advanced Scientific Component Software.
SDM Center’s Data Mining & Analysis SDM Center Parallel Statistical Analysis with RScaLAPACK Parallel, Remote & Interactive Visual Analysis with ASPECT.
1 Arie Shoshani, LBNL SDM center Scientific Data Management Center (Integrated Software Infrastructure Center – ISIC) Arie Shoshani All Hands Meeting March.
1 Scientific Data Management Center(ISIC) contains extensive publication list.
The Earth System Grid (ESG) Computer Science and Technologies DOE SciDAC ESG Project Review Argonne National Laboratory, Illinois May 8-9, 2003.
Presented by Scientific Data Management Center Nagiza F. Samatova Network and Cluster Computing Computer Sciences and Mathematics Division.
SDM center Supporting Heterogeneous Data Access in Genomics Terence Critchlow Center for Applied Scientific Computing Lawrence Livermore National Laboratory.
Your name here SPA: Successes, Status, and Future Directions Terence Critchlow And many, many, others Scientific Process Automation PNNL.
Scalable Systems Software for Terascale Computer Centers Coordinator: Al Geist Participating Organizations ORNL ANL LBNL.
I/O for Structured-Grid AMR Phil Colella Lawrence Berkeley National Laboratory Coordinating PI, APDEC CET.
SDM center Supporting Heterogeneous Data Access in Genomics Terence Critchlow Ling Liu, Calton Pu GT Reagan Moore, Bertam Ludaescher, SDSC Amarnath Gupta.
STAR Collaboration, July 2004 Grid Collector Wei-Ming Zhang Kent State University John Wu, Alex Sim, Junmin Gu and Arie Shoshani Lawrence Berkeley National.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Toward interactive visualization in a distributed workflow Steven G. Parker Oscar Barney Ayla Khan Thiago Ize Steven G. Parker Oscar Barney Ayla Khan Thiago.
TeraGrid Gateway User Concept – Supporting Users V. E. Lynch, M. L. Chen, J. W. Cobb, J. A. Kohl, S. D. Miller, S. S. Vazhkudai Oak Ridge National Laboratory.
Presented by Scientific Data Management Center Nagiza F. Samatova Oak Ridge National Laboratory Arie Shoshani (PI) Lawrence Berkeley National Laboratory.
DOE Network PI Meeting 2005 Runtime Data Management for Data-Intensive Scientific Applications Xiaosong Ma NC State University Joint Faculty: Oak Ridge.
Xolotl: A New Plasma Facing Component Simulator Scott Forest Hull II Jr. Software Developer Oak Ridge National Laboratory
2/22/2001Greenbook 2001/OASCR1 Greenbook/OASCR Activities Focus on technology to enable SCIENCE to be conducted, i.e. Software tools Software libraries.
Supercomputing 2006 Scientific Data Management Center Lead Institution: LBNL; PI: Arie Shoshani Laboratories: ANL, ORNL, LBNL, LLNL, PNNL Universities:
WASP Airborne Data Processor Laboratory for Imaging Algorithms and Systems Chester F. Carlson Center for Imaging Science Rochester Institute of Technology.
The Performance Evaluation Research Center (PERC) Participating Institutions: Argonne Natl. Lab.Univ. of California, San Diego Lawrence Berkeley Natl.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
SDM Center High-Performance Parallel I/O Libraries (PI) Alok Choudhary, (Co-I) Wei-Keng Liao Northwestern University In Collaboration with the SEA Group.
SDM Center Parallel I/O Storage Efficient Access Team.
SDM Center Techniques for feature identification in scientific data Chandrika Kamath (LLNL) with Erick Cantú-Paz, Imola Fodor, Cyrus Harrison, Nicole Love,
Climate-SDM (1) Climate analysis use case –Described by: Marcia Branstetter Use case description –Data obtained from ESG –Using a sequence steps in analysis,
Workflow-Driven Science using Kepler Ilkay Altintas, PhD San Diego Supercomputer Center, UCSD words.sdsc.edu.
Center for Component Technology for Terascale Simulation Software (CCTTSS) 110 April 2002CCA Forum, Townsend, TN This work has been sponsored by the Mathematics,
Design and Planning Tools John Grosh Lawrence Livermore National Laboratory April 2016.
Presented by SciDAC-2 Petascale Data Storage Institute Philip C. Roth Computer Science and Mathematics Future Technologies Group.
University of Chicago Department of Energy Applications In Hand:  FLASH (HDF-5)  ENZO (MPI-IO)  STAR Likely  Climate – Bill G to contact (Michalakas.
VisIt Project Overview
DOE 2000 PI Retreat Breakout C-1
Scientific Data Management contains extensive publication list
SDM workshop Strawman report History and Progress and Goal.
Presentation transcript:

1 Scientific Data Management Center DOE Laboratories: ANL: Rob Ross LBNL:Doron Rotem LLNL:Chandrika Kamath ORNL: Nagiza Samatova PNNL: Terence Critchlow Jarek Nieplocha Universities: NCSU: Mladen Vouk NWU:Alok Choudhary UCD: Bertram Ludaescher SDSC: Ilkay Altintas UUtah:Steve Parker Co-Principal Investigators Principal Investigator LBNL: Arie Shoshani

2 Scientific Data Management Center SciDAC Review, Issue 2, Fall 2006 Illustration: A. Tovey Lead Institution: LBNL PI: Arie Shoshani Laboratories: ANL, ORNL, LBNL, LLNL, PNNL Universities: NCSU, NWU, SDSC, UCD, U. Utah Established 5 years ago (SciDAC-1) Successfully re-competed for next 5 years (SciDAC-2) Featured in Fall 2006 issue of SciDAC Review magazine

3 SDM Infrastructure Uses three-layer organization of technologies Scientific Process Automation (SPA) Data Mining and Analysis (DMA) Storage Efficient Access (SEA) Operating system Hardware (e.g., Cray XT3, IBM Blue/Gene L) Operating system Hardware (e.g., Cray XT3, IBM Blue/Gene L) Integrated approach: To provide a scientific workflow capability To support data mining and analysis tools To accelerate storage and access to data Benefits scientists by Hiding underlying parallel and indexing technology Permitting assembly of modules using workflow description tool Goal: Reduce data management overhead

4 Automating scientific workflow in SPA enables scientists to focus on science not process Scientific discovery is a multi-step process. SPA-Kepler workflow system automates and manages this process. Contact: Terence Critchlow, LLNL Illustration: A. Tovey Tasks that required hours or days can now be completed in minutes, allowing biologists to spend their time saved on science Dashboards provide improved interfaces Execution monitoring (Provenance) provides near real-time status

5 Data analysis for fusion plasma Plot of orbits in cross-section of a fusion experiment shows different types of orbits, including circle-like “quasi-periodic orbits” and “island orbits.” Characterizing the topology of orbits is challenging, as experimental and simulation data are in the form of points rather than a continuous curve. We are successfully applying data mining techniques to this problem. Feature selection techniques used to identify key parameters relevant to the presence of edge harmonic oscillations in the DIII-D tokomak. Contact: Chandrika Kamath, LLNL Error Features PCA Filter Distance ChiSquare Stump Boosting

6 Searching and indexing with FastBit Gleaning insights about combustion simulation About FastBit: Extremely fast search of large databases Outperforms commercial software Used by various applications: combustion, STAR, astrophysics visualization Collaborators: SNL: J. Chen, W. Doyle NCSU: T. Echekki Illustration: A. Tovey Finding & tracking of combustion flame fronts Contact: John Wu, LBNL Searching for regions that satisfy particular criteria is a challenge. FastBit efficiently finds regions of interest.

7 Data Analysis based on Dynamic Histograms using FastBit Conditional histograms are common in data analysis. Fastbit indexing facilitates real-time anomaly detection Contact: John Wu, LBNL Example of finding the number of malicious network connections in a particular time window A histogram of number of connections to port 5554 of machine in LBNL IP address space (two-horizontal axes), vertical axis is time Two sets of scans are visible as two sheets

8 Parallel statistical computing with pR Goal: Provide scalable high-performance statistical data analysis framework to help scientists perform interactive analyses of produced data to extract knowledge Able to use existing high-level (i.e., R) code Requires minimal effort for parallelizing Offers identical interface Provides efficient and scalable performance Contact: Nagiza Samatova, ORNL

9 Parallel input/output Scaling computational science Multi-layer parallel I/O design: Supports Parallel-netCDF library built on top of MPI-IO implementation called ROMIO, built in turn on top of Abstract Device Interface for I/O system, used to access parallel storage system. Benefits to scientists: Brings performance, productivity, and portability Improves performance by order of magnitude Orchestration of data transfers and speedy analyses depends on efficient systems for storage, access, and movement of data among modules. Illustration: A. Tovey Contact: Rob Ross, ANL

10 Speeding data transfer with PnetCDF P0 P1 P2 P3 netCDF Parallel File System Parallel netCDF P0 P1 P2 P3 Parallel File System Illustration: A. Tovey Rate of data transfer using HDF5 decreases when a particular problem is divided among more processors. In contrast, parallel version of netCDF improves because of low- overhead nature of PnetCDF and its tight coupling to MPI-IO. Enables high performance parallel I/O to netCDF data sets Achieves up to 10-fold performance improvement over HDF5 Contact: Rob Ross, ANL Inter-process communication

11 Contacts Arie Shoshani (PI) Lawrence Berkeley National Laboratory Nagiza Samatova Data Mining and Analysis area leader Oak Ridge National Laboratory Terence Critchlow Scientific Process Automation area leader Pacific Northwest National Laboratory Rob Ross Storage Efficient Access area leader Argonne N ational Laboratory