Presented by Scientific Data Management Center Nagiza F. Samatova Network and Cluster Computing Computer Sciences and Mathematics Division.

Slides:



Advertisements
Similar presentations
1 The SciDAC Scientific Data Management Center: Infrastructure and Results Arie Shoshani Lawrence Berkeley National Laboratory SC 2004 November, 2004.
Advertisements

University of Chicago Department of Energy The Parallel and Grid I/O Perspective MPI, MPI-IO, NetCDF, and HDF5 are in common use Multi TB datasets also.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Priority Research Direction (I/O Models, Abstractions and Software) Key challenges What will you do to address the challenges? – Develop newer I/O models.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
Grid Collector: Enabling File-Transparent Object Access For Analysis Wei-Ming Zhang Kent State University John Wu, Alex Sim, Junmin Gu and Arie Shoshani.
Astrophysics, Biology, Climate, Combustion, Fusion, Nanoscience Working Group on Simulation-Driven Applications 10 CS, 10 Sim, 1 VR.
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
Connecting HPIO Capabilities with Domain Specific Needs Rob Ross MCS Division Argonne National Laboratory
Arie Shoshani The Scientific Data Management Center Arie Shoshani (PI) Lawrence Berkeley National Laboratory DOE Laboratories.
UNIVERSITY of MARYLAND GLOBAL LAND COVER FACILITY High Performance Computing in Support of Geospatial Information Discovery and Mining Joseph JaJa Institute.
Computer Science in UNEDF George Fann, Oak Ridge National Laboratory Rusty Lusk, Argonne National Laboratory Jorge Moré, Argonne National Laboratory Esmond.
Slide 1 Auburn University Computer Science and Software Engineering Scientific Computing in Computer Science and Software Engineering Kai H. Chang Professor.
Event Metadata Records as a Testbed for Scalable Data Mining David Malon, Peter van Gemmeren (Argonne National Laboratory) At a data rate of 200 hertz,
K E Y : SW Service Use Big Data Information Flow SW Tools and Algorithms Transfer Application Provider Visualization Access Analytics Curation Collection.
Scientific Data Management (SDM)
U.S. Department of Energy Office of Science Advanced Scientific Computing Research Program CASC, May 3, ADVANCED SCIENTIFIC COMPUTING RESEARCH An.
SDM meeting, July 10-11, 2001Area 3 Report Data mining and discovery of access patterns 3a.i) Adaptive file caching in a distributed system (LBNL) 3b.i)
Seaborg Cerise Wuthrich CMPS Seaborg  Manufactured by IBM  Distributed Memory Parallel Supercomputer  Based on IBM’s SP RS/6000 Architecture.
Active Monitoring in GRID environments using Mobile Agent technology Orazio Tomarchio Andrea Calvagna Dipartimento di Ingegneria Informatica e delle Telecomunicazioni.
1 Scientific Data Management Center DOE Laboratories: ANL: Rob Ross LBNL:Doron Rotem LLNL:Chandrika Kamath ORNL: Nagiza Samatova.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
November 13, 2006 Performance Engineering Research Institute 1 Scientific Discovery through Advanced Computation Performance Engineering.
Physics Steven Gottlieb, NCSA/Indiana University Lattice QCD: focus on one area I understand well. A central aim of calculations using lattice QCD is to.
1 Arie Shoshani, LBNL SDM center Scientific Data Management Center(SDM-ISIC) Arie Shoshani Computing Sciences Directorate Lawrence Berkeley National Laboratory.
Presented by On the Path to Petascale: Top Challenges to Scientific Discovery Scott A. Klasky NCCS Scientific Computing End-to-End Task Lead.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Accelerating Scientific Exploration Using Workflow Automation Systems Terence Critchlow (LLNL) Ilkay Altintas (SDSC) Scott Klasky(ORNL) Mladen Vouk (NCSU)
SciDAC All Hands Meeting, March 2-3, 2005 Northwestern University PIs:Alok Choudhary, Wei-keng Liao Graduate Students:Avery Ching, Kenin Coloma, Jianwei.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
Crystal Ball Panel ORNL Heterogeneous Distributed Computing Research Al Geist ORNL March 6, 2003 SOS 7.
Opportunities in Parallel I/O for Scientific Data Management Rajeev Thakur and Rob Ross Mathematics and Computer Science Division Argonne National Laboratory.
Service - Oriented Middleware for Distributed Data Mining on the Grid ,劉妘鑏 Antonio C., Domenico T., and Paolo T. Journal of Parallel and Distributed.
Pascucci-1 Valerio Pascucci Director, CEDMAV Professor, SCI Institute & School of Computing Laboratory Fellow, PNNL Massive Data Management, Analysis,
Presented by An Overview of the Common Component Architecture (CCA) The CCA Forum and the Center for Technology for Advanced Scientific Component Software.
Kamath 1 Scientific Data Mining Chandrika Kamath October 7, 2008 Lawrence Livermore National Laboratory.
SDM Center’s Data Mining & Analysis SDM Center Parallel Statistical Analysis with RScaLAPACK Parallel, Remote & Interactive Visual Analysis with ASPECT.
1 Arie Shoshani, LBNL SDM center Scientific Data Management Center (Integrated Software Infrastructure Center – ISIC) Arie Shoshani All Hands Meeting March.
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
1 Scientific Data Management Center(ISIC) contains extensive publication list.
E. WES BETHEL (LBNL), CHRIS JOHNSON (UTAH), KEN JOY (UC DAVIS), SEAN AHERN (ORNL), VALERIO PASCUCCI (LLNL), JONATHAN COHEN (LLNL), MARK DUCHAINEAU.
Your name here SPA: Successes, Status, and Future Directions Terence Critchlow And many, many, others Scientific Process Automation PNNL.
K E Y : SW Service Use Big Data Information Flow SW Tools and Algorithms Transfer Transformation Provider Visualization Access Analytics Curation Collection.
Scalable Systems Software for Terascale Computer Centers Coordinator: Al Geist Participating Organizations ORNL ANL LBNL.
Presented by Jens Schwidder Tara D. Gibson James D. Myers Computing & Computational Sciences Directorate Oak Ridge National Laboratory Scientific Annotation.
May 6, 2002Earth System Grid - Williams The Earth System Grid Presented by Dean N. Williams PI’s: Ian Foster (ANL); Don Middleton (NCAR); and Dean Williams.
VAPoR: A Discovery Environment for Terascale Scientific Data Sets Alan Norton & John Clyne National Center for Atmospheric Research Scientific Computing.
I/O for Structured-Grid AMR Phil Colella Lawrence Berkeley National Laboratory Coordinating PI, APDEC CET.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
J.-N. Leboeuf V.K. Decyk R.E. Waltz J. Candy W. Dorland Z. Lin S. Parker Y. Chen W.M. Nevins B.I. Cohen A.M. Dimits D. Shumaker W.W. Lee S. Ethier J. Lewandowski.
Software Engineering Chapter: Computer Aided Software Engineering 1 Chapter : Computer Aided Software Engineering.
1 OFFICE OF ADVANCED SCIENTIFIC COMPUTING RESEARCH The NERSC Center --From A DOE Program Manager’s Perspective-- A Presentation to the NERSC Users Group.
Presented by Scientific Data Management Center Nagiza F. Samatova Oak Ridge National Laboratory Arie Shoshani (PI) Lawrence Berkeley National Laboratory.
DOE Network PI Meeting 2005 Runtime Data Management for Data-Intensive Scientific Applications Xiaosong Ma NC State University Joint Faculty: Oak Ridge.
2/22/2001Greenbook 2001/OASCR1 Greenbook/OASCR Activities Focus on technology to enable SCIENCE to be conducted, i.e. Software tools Software libraries.
Supercomputing 2006 Scientific Data Management Center Lead Institution: LBNL; PI: Arie Shoshani Laboratories: ANL, ORNL, LBNL, LLNL, PNNL Universities:
SDM Center High-Performance Parallel I/O Libraries (PI) Alok Choudhary, (Co-I) Wei-Keng Liao Northwestern University In Collaboration with the SEA Group.
SDM Center Parallel I/O Storage Efficient Access Team.
K E Y : DATA SW Service Use Big Data Information Flow SW Tools and Algorithms Transfer Hardware (Storage, Networking, etc.) Big Data Framework Scalable.
SDM Center Techniques for feature identification in scientific data Chandrika Kamath (LLNL) with Erick Cantú-Paz, Imola Fodor, Cyrus Harrison, Nicole Love,
Climate-SDM (1) Climate analysis use case –Described by: Marcia Branstetter Use case description –Data obtained from ESG –Using a sequence steps in analysis,
Fermi National Accelerator Laboratory & Thomas Jefferson National Accelerator Facility SciDAC LQCD Software The Department of Energy (DOE) Office of Science.
Design and Planning Tools John Grosh Lawrence Livermore National Laboratory April 2016.
Presented by SciDAC-2 Petascale Data Storage Institute Philip C. Roth Computer Science and Mathematics Future Technologies Group.
University of Chicago Department of Energy Applications In Hand:  FLASH (HDF-5)  ENZO (MPI-IO)  STAR Likely  Climate – Bill G to contact (Michalakas.
VisIt Project Overview
DOE 2000 PI Retreat Breakout C-1
Scientific Data Management contains extensive publication list
SDM workshop Strawman report History and Progress and Goal.
Presentation transcript:

Presented by Scientific Data Management Center Nagiza F. Samatova Network and Cluster Computing Computer Sciences and Mathematics Division

2 Samatova_SDM_0611 Scientific Data Management Center SciDAC Review, Issue 2, Fall 2006 Illustration: A. Tovey Lead Institution: LBNL PI: Arie Shoshani Laboratories: ANL, ORNL, LBNL, LLNL, PNNL Universities: NCSU, NWU, SDSC, UCD, U. Utah Established 5 years ago (SciDAC-1) Successfully re-competed for next 5 years (SciDAC-2) Featured in Fall 2006 issue of SciDAC Review magazine

3 Samatova_SDM_0611 SDM Infrastructure Uses three-layer organization of technologies Scientific Process Automation (SPA) Data Mining and Analysis (DMA) Storage Efficient Access (SEA) Operating system Hardware (e.g., Cray XT3, IBM Blue/Gene L) Operating system Hardware (e.g., Cray XT3, IBM Blue/Gene L) Integrated approach: To provide a scientific workflow capability To support data mining and analysis tools To accelerate storage and access to data Benefits scientists by Hiding underlying parallel and indexing technology Permitting assembly of modules using workflow description tool Goal: Reduce data management overhead

4 Samatova_SDM_0611 Automating scientific workflow in SPA enables mining of diverse biology databases Scientific discovery is a multi-step process. SPA-Kepler workflow system automates and manages this process. Illustration: A. Tovey Tasks that required hours or days can now be completed in minutes, allowing biologists to spend their time saved on science Contact: Terence Critchlow, LLNL

5 Samatova_SDM_0611 Data analysis for fusion plasma Plot of orbits in cross-section of a fusion experiment shows different types of orbits, including circle-like “quasi-periodic orbits” and “island orbits.” Characterizing the topology of orbits is challenging, as experimental and simulation data are in the form of points rather than a continuous curve. We are successfully applying data mining techniques to this problem. Feature selection techniques used to identify key parameters relevant to the presence of edge harmonic oscillations in the DIII-D tokomak. Contact: Chandrika Kamath, LLNL Error Features PCA Filter Distance ChiSquare Stump Boosting

6 Samatova_SDM_0611 Searching and indexing with FastBit Gleaning insights about combustion simulation About FastBit: Extremely fast search of large databases Outperforms commercial software Used by various applications: combustion, STAR, astrophysics visualization Collaborators: SNL: J. Chen, W. Doyle NCSU: T. Echekki Illustration: A. Tovey Finding & tracking of combustion flame fronts Contact: John Wu, LBNL Searching for regions that satisfy particular criteria is a challenge. FastBit efficiently finds regions of interest.

7 Samatova_SDM_0611 Searches in STAR experiment Finding one event out of a million Illustration: A. Tovey STAR scientists extract data of interest within 15 minutes. Accomplishing the same task before consumed weeks. FastBit and Storage Resource Manager enable efficient mining of hundreds of terabytes of data generated by STAR detector Contact: John Wu, LBNL

8 Samatova_SDM_0611 Parallel statistical computing with pR Goal: Provide scalable high-performance statistical data analysis framework to help scientists perform interactive analyses of produced data to extract knowledge Able to use existing high-level (i.e., R) code Requires minimal effort for parallelizing Offers identical interface Provides efficient and scalable performance Contact: Nagiza Samatova, ORNL

9 Samatova_SDM_0611 Parallel input/output Scaling computational science Multi-layer parallel I/O design: Supports Parallel-netCDF library built on top of MPI-IO implementation called ROMIO, built in turn on top of Abstract Device Interface for I/O system, used to access parallel storage system. Benefits to scientists: Brings performance, productivity, and portability Improves performance by order of magnitude Orchestration of data transfers and speedy analyses depends on efficient systems for storage, access, and movement of data among modules. Illustration: A. Tovey Contact: Rob Ross, ANL

10 Samatova_SDM_0611 Speeding data transfer with PnetCDF P0 P1 P2 P3 netCDF Parallel File System Parallel netCDF P0 P1 P2 P3 Parallel File System Illustration: A. Tovey Rate of data transfer using HDF5 decreases when a particular problem is divided among more processors. In contrast, parallel version of netCDF improves because of low- overhead nature of PnetCDF and its tight coupling to MPI-IO. Enables high performance parallel I/O to netCDF data sets Achieves up to 10-fold performance improvement over HDF5 Contact: Rob Ross, ANL Inter-process communication

11 Samatova_SDM_0611 Contact Nagiza Samatova Network and Cluster Computing Computer Science and Mathematics Division (865) Samatova_SDM_0611