Presentation is loading. Please wait.

Presentation is loading. Please wait.

Supercomputing 2006 Scientific Data Management Center Lead Institution: LBNL; PI: Arie Shoshani Laboratories: ANL, ORNL, LBNL, LLNL, PNNL Universities:

Similar presentations


Presentation on theme: "Supercomputing 2006 Scientific Data Management Center Lead Institution: LBNL; PI: Arie Shoshani Laboratories: ANL, ORNL, LBNL, LLNL, PNNL Universities:"— Presentation transcript:

1 Supercomputing 2006 Scientific Data Management Center Lead Institution: LBNL; PI: Arie Shoshani Laboratories: ANL, ORNL, LBNL, LLNL, PNNL Universities: NCSU, NWU, SDSC, UCD, Uof Utah Established 5 years ago (SciDAC-1) Successfully re-competed for the next 5 years (SciDAC-2) Featured in Fall 2006 issue of SciDAC Review magazine SciDAC Review, Issue 2, Fall 2006Illustration: A. Tovey

2 Supercomputing 2006 SDM Infrastructure Uses three-layer organization of technologies Goal: Reduce data management overhead Integrated Approach: To provide a scientific workflow capability To support data mining and analysis tools To accelerates storage and access to data Scientific Process Automation (SPA) Data Mining and Analysis (DMA) Storage Efficient Access (SEA) Operating System Hardware (e.g. Cray XT3, IBM Blue/Gene L) Benefits Scientists by: Hiding underlying parallel and indexing technology Permitting assembly of modules using workflow description tool

3 Supercomputing 2006 Automating Scientific Workflow in SPA Enables mining of diverse biology databases Scientific discovery is a multi-step process. SPA-Kepler workflow system automates and manages this process. Illustration: A. Tovey Tasks that required hours or days can now be completed in minutes allowing the biologists to spend their saved time on science. Contact: Terence Critchlow, LLNL (critchlow1@llnl.gov)critchlow1@llnl.gov

4 Supercomputing 2006 Feature Identification in Fusion Plasma Because both experiments and simulations report individual crossing points rather than a continuous curve, identifying the orbit topology is challenging. Illustration: A. Tovey. Source: CS Cheng, NYU The plot of orbits in a cross-section of a fusion experiment shows different types of orbits, including circle-like “quasi-periodic orbits” and “island orbits.” Contact: Chandrika Kamath, LLNL (kamath2@llnl.gov)kamath2@llnl.gov A combination of PCA and ICA technology used to identify the key parameters that are relevant to the presence of edge harmonic oscillations in a Tokomak. Approach:

5 Supercomputing 2006 Searching and Indexing with FastBit Gleaning Insights about Combustion Simulation Searching for regions that satisfy particular criteria is a challenge. FastBit efficiently finds regions of interest. Contact: John Wu, LBNL (kwu@lbl.gov)kwu@lbl.gov About FastBit: Extremely fast search of large databases Outperforms commercial software Used by various applications: combustion, STAR, astrophysics visualization. Illustration: A. Tovey Finding & tracking of combustion flame fronts SNL: Drs. J. Chen, W. Doyle NCSU: Dr. T. Echekki Collaborators:

6 Supercomputing 2006 Searches in STAR Experiment Finding One Event Out of A Million FastBit and Storage Resource Manager (SRM) have enabled efficient mining of the hundreds of terabytes of data generated by the STAR detector. Illustration: A. Tovey STAR scientists extract data of interest within fifteen minutes. Accomplishing the same task before consumed weeks. Contact: John Wu, LBNL (kwu@lbl.gov)kwu@lbl.gov

7 Supercomputing 2006 Parallel Statistical Computing with pR Contact: Nagiza Samatova, ORNL (samatovan@ornl.gov)samatovan@ornl.gov Able to use existing high- level (i.e. R) code Requires minimal effort for parallelizing Offers identical interface Provides efficient and scalable performance Goal: to provide a scalable high performance statistical data analysis framework to help scientists perform interactive analyses of produced data to extract knowledge.

8 Supercomputing 2006 Parallel Input/Output Scaling Computational Science Illustration: A. Tovey Orchestration of data transfers and speedy analyses depend on efficient systems for storage, access, and movement of data among modules. Supports Parallel-netCDF library built on top of an MPI-IO implementation called ROMIO, which is in turn built on top of an Abstract Device Interface for I/O (ADIO) system, used to access a parallel storage system. Contact: Rob Ross, ANL (rross@anl.gov)@anl.gov Multi-layer parallel I/O design: Benefits to scientists: Brings performance, productivity, and portability Improves performance by an order of magnitude.

9 Supercomputing 2006 Speeding Data Transfer with PnetCDF Illustration: A. Tovey Contact: Rob Ross, ANL (rross@anl.gov)@anl.gov The rate of data transfer using HDF5 decreases when a particular problem is divided among more processors. In contrast the parallel version of netCDF improves due to the low-overhead nature of PnetCDF and its tight coupling to MPI-IO. P0 P1 P2 P3 netCDF Parallel File System Parallel netCDF P0 P1 P2 P3 Parallel File System Inter-process communication Enables high performance parallel I/O to netCDF datasets Achieves up to 10 fold performance improvement over HDF5


Download ppt "Supercomputing 2006 Scientific Data Management Center Lead Institution: LBNL; PI: Arie Shoshani Laboratories: ANL, ORNL, LBNL, LLNL, PNNL Universities:"

Similar presentations


Ads by Google