Presentation is loading. Please wait.

Presentation is loading. Please wait.

Presented by Scientific Data Management Center Nagiza F. Samatova Network and Cluster Computing Computer Sciences and Mathematics Division.

Similar presentations


Presentation on theme: "Presented by Scientific Data Management Center Nagiza F. Samatova Network and Cluster Computing Computer Sciences and Mathematics Division."— Presentation transcript:

1 Presented by Scientific Data Management Center Nagiza F. Samatova Network and Cluster Computing Computer Sciences and Mathematics Division

2 2 Samatova_SDM_0611 Scientific Data Management Center SciDAC Review, Issue 2, Fall 2006 Illustration: A. Tovey Lead Institution: LBNL PI: Arie Shoshani Laboratories: ANL, ORNL, LBNL, LLNL, PNNL Universities: NCSU, NWU, SDSC, UCD, U. Utah Established 5 years ago (SciDAC-1) Successfully re-competed for next 5 years (SciDAC-2) Featured in Fall 2006 issue of SciDAC Review magazine

3 3 Samatova_SDM_0611 SDM Infrastructure Uses three-layer organization of technologies Scientific Process Automation (SPA) Data Mining and Analysis (DMA) Storage Efficient Access (SEA) Operating system Hardware (e.g., Cray XT3, IBM Blue/Gene L) Operating system Hardware (e.g., Cray XT3, IBM Blue/Gene L) Integrated approach: To provide a scientific workflow capability To support data mining and analysis tools To accelerate storage and access to data Benefits scientists by Hiding underlying parallel and indexing technology Permitting assembly of modules using workflow description tool Goal: Reduce data management overhead

4 4 Samatova_SDM_0611 Automating scientific workflow in SPA enables mining of diverse biology databases Scientific discovery is a multi-step process. SPA-Kepler workflow system automates and manages this process. Illustration: A. Tovey Tasks that required hours or days can now be completed in minutes, allowing biologists to spend their time saved on science Contact: Terence Critchlow, LLNL (critchlow1@llnl.gov)

5 5 Samatova_SDM_0611 Data analysis for fusion plasma Plot of orbits in cross-section of a fusion experiment shows different types of orbits, including circle-like “quasi-periodic orbits” and “island orbits.” Characterizing the topology of orbits is challenging, as experimental and simulation data are in the form of points rather than a continuous curve. We are successfully applying data mining techniques to this problem. Feature selection techniques used to identify key parameters relevant to the presence of edge harmonic oscillations in the DIII-D tokomak. Contact: Chandrika Kamath, LLNL (kamath2@llnl.gov) Error 0.16 0.18 0.2 0.2 2 0.2 4 0.2 6 0.2 8 0.3 0 510 15 20 2530 35 Features PCA Filter Distance ChiSquare Stump Boosting

6 6 Samatova_SDM_0611 Searching and indexing with FastBit Gleaning insights about combustion simulation About FastBit: Extremely fast search of large databases Outperforms commercial software Used by various applications: combustion, STAR, astrophysics visualization Collaborators: SNL: J. Chen, W. Doyle NCSU: T. Echekki Illustration: A. Tovey Finding & tracking of combustion flame fronts Contact: John Wu, LBNL (kwu@lbl.gov) Searching for regions that satisfy particular criteria is a challenge. FastBit efficiently finds regions of interest.

7 7 Samatova_SDM_0611 Searches in STAR experiment Finding one event out of a million Illustration: A. Tovey STAR scientists extract data of interest within 15 minutes. Accomplishing the same task before consumed weeks. FastBit and Storage Resource Manager enable efficient mining of hundreds of terabytes of data generated by STAR detector Contact: John Wu, LBNL (kwu@lbl.gov)

8 8 Samatova_SDM_0611 Parallel statistical computing with pR Goal: Provide scalable high-performance statistical data analysis framework to help scientists perform interactive analyses of produced data to extract knowledge Able to use existing high-level (i.e., R) code Requires minimal effort for parallelizing Offers identical interface Provides efficient and scalable performance Contact: Nagiza Samatova, ORNL (samatovan@ornl.gov)

9 9 Samatova_SDM_0611 Parallel input/output Scaling computational science Multi-layer parallel I/O design: Supports Parallel-netCDF library built on top of MPI-IO implementation called ROMIO, built in turn on top of Abstract Device Interface for I/O system, used to access parallel storage system. Benefits to scientists: Brings performance, productivity, and portability Improves performance by order of magnitude Orchestration of data transfers and speedy analyses depends on efficient systems for storage, access, and movement of data among modules. Illustration: A. Tovey Contact: Rob Ross, ANL (rross@mcs.anl.gov)

10 10 Samatova_SDM_0611 Speeding data transfer with PnetCDF P0 P1 P2 P3 netCDF Parallel File System Parallel netCDF P0 P1 P2 P3 Parallel File System Illustration: A. Tovey Rate of data transfer using HDF5 decreases when a particular problem is divided among more processors. In contrast, parallel version of netCDF improves because of low- overhead nature of PnetCDF and its tight coupling to MPI-IO. Enables high performance parallel I/O to netCDF data sets Achieves up to 10-fold performance improvement over HDF5 Contact: Rob Ross, ANL (rross@mcs.anl.gov) Inter-process communication

11 11 Samatova_SDM_0611 Contact Nagiza Samatova Network and Cluster Computing Computer Science and Mathematics Division (865) 241-4351 samatovan@ornl.gov 11 Samatova_SDM_0611


Download ppt "Presented by Scientific Data Management Center Nagiza F. Samatova Network and Cluster Computing Computer Sciences and Mathematics Division."

Similar presentations


Ads by Google