Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Arie Shoshani, LBNL SDM center Scientific Data Management Center(SDM-ISIC) Arie Shoshani Computing Sciences Directorate Lawrence Berkeley National Laboratory.

Similar presentations


Presentation on theme: "1 Arie Shoshani, LBNL SDM center Scientific Data Management Center(SDM-ISIC) Arie Shoshani Computing Sciences Directorate Lawrence Berkeley National Laboratory."— Presentation transcript:

1 1 Arie Shoshani, LBNL SDM center Scientific Data Management Center(SDM-ISIC) Arie Shoshani Computing Sciences Directorate Lawrence Berkeley National Laboratory http://sdm.lbl.gov/sdmcenter

2 2 Arie Shoshani, LBNL SDM center Participants Center Director: Arie Shoshani DOE Laboratories: ANL:Bill Gropp (coordinating PI) Rob Ross LBNL:Ekow Otoo Arie Shoshani (coordinating PI) LLNL:Terence Critchlow (coordinating PI) ORNL: Randy Burris Thomas Potok (coordinating PI) Universities: Georgia Institute of Technology Ling Liu Calton Pu (coordinating PI) North Carolina State University Mladen Vouk (coordinating PI) Northwestern University Alok Choudhary (coordinating PI) Wei-Keng Liao UC San Diego (Supercomputer Center): Amarnath Gupta Reagan Moore (coordinating PI)

3 3 Arie Shoshani, LBNL SDM center Original Goals and Framework Coordinated framework for theCoordinated framework for the unification, development, deployment, and reuse of scientific data management software FrameworkFramework 4 areas Very large databases distributed databases heterogeneous databases data mining (+ agent technology) 4 tier levels Storage level File level Dataset level federated data level

4 4 Arie Shoshani, LBNL SDM center Master Diagram 5) Agent technology c) Dataset Level b) File Level a) Storage Level 1) Storage and retrieval of Very large datasets 2) Access optimization of distributed data Parallel I/O: improving parallel access from clusters (ANL, NWU) MPI I/O: implementation based on file-level hints (ANL, NWU) Multi-tier metadata system for querying heterogeneous data sources (LLNL, Georgia Tech) Knowledge-based federation of heterogeneous databases (SDSC) Low level API for grid I/O (ANL) Optimization of low-level data storage, retrieval and transport (ORNL) [Grid Enabling Technology] Analysis of application-level query patterns (LLNL, NWU) Optimizing shared access to tertiary storage (LBNL, ORNL) High-dimensional indexing techniques (LBNL) Enabling communication among tools and data (ORNL, NCSU) d) Dataset Federation Level Multi-agent high-dimensional cluster analysis (ORNL) Adaptive file caching in a distributed system (LBNL) Dimension reduction and sampling (LLNL, LBNL) 3) Data mining and discovery of access patterns 4) Distributed, heterogeneous data access

5 5 Arie Shoshani, LBNL SDM center Tapes Disks Scientific Simulations & experiments Scientific Data Management ISIC Scientific Analysis & Discovery Data Manipulation: Getting files from Tape archive Extracting subset of data from files Reformatting data Getting data from heterogeneous, distributed systems moving data over the network Petabytes Terabytes Tapes Disks Petabytes Terabytes Data Manipulation: ~80% time ~20% time ~20% time ~80% time Using SDM-ISIC technology Scientific Analysis & Discovery Climate Modeling Astrophysics Genomics and Proteomics High Energy Physics Optimizing shared access from mass storage systems Metadata and knowledge- based federations API for Grid I/O High-dimensional cluster analysis High-dimensional indexing Adaptive file caching Agents … SDM-ISIC Technology DOE Labs: ANL, LBNL, LLNL, ORNL Universities: GTech, NCSU, NWU, SDSC Current Goal Goals Optimize and simplify: access to very large datasets access to distributed data access of heterogeneous data data mining of very large datasets

6 6 Arie Shoshani, LBNL SDM center Benefits to ApplicationsBenefits to Applications Efficiency Example: by removing I/O bottlenecks – matching storage structures to the application Effectiveness Example: by making access to data from tertiary storage or various sites on the data grid “transparent”, more effective data exploration is possible New algorithms Example: by developing a more effective high-dimensional clustering technique for large datasets, discovery of new correlations are possible Enabling ad-hoc exploration of data Example: by enabling a “run and render” capability to visualize simulation output while the code is running, it is possible to monitor and steer a long-running simulation

7 7 Arie Shoshani, LBNL SDM center Current Projects 1)High-Dimensional Clustering Target applications: Astrophysics, Climate Modeling LLNL, ORNL Scientific problem targeted: To understand the mechanism(s) behind core-collapse supernovae it is crucial to explore and quantify: The correlations between the neutrino flux and stellar core convection The correlations between convection and spatial dimensionality The correlations between convection and rotation Contact: Anthony Mezzacappa, ORNL Scientific problem targeted: Separating volcano and ENSO (El Nino Southern oscillation) signals from the rest of the climate data to study variability in temperature Contact: Ben Santer, PCMDI, LLNL

8 8 Arie Shoshani, LBNL SDM center Current Projects 2) Efficient Parallel I/O to Disk Storage Target application: Astrophysics ANL, NWU, LLNL Scientific problem targeted: Astrophysics simulation code (FLASH): Early production runs spent as much as half of the time writing checkpoint and vizualization data Contact: Mike Zingale, U of Chicago Scientific problem targeted: improving parallel I/O efficiency for tiled displays - a popular medium for collaborative viewing of high-resolution visualization Astrophysics data Contact: Mike Papka, ANL Scientific problem targeted: Query pattern analysis for astrophysics star data devising disk layout for the data such that overall data access time across multiple applications and users is reduced Contact: LLNL

9 9 Arie Shoshani, LBNL SDM center Current Projects 3) Providing transparent access to grid data Target application: High Energy Physics LBNL, ORNL Scientific problem targeted: given a logical request (expressed on event attributes), get relevant data from grid sites and tertiary storage to application code without human intervention Contact: Doug Olson, LBNL Contact: Stephen Gowdy, SLAC Contact: Jackie Chan, Sandia Livermore (combustion)

10 10 Arie Shoshani, LBNL SDM center Current Projects 4) Heterogeneous Data Federation Target application: Biology LLNL, SDSC, GTU, NCSU, ORNL Scientific problem targeted: to developing our infrastructure in support of cancer researchers at LLNL, who expect to use it to help identify genes which respond to low-doses of radiation. This problem is difficult because the information required by the scientists is spread across many, independent, web-based data sources - each using their own interfaces and data formats Contact: Matt Coleman, LLNL

11 11 Arie Shoshani, LBNL SDM center

12 12 Arie Shoshani, LBNL SDM center

13 13 Arie Shoshani, LBNL SDM center


Download ppt "1 Arie Shoshani, LBNL SDM center Scientific Data Management Center(SDM-ISIC) Arie Shoshani Computing Sciences Directorate Lawrence Berkeley National Laboratory."

Similar presentations


Ads by Google