Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Arie Shoshani, LBNL SDM center Scientific Data Management Center (Integrated Software Infrastructure Center – ISIC) Arie Shoshani All Hands Meeting March.

Similar presentations


Presentation on theme: "1 Arie Shoshani, LBNL SDM center Scientific Data Management Center (Integrated Software Infrastructure Center – ISIC) Arie Shoshani All Hands Meeting March."— Presentation transcript:

1 1 Arie Shoshani, LBNL SDM center Scientific Data Management Center (Integrated Software Infrastructure Center – ISIC) Arie Shoshani All Hands Meeting March 26-27, 2002 http://sdm.lbl.gov/sdmcenter(http://sdmcenter.lbl.gov)

2 2 Arie Shoshani, LBNL SDM center

3 3 Arie Shoshani, LBNL SDM center Original Goals and Framework coordinated framework for thecoordinated framework for the unification, development, deployment, and reuse of scientific data management software FrameworkFramework 4 areas (+ “glue”)  Very large, distributed, heterogeneous, data mining (+ agent technology) 4 tier levels  Storage, file, dataset, federated data

4 4 Arie Shoshani, LBNL SDM center Task Diagram 5) Agent technology c) Dataset Level b) File Level a) Storage Level 1) Storage and retrieval of Very large datasets 2) Access optimization of distributed data Parallel I/O: improving parallel access from clusters (ANL, NWU) MPI I/O: implementation based on file-level hints (ANL, NWU) Multi-tier metadata system for querying heterogeneous data sources (LLNL, Georgia Tech) Knowledge-based federation of heterogeneous databases (SDSC) Low level API for grid I/O (ANL) Optimization of low-level data storage, retrieval and transport (ORNL) [Grid Enabling Technology] Analysis of application-level query patterns (LLNL, NWU) Optimizing shared access to tertiary storage (LBNL, ORNL) High-dimensional indexing techniques (LBNL) Enabling communication among tools and data (ORNL, NCSU) d) Dataset Federation Level Multi-agent high-dimensional cluster analysis (ORNL) Adaptive file caching in a distributed system (LBNL) Dimension reduction and sampling (LLNL, LBNL) 3) Data mining and discovery of access patterns 4) Distributed, heterogeneous data access

5 5 Arie Shoshani, LBNL SDM center Tapes Disks Scientific Simulations & experiments Scientific Data Management ISIC Scientific Analysis & Discovery Data Manipulation: Getting files from Tape archive Extracting subset of data from files Reformatting data Getting data from heterogeneous, distributed systems moving data over the network Petabytes Terabytes Tapes Disks Petabytes Terabytes Data Manipulation: ~80% time ~20% time ~20% time ~80% time Using SDM-ISIC technology Scientific Analysis & Discovery Climate Modeling Astrophysics Genomics and Proteomics High Energy Physics Optimizing shared access from mass storage systems Metadata and knowledge- based federations API for Grid I/O High-dimensional cluster analysis High-dimensional indexing Adaptive file caching Agents … SDM-ISIC Technology DOE Labs: ANL, LBNL, LLNL, ORNL Universities: GTech, NCSU, NWU, SDSC Current Goal Goals Optimize and simplify: access to very large datasets access to distributed data access of heterogeneous data data mining of very large datasets

6 6 Arie Shoshani, LBNL SDM center Benefits to Applications Efficiency Example: by removing I/O bottlenecks – matching storage structures to the application Effectiveness Example: by making access to data from tertiary storage or various sites on the data grid “transparent”, more effective data exploration is possible New algorithms Example: by developing a more effective high-dimensional clustering technique for large datasets, discovery of new correlations are possible Enabling ad-hoc exploration of data Example: by enabling a “run and render” capability to visualize simulation output while the code is running, it is possible to monitor and steer a long-running simulation

7 7 Arie Shoshani, LBNL SDM center How to execute plan? Executive CommitteeExecutive Committee Made of area leaders Organize into projectsOrganize into projects Led by area leaders Common theme Multiple tasks combine into common goal All tasks covered (some in more than one project) Initially focus on one primary application area (more better) Focus on one (or more) application scientists contacts Focus on specific scenarios that represent real needs Conference callsConference calls Every Monday Cycle on Project P1-P4 Open to all (Arie & Ekow attend all) Quarterly reportsQuarterly reports Half yearly all-handsHalf yearly all-hands

8 8 Arie Shoshani, LBNL SDM center Organization of Projects: P1, P2, P3, P4 5) Agent technology c) Dataset Level b) File Level a) Storage Level 1) Storage and retrieval of Very large datasets 2) Access optimization of distributed data Parallel I/O: improving parallel access from clusters (ANL, NWU) MPI I/O: implementation based on file-level hints (ANL, NWU) Multi-tier metadata system for querying heterogeneous data sources (LLNL, Georgia Tech) Knowledge-based federation of heterogeneous databases (SDSC) Low level API for grid I/O (ANL) Optimization of low-level data storage, retrieval and transport (ORNL) [Grid Enabling Technology] Analysis of application-level query patterns (LLNL, NWU) Optimizing shared access to tertiary storage (LBNL, ORNL) High-dimensional indexing techniques (LBNL) Enabling communication among tools and data (ORNL, NCSU) d) Dataset Federation Level Multi-agent high-dimensional cluster analysis (ORNL) Adaptive file caching in a distributed system (LBNL) Dimension reduction and sampling (LLNL, LBNL) 3) Data mining and discovery of access patterns 4) Distributed, heterogeneous data access

9 9 Arie Shoshani, LBNL SDM center Projects and Primary Application Areas Organized ourselves into 4 projectsOrganized ourselves into 4 projects (P1) Heterogeneous Data Integration (biology)  LLNL, SDSC, GATECH, NCSU, ORNL (P2) Data Mining and Access Pattern Discovery (Climate, Astrophysics)  LLNL, ORNL, LBNL (P3) Efficient Access from Large Datasets (HENP, Combustion)  LBNL, ORNL (P4) Parallel Disk Access & Grid-IO (Astrophysics, Climate)  ANL, NWU, LLNL SDM center

10 10 Arie Shoshani, LBNL SDM center Projects and Primary Application Areas Organized ourselves into 4 projectsOrganized ourselves into 4 projects (P1) Heterogeneous Data Integration (biology)  LLNL - Terence  SDSC – Amarnath, Bertram, Ilkay  GATECH – Ling, Calton + students  NCSU – Mladen + Students  ORNL – Tom (P2) Data Mining and Access Pattern Discovery (Climate, Astrophysics)  LLNL – Chandrika, Ghaleb, Imola  ORNL – Nagiza, George, Tom  LBNL – Ekow SDM center

11 11 Arie Shoshani, LBNL SDM center Projects and Primary Application Areas Organized ourselves into 4 projectsOrganized ourselves into 4 projects (P3) Efficient Access from Large Datasets (HENP, Combustion)  LBNL – John, Ekow, Arie + postdoc  ORNL – Randy, Dan (P4) Parallel Disk Access & Grid-IO (Astrophysics, Climate)  ANL – Bill, Rob, Rajiv  NWU – Alok, Wei-Kang + students  LLNL – Ghaleb Area leader at Large  Tom SDM center

12 12 Arie Shoshani, LBNL SDM center Focus on real needs Selected specific short term goals & scenariosSelected specific short term goals & scenarios (P1) Heterogeneous Data Integration (biology)  Microarray analysis workflow scenario (P2) Data Mining and Access Pattern Discovery (Climate, Astrophysics)  “Run and Render” scenario for Astrophysics  Dimensionality reduction for Climate model (P3) Efficient Access from Large Datasets (HENP)  STAR analysis framework (P4) Parallel Disk Access & Grid-IO (Astrophysics, Climate)  FLASH codes for Astrophysics  NetCDF using MPI-IO for Climate Modeling & Fusion SDM center

13 13 Arie Shoshani, LBNL SDM center Application Scientists Contacts Close collaboration with individualsClose collaboration with individuals Matt Coleman - LLNL (Biology) Tony Mezzacappa – ORNL (Astrophysics) Ben Santer - LLNL, John Drake - ORNL (Climate) Doug Olson - LBNL, Wei-Ming Zhang – Kent (HENP) Wendy Koegler – Sandia L. (Combustion) Mike Papka - ANL (Astrophysics Vis) Mike Zingale – U of Chicago (Astrophysics) John Michalakes – NCAR (Climate) SDM center

14 14 Arie Shoshani, LBNL SDM center Organization of Meeting First dayFirst day Applications perspective on data management needs  Explain why the need  Say what hurts the most Technical details of current work and existing software  By project  Talks led by Area Leaders Second daySecond day Discuss and develop plans – 4 breakout sessions  Specific technical goals in next half year  SDM-ISIC people involved  Application people involved  Estimated schedule  Longer term projections (2-3 years)  Identify potential new applications – future focus Planning  Conference calls – reporting  Intellectual property  CVS repositories  Future all-hands, September

15 15 Arie Shoshani, LBNL SDM center Agenda - Morning Day 1, March 26 8:00 Introduction and opening remarks Arie Shoshani 8:15 Comments by DOE Program Manager John Van Rosendale 8:30 Astrophysics Perspective Tony Mezzacappa, ORNL 9:15 Climate Perspective John Drake, ORNL 10:00 –10:15 Break 10:15 HEP Perspective Doug Olson, LBNL 11:00 Biology Perspective Dave Nelson, LLNL 11:45 Putting software into production Randy Burris, ORNL 12:00 Lunch

16 16 Arie Shoshani, LBNL SDM center Agenda – Afternoon 1:00 PM (P1) Heterorgeneous Data Access Area Leader: Terence Critchlow - Supporting Heterogeneous Data Access in Genomics Presenter: Terence Critchlow -Context-sensitive Service Composition for Support of Scientific Workflows Presenter: Mladen A. Vouk - XWRAPComposer: A wrapper generation system for Integrating Bioinformatics Data Sources Presenter: Ling Liu - Constructing Workflows by Integrating Interactive Information Sources Presenters: Amarnath Gupta & Ilkay Altintas 2:00 PM P2) Data Mining and Access Pattern Discovery Area Leader: Nagiza Samatova - ASPECT: Adaptable Simulation Product Exploration and Control Toolkit presenter: Nagiza Samatova - Dimension Reduction and Sampling presenter: Imola Fodor - Discovery of Access Patterns to Scientific Simulation Data presenter: Ghaleb Abdulla 3:30 PM (P3) Efficient Access from Large Datasets area Leader: Arie ShoshanI - Supporting Ad-hoc Data Exploration for Large Scientific Databases presenter: Arie Shoshani - Efficient Bitmap Indexing Techniques for Very Large Datasets presenter: John Wu - Shared Disk File Caching Taking into Account Delays in Space Reservations, Transfer, and processing presenter: Ekow Otoo - Optimizing Shared Access to Tertiary Storage presenter: Randy Burris 4:30 PM (P4) Parallel Disk Access & Grid-IO Area Leaders: Bill Gropp and Alok Choudhary - Parallel and Grid I/O Infrastructure presenter: Rob Ross - Enabling High Performance Application I/O presenter: Wei-keng Liao 5:30 Comments from application people (1 hour) (free form discussion)

17 17 Arie Shoshani, LBNL SDM center Agenda – Day 2 8:00 Welcome and logistics 8:30 Recap and planning 9:30 Project Breakout meetings (2 Hours) Specific technical goals in next half year SDM-ISIC people involved Application people involved Estimated schedule Longer term projections (2-3 years) Identify potential new applications – future focus Lunch 1:00 Project breakout meetings (2 Hours) 3:00 Summary of meetings (2 Hour) (30 min per project) 5:00 Conclusion and planning


Download ppt "1 Arie Shoshani, LBNL SDM center Scientific Data Management Center (Integrated Software Infrastructure Center – ISIC) Arie Shoshani All Hands Meeting March."

Similar presentations


Ads by Google