Presentation is loading. Please wait.

Presentation is loading. Please wait.

Scientific Data Management (SDM)

Similar presentations


Presentation on theme: "Scientific Data Management (SDM)"— Presentation transcript:

1 Scientific Data Management (SDM)
Center For Enabling Technologies (CET) Lead Institution: LBNL Coordinating PI: Arie Shoshani

2 Scientific Data Management Center
Participating Institutions Center PI: Arie Shoshani LBNL DOE Laboratories co-PIs: Bill Gropp, Rob Ross* ANL Arie Shoshani, Doron Rotem LBNL Terence Critchlow*, Chandrika Kamath LLNL Nagiza Samatova* ORNL Universities co-PIs : Mladen Vouk North Carolina State Alok Choudhary Northwestern Bertram Ludaescher, Ilkay Altinas UC Davis + SDSC Steve Parker U of Utah * Area Leaders

3 A Typical SDM Scenario Task A: Generate Time-Steps Flow Tier Task B:
Move TS Task C: Analyze TS Task D: Visualize TS + Control Flow Layer Applications & Software Tools Layer Data Mover Post Processing Parallel R Terascale Browser Simulation Program Work Tier I/O System Layer HDF5 Libraries Parallel NetCDF PVFS Sabul SRM Storage & Network Resouces Layer

4 Technology Details by Layer
Scientific Scientific WorkFlow WorkFlow Web Web Process Process Management Management Wrapping Wrapping Automation Automation Tools Tools Tools Tools (SPA) (SPA) Layer Layer Data Data ASPECT: ASPECT: Parallel R Data Data Efficient Efficient Efficient Efficient Mining & Mining & integration integration Statistical Analysis Analysis indexing indexing Parallel Parallel Analysis Analysis Framework Framework tools tools (Bitmap (Bitmap Visualization Visualization Analysis (DMA) (DMA) (PCA, ICA) (PCA, ICA) Index) Index) ( ( pVTK pVTK ) ) Layer Layer Storage Storage Storage Storage Parallel Parallel ROMIO ROMIO Parallel Parallel Efficient Efficient Resource Resource NetCDF NetCDF MPI MPI - - IO IO Virtual Virtual Access Access Manager Manager Software Software System System File File (SEA) (SEA) (To HPSS) (To HPSS) Layer Layer System System Layer Layer Hardware, OS, and MSS (HPSS) Hardware, OS, and MSS (HPSS)

5 Example Data Flow in TSI
Logistical Network Courtesy: John Blondin

6 Using the Scientific Workflow Tool (Kepler) Emphasizing Dataflow (SDSC, NCSU, LLNL)
Automate data generation, transfer and visualization of a large-scale simulation at ORNL

7 FastBit A compressed bitmap indexing technology for efficient searching of read-only data Don’t say anything about URL

8 FastBit Overview FastBit is designed to search multi-dimensional data
Conceptually in table format rows  objects columns  attributes FastBit uses vertical (column-oriented) organization for the data Efficient for analysis of read-only data FastBit uses compressed bitmap indices to speed up searches Proven in analysis to be optimal for single-attribute queries Superior to other optimal indices because they are also efficient for multi-attribute queries column row Vertical organization examples: Kdb (1998), C-store (2005)

9 Basic Bitmap Index Data values b0 b1 b2 b3 b4 b5 1 1 1 1 1 1 1 5 3 2 4
Compact: one bit per distinct value per object Easy to build: faster than common B-trees Efficient to query: only bitwise logical operations A < 2  b0 OR b1 2<A<5  b3 OR b4 Efficient for multi-dimensional queries Use bitwise operations to combine the partial results b0 b1 b2 b3 b4 b5 1 1 1 1 1 1 1 5 3 2 4 =0 =1 =2 =3 =4 =5

10 Grid Collector Features
Key features of the Grid Collector: Providing transparent object access Selecting objects based on their attribute values Improving analysis system’s throughput Enabling interactive distributed data analysis Here is what we get when they work together. Refer to the picture first: (logical request, bitmap index, hrm, files back into cache, event iterator deliver the event to the analysis code) Then say, this provides: (the point on the top)

11 Grid Collector Speeds up Analyses
Test machine: 2.8 GHz Xeon, 27 MB/s read speed When searching for rare events, say, selecting one event out of 1000, using GC is 20 to 50 times faster Using GC to read 1/2 of events, speedup > 1.5, 1/10 events, speed up > 2.

12 FastBit-Based Multi-Attribute Region Finding is Theoretically Optimal
Time required to identify regions in 3D Supernova simulation (LBNL) Flame Front discovery (range conditions for multiple measures) in a combustion simulation (Sandia) On 3D data with over 110 million points, region finding takes less than 2 seconds

13 from Files to Object Management
Objects On-Demand: from Files to Object Management A Scientific Application Partership (SAP) Lead Institution: BNL Coordinating PI: Jerome Lauret

14 Participating Institutions
BNL : Jerome Lauret LBNL : John Wu SLAC: Andy Hanushevsky Technologies FastBit SRM (DRM, HRM) xrootd

15 Client sees all servers as xrootd data servers
Xrootd:Single Level Switch A open file X Redirectors Cache file location 2nd open X go to C Who has file X? B go to C open file X I have C Client Redirector (Head Node) Data Servers Cluster Client sees all servers as xrootd data servers

16 Client sees all servers as xrootd data servers
Xrootd:Single Level Switch A DRM open file X Redirectors Cache file location archive 2nd open X go to C Who has file X? B DRM HRM go to C MSS open file X I have C DRM Client Redirector (Head Node) Data Servers Cluster Client sees all servers as xrootd data servers

17 Objects on-demand xrootd Here is what we get when they work together.
Refer to the picture first: (logical request, bitmap index, hrm, files back into cache, event iterator deliver the event to the analysis code) Then say, this provides: (the point on the top)

18 Storage Resource Management (SRM)
Center For Enabling Technologies (CET) Lead Institution: LBNL Coordinating PI: Alex Sim

19 Participating Institutions
BNL : Jerome Lauret FNAL : Don Petravick, Timur Perelmutov TJNAF : Andy Kowalski LBNL : Alex Sim, Arie Shoshani UCSD : Abhishek Singh Rana U. of Wisc: Miron Livny

20 Proposed work Development of new functional features as part of the SRM collaboration (coordinated by LBNL) Authorization Monitoring Performance estimation Development of new versions of SRMs by participating institutions Disk systems and HPSS (LBNL) dCache (FNAL) Jasmine (TJNAF)

21 New Aspects Development of monitoring components for bandwidth and networking availability (U. Wisc, FNAL) Better control of SRM behavior Performance estimation Integration of Lambda station interface into the SRM middleware (FNAL) Development of an authorization framework (UCSD) To enforce access privileges used by SRMs for policy declaration by VO and Sites

22 SRM Collaboration Continued support of SRMs in experiments and projects ATLAS (BNL, FNAL) CLAS (TJNAF) CMS (FNAL) CPES (LBNL) ESG (LBNL) Lattice QCD (TJNAF, FNAL) Phenix (BNL) STAR (BNL, LBNL) Coordination with other centers and institutes (including LCG, RAL, EGEE). Goal: joint specification of SRM through regular meetings, joint documents, and GGF participation


Download ppt "Scientific Data Management (SDM)"

Similar presentations


Ads by Google