Presentation is loading. Please wait.

Presentation is loading. Please wait.

1D. Olson, SDM-ISIC Mtg, 26 Mar 2002 Scientific Data Management: An Incomplete Experimental HENP Perspective D. Olson, LBNL 26 March 2002 SDM-ISIC Meeting.

Similar presentations


Presentation on theme: "1D. Olson, SDM-ISIC Mtg, 26 Mar 2002 Scientific Data Management: An Incomplete Experimental HENP Perspective D. Olson, LBNL 26 March 2002 SDM-ISIC Meeting."— Presentation transcript:

1 1D. Olson, SDM-ISIC Mtg, 26 Mar 2002 Scientific Data Management: An Incomplete Experimental HENP Perspective D. Olson, LBNL 26 March 2002 SDM-ISIC Meeting Gatlinburg

2 2D. Olson, SDM-ISIC Mtg, 26 Mar 2002 Particle Physics Data Grid www.ppdg.net PI’s: Mount, Livny, Newman Coordinators: Pordes, Olson

3 3D. Olson, SDM-ISIC Mtg, 26 Mar 2002 Contents Quick overview of HENP data —Generic data flow —Sizes, timescales —Average physicist view What’s hard —Making technology work in production —A clear view for average physicist —Analysis of large datasets —Other things as well Today, many issues wrapped in hopes for “Data Grid”

4 4D. Olson, SDM-ISIC Mtg, 26 Mar 2002 Experimental HENP event data Basic character of data is “event” —May be few particles

5 5D. Olson, SDM-ISIC Mtg, 26 Mar 2002 BaBar event http://www.slac.stanford.edu/BFROOT/

6 6D. Olson, SDM-ISIC Mtg, 26 Mar 2002 Experimental HENP event data Basic character of data is “event” —May be few particles —May be MANY particles

7 7D. Olson, SDM-ISIC Mtg, 26 Mar 2002 STAR event, Au + Au www.star.bnl.gov

8 8D. Olson, SDM-ISIC Mtg, 26 Mar 2002 Experimental HENP event data Basic character of data is “event” —May be few tracks —May be MANY tracks Detector characteristics, beam types, triggers effect the type of events recorded Physics analysis is a statistical analysis of many (1000’s, M’s, B’s, T’s) independent events

9 9D. Olson, SDM-ISIC Mtg, 26 Mar 2002 Generic data flow in HENP “Skims”, “microDST production”, … Filtering chosen to make this a convenient size

10 10D. Olson, SDM-ISIC Mtg, 26 Mar 2002 A collaboration of people $100M, 10 yr, 100 people Free?, 10 yr, 20 people Free?, 1 yr, 10 people, 5x/yr Free?, 1 mo, 1 person, 50x/yr (“Typical” example today, LHC is larger)

11 11D. Olson, SDM-ISIC Mtg, 26 Mar 2002 Example: CMS Tiers

12 12D. Olson, SDM-ISIC Mtg, 26 Mar 2002 List of major accelerator-based HENP experiments ExperimentLocation# physicistsTime scale BaBarSLAC8001999 - 2010 STARBNL / RHIC4502000 - 2010 PHENIXBNL / RHIC4502000 - 2010 Jlab/CLASJLAB2002000 - 2010 CDFFNAL8001995 - 2010 D0FNAL8001995 - 2010 ATLASCERN20002006 - 2016 CMSCERN20002006 - 2016 ALICECERN12002007 - 2017 Jlab Hall DJLAB2002008 - 2018

13 13D. Olson, SDM-ISIC Mtg, 26 Mar 2002 Size / frequency of basic activities ItemSize (TB) / Frequency (/yr) Typical todayLHC era (>5 yr) Raw data100 TB / yr1,000 TB / yr Event Reconstruction 3 / yr2 / yr DST data1 > DST/ raw > 0.10.1 > DST/ raw > 0.02 microDST production 0.1 > microDST/DST >.001 ? Physics analysis10 - 100 * #physicists / year ?

14 14D. Olson, SDM-ISIC Mtg, 26 Mar 2002 Average physicist view Mythology, culture, terminology varies a lot from one experiment to another. BaBar —Object view or primary event store (Objectivity) —Event collection objects give primary access points to data Event collection has list of references to all event components of interest With 100,000 collections, how to organize them? —Ntuples & PAW for final data format, analysis tool STAR (first year data, getting started) —A “production, trigger” is all reconstructed events for a trigger type with a certain version of code, (P00hg, central) —Access point is list of directory path’s below which all data are stored on disk —WZ will be setting up STACS —ROOT for data format and analysis tool …

15 15D. Olson, SDM-ISIC Mtg, 26 Mar 2002 What’s hard I, living with technology Typical computer center today —A couple STK Powderhorn tape silos, HPSS or home-grown MSS —1000 linux processors —Assortment of 100/1000 Gbps network —50 TB disk (1000 spindles) —Network s/w for I/O (NFS, Objy AMS, RFIO, …) —AFS for distributed collaboration Can make large RAID filesystems w/ network access —Faults can affect many nodes stale NFS file handles AFS faults affects nodes across country, work —Large RAID is $$$ Desire to reduce effect of faults —Fewer faults —More tolerance …

16 16D. Olson, SDM-ISIC Mtg, 26 Mar 2002 What’s hard II, A clear view for average physicist What’s going on in this box?

17 17D. Olson, SDM-ISIC Mtg, 26 Mar 2002 What’s hard II, A clear view for average physicist What data is available? —“data” means List of files? (like STAR) Collection object w/ pointers to all events? (like BaBar) —“available” means On disk? Where? Exists? Does it really have the filters and calibrations I need? Is it the “official” version of the data? …

18 18D. Olson, SDM-ISIC Mtg, 26 Mar 2002 What’s hard III, Analysis of large datasets Dataset does not fit on disk, or requires parallel processing, or is large enough operation that chance of fault is high

19 19D. Olson, SDM-ISIC Mtg, 26 Mar 2002 What’s hard III, Analysis of large datasets Dataset does not fit on disk —Needs access s/w to couple w/ processing SAM, STACS —Does performance meet demand?

20 20D. Olson, SDM-ISIC Mtg, 26 Mar 2002 SAM (Sequential data Access via Meta-data) http://d0db.fnal.gov/sam/

21 21D. Olson, SDM-ISIC Mtg, 26 Mar 2002 STACS http://sdm.lbl.gov/projectindividual.php?ProjectID=STACS

22 22D. Olson, SDM-ISIC Mtg, 26 Mar 2002 What’s hard III, Analysis of large datasets Dataset does not fit on disk —Needs access s/w to couple w/ processing SAM, STACS —Does performance meet demand? Needs parallel processing (not very hard) —Can not do analysis on private/personal machine —Schedule access to shared resource (CPU and disk) Operation for a single analysis is large enough that faults occur —Need exception handling —Need workflow management to complete failed tasks or, at least, accurately report status

23 23D. Olson, SDM-ISIC Mtg, 26 Mar 2002 Example shared nothing cluster http://www.ihep.ac.cn/~chep01/paper/4-026.pdf

24 24D. Olson, SDM-ISIC Mtg, 26 Mar 2002 PPDG

25 25D. Olson, SDM-ISIC Mtg, 26 Mar 2002 Summary Faulty technology sets boundary conditions —Fault tolerant will expand boundaries of capabilities Data management is coupled with processing —Visualization (access w/o processing) is minor in HENP —Need access to data when & where it is needed for processing Working on data grid as context for data management PPDG has SDM ISIC as one of the technology base projects


Download ppt "1D. Olson, SDM-ISIC Mtg, 26 Mar 2002 Scientific Data Management: An Incomplete Experimental HENP Perspective D. Olson, LBNL 26 March 2002 SDM-ISIC Meeting."

Similar presentations


Ads by Google