Download presentation
Presentation is loading. Please wait.
Published byLionel Peters Modified over 9 years ago
1
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics Richard P. Mount CHEP 2000 Padova February 10, 2000
2
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 2 Some Hardware History 1994Still IBM mainframe dominated AIX farm growing (plus SLD Vaxes) 1996Tried to move SLD to AIX Unix farm 1997The rise of Sun -- farm plus SMP 1998Sun E10000 plus farm plus ‘datamovers’ Remove IBM mainframe 1999Bigger E10000, 300 Ultra 5s, more datamovers 2000E10000, 700+ farm machines, tens of datamovers etc. (plus SLD Vaxes)
3
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 3
4
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 4 Some non-Hardware History Historical Approaches: –Offline computing for SLAC experiments was not included explicitly in the cost of constructing or operating the experiments; –SLAC Computing Services (SCS) was responsible for running systems (only); –Physics groups were responsible for software tools. Some things have changed...
5
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 5 BaBar Data Analysis 6 STK Powderhorn Silos with 20 ‘Eagle’ drives Tapes managed by HPSS Data-access mainly via Objectivity
6
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 6 STK Powderhorn Silo
7
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 7
8
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 8
9
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 9 # Bitfile Server # Name Server # Storage Servers # Physical Volume Library # Physical Volume Repositories # Storage System Manager # Migration/Purge Server # Metadata Manager # Log Daemon # Log Client # Startup Daemon # Encina/SFS # DCE Control Network Data Network HPSS: High Performance Storage System Andy Hanushevsky/SLAC
10
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 10 # Bitfile Server # Name Server # Storage Servers # Physical Volume Library # Physical Volume Repositories # Storage System Manager # Migration/Purge Server # Metadata Manager # Log Daemon # Log Client # Startup Daemon # Encina/SFS # DCE Control Network Data Network HPSS at SLAC Andy Hanushevsky/SLAC
11
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 11 oofs interface File system interface Objectivity DB in BaBar Andy Hanushevsky/SLAC oofs interface File system interface Datamover
12
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 12 IR2 FED Conditions Configuration Ambient OPR FED Events Conditions Configuration Analysis FED Events Conditions Configuration Events HPSS Conditions etc. Analysis Computer CenterIR2 OPR Prompt Reconstruction Principal Data Flows
13
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 13 IR2 FED Conditions Configuration Ambient OPR FED Events Conditions Configuration Analysis FED Events Conditions Configuration Events HPSS Conditions etc. Analysis Computer CenterIR2 OPR Prompt Reconstruction Daily “Sweep” Twice a week “Sweep” Database “Sweeps”
14
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 14 OPR to Analysis “Sweep” 1) Flush OPR databases (tag, collection...) to HPSS 2) “diff” Analysis and OPR federation catalogs 3) Stage in (some) missing Analysis databases from HPSS 4) Attach new databases to Analysis federation 200 Gbytes moved per Sweep 1 Tbyte per sweep left in HPSS but attached to Analysis Federation. Currently takes about 6 hours. Achievable target of < 30 minutes. Note that it takes at least 3 hours to stage in 1 TB using 10 tape drives.
15
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 15 BaBar Offline Systems: August 1999
16
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 16 Datamovers end 1999 Datamove4OPR Datamove5OPR Datamove1Reconstruction (real+MC) Datamove6Reconstruction (real+MC) Datamove3Export Datamove2RAW,REC managed stagein Datamove9RAW, REC anarchistic stagein Shire (E10k)Physics Analysis (6 disk arrays) Datamove7Testbed Datamove8Testbed Most are 4 processor Sun SMPs with two (0.5 or 0.8 TB each) disk arrays
17
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 17 SLAC-BaBar Data Analysis System 50/400 simultaneous/total physicists, 300 Tbytes per year
18
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 18 Problems, August-October 1999: Complex Systems, Lots of Data OPR could not keep up with data –blamed on Objectivity (partially true) Data analysis painfully slow –blamed on Objectivity (partially true) Linking BaBar code took forever –blamed on SCS, Sun, AFS, NFS and even BaBar Sun E10000 had low reliability and throughput –blamed on AFS (reliability), Objectivity (throughput)...
19
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 19 BaBar Reconstruction Production: Performance Problems with Early Database Implementation
20
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 20 Fixing the “OPR Objectivity Problem” BaBar Prompt Reconstruction Throughput (Test System)
21
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 21 Fixing Physics Analysis “Objectivity Problems”: Ongoing Work Applying fixes found in OPR Testbed Use of Analysis systems and BaBar physicist as Analysis Testbed Extensive instrumentation essential A current challenge: –Can we de-randomize disk access (by tens of physicists and hundreds of jobs) –Partial relief now available by making real copies of popular collections
22
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 22 Extensive (but still insufficient) Instrumentation 2 days traffic on one Datamove machine 6 weeks traffic on one Tapemove machine
23
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 23 Kanga, the BaBar “Objectivity-Free” Root-I/O-based Alternative Aimed at final stages of data analysis Easy for universities to install Supports BaBar analysis framework Very successful validation of the insulating power of the BaBar transient-persistent interface Nearly working
24
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 24 Exporting the Data CCIN2P3 (France) –Plan to mirror (almost) all BaBar data –Currently have “Fast” (DST) data only (~3 TB) –Typical delay is one month –Using Objectivity CASPUR (Italy) –Plan only to store “Fast” data (but its too big) –Data are at CASPUR but not yet available –Prefer Kanga RAL (UK) –Plan only to store “Fast” data –Using Objectivity
25
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 25 Particle Physics Data Grid Universities, DoE Accelerator Labs, DoE Computer Science Particle Physics: a Network-Hungry Collaborative Application –Petabytes of compressed experimental data; –Nationwide and worldwide university-dominated collaborations analyze the data; –Close DoE-NSF collaboration on construction and operation of most experiments; –The PPDG lays the foundation for lifting the network constraint from particle-physics research. Short-Term Targets: –High-speed site-to-site replication of newly acquired particle-physics data (> 100 Mbytes/s); –Multi-site cached file-access to thousands of ~10 Gbyte files.
26
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 26 High-Speed Site-to-Site File Replication Service Multi-Site Cached File Access
27
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 27 PPDG Resources Network Testbeds: –ESNET links at up to 622 Mbits/s (e.g. LBNL-ANL) –Other testbed links at up to 2.5 Gbits/s (e.g. Caltech-SLAC via NTON) Data and Hardware: –Tens of terabytes of disk-resident particle physics data (plus hundreds of terabytes of tape-resident data) at accelerator labs; –Dedicated terabyte university disk cache; –Gigabit LANs at most sites. Middleware Developed by Collaborators: –Many components needed to meet short-term targets (e.g.Globus, SRB, MCAT, Condor,OOFS,Netlogger, STACS, Mass Storage Management) already developed by collaborators. Existing Achievements of Collaborators: –WAN transfer at 57 Mbytes/s; –Single site database access at 175 Mbytes/s
28
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics Picture Show
29
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 29
30
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 30 Sun A3500 disk arrays used by BaBar (about 20 TB)
31
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 31 NFS File Servers: Network Appliance F760 et al. ~ 3TB
32
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 32 BaBar Datamovers (AMS Servers) and Tapemovers
33
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 33 More BaBar Servers: Build, Objy Catalog, Objy Journal, Objy Test...
34
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 34 Sun Ultra5 Batch Farm
35
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 35 Sun Netra T1 Farm Machines (440Mhz UltraSparc, one rack unit high)
36
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 36 Sun Netra T1 Farm now installing 450 machines about to order another 260
37
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 37 Linux Farm
38
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 38 Core Network Switches and Routers
39
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 39 Cisco 12000 External Router one OC48 (2.4 Gbps) interface (OC12 interfaces to be added) four Gigabit Ethernets “Grid-Testbed Ready”
40
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 40
41
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics Money and People
42
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 42 BaBar Offline Computing at SLAC: Costs other than Personnel (does not include “per physicist” costs such as desktop support, help desk, telephone, general site network) Does not include tapes
43
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 43 BaBar Offline Computing at SLAC: Costs other than Personnel (does not include “per physicist” costs such as desktop support, help desk, telephone, general site network)
44
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 44 BaBar Computing at SLAC: Personnel (SCS)
45
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 45 BaBar Computing at SLAC: Personnel for Applications and Production Support Some guesses
46
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 46 BaBar Computing Personnel The Whole Story? M a n y g u e s s e s
47
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics Issues
48
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 48 Complexity BaBar (and CDF,D0,RHIC,LHC) is driven to systems with ~1000 boxes performing tens of functions How to deliver reliable throughput with hundreds of users? –Instrument heavily –Build huge test systems –“Is this a physics experiment or a computer science experiment?”
49
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 49 Objectivity Current technical problems: –Too few Object IDs (fix in ~ 1 year?) –Lockserver bottleneck (inelegant workarounds possible, more elegant fixes possible (e.g. read- only databases) –Endian translation problem (e.g. lousy Linux performance on Solaris-written databases) Non-technical problems –Will the (VL)ODBMS market take off? –If so, will Objectivity Inc. prosper?
50
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 50 Personnel versus Equipment Should SLAC be spending more on people and buying cheaper stuff? We buy: –Disks at 5 x rock bottom –Tape drives at 5 x rock bottom –Farm CPU at 2-3 x rock bottom –Small SMP CPU at 2-3 x farms –Large SMP CPU at 5-10 x farms –Network stuff at “near monopoly” pricing All at (or slightly after) the very last moment I am uneasily happy with all these choices
51
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 51 Personnel Issues Is the SLAC equipment/personnel ratio a good model? SLAC-SCS staff are: –smart –motivated –having fun –(unofficially) on call 24 x 7 –in need of reinforcements
52
Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 52 BaBar Computing Coordinator The search is now on An exciting challenge Strong SLAC backing Contact me with your suggestions and enquiries (richard.mount@stanford.edu)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.