Richard P. Mount CHEP 2000Data Analysis for SLAC Physics Richard P. Mount CHEP 2000 Padova February 10, 2000.

Slides:



Advertisements
Similar presentations
Jens G Jensen Atlas Petabyte store Supporting Multiple Interfaces to Mass Storage Providing Tape and Mass Storage to Diverse Scientific Communities.
Advertisements

12th September 2002Tim Adye1 RAL Tier A Tim Adye Rutherford Appleton Laboratory BaBar Collaboration Meeting Imperial College, London 12 th September 2002.
Jean-Yves Nief, CC-IN2P3 Wilko Kroeger, SCCS/SLAC Adil Hasan, CCLRC/RAL HEPiX, SLAC October 11th – 13th, 2005 BaBar data distribution using the Storage.
1 Data Storage MICE DAQ Workshop 10 th February 2006 Malcolm Ellis & Paul Kyberd.
What is it? Hierarchical storage software developed in collaboration with five US department of Energy Labs since 1992 Allows storage management of 100s.
1 Andrew Hanushevsky - HEPiX, October 6-8, 1999 Mass Storage For BaBar at SLAC Andrew Hanushevsky Stanford.
Magda – Manager for grid-based data Wensheng Deng Physics Applications Software group Brookhaven National Laboratory.
PetaByte Storage Facility at RHIC Razvan Popescu - Brookhaven National Laboratory.
Mass RHIC Computing Facility Razvan Popescu - Brookhaven National Laboratory.
CHEP 2004 September 2004Richard P. Mount, SLAC Huge-Memory Systems for Data-Intensive Science Richard P. Mount SLAC CHEP, September 29, 2004.
The Mass Storage System at JLAB - Today and Tomorrow Andy Kowalski.
US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.
Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.
CASPUR Site Report Andrei Maslennikov Sector Leader - Systems Catania, April 2001.
9/16/2000Ian Bird/JLAB1 Planning for JLAB Computational Resources Ian Bird.
May Richard P. Mount, SLAC Advanced Computing Technology Overview Richard P. Mount Director: Scientific Computing and Computing Services Stanford.
Farm Management D. Andreotti 1), A. Crescente 2), A. Dorigo 2), F. Galeazzi 2), M. Marzolla 3), M. Morandin 2), F.
3rd Nov 2000HEPiX/HEPNT CDF-UK MINI-GRID Ian McArthur Oxford University, Physics Department
An Overview of PHENIX Computing Ju Hwan Kang (Yonsei Univ.) and Jysoo Lee (KISTI) International HEP DataGrid Workshop November 8 ~ 9, 2002 Kyungpook National.
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002.
23 Oct 2002HEPiX FNALJohn Gordon CLRC-RAL Site Report John Gordon CLRC eScience Centre.
8th November 2002Tim Adye1 BaBar Grid Tim Adye Particle Physics Department Rutherford Appleton Laboratory PP Grid Team Coseners House 8 th November 2002.
Introduction to U.S. ATLAS Facilities Rich Baker Brookhaven National Lab.
DATABASE MANAGEMENT SYSTEMS IN DATA INTENSIVE ENVIRONMENNTS Leon Guzenda Chief Technology Officer.
PPDG and ATLAS Particle Physics Data Grid Ed May - ANL ATLAS Software Week LBNL May 12, 2000.
Jefferson Lab Site Report Kelvin Edwards Thomas Jefferson National Accelerator Facility HEPiX – Fall, 2005.
20-22 September 1999 HPSS User Forum, Santa Fe CERN IT/PDP 1 History  Test system HPSS 3.2 installation in Oct 1997 IBM AIX machines with IBM 3590 drives.
21 st October 2002BaBar Computing – Stephen J. Gowdy 1 Of 25 BaBar Computing Stephen J. Gowdy BaBar Computing Coordinator SLAC 21 st October 2002 Second.
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
SLAC Site Report Chuck Boeheim Assistant Director, SLAC Computing Services.
Integrating JASMine and Auger Sandy Philpott Thomas Jefferson National Accelerator Facility Jefferson Ave. Newport News, Virginia USA 23606
RAL Site Report John Gordon IT Department, CLRC/RAL HEPiX Meeting, JLAB, October 2000.
LCG Phase 2 Planning Meeting - Friday July 30th, 2004 Jean-Yves Nief CC-IN2P3, Lyon An example of a data access model in a Tier 1.
JLAB Computing Facilities Development Ian Bird Jefferson Lab 2 November 2001.
ALMA Archive Operations Impact on the ARC Facilities.
Jefferson Lab Site Report Sandy Philpott Thomas Jefferson National Accelerator Facility Newport News, Virginia USA
CHEP 2000: 7-11 February, 2000 I. SfiligoiData Handling in KLOE 1 CHEP 2000 Data Handling in KLOE I.Sfiligoi INFN LNF, Frascati, Italy.
8 October 1999 BaBar Storage at CCIN2P3 p. 1 Rolf Rumler BaBar Storage at Lyon HEPIX and Mass Storage SLAC, California, U.S.A. 8 October 1999 Rolf Rumler,
Sep 02 IPP Canada Remote Computing Plans Pekka K. Sinervo Department of Physics University of Toronto 4 Sep IPP Overview 2 Local Computing 3 Network.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
DoE NGI Program PI Meeting, October 1999Particle Physics Data Grid Richard P. Mount, SLAC Particle Physics Data Grid Richard P. Mount SLAC Grid Workshop.
RAL Site report John Gordon ITD October 1999
PPDGLHC Computing ReviewNovember 15, 2000 PPDG The Particle Physics Data Grid Making today’s Grid software work for HENP experiments, Driving GRID science.
1D. Olson, SDM-ISIC Mtg, 26 Mar 2002 Scientific Data Management: An Incomplete Experimental HENP Perspective D. Olson, LBNL 26 March 2002 SDM-ISIC Meeting.
26 September 2000Tim Adye1 Data Distribution Tim Adye Rutherford Appleton Laboratory BaBar Collaboration Meeting 26 th September 2000.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
SLAC Status, Les CottrellESnet International Meeting, Kyoto July 24-25, 2000 SLAC Update Les Cottrell & Richard Mount July 24, 2000.
Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002.
11th April 2003Tim Adye1 RAL Tier A Status Tim Adye Rutherford Appleton Laboratory BaBar UK Collaboration Meeting Liverpool 11 th April 2003.
Randy MelenApril 14, Stanford Linear Accelerator Center Site Report April 1999 Randy Melen SLAC Computing Services/Systems HPC Team Leader.
RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,
BaBar and the GRID Tim Adye CLRC PP GRID Team Meeting 3rd May 2000.
Batch Software at JLAB Ian Bird Jefferson Lab CHEP February, 2000.
A UK Computing Facility John Gordon RAL October ‘99HEPiX Fall ‘99 Data Size Event Rate 10 9 events/year Storage Requirements (real & simulated data)
GDB meeting - Lyon - 16/03/05 An example of data management in a Tier A/1 Jean-Yves Nief.
W.A.Wojcik/CCIN2P3, HEPiX at SLAC, Oct CCIN2P3 Site report Wojciech A. Wojcik IN2P3 Computing Center URL:
PetaCache: Data Access Unleashed Tofigh Azemoon, Jacek Becla, Chuck Boeheim, Andy Hanushevsky, David Leith, Randy Melen, Richard P. Mount, Teela Pulliam,
Richard P. MountHEPIX-HEPNT Meeting, SLAC, October 1999 HEPIX-HEPNT Welcome and Belated Welcome Richard P. Mount SLAC.
1 Particle Physics Data Grid (PPDG) project Les Cottrell – SLAC Presented at the NGI workshop, Berkeley, 7/21/99.
Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.
Hall D Computing Facilities Ian Bird 16 March 2001.
GridPP10 Meeting CERN June 3 rd 2004
PC Farms & Central Data Recording
UK GridPP Tier-1/A Centre at CLRC
Grid Canada Testbed using HEP applications
SLAC B-Factory BaBar Experiment WAN Requirements
Presentation transcript:

Richard P. Mount CHEP 2000Data Analysis for SLAC Physics Richard P. Mount CHEP 2000 Padova February 10, 2000

Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 2 Some Hardware History 1994Still IBM mainframe dominated AIX farm growing (plus SLD Vaxes) 1996Tried to move SLD to AIX Unix farm 1997The rise of Sun -- farm plus SMP 1998Sun E10000 plus farm plus ‘datamovers’ Remove IBM mainframe 1999Bigger E10000, 300 Ultra 5s, more datamovers 2000E10000, 700+ farm machines, tens of datamovers etc. (plus SLD Vaxes)

Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 3

Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 4 Some non-Hardware History Historical Approaches: –Offline computing for SLAC experiments was not included explicitly in the cost of constructing or operating the experiments; –SLAC Computing Services (SCS) was responsible for running systems (only); –Physics groups were responsible for software tools. Some things have changed...

Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 5 BaBar Data Analysis 6 STK Powderhorn Silos with 20 ‘Eagle’ drives Tapes managed by HPSS Data-access mainly via Objectivity

Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 6 STK Powderhorn Silo

Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 7

Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 8

Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 9 # Bitfile Server # Name Server # Storage Servers # Physical Volume Library # Physical Volume Repositories # Storage System Manager # Migration/Purge Server # Metadata Manager # Log Daemon # Log Client # Startup Daemon # Encina/SFS # DCE Control Network Data Network HPSS: High Performance Storage System Andy Hanushevsky/SLAC

Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 10 # Bitfile Server # Name Server # Storage Servers # Physical Volume Library # Physical Volume Repositories # Storage System Manager # Migration/Purge Server # Metadata Manager # Log Daemon # Log Client # Startup Daemon # Encina/SFS # DCE Control Network Data Network HPSS at SLAC Andy Hanushevsky/SLAC

Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 11 oofs interface File system interface Objectivity DB in BaBar Andy Hanushevsky/SLAC oofs interface File system interface Datamover

Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 12 IR2 FED Conditions Configuration Ambient OPR FED Events Conditions Configuration Analysis FED Events Conditions Configuration Events HPSS Conditions etc. Analysis Computer CenterIR2 OPR Prompt Reconstruction Principal Data Flows

Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 13 IR2 FED Conditions Configuration Ambient OPR FED Events Conditions Configuration Analysis FED Events Conditions Configuration Events HPSS Conditions etc. Analysis Computer CenterIR2 OPR Prompt Reconstruction Daily “Sweep” Twice a week “Sweep” Database “Sweeps”

Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 14 OPR to Analysis “Sweep” 1) Flush OPR databases (tag, collection...) to HPSS 2) “diff” Analysis and OPR federation catalogs 3) Stage in (some) missing Analysis databases from HPSS 4) Attach new databases to Analysis federation 200 Gbytes moved per Sweep 1 Tbyte per sweep left in HPSS but attached to Analysis Federation. Currently takes about 6 hours. Achievable target of < 30 minutes. Note that it takes at least 3 hours to stage in 1 TB using 10 tape drives.

Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 15 BaBar Offline Systems: August 1999

Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 16 Datamovers end 1999 Datamove4OPR Datamove5OPR Datamove1Reconstruction (real+MC) Datamove6Reconstruction (real+MC) Datamove3Export Datamove2RAW,REC managed stagein Datamove9RAW, REC anarchistic stagein Shire (E10k)Physics Analysis (6 disk arrays) Datamove7Testbed Datamove8Testbed Most are 4 processor Sun SMPs with two (0.5 or 0.8 TB each) disk arrays

Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 17 SLAC-BaBar Data Analysis System 50/400 simultaneous/total physicists, 300 Tbytes per year

Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 18 Problems, August-October 1999: Complex Systems, Lots of Data OPR could not keep up with data –blamed on Objectivity (partially true) Data analysis painfully slow –blamed on Objectivity (partially true) Linking BaBar code took forever –blamed on SCS, Sun, AFS, NFS and even BaBar Sun E10000 had low reliability and throughput –blamed on AFS (reliability), Objectivity (throughput)...

Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 19 BaBar Reconstruction Production: Performance Problems with Early Database Implementation

Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 20 Fixing the “OPR Objectivity Problem” BaBar Prompt Reconstruction Throughput (Test System)

Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 21 Fixing Physics Analysis “Objectivity Problems”: Ongoing Work Applying fixes found in OPR Testbed Use of Analysis systems and BaBar physicist as Analysis Testbed Extensive instrumentation essential A current challenge: –Can we de-randomize disk access (by tens of physicists and hundreds of jobs) –Partial relief now available by making real copies of popular collections

Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 22 Extensive (but still insufficient) Instrumentation 2 days traffic on one Datamove machine 6 weeks traffic on one Tapemove machine

Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 23 Kanga, the BaBar “Objectivity-Free” Root-I/O-based Alternative Aimed at final stages of data analysis Easy for universities to install Supports BaBar analysis framework Very successful validation of the insulating power of the BaBar transient-persistent interface Nearly working

Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 24 Exporting the Data CCIN2P3 (France) –Plan to mirror (almost) all BaBar data –Currently have “Fast” (DST) data only (~3 TB) –Typical delay is one month –Using Objectivity CASPUR (Italy) –Plan only to store “Fast” data (but its too big) –Data are at CASPUR but not yet available –Prefer Kanga RAL (UK) –Plan only to store “Fast” data –Using Objectivity

Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 25 Particle Physics Data Grid Universities, DoE Accelerator Labs, DoE Computer Science Particle Physics: a Network-Hungry Collaborative Application –Petabytes of compressed experimental data; –Nationwide and worldwide university-dominated collaborations analyze the data; –Close DoE-NSF collaboration on construction and operation of most experiments; –The PPDG lays the foundation for lifting the network constraint from particle-physics research. Short-Term Targets: –High-speed site-to-site replication of newly acquired particle-physics data (> 100 Mbytes/s); –Multi-site cached file-access to thousands of ~10 Gbyte files.

Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 26 High-Speed Site-to-Site File Replication Service Multi-Site Cached File Access

Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 27 PPDG Resources Network Testbeds: –ESNET links at up to 622 Mbits/s (e.g. LBNL-ANL) –Other testbed links at up to 2.5 Gbits/s (e.g. Caltech-SLAC via NTON) Data and Hardware: –Tens of terabytes of disk-resident particle physics data (plus hundreds of terabytes of tape-resident data) at accelerator labs; –Dedicated terabyte university disk cache; –Gigabit LANs at most sites. Middleware Developed by Collaborators: –Many components needed to meet short-term targets (e.g.Globus, SRB, MCAT, Condor,OOFS,Netlogger, STACS, Mass Storage Management) already developed by collaborators. Existing Achievements of Collaborators: –WAN transfer at 57 Mbytes/s; –Single site database access at 175 Mbytes/s

Richard P. Mount CHEP 2000Data Analysis for SLAC Physics Picture Show

Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 29

Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 30 Sun A3500 disk arrays used by BaBar (about 20 TB)

Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 31 NFS File Servers: Network Appliance F760 et al. ~ 3TB

Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 32 BaBar Datamovers (AMS Servers) and Tapemovers

Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 33 More BaBar Servers: Build, Objy Catalog, Objy Journal, Objy Test...

Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 34 Sun Ultra5 Batch Farm

Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 35 Sun Netra T1 Farm Machines (440Mhz UltraSparc, one rack unit high)

Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 36 Sun Netra T1 Farm now installing 450 machines about to order another 260

Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 37 Linux Farm

Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 38 Core Network Switches and Routers

Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 39 Cisco External Router one OC48 (2.4 Gbps) interface (OC12 interfaces to be added) four Gigabit Ethernets “Grid-Testbed Ready”

Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 40

Richard P. Mount CHEP 2000Data Analysis for SLAC Physics Money and People

Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 42 BaBar Offline Computing at SLAC: Costs other than Personnel (does not include “per physicist” costs such as desktop support, help desk, telephone, general site network) Does not include tapes

Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 43 BaBar Offline Computing at SLAC: Costs other than Personnel (does not include “per physicist” costs such as desktop support, help desk, telephone, general site network)

Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 44 BaBar Computing at SLAC: Personnel (SCS)

Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 45 BaBar Computing at SLAC: Personnel for Applications and Production Support Some guesses

Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 46 BaBar Computing Personnel The Whole Story? M a n y g u e s s e s

Richard P. Mount CHEP 2000Data Analysis for SLAC Physics Issues

Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 48 Complexity BaBar (and CDF,D0,RHIC,LHC) is driven to systems with ~1000 boxes performing tens of functions How to deliver reliable throughput with hundreds of users? –Instrument heavily –Build huge test systems –“Is this a physics experiment or a computer science experiment?”

Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 49 Objectivity Current technical problems: –Too few Object IDs (fix in ~ 1 year?) –Lockserver bottleneck (inelegant workarounds possible, more elegant fixes possible (e.g. read- only databases) –Endian translation problem (e.g. lousy Linux performance on Solaris-written databases) Non-technical problems –Will the (VL)ODBMS market take off? –If so, will Objectivity Inc. prosper?

Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 50 Personnel versus Equipment Should SLAC be spending more on people and buying cheaper stuff? We buy: –Disks at 5 x rock bottom –Tape drives at 5 x rock bottom –Farm CPU at 2-3 x rock bottom –Small SMP CPU at 2-3 x farms –Large SMP CPU at 5-10 x farms –Network stuff at “near monopoly” pricing All at (or slightly after) the very last moment I am uneasily happy with all these choices

Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 51 Personnel Issues Is the SLAC equipment/personnel ratio a good model? SLAC-SCS staff are: –smart –motivated –having fun –(unofficially) on call 24 x 7 –in need of reinforcements

Richard P. Mount CHEP 2000Data Analysis for SLAC Physics 52 BaBar Computing Coordinator The search is now on An exciting challenge Strong SLAC backing Contact me with your suggestions and enquiries