Using an Object Oriented Database to Store BaBar's Terabytes

Slides:



Advertisements
Similar presentations
31/03/00 CMS(UK)Glenn Patrick What is the CMS(UK) Data Model? Assume that CMS software is available at every UK institute connected by some infrastructure.
Advertisements

12th September 2002Tim Adye1 RAL Tier A Tim Adye Rutherford Appleton Laboratory BaBar Collaboration Meeting Imperial College, London 12 th September 2002.
Resources for the ATLAS Offline Computing Basis for the Estimates ATLAS Distributed Computing Model Cost Estimates Present Status Sharing of Resources.
Randall Sobie The ATLAS Experiment Randall Sobie Institute for Particle Physics University of Victoria Large Hadron Collider (LHC) at CERN Laboratory ATLAS.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
Jean-Yves Nief, CC-IN2P3 Wilko Kroeger, SCCS/SLAC Adil Hasan, CCLRC/RAL HEPiX, SLAC October 11th – 13th, 2005 BaBar data distribution using the Storage.
The B A B AR G RID demonstrator Tim Adye, Roger Barlow, Alessandra Forti, Andrew McNab, David Smith What is BaBar? The BaBar detector is a High Energy.
1 Andrew Hanushevsky - HEPiX, October 6-8, 1999 Mass Storage For BaBar at SLAC Andrew Hanushevsky Stanford.
CERN/IT/DB Multi-PB Distributed Databases Jamie Shiers IT Division, DB Group, CERN, Geneva, Switzerland February 2001.
February 19th 2009AlbaNova Instrumentation Seminar1 Christian Bohm Instrumentation Physics, SU Upgrading the ATLAS detector Overview Motivation The current.
2/10/2000 CHEP2000 Padova Italy The BaBar Online Databases George Zioulas SLAC For the BaBar Computing Group.
Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.
25 February 2000Tim Adye1 Using an Object Oriented Database to Store BaBar's Terabytes Tim Adye Particle Physics Department Rutherford Appleton Laboratory.
CMS Report – GridPP Collaboration Meeting VI Peter Hobson, Brunel University30/1/2003 CMS Status and Plans Progress towards GridPP milestones Workload.
April 2001HEPix/HEPNT1 RAL Site Report John Gordon CLRC, UK.
Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.
2nd April 2001Tim Adye1 Bulk Data Transfer Tools Tim Adye BaBar / Rutherford Appleton Laboratory UK HEP System Managers’ Meeting 2 nd April 2001.
November 7, 2001Dutch Datagrid SARA 1 DØ Monte Carlo Challenge A HEP Application.
Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002.
Fermilab User Facility US-CMS User Facility and Regional Center at Fermilab Matthias Kasemann FNAL.
8th November 2002Tim Adye1 BaBar Grid Tim Adye Particle Physics Department Rutherford Appleton Laboratory PP Grid Team Coseners House 8 th November 2002.
LHC Computing Review - Resources ATLAS Resource Issues John Huth Harvard University.
Finnish DataGrid meeting, CSC, Otaniemi, V. Karimäki (HIP) DataGrid meeting, CSC V. Karimäki (HIP) V. Karimäki (HIP) Otaniemi, 28 August, 2000.
GridPP18 Glasgow Mar 07 DØ – SAMGrid Where’ve we come from, and where are we going? Evolution of a ‘long’ established plan Gavin Davies Imperial College.
21 st October 2002BaBar Computing – Stephen J. Gowdy 1 Of 25 BaBar Computing Stephen J. Gowdy BaBar Computing Coordinator SLAC 21 st October 2002 Second.
7April 2000F Harris LHCb Software Workshop 1 LHCb planning on EU GRID activities (for discussion) F Harris.
RAL Site Report John Gordon IT Department, CLRC/RAL HEPiX Meeting, JLAB, October 2000.
The BTeV Project at Fermilab Talk to DOE/Fermi Group Jan. 11, 2001 Introduction – Joel Butler Tracking detectors – David Christian Particle Identification.
LCG Phase 2 Planning Meeting - Friday July 30th, 2004 Jean-Yves Nief CC-IN2P3, Lyon An example of a data access model in a Tier 1.
Ian Bird LHC Computing Grid Project Leader LHC Grid Fest 3 rd October 2008 A worldwide collaboration.
The LHCb CERN R. Graciani (U. de Barcelona, Spain) for the LHCb Collaboration International ICFA Workshop on Digital Divide Mexico City, October.
Les Les Robertson LCG Project Leader High Energy Physics using a worldwide computing grid Torino December 2005.
26 September 2000Tim Adye1 Data Distribution Tim Adye Rutherford Appleton Laboratory BaBar Collaboration Meeting 26 th September 2000.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
11th November 2002Tim Adye1 Distributed Analysis in the BaBar Experiment Tim Adye Particle Physics Department Rutherford Appleton Laboratory University.
Feb. 14, 2002DØRAM Proposal DØ IB Meeting, Jae Yu 1 Proposal for a DØ Remote Analysis Model (DØRAM) Introduction Partial Workshop Results DØRAM Architecture.
Outline  Higgs Particle Searches for Origin of Mass  Grid Computing  A brief Linear Collider Detector R&D  The  The grand conclusion: YOU are the.
Heritage The first electron synchrotron in Europe was constructed in Glasgow in the 1950s E = 300 MeV.
Heritage The first electron synchrotron in Europe was constructed in Glasgow in the 1950s E = 300 MeV.
BaBar and the GRID Tim Adye CLRC PP GRID Team Meeting 3rd May 2000.
LHC Computing, CERN, & Federated Identities
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
A UK Computing Facility John Gordon RAL October ‘99HEPiX Fall ‘99 Data Size Event Rate 10 9 events/year Storage Requirements (real & simulated data)
A Data Handling System for Modern and Future Fermilab Experiments Robert Illingworth Fermilab Scientific Computing Division.
1 June 11/Ian Fisk CMS Model and the Network Ian Fisk.
Jianming Qian, UM/DØ Software & Computing Where we are now Where we want to go Overview Director’s Review, June 5, 2002.
1 Particle Physics Data Grid (PPDG) project Les Cottrell – SLAC Presented at the NGI workshop, Berkeley, 7/21/99.
11th September 2002Tim Adye1 BaBar Experience Tim Adye Rutherford Appleton Laboratory PPNCG Meeting Brighton 11 th September 2002.
DØ Grid Computing Gavin Davies, Frédéric Villeneuve-Séguier Imperial College London On behalf of the DØ Collaboration and the SAMGrid team The 2007 Europhysics.
Hall D Computing Facilities Ian Bird 16 March 2001.
LHC collisions rate: Hz New PHYSICS rate: Hz Event selection: 1 in 10,000,000,000,000 Signal/Noise: Raw Data volumes produced.
ATLAS – statements of interest (1) A degree of hierarchy between the different computing facilities, with distinct roles at each level –Event filter Online.
Ian Bird WLCG Workshop San Francisco, 8th October 2016
Understanding the nature of matter -
The LHC Computing Challenge
The ZEUS Event Store An object-oriented tag database for physics analysis Adrian Fox-Murphy, DESY CHEP2000, Padova.
Dagmar Adamova, NPI AS CR Prague/Rez
Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group
The LHC Computing Grid Visit of Her Royal Highness
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
Grid Canada Testbed using HEP applications
Measuring the B-meson’s brief but eventful life
Nuclear Physics Data Management Needs Bruce G. Gibbard
SLAC B-Factory BaBar Experiment WAN Requirements
ALICE Data Challenges Fons Rademakers Click to add notes.
Kanga Tim Adye Rutherford Appleton Laboratory Computing Plenary
Measuring the B-meson’s brief but eventful life
LHCb thinking on Regional Centres and Related activities (GRIDs)
Event Storage GAUDI - Data access/storage Framework related issues
The LHC Computing Grid Visit of Professor Andreas Demetriou
Presentation transcript:

Using an Object Oriented Database to Store BaBar's Terabytes Tim Adye Particle Physics Department CLRC Workshop on Advanced Data Storage and Management Techniques Sunday, 24 February 2019 Tim Adye

Outline The BaBar experiment at SLAC Data storage requirements Use of an Object Oriented Database Data organisation SLAC RAL Future experiments Sunday, 24 February 2019 Tim Adye

BaBar The BaBar experiment is based in California at the Stanford Linear Accelerator Center, and was designed and built by more than 500 physicists from 10 countries, including from 9 UK Universities and RAL. It is looking for the subtle differences between the decay of the B0 meson and its antiparticle (B0). If this “CP Violation” is large enough, it could explain the cosmological matter-antimatter asymmetry. We are are looking for a subtle effect in a rare (and difficult to identify) decay, so need to record the results of a large numbers of events. Sunday, 24 February 2019 Tim Adye

SLAC, PEP-II, and BaBar Sunday, 24 February 2019 Tim Adye

How much data? Since BaBar started operation in May, we have recorded 7 million events. 4 more years’ running and continually improving luminosity. Eventually record data at ~100 Hz; ~109 events/year. Each event uses 100-300kb. Also need to generate 1-10 times that number of simulated events. Recorded 5 Tb Expect to reach ~300 Tb/year Ie. 1-2 Pb in the lifetime of the experiment. Sunday, 24 February 2019 Tim Adye

Why an OODBMS? BaBar has adopted C++ and OO techniques The first large HEP experiment to do so wholesale. An OO Database has a more natural interface for C++ (and Java). Require distributed database Event processing and analysis takes place on many processors 200 node farm at SLAC Data structures will change over time Cannot afford to reprocess everything Schema evolution Objectivity chosen Front runner also at CERN Sunday, 24 February 2019 Tim Adye

How do we organise the data? Traditional HEP analyses read each event and select relevant events, for which additional processing is done. Can be done with sequential file Many different analyses performed by BaBar physicists. In BaBar there is too much data. Won’t work if all the people to read all the data all of the time. Even if all of it were on disk. Organise data into different levels of detail Stored in separate files tag, “microDST”, full reconstruction, raw data Objectivity keeps track of cross-references Only read more detailed information for selected events. But different selections for different analyses Sunday, 24 February 2019 Tim Adye

What happens at SLAC? Cannot store everything on disk Maybe 5 Tb, but not 1 Pb. Already buying ~1 Tb disk per month. Analysis requires frequent access to summary information. Keep tag and “microDST” on disk Rest in mass store (HPSS at SLAC) Main challenge is getting this to scale to hundreds of processes/ors reading and writing at the same time. The vendor seems to believe we can do it. “The Terabyte Wars are over While other vendors quarrel about who can store 1 Terabyte in a database, the BaBar physics experiment at the Stanford Linear Accelerator Center (SLAC) has demonstrated putting 1 Terabyte of data PER DAY into an Objectivity Database.” Top news item on Objectivity web site But it took a lot of work... Sunday, 24 February 2019 Tim Adye

Performance Scaling A lot of effort has gone into improving speed of recording events Ongoing work on obtain similar improvements in data access. Sunday, 24 February 2019 Tim Adye

RAL as a Regional Centre Cannot do everything at SLAC Even with all the measures to improve analysis efficiency at SLAC, it cannot support entire collaboration. Network connection from UK is slow, sometimes very slow, occasionally unreliable. Therefore need to allow analysis outside SLAC. “Regional Centres” in UK, France, and Italy. RAL is the UK Regional Centre. Major challenge to transfer data from SLAC, and to reproduce databases and analysis environment at RAL. Sunday, 24 February 2019 Tim Adye

RAL Setup At RAL, have just installed Sun analysis and data server machines with 5 Tb disk UK Universities have 0.5-1 Tb locally All part of £800k JREI award Import microDST using DLT-IV ~70 Gb/tape with compression Interfaced to Atlas Datastore (see John Gordon’s talk). Less-used parts of the federation can be archived Can be brought back to disk on demand needs further automation Also acts as a local backup. Sunday, 24 February 2019 Tim Adye

Other Experiments BaBar’s requirements are modest with respect to what is to come. 2001 Tevatron Run II: ~1 Pb/year. 2005 LHC: many Pb/year. Choice of HSM. HPSS is expensive. Maybe we don’t need all the bells and whistles. But already in use at SLAC/CERN/... EuroStore (EU/CERN/DESY/...) ENSTORE (Fermilab) CASTOR (CERN) Which way should RAL go? Is Objectivity well-suited to our use? Develop our own? Espresso (CERN) BaBar is being watched closely... Sunday, 24 February 2019 Tim Adye