The PHysics Analysis SERver Project (PHASER) CHEP 2000 Padova, Italy February 7-11, 2000 M. Bowen, G. Landsberg, and R. Partridge* Brown University.

Slides:



Advertisements
Similar presentations
System Integration and Performance
Advertisements

Chapter 12 File Processing and Data Management Concepts
© Copyright 2012 STI INNSBRUCK Apache Lucene Ioan Toma based on slides from Aaron Bannert
IELM 230: File Storage and Indexes Agenda: - Physical storage of data in Relational DB’s - Indexes and other means to speed Data access - Defining indexes.
Oct 31, 2000Database Management -- Fall R. Larson Database Management: Introduction to Terms and Concepts University of California, Berkeley School.
IT ARCHITECTURE © Holmes Miller BUILDING METAPHOR 3CUSTOMER’S CONCERN Has vision about building that will meet needs and desires 3ARCHITECT’S CONCERN.
3-1 Chapter 3 Data and Knowledge Management
Internet Resources Discovery (IRD) IBM DB2 Digital Library Thanks to Zvika Michnik and Avital Greenberg.
Introduction to Databases Transparencies
Chapter 3 Chapter 3: Server Hardware. Chapter 3 Learning Objectives n Describe the base system requirements for Windows NT 4.0 Server n Explain how to.
Operational Dataset Update Functionality Included in the NCAR Research Data Archive Management System 1 Zaihua Ji Doug Schuster Steven Worley Computational.
Hardware and Software Basics. Computer Hardware  Central Processing Unit - also called “The Chip”, a CPU, a processor, or a microprocessor  Memory (RAM)
Teaching and Learning with Technology  Allyn and Bacon 2002 Introduction to Personal Computers in the Classroom Chapter 3 Teaching and Learning with Technology.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
CLEO’s User Centric Data Access System Christopher D. Jones Cornell University.
The D0 Monte Carlo Challenge Gregory E. Graham University of Maryland (for the D0 Collaboration) February 8, 2000 CHEP 2000.
IT The Relational DBMS Section 06. Relational Database Theory Physical Database Design.
1 Physical Data Organization and Indexing Lecture 14.
Gorman, Stubbs, & CEP Inc. 1 Introduction to Operating Systems Lesson 12 Windows 2000 Server.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
GLAST LAT ProjectDOE/NASA Baseline-Preliminary Design Review, January 8, 2002 K.Young 1 LAT Data Processing Facility Automatically process Level 0 data.
Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA.
TM 7-1 Copyright © 1999 Addison Wesley Longman, Inc. Physical Database Design.
Update on a New EPICS Archiver Kay Kasemir and Leo R. Dalesio 09/27/99.
Data Import Data Export Mass Storage & Disk Servers Database Servers Tapes Network from CERN Network from Tier 2 and simulation centers Physics Software.
1 Selecting LAN server (Week 3, Monday 9/8/2003) © Abdou Illia, Fall 2003.
DATABASE MANAGEMENT SYSTEMS IN DATA INTENSIVE ENVIRONMENNTS Leon Guzenda Chief Technology Officer.
Finnish DataGrid meeting, CSC, Otaniemi, V. Karimäki (HIP) DataGrid meeting, CSC V. Karimäki (HIP) V. Karimäki (HIP) Otaniemi, 28 August, 2000.
RAMCloud: Low-latency DRAM-based storage Jonathan Ellithorpe, Arjun Gopalan, Ashish Gupta, Ankita Kejriwal, Collin Lee, Behnam Montazeri, Diego Ongaro,
OCR GCSE Computing © Hodder Education 2013 Slide 1 OCR GCSE Computing Chapter 2: Memory.
21 st October 2002BaBar Computing – Stephen J. Gowdy 1 Of 25 BaBar Computing Stephen J. Gowdy BaBar Computing Coordinator SLAC 21 st October 2002 Second.
1/14/2005Yan Huang - CSCI5330 Database Implementation – Storage and File Structure Storage and File Structure.
Touchstone Automation’s DART ™ (Data Analysis and Reporting Tool)
9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.
4/5/2007Data handling and transfer in the LHCb experiment1 Data handling and transfer in the LHCb experiment RT NPSS Real Time 2007 FNAL - 4 th May 2007.
V.Sirotenko, July Status of Online Databases Currently there are 2 online Oracle Databases running on d0online cluster: 1.Production DB, d0onprd,
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
CPSC 404, Laks V.S. Lakshmanan1 External Sorting Chapter 13: Ramakrishnan & Gherke and Chapter 2.3: Garcia-Molina et al.
Chapter 1 Computer Abstractions and Technology. Chapter 1 — Computer Abstractions and Technology — 2 The Computer Revolution Progress in computer technology.
Lee Lueking 1 The Sequential Access Model for Run II Data Management and Delivery Lee Lueking, Frank Nagy, Heidi Schellman, Igor Terekhov, Julie Trumbo,
LHC Physics Analysis and Databases or: “How to discover the Higgs Boson inside a database” Maaike Limper.
The KLOE computing environment Nuclear Science Symposium Portland, Oregon, USA 20 October 2003 M. Moulson – INFN/Frascati for the KLOE Collaboration.
7 Strategies for Extracting, Transforming, and Loading.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
A.Abhari CPS1251 Topic 1: Introduction to Computers Computer Hardware Computer components Connecting Computers Computer Software Operating System (OS)
20 Copyright © 2008, Oracle. All rights reserved. Cache Management.
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
A first look at CitusDB & in- database physics analysis M. Limper 19/06/2014.
Scalable data access with Impala Zbigniew Baranowski Maciej Grzybek Daniel Lanza Garcia Kacper Surdy.
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
TAGS in the Analysis Model Jack Cranshaw, Argonne National Lab September 10, 2009.
D0 Farms 1 D0 Run II Farms M. Diesburg, B.Alcorn, J.Bakken, R. Brock,T.Dawson, D.Fagan, J.Fromm, K.Genser, L.Giacchetti, D.Holmgren, T.Jones, T.Levshina,
CONFIDENTIAL Overview NTP Software Object Store and Cloud Connector™ (OSCC™) has a carefully structured architecture that includes a number of collaborative.
Jianming Qian, UM/DØ Software & Computing Where we are now Where we want to go Overview Director’s Review, June 5, 2002.
October 19, 2010 David Lawrence JLab Oct. 19, 20101RootSpy -- CHEP10, Taipei -- David Lawrence, JLab Parallel Session 18: Software Engineering, Data Stores,
ISC321 Database Systems I Chapter 2: Overview of Database Languages and Architectures Fall 2015 Dr. Abdullah Almutairi.
ATLAS Distributed Computing Tutorial Tags: What, Why, When, Where and How? Mike Kenyon University of Glasgow.
Databases (CS507) CHAPTER 2.
Chapter 9: The Client/Server Database Environment
Database System Concepts and Architecture
CMS High Level Trigger Configuration Management
Running virtualized Hadoop, does it make sense?
Introduction to Operating Systems
The COMPASS event store in 2002
The ZEUS Event Store An object-oriented tag database for physics analysis Adrian Fox-Murphy, DESY CHEP2000, Padova.
The Client/Server Database Environment
Physical Database Design
CIS16 Application Development – Programming with Visual Basic
Operational Dataset Update Functionality Included in the NCAR Research Data Archive Management System Zaihua Ji Doug Schuster Steven Worley Computational.
Presentation transcript:

The PHysics Analysis SERver Project (PHASER) CHEP 2000 Padova, Italy February 7-11, 2000 M. Bowen, G. Landsberg, and R. Partridge* Brown University

Richard Partridge2 What is the PHASER project? u Effort to substantially increase productivity of physicists analyzing multi-TB summary data sets u Our immediate focus is on the DØ experiment »600 million data events/year starting in early 2001 »Summary data set expected to grow at rate of 3TB/year u Concentrate on event selection and “ntuple” creation stage »transition in data handling from monolithic reconstruction processing to the much more chaotic processing of summary data by many physicisits »IO and CPU intensive due to need to apply latest calibration, particle ID, and event selection algorithms to several hundred million events

Richard Partridge3 PHASER Architecture u Physics Object Database (POD) stores meta-data used by most physics analyses for their initial event selection u Physics Object and Particle ID tables in POD store calibrated 4-vectors, object quality variables, and results of particle ID algorithms  DVD storage of full summary (  DST) data set and useful subsets of larger DST and STA data sets

Richard Partridge4 PHASER is PHast u New calibrations and particle ID algorithms can be quickly incorporated »Only the changes need to be importd »Regenerating the large  DST data set will only be done infrequently u Storage of up-to-date calibrations and particle ID algorihtms avoids the need to re-apply these alogorithms for each event selection pass u Particle ID tables are small, making it possible to quickly eliminate events not having the desired set of physics objects  Direct access to full  DST sample on DVD allows a  DST subset to be quickly generated for advanced analyses developing new algorithms not yet in the database

Richard Partridge5 The Physics Object Database (POD) u Stores fully calibrated meta-data associated with the various physics objects »leptons, photons, jets, missing ET, secondary vertices, triggers, etc. »for example, an electron object would have the energy, direction, and various quantities used in the electron ID algorithms stored u Each physics object associated with a table in a relational database u Primary key uniquely identifies each physics object and provides information needed to correlate physics objects from a single event »Currently use Run, Event, Instance (where appropriate) and row number from ntuple used to load database »Alternative: data source index, sequence number, and instance

Richard Partridge6 Why use a Relational Database? u Physics objects typically have a fixed set of attributes used for event selection and analysis u Independence of tables aids loading, updating database »Data can be “bulk loaded” as long as primary key is provided in input data stream u Several vendors with quite capable products, large commercial market

Richard Partridge7 Prototype POD u Use DØ Run 1 data ( running period) u 62 million events loaded into the database u Entire “All-Stream” data set loaded »Data set used by almost all DØ physics analyses »Only files with special processing or trigger conditions excluded u Column-wise ntuple format used for importing/exporting data

Richard Partridge8 DØ Run 1 POD u Including indexes, Run 1 POD occupies ~100 GB »58% physics object data »18% indexes on object E T »12% primary keys »12% database overhead

Richard Partridge9 POD Benchmarks u Z  e + e - candidate event selection: »7 seconds to identify ~6k events  W  e  candidate event selection: »18 seconds to identify ~86k events u Both benchmarks times make use of particle ID tables u Event selection times compare very favorably with ~1000 CPU hours required to generate ntuples used in this study Benchmark Hardware/Software u 450 MHz dual-processor Pentium II with 256 MB RAM u Database stored on (6) 36 GB disks in Raid 0 stripe set u MS SQL Server running on Windows NT 4.0

Richard Partridge10 DVD Storage u Provide access to additional event information not included in POD u DVD-RAM has a number of unique capabilities »Less expensive than disk storage, doesn’t require backup »Access to individual events is much faster than tape storage u Current disk capacity is 2.6 GB, 4.7 GB expected soon u Commercial DVD libraries hold up to 600 DVD disks »2.8 TB capacity using 4.7 GB DVD-RAM disks »Average disk load time of 4.5 s, <1 hour to cycle through 600 disks »Up to 6 DVD-RAM drives gives ~10 MB/s IO rate

Richard Partridge11 Web Interface u Plan to develop web-based user interface u Interface modelled on “3-tier” architecture widely used in commercial applications u Physicist will enter event selection requirements using a Java applet u Applet communicates request to “Physics Intelligence” middleware running on PHASER system (via CORBA) »Translate request to SQL for event selection »Verify that request can be accommodated within resource constraints »Produce the requested output files

Richard Partridge12 PHASER Output u Several output options: »List of run and event numbers satisfying the request »Ntuple created from POD information »  DST stream containing requested events from DVD library u Output files will generally be small enough to transfer over the network u Larger output files can be written to DVD and physically sent to physicist for further analysis

Richard Partridge13 Conclusions u PHASER offers a way for both experts, novices, and “dinosaurs” to quickly extract information about a particular class of events u Feasibility of loading “Run 1” size physics object info into a relational database has been demonstrated u Significant improvements in event selection time has been observed for W/Z benchmarks u Expect these results will scale up to Run 2 data load u Database technology is also potentially useful for helping manage complex analyses and storing intermediate results