Summary of the First Database Survey J.N. Butler Oct. 11, 2001.

Slides:



Advertisements
Similar presentations
Provenance-Aware Storage Systems Margo Seltzer April 29, 2005.
Advertisements

Configuration Management
March 24-28, 2003Computing for High-Energy Physics Configuration Database for BaBar On-line Rainer Bartoldus, Gregory Dubois-Felsmann, Yury Kolomensky,
System Design System Design - Mr. Ahmad Al-Ghoul System Analysis and Design.
1 Databases in ALICE L.Betev LCG Database Deployment and Persistency Workshop Geneva, October 17, 2005.
Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
Peter Chochula, January 31, 2006  Motivation for this meeting: Get together experts from different fields See what do we know See what is missing See.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
EventStore Managing Event Versioning and Data Partitioning using Legacy Data Formats Chris Jones Valentin Kuznetsov Dan Riley Greg Sharp CLEO Collaboration.
Chapter 3: System design. System design Creating system components Three primary components – designing data structure and content – create software –
Introduction to Databases Transparencies
Jianchun (JC) Wang, 08/21/99 RICH Electronics and DAQ Chip Carrier Short Cable Transition Board Long Cable Data Board Crate J.C.Wang Syracuse University.
1Paul Kyberd - CM17 - February 2007 Database(s) for MICE  It is clear that we will need some sort of data base to hold information such as: u Geometry.
GLAST LAT ProjectNovember 18, 2004 I&T Two Tower IRR 1 GLAST Large Area Telescope: Integration and Test One and Two Tower Integration Readiness Review.
Operating Systems.
Check Disk. Disk Defragmenter Using Disk Defragmenter Effectively Run Disk Defragmenter when the computer will receive the least usage. Educate users.
Chapter 1 Overview of Databases and Transaction Processing.
Tango Asset Management Capabilities Presentation 1.Desktop & Web Architecture 2.Equipment Management 3.Condition Management 4.Data Mining 5.Parameter Trending.
CLEO’s User Centric Data Access System Christopher D. Jones Cornell University.
Module Title? DBMS Introduction to Database Management System.
CSC271 Database Systems Lecture # 4.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 1 DATABASE SYSTEMS (Cont’d) Instructor Ms. Arwa Binsaleh.
1 Lecture 19 Configuration Management Software Engineering.
File Processing - Database Overview MVNC1 DATABASE SYSTEMS Overview.
HPS Online Software Discussion Jeremy McCormick, SLAC Status and Plans.
IceCube DAQ Mtg. 10,28-30 IceCube DAQ: “DOM MB to Event Builder”
© 2007 by Prentice Hall 1 Introduction to databases.
Claudia-Elisabeth Wulz Institute for High Energy Physics Vienna Level-1 Trigger Menu Working Group CERN, 9 November 2000 Global Trigger Overview.
DC12 Commissioning Status GOALS: establish operating conditions, determine initial calibration parameters and measure operating characteristics for the.
Bookkeeping Tutorial. Bookkeeping & Monitoring Tutorial2 Bookkeeping content  Contains records of all “jobs” and all “files” that are created by production.
Event Data History David Adams BNL Atlas Software Week December 2001.
- Ahmad Al-Ghoul Data design. 2 learning Objectives Explain data design concepts and data structures Explain data design concepts and data structures.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
Prediction W. Buchmueller (DESY) arXiv:hep-ph/ (1999)
Introduction to Software Development. Systems Life Cycle Analysis  Collect and examine data  Analyze current system and data flow Design  Plan your.
JANA and Raw Data David Lawrence, JLab Oct. 5, 2012.
DATABASE MANAGEMENT SYSTEMS CMAM301. Introduction to database management systems  What is Database?  What is Database Systems?  Types of Database.
Clara Gaspar, March 2005 LHCb Online & the Conditions DB.
David Adams ATLAS Virtual Data in ATLAS David Adams BNL May 5, 2002 US ATLAS core/grid software meeting.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
All Experimenters MeetingDmitri Denisov Week of July 7 to July 15 Summary  Delivered luminosity and operating efficiency u Delivered: 1.4pb -1 u Recorded:
Monte-Carlo Event Database: current status Sergey Belov, JINR, Dubna.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
Why A Software Review? Now have experience of real data and first major analysis results –What have we learned? –How should that change what we do next.
9 December 2003D. Menasce. S. Magni: Database requirements for the Silicon Tracker 1 Database requirements for the Inner Silicon Tracker in BTeV First.
Database David Forrest. What database? DBMS: PostgreSQL. Run on dedicated Database server at RAL Need to store information on conditions of detector as.
Pavel Nevski DDM Workshop BNL, September 27, 2006 JOB DEFINITION as a part of Production.
Summary of User Requirements for Calibration and Alignment Database Magali Gruwé CERN PH/AIP ALICE Offline Week Alignment and Calibration Workshop February.
Summary of Workshop on Calibration and Alignment Database Magali Gruwé CERN PH/AIP ALICE Computing Day February 28 th 2005.
Overview of PHENIX Muon Tracker Data Analysis PHENIX Muon Tracker Muon Tracker Software Muon Tracker Database Muon Event Display Performance Muon Reconstruction.
Online Consumers produce histograms (from a limited sample of events) which provide information about the status of the different sub-detectors. The DQM.
Finding Data in ATLAS. May 22, 2009Jack Cranshaw (ANL)2 Starting Point Questions What is the latest reprocessing of cosmics? Are there are any AOD produced.
Evelyn Thomson Ohio State University Page 1 XFT Status CDF Trigger Workshop, 17 August 2000 l XFT Hardware status l XFT Integration tests at B0, including:
7.1 Operating Systems. 7.2 A computer is a system composed of two major components: hardware and software. Computer hardware is the physical equipment.
VI/ CERN Dec 4 CMS Software Architecture vs Hybrid Store Vincenzo Innocente CMS Week CERN, Dec
Chapter 1 Overview of Databases and Transaction Processing.
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
AliRoot survey: Calibration P.Hristov 11/06/2013.
Advanced Higher Computing Science The Project. Introduction Worth 60% of the total marks for the course Must include: An appropriate interface using input.
ATLAS Detector Resources & Lumi Blocks Enrico & Nicoletta.
Local Alarm Station Data Acquisition, Storage and Visualization for Radiation Portal Monitor (RPM).
Advanced Higher Computing Science
Slow Control and Run Initialization Byte-wise Environment
Slow Control and Run Initialization Byte-wise Environment
TRANSACTION PROCESSING SYSTEM (TPS)
Component and Deployment Diagrams
Database Management System (DBMS)
Manuscript Transcription Assistant Initiative
The ultimate in data organization
Presentation transcript:

Summary of the First Database Survey J.N. Butler Oct. 11, 2001

Goals of Survey Get people to begin to think about the requirements in this area -- nothing final or “binding” Get some idea of scope of needs, sizes of databases, access patterns to begin the discussion of which DBMS’s to use Explore commonalities

Arbitrary Categories 1.Detector construction tracking databases 2.Calibration databases 3.Configuration databases 4.Monitoring databases 5. Event-related databases 6. Analysis and Simulation support databases 7. Documentation and general information databases

Summary of Questions purpose of the database source of the data -- where is the information that populates the database generated? quantity of the data (e.g. number of channels, data per channel, frequency of the data for each channel, etc) Anticipated use of the data/access patterns

Summary of Groups Responding Pixel EMCAL RICH Muon Forward Silicon Forward Straws Data Acquisition System Trigger The individual group responses and some contributed opinions are on the web page. Also, on the web page are WORD documents with the responses organized in categories, rather than by group

Summary of Detector Construction Tracking Databases All construction subprojects plan to have them They are all likely to be small, <~1 Gbyte bfi( before inflation). They will be used to track components, subassemblies, to do trending of yields, selection of acceptable components, matching components to their environment, etc Basic output is probably response on terminal or paper Query rate will not be high Personal comment: People will probably demand a GUI for data entries and queries

Muon Response For each tube (~75K channels) Wire used Delrin plug used Brass pin used Who strung the tube When they strung it Where it was strung What the tension is Who measured it When it was measured Where it was measured What the efficiency is Who measured it When it was measured Where it was measured In principle the tension and efficiency for a tube can be measured multiple times, but for most tubes this will be a once-off during For each ASDQ chip we store Serial number Lot Position in card Delivery Date For each ASDQ channel (~75K) 2 measures of input voltage (+/-) 2 measures of output voltage (+/-) 1 threshold 5 measures of output width 2 measures of calibration pulse for a total of 12 values/channel For each plank (~2500 items) we store Its serial number (Barcode) Its length Who made it, Where, When THERE ARE LINKS FROM AN OBJECTS TO IT PARENTS

EMCAL Construction Construction database with 1) purpose : tracking parts like status of a crystal, PMT and electronics for each channel, position in the detector, cable interconnections 2) source of data: crystal status - from quality control DB, PMT and electronics - from some other DB or directly from testing setup, some info may come from an operator. 3) quantity of data: 100bytes x channels x several stages like testing, installation, going from, going to, etc total ~23 Mbytes. 4) anticipated use of the data/access patterns: reports, later may be used for repairing (what cable goes to what channel of electronics)

Summary of Calibration Databases People tended to look upon this as meaning initial calibrations, some of which would be done on the bench, or with cosmic rays or with test beam or initial beam exposure, and perhaps repeated at fixed, not too frequent intervals. Forward silicon saw all calibration done through DAQ, which is consistent with their experience Trigger and DAQ need access to calibration databases for initialization of hardware and programs, but do not themselves have needs in this area

Configuration Databases Gets updated once a new calibration indicates there is a need to change threshold values. Definite relationship with the Calibration Constants database, source of data needed to compute new set of initialization constants Possible relationship with the Monitor Values database: the latter gets here reference values to check against monitored ones Possible relationship with the Detector Construction database: initialization might be done only on components declared installed by the latter database. Size will depend on how often configuration needs to change Change of configuration may need to be quick once it determined to be necessary When to make change, how to know when it has occurred

Monitoring Databases The detector groups saw this as mostly monitoring temperatures, pressures, flow rates and Monitoring gains, thresholds, pedestals, drift speeds, high voltage, current draw, and occupancy Trigger and DAQ were interested in data rates, trigger rates, event sizes and physics indicators, numerous distributions that could be used to track performance from individual channels through fairly high level physics quantities -- like specific “golden B decay modes” Total database sizes are not large except for the last category. Stability will determine recording frequency

Event Related Databases We plan for 40 Billion events/year. This is a rate of 200 Mbytes/beam- second. An event catalog would probably be pretty hard to search. If we use files catalogs for the archived data, how many files might we have? The answer, for raw data, comes to 20,000 files/year X 100Gbytes/filesize(in Gbytes) This would hold 2 million events X filesize(in Gbytes)/100Gbytes So, there would be of order 100,000 entries/year in a “files” database for the raw data, if we used 20Gbyte files. Although the data might on a file might not represent events taken in an approximately contiguous time, but might be scrambled, if they were, 100 Gbytes would represent ~8 minutes of running. Assume three times this many files for physics analysis and simulation datasets.

Run State Database Another type of events are run transitions. For reconstruction purposes (including L2/L3) - a record of the detector status and Configuration at begin run time, at the time when we enable the trigger (start run) etc. We could also include periodic updates. What is a RUN? Presumably it implies a period which has a stable, well-defined set of constants. Given that the events are only approximately ordered, how do we define such a period. How do we synchronize parameter changes? This database would be used to locate events saved in the storage systems. The data that is generated for this database will come from the trigger system, which will tag events that seem unusual and should be studied. Pathological events can provide indications of trigger-system failures that were not caught by the monitoring system, bugs in trigger algorithms, failures in detector components, or hints of unanticipated physics signals. Pathological Event Database

Analysis and Simulation Support Databases Analysis and simulation programs clearly need access to many of the other databases There needs to be a run history database that they access There will be an electronic logbook, using a database for information storage, with shift information and probably a streamlined version for use by reconstruction and analysis programs Analysis History and Conditions database. It is very important to know exactly what program was run and with what inputs. For production jobs, we will probably follow the rule of having a status word on each event that records a “code” which defines the complete production code and environment (databases, etc) used in the processing. For physics analysis, this is probably impractical. It may be that a standard use of the electronic logbook for this purpose can be adopted. This is no an easy problem.

Summary of Document Databases 1.The purpose of the database is to have one central collection of all BTeV documentation. It will maintain a local copy of each document. Documents may be as simple as a picture of the detector. Among other things, we expect that presentations at our meetings will be entered into the database 2.Most data will be entered by hand via a web interface by members of the collaboration. 3.If we are entering small “documents”, then I expect that we could easily have 40K or so. I believe that CDF now has 5K documents using a more traditional definition of what a document is. Information to be stored for each document includes title, author list, location, category list (each document is allowed to reside in multiple categories), submission date, revision date, revision number, and abstract or short description. 4.This will be used continually by collaborators and other physicists wishing to find information about the experiment. All data must be backed up. The documents themselves will not reside in the database, but in a structured directory tree which must be accessible by the web server (and which also must be backed up). We may want to mirror this at a couple of sites (e.g. Italy) for ease of access. However, there should not be more than a handful of mirror sites so that maintenance does not become a headache.

Conclusion and Next Steps There will be many databases and most will be small in terms of disk size --1 to 100 Gbytes (bfi). Storage is rarely a consideration but speed of access for typical queries, user interface, and application program interface are issues Classic database issues all apply Use patterns not clear but BTeV is committed to facilitating local analysis and distributed analysis so this will be a design consideration

Conclusion and Next Steps Separation into categories is probably helpful but monitoring and calibration probably need better delineation and “static” vs “active” monitoring may be a useful distinction. The “category” view of this survey needs to be completed. We need to capture, as preliminary, whatever numerical information we have although I think at this point it is not very precise We need to write a document that can trigger another, more complete round of work We need to think about what a “run” means and how parameters can be changed in anorgnainzed, trackable fashion We need to at least define “requirement categories” such as speed of access for typical transactions, requirements for backup, mirroring, distribution, access rules and security, etc We have a lot of work ahead but I think this survey provided a good beginning and I want to thank everyone who contributed

Arbitrary “Categories” Detector construction tracking databases: These are used to track parts inventories, processing steps including the progress and location of subassemblies, quality assurance test results, etc. Calibration databases: calibration constants determined in test beam runs, cosmic ray runs, from pulser data, or fromevent data (e.g. in situ calibrations). These could include pedestals, gains, start times, velocities, etc. Configuration databases: These are constants that must be downloaded to initialize and run the system. These could include hardware configurations such a physical and logical addresses of modules, and such quantities as high voltages, thresholds, pedestals, masks (e.g. to suppress bad channels), trigger masks and trigger configurations/definition and cuts, alignment parameters etc. Some of these may be static and others may be determined from the calibration, monitoring or analysis systems. and may change each run or with some other frequency. Monitoring databases: These include monitoring of environmental conditions such a temperatures, barometric pressures, luminosity monitoring, trigger rates, data sizes, pedestals, gains, alignments, physics quantities, and physics signals,tracked over the whole time of the experiment.

Arbitrary “Categories” Event-related databases: At present, BTeV does not plan to have an event database but does plan to have an extensive metadata database, or data catalog, to locate events on the various storage systems. This would catalog datasets associated with raw and reconstructed data and many datasets generated for physics analysis. Simulation data sets would presumably be handled in the same catalog. The metadata catalog might hold information on each event that could be used by itself to make some high level selection of the data. Analysis and Simulation support databases: These include the geometry database, and various run databases and simulation sample databases, databases associated with the control room logbook, computer processing databases (what has been processed through each step of the analysis chain), etc Documentation and general information databases: These include catalogs of BTeV and external documentation needed by the group