Analysis Tools at D0 PPDG Analysis Grid Computing Project, CS 11 Caltech Meeting Lee Lueking Femilab Computing Division December 19, 2002.

Slides:



Advertisements
Similar presentations
GridPP July 2003Stefan StonjekSlide 1 SAM middleware components Stefan Stonjek University of Oxford 7 th GridPP Meeting 02 nd July 2003 Oxford.
Advertisements

RunJob in CMS Greg Graham Discussion Slides. RunJob in CMS RunJob is an Application Configuration and Job Creation Tool –RunJob uses metadata to abstract.
CMS Applications Towards Requirements for Data Processing and Analysis on the Open Science Grid Greg Graham FNAL CD/CMS for OSG Deployment 16-Dec-2004.
Amber Boehnlein, FNAL D0 Computing Model and Plans Amber Boehnlein D0 Financial Committee November 18, 2002.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
EventStore Managing Event Versioning and Data Partitioning using Legacy Data Formats Chris Jones Valentin Kuznetsov Dan Riley Greg Sharp CLEO Collaboration.
Application architectures
Application architectures
CVSQL 2 The Design. System Overview System Components CVSQL Server –Three network interfaces –Modular data source provider framework –Decoupled SQL parsing.
Object Oriented Databases by Adam Stevenson. Object Databases Became commercially popular in mid 1990’s Became commercially popular in mid 1990’s You.
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
Shuei MEG review meeting, 2 July MEG Software Status MEG Software Group Framework Large Prototype software updates Database ROME Monte Carlo.
The SAMGrid Data Handling System Outline:  What Is SAMGrid?  Use Cases for SAMGrid in Run II Experiments  Current Operational Load  Stress Testing.
Remote Production and Regional Analysis Centers Iain Bertram 24 May 2002 Draft 1 Lancaster University.
Grid Job and Information Management (JIM) for D0 and CDF Gabriele Garzoglio for the JIM Team.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
November 7, 2001Dutch Datagrid SARA 1 DØ Monte Carlo Challenge A HEP Application.
HPS Online Software Discussion Jeremy McCormick, SLAC Status and Plans.
RISICO on the GRID architecture First implementation Mirko D'Andrea, Stefano Dal Pra.
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
NoSQL Databases Oracle - Berkeley DB. Content A brief intro to NoSQL About Berkeley Db About our application.
- Iain Bertram R-GMA and DØ Iain Bertram RAL 13 May 2004 Thanks to Jeff Templon at Nikhef.
1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
SAM and D0 Grid Computing Igor Terekhov, FNAL/CD.
Data Management Console Synonym Editor
ATLAS and GridPP GridPP Collaboration Meeting, Edinburgh, 5 th November 2001 RWL Jones, Lancaster University.
DB-based DAQ monitoring and Physics analysis tools Emiliano Barbuto European Emulsion Group (LNGS May 2003)
Databases E. Leonardi, P. Valente. Conditions DB Conditions=Dynamic parameters non-event time-varying Conditions database (CondDB) General definition:
What is a schema ? Schema is a collection of Database Objects. Schema Objects are logical structures created by users to contain, or reference, their data.
Frontiers in Massive Data Analysis Chapter 3.  Difficult to include data from multiple sources  Each organization develops a unique way of representing.
Data Grid projects in HENP R. Pordes, Fermilab Many HENP projects are working on the infrastructure for global distributed simulated data production, data.
9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.
ALICE, ATLAS, CMS & LHCb joint workshop on
Dzero MC production on LCG How to live in two worlds (SAM and LCG)
EGEE is a project funded by the European Union under contract IST HEP Use Cases for Grid Computing J. A. Templon Undecided (NIKHEF) Grid Tutorial,
Elizabeth Gallas August 9, 2005 CD Support for D0 Database Projects 1 Elizabeth Gallas Fermilab Computing Division Fermilab CD Grid and Data Management.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Data reprocessing for DZero on the SAM-Grid Gabriele Garzoglio for the SAM-Grid Team Fermilab, Computing Division.
SAM - Sequential Data Access via Metadata Schema Metadata Functionality Workshop Glasgow University April 26-28,2004.
Metadata Mòrag Burgon-Lyon University of Glasgow.
Lee Lueking 1 The Sequential Access Model for Run II Data Management and Delivery Lee Lueking, Frank Nagy, Heidi Schellman, Igor Terekhov, Julie Trumbo,
Database Server Concepts and Possibilities Lee Lueking D0 Data Browser Workshop April 8, 2002.
GridPP11 Liverpool Sept04 SAMGrid GridPP11 Liverpool Sept 2004 Gavin Davies Imperial College London.
EGEE User Forum Data Management session Development of gLite Web Service Based Security Components for the ATLAS Metadata Interface Thomas Doherty GridPP.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
UTA MC Production Farm & Grid Computing Activities Jae Yu UT Arlington DØRACE Workshop Feb. 12, 2002 UTA DØMC Farm MCFARM Job control and packaging software.
Jean-Roch Vlimant, CERN Physics Performance and Dataset Project Physics Data & MC Validation Group McM : The Evolution of PREP. The CMS tool for Monte-Carlo.
Re-Reconstruction Of Generated Monte Carlo In a McFarm Context 2003/09/26 Joel Snow, Langston U.
General SAM Hints and Tips Adam Lyon (FNAL/CD/D0) GCAS Meeting 12/19/02 Removing Special RunsChecking Sam’s Health Cutting on Detector QualitySamRoot DB.
Adapting SAM for CDF Gabriele Garzoglio Fermilab/CD/CCF/MAP CHEP 2003.
Pavel Nevski DDM Workshop BNL, September 27, 2006 JOB DEFINITION as a part of Production.
Data Management with SAM at DØ The 2 nd International Workshop on HEP Data Grid Kyunpook National University Daegu, Korea August 22-23, 2003 Lee Lueking.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
D0 File Replication PPDG SLAC File replication workshop 9/20/00 Vicky White.
Finding Data in ATLAS. May 22, 2009Jack Cranshaw (ANL)2 Starting Point Questions What is the latest reprocessing of cosmics? Are there are any AOD produced.
DZero Monte Carlo Production Ideas for CMS Greg Graham Fermilab CD/CMS 1/16/01 CMS Production Meeting.
Introduction to the SAM System at DØ Physics 5391 July 1, 2002 Mark Sosebee U.T. Arlington.
Introduction to Core Database Concepts Getting started with Databases and Structure Query Language (SQL)
DBS Monitor and DAN CD Projects Report July 9, 2003.
Jianming Qian, UM/DØ Software & Computing Where we are now Where we want to go Overview Director’s Review, June 5, 2002.
DZero Monte Carlo Status, Performance, and Future Plans Greg Graham U. Maryland - Dzero 10/16/2000 ACAT 2000.
CDF SAM Deployment Status Doug Benjamin Duke University (for the CDF Data Handling Group)
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
Shahkar/MCRunjob: An HEP Workflow Planner for Grid Production Processing Greg Graham CD/CMS Fermilab GruPhyN 15 October 2003.
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
Sept Wyatt Merritt Run II Computing Review1 Status of SAMGrid / Future Plans for SAMGrid  Brief introduction to SAMGrid  Status and deployments.
The Client/Server Database Environment
Conditions Data access using FroNTier Squid cache Server
Lee Lueking D0RACE January 17, 2002
Presentation transcript:

Analysis Tools at D0 PPDG Analysis Grid Computing Project, CS 11 Caltech Meeting Lee Lueking Femilab Computing Division December 19, 2002

Lee Lueking - FNAL/CD DZero Analysis Tools  SAM dataset tools –Database Schema –Datasets and Query Dimensions  MC RunJob Monte Carlo request system –Physics Groups submit web based requests for MC –Provides prioritization, work assignment, and tracking. D0 Framework, D0Tools, D0 Run Time Environment (RTE) ROOT –Using ROOT non-intrusive I/O for Dzero EDM (Event Data Model) –ROOT/SAM and ROOT/SAM-Grid (Philippe Canal, up next) More in this talk

December 19, 2002Lee Lueking - FNAL/CD SAM Simplified Database Schema Files ID Name Format Size # Events Files ID Name Format Size # Events Events ID Event Number Trigger L1 Trigger L2 Trigger L3 Off-line Filter Thumbnail Events ID Event Number Trigger L1 Trigger L2 Trigger L3 Off-line Filter Thumbnail Volume Project Data Tier Physical Data Stream Physical Data Stream Trigger Configuration Trigger Configuration Creation & Processing Info Creation & Processing Info Run Event-File Catalog Event-File Catalog Run Conditions Luminosity Calibration Trigger DB Alignment Run Conditions Luminosity Calibration Trigger DB Alignment Group and User information Group and User information Station Config. & Cache info Station Config. & Cache info File Storage Locations File Storage Locations MC Process & Decay MC Process & Decay SAM schema has over 100 tables There are several other related tablespaces also available

December 19, 2002Lee Lueking - FNAL/CD fileRunEventDateTriggerApoApp vsn… file1 file2 file3 file4 file5 filen Challenge: Transform the complex SAM schema into a form that is user friendly, and avoids badly formed user SQL queries. Solution: Transform the schema to look like one giant table. Dimension Name DataFile

December 19, 2002Lee Lueking - FNAL/CD Dimensions, Links, and Chains Matthew Vranicar (Piocon Technologies), SAM team This transformation is done using what we call links and chains. A link is a description of how to relate two tables A chain is a set of links that connects the desired dimension (column in a table) to the datafile. These are stored in the database itself, and loaded in the middle tier server when it starts up. A grammar is provided for users to build complex queries employing dimensions.

December 19, 2002Lee Lueking - FNAL/CD Dimensions:Examples There are dozens of dimensions available. Additional dimensions are easily defined. Examples of dimensions defined: –APPL_NAME, APPL_NAME_ANALYZED, CONSUMED_DATE, CONSUMED_STATUS, CONSUMER, CONSUMER_GROUP, CONSUMER_ID, CREATE_DATE, DATASET_DEF_ID, DATASET_DEF_NAME, DATASET_ID, DATASET_VERSION, DATA_FILE_LOCATION_STATUS, DATA_TIER, DATA_TIER_ANALYZED, DELIVERED_STATUS, EVENT_NUMBER, FAMILY, FAMILY_ANALYZED, FILE_ANALYZED, FILE_NAME, FILE_PARTITION, FILE_STATUS, FULL_PATH, LOGICAL_DATASTREAM_NAME, PARAM_TYPE, RUN_ID, RUN_NUMBER, RUN_QUALITY, VERSION, VERSION_ANALYZED, WORK_GRP_NAME, etc., etc., etc. __SET__ : Special dimension allowing you to include an existing dataset definition.

December 19, 2002Lee Lueking - FNAL/CD Query Syntax and Grammar Constraint operators: =, !=, >, =, <=, like, not like, in, not in, between, is null, is not null Sets operators: and, or, minus, (union, intersection to be added) syntax: --dim="[(]name [conOper] value [setOper name [conOper] value][)]..." Command line examples: –sam define dataset --defname=dataset_definition_name -- group=work_group_name --dim="(run_number data_tier digitized) minus physical_datastream_name electron+jet" –sam create dataset --defname=dataset_definition_name Note: Through an SBIR (Matthew Vranicar, Piocon + Randolf Herber, FNAL CD) are providing additional features. More reliability using tokenizer (flex), and parser (bison) to check the grammar and do () handling. Also, security, user access control, database resource management are being added.

December 19, 2002Lee Lueking - FNAL/CD

December 19, 2002Lee Lueking - FNAL/CD

December 19, 2002Lee Lueking - FNAL/CD MC_RunJob:What is it? Greg Graham (FNAL/CMS), David Evans (Lancaster University/DZero) Python based work flow planner Metadata language interpreter Flexible and generic Dataset 1 Pkg 1 Pkg 2 Pkg 3 Pkg 2 Pkg 1 Processing Line Phase Boundaries DS 2DS 3 Pkg 3 Dataset 4

December 19, 2002Lee Lueking - FNAL/CD MC_RunJob:Work Flow Planner Chain a set of inputs, processes, and outputs Set up a local execution environment Parallel-ize the job according to the local environment. Produce metadata to describe each processing step. Retrieval/delivery of input/output data. Cluster scale job management (adapting to work with SAM-Grid).

December 19, 2002Lee Lueking - FNAL/CD MC_RunJob:Metadata Language Interpreter Define metadata language:keywords Convert metadata into jobs: Macro Generate metadata for processors Log the metadata for output to declare/store into SAM Metadata Physics Result

December 19, 2002Lee Lueking - FNAL/CD MC_RunJob:Use at DZero MC production at remote farms User MC production on central resources MC SAM storage using keywords Run MC software: Gen => GEANT => Digi => Trigsim. Runs MC Reconstruction Runs MC/Data analyzers: ROOT-tuple/tree makers Plans to use more broadly for data analysis.

December 19, 2002Lee Lueking - FNAL/CD Flexibility OO based python –Basic stuff is pretty generic –Essentially one python class and a list of metadata will allow running an executable in a SAM friendly way. Can complicate it as desired. –SAM gets the metadata definition from runjob, so when you add new features, telling SAM about it is as simple as typing “sam load keywords”.

December 19, 2002Lee Lueking - FNAL/CD For Additional Information Dataset Dimensions: – MC_RunJob: – Monte Carlo Request System: – Other D0 stuff: – SAM/Root: –