Presentation is loading. Please wait.

Presentation is loading. Please wait.

Analysis Tools at D0 PPDG Analysis Grid Computing Project, CS 11 Caltech Meeting Lee Lueking Femilab Computing Division December 19, 2002.

Similar presentations


Presentation on theme: "Analysis Tools at D0 PPDG Analysis Grid Computing Project, CS 11 Caltech Meeting Lee Lueking Femilab Computing Division December 19, 2002."— Presentation transcript:

1 Analysis Tools at D0 PPDG Analysis Grid Computing Project, CS 11 Caltech Meeting Lee Lueking Femilab Computing Division December 19, 2002

2 Lee Lueking - FNAL/CD DZero Analysis Tools  SAM dataset tools –Database Schema –Datasets and Query Dimensions  MC RunJob Monte Carlo request system –Physics Groups submit web based requests for MC –Provides prioritization, work assignment, and tracking. D0 Framework, D0Tools, D0 Run Time Environment (RTE) ROOT –Using ROOT non-intrusive I/O for Dzero EDM (Event Data Model) –ROOT/SAM and ROOT/SAM-Grid (Philippe Canal, up next) More in this talk

3 December 19, 2002Lee Lueking - FNAL/CD SAM Simplified Database Schema Files ID Name Format Size # Events Files ID Name Format Size # Events Events ID Event Number Trigger L1 Trigger L2 Trigger L3 Off-line Filter Thumbnail Events ID Event Number Trigger L1 Trigger L2 Trigger L3 Off-line Filter Thumbnail Volume Project Data Tier Physical Data Stream Physical Data Stream Trigger Configuration Trigger Configuration Creation & Processing Info Creation & Processing Info Run Event-File Catalog Event-File Catalog Run Conditions Luminosity Calibration Trigger DB Alignment Run Conditions Luminosity Calibration Trigger DB Alignment Group and User information Group and User information Station Config. & Cache info Station Config. & Cache info File Storage Locations File Storage Locations MC Process & Decay MC Process & Decay SAM schema has over 100 tables There are several other related tablespaces also available

4 December 19, 2002Lee Lueking - FNAL/CD fileRunEventDateTriggerApoApp vsn… file1 file2 file3 file4 file5 filen Challenge: Transform the complex SAM schema into a form that is user friendly, and avoids badly formed user SQL queries. Solution: Transform the schema to look like one giant table. Dimension Name DataFile

5 December 19, 2002Lee Lueking - FNAL/CD Dimensions, Links, and Chains Matthew Vranicar (Piocon Technologies), SAM team This transformation is done using what we call links and chains. A link is a description of how to relate two tables A chain is a set of links that connects the desired dimension (column in a table) to the datafile. These are stored in the database itself, and loaded in the middle tier server when it starts up. A grammar is provided for users to build complex queries employing dimensions.

6 December 19, 2002Lee Lueking - FNAL/CD Dimensions:Examples There are dozens of dimensions available. Additional dimensions are easily defined. Examples of dimensions defined: –APPL_NAME, APPL_NAME_ANALYZED, CONSUMED_DATE, CONSUMED_STATUS, CONSUMER, CONSUMER_GROUP, CONSUMER_ID, CREATE_DATE, DATASET_DEF_ID, DATASET_DEF_NAME, DATASET_ID, DATASET_VERSION, DATA_FILE_LOCATION_STATUS, DATA_TIER, DATA_TIER_ANALYZED, DELIVERED_STATUS, EVENT_NUMBER, FAMILY, FAMILY_ANALYZED, FILE_ANALYZED, FILE_NAME, FILE_PARTITION, FILE_STATUS, FULL_PATH, LOGICAL_DATASTREAM_NAME, PARAM_TYPE, RUN_ID, RUN_NUMBER, RUN_QUALITY, VERSION, VERSION_ANALYZED, WORK_GRP_NAME, etc., etc., etc. __SET__ : Special dimension allowing you to include an existing dataset definition.

7 December 19, 2002Lee Lueking - FNAL/CD Query Syntax and Grammar Constraint operators: =, !=, >, =, <=, like, not like, in, not in, between, is null, is not null Sets operators: and, or, minus, (union, intersection to be added) syntax: --dim="[(]name [conOper] value [setOper name [conOper] value][)]..." Command line examples: –sam define dataset --defname=dataset_definition_name -- group=work_group_name --dim="(run_number 100930 data_tier digitized) minus physical_datastream_name electron+jet" –sam create dataset --defname=dataset_definition_name Note: Through an SBIR (Matthew Vranicar, Piocon + Randolf Herber, FNAL CD) are providing additional features. More reliability using tokenizer (flex), and parser (bison) to check the grammar and do () handling. Also, security, user access control, database resource management are being added.

8 December 19, 2002Lee Lueking - FNAL/CD

9 December 19, 2002Lee Lueking - FNAL/CD

10 December 19, 2002Lee Lueking - FNAL/CD MC_RunJob:What is it? Greg Graham (FNAL/CMS), David Evans (Lancaster University/DZero) Python based work flow planner Metadata language interpreter Flexible and generic Dataset 1 Pkg 1 Pkg 2 Pkg 3 Pkg 2 Pkg 1 Processing Line Phase Boundaries DS 2DS 3 Pkg 3 Dataset 4

11 December 19, 2002Lee Lueking - FNAL/CD MC_RunJob:Work Flow Planner Chain a set of inputs, processes, and outputs Set up a local execution environment Parallel-ize the job according to the local environment. Produce metadata to describe each processing step. Retrieval/delivery of input/output data. Cluster scale job management (adapting to work with SAM-Grid).

12 December 19, 2002Lee Lueking - FNAL/CD MC_RunJob:Metadata Language Interpreter Define metadata language:keywords Convert metadata into jobs: Macro Generate metadata for processors Log the metadata for output to declare/store into SAM Metadata Physics Result

13 December 19, 2002Lee Lueking - FNAL/CD MC_RunJob:Use at DZero MC production at remote farms User MC production on central resources MC SAM storage using keywords Run MC software: Gen => GEANT => Digi => Trigsim. Runs MC Reconstruction Runs MC/Data analyzers: ROOT-tuple/tree makers Plans to use more broadly for data analysis.

14 December 19, 2002Lee Lueking - FNAL/CD Flexibility OO based python –Basic stuff is pretty generic –Essentially one python class and a list of metadata will allow running an executable in a SAM friendly way. Can complicate it as desired. –SAM gets the metadata definition from runjob, so when you add new features, telling SAM about it is as simple as typing “sam load keywords”.

15 December 19, 2002Lee Lueking - FNAL/CD For Additional Information Dataset Dimensions: –http://d0db.fnal.gov/sam_project_editorhttp://d0db.fnal.gov/sam_project_editor MC_RunJob: –http://www-clued0.fnal.gov/mc_runjob/mainframe.htmlhttp://www-clued0.fnal.gov/mc_runjob/mainframe.html Monte Carlo Request System: –http://www-d0.fnal.gov/computing/mcprod/mcc.htmlhttp://www-d0.fnal.gov/computing/mcprod/mcc.html Other D0 stuff: –http://www-d0.fnal.gov/atworkhttp://www-d0.fnal.gov/atwork SAM/Root: –http://d0db.fnal.gov/sam/doc/userdocs/SamRoot.htmlhttp://d0db.fnal.gov/sam/doc/userdocs/SamRoot.html


Download ppt "Analysis Tools at D0 PPDG Analysis Grid Computing Project, CS 11 Caltech Meeting Lee Lueking Femilab Computing Division December 19, 2002."

Similar presentations


Ads by Google