The SAMGrid Data Handling System Outline:  What Is SAMGrid?  Use Cases for SAMGrid in Run II Experiments  Current Operational Load  Stress Testing.

Slides:



Advertisements
Similar presentations
GridPP July 2003Stefan StonjekSlide 1 SAM middleware components Stefan Stonjek University of Oxford 7 th GridPP Meeting 02 nd July 2003 Oxford.
Advertisements

CMS Applications Towards Requirements for Data Processing and Analysis on the Open Science Grid Greg Graham FNAL CD/CMS for OSG Deployment 16-Dec-2004.
Amber Boehnlein, FNAL D0 Computing Model and Plans Amber Boehnlein D0 Financial Committee November 18, 2002.
Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
Batch Production and Monte Carlo + CDB work status Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
Oxford Jan 2005 RAL Computing 1 RAL Computing Implementing the computing model: SAM and the Grid Nick West.
F Fermilab Database Experience in Run II Fermilab Run II Database Requirements Online databases are maintained at each experiment and are critical for.
18 Feb 2004Computing Division Project Status Report1 Project Status Report : SAMGrid  SAMGrid Management, Status, Operations – Merritt  SAMGrid Development.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
S. Veseli - SAM Project Status SAMGrid Developments – Part I Siniša Veseli CD/D0CA.
Remote Production and Regional Analysis Centers Iain Bertram 24 May 2002 Draft 1 Lancaster University.
Grid Job and Information Management (JIM) for D0 and CDF Gabriele Garzoglio for the JIM Team.
November 7, 2001Dutch Datagrid SARA 1 DØ Monte Carlo Challenge A HEP Application.
Building a distributed software environment for CDF within the ESLEA framework V. Bartsch, M. Lancaster University College London.
D0 Farms 1 D0 Run II Farms M. Diesburg, B.Alcorn, J.Bakken, T.Dawson, D.Fagan, J.Fromm, K.Genser, L.Giacchetti, D.Holmgren, T.Jones, T.Levshina, L.Lueking,
CDF Grid Status Stefan Stonjek 05-Jul th GridPP meeting / Durham.
D0 SAM – status and needs Plagarized from: D0 Experiment SAM Project Fermilab Computing Division.
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002.
CHEP 2003Stefan Stonjek1 Physics with SAM-Grid Stefan Stonjek University of Oxford CHEP th March 2003 San Diego.
CHEP'07 September D0 data reprocessing on OSG Authors Andrew Baranovski (Fermilab) for B. Abbot, M. Diesburg, G. Garzoglio, T. Kurca, P. Mhashilkar.
SamGrid– A Reality of “Grid” Computing –SamGrid– Adam Lyon (Fermilab Computing Division and DØ Experiment) GridKa School’04 September, 2004 Outline Introduction.
1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
SAMGrid as a Stakeholder of FermiGrid Valeria Bartsch Computing Division Fermilab.
SAM and D0 Grid Computing Igor Terekhov, FNAL/CD.
GridPP18 Glasgow Mar 07 DØ – SAMGrid Where’ve we come from, and where are we going? Evolution of a ‘long’ established plan Gavin Davies Imperial College.
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
ORBMeeting July 11, Outline SAM Overview and Station description Resource Management Station Cache Station Prioritized Fair Share Job Control File.
9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.
Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
22 nd September 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
Dzero MC production on LCG How to live in two worlds (SAM and LCG)
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
16 September GridPP 5 th Collaboration Meeting D0&CDF SAM and The Grid Act I: Grid, Sam and Run II Rick St. Denis – Glasgow University Act II: Sam4CDF.
09/02 ID099-1 September 9, 2002Grid Technology Panel Patrick Dreher Technical Panel Discussion: Progress in Developing a Web Services Data Analysis Grid.
EGEE is a project funded by the European Union under contract IST HEP Use Cases for Grid Computing J. A. Templon Undecided (NIKHEF) Grid Tutorial,
4 March 2004GridPP 9th Collaboration Meeting SAMGrid:JIM and CDF Development CDF Accepts the Need for the Grid –Requirements How to Meet the Need –Status.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Data reprocessing for DZero on the SAM-Grid Gabriele Garzoglio for the SAM-Grid Team Fermilab, Computing Division.
SAM - Sequential Data Access via Metadata Schema Metadata Functionality Workshop Glasgow University April 26-28,2004.
T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.
Lee Lueking 1 The Sequential Access Model for Run II Data Management and Delivery Lee Lueking, Frank Nagy, Heidi Schellman, Igor Terekhov, Julie Trumbo,
GridPP11 Liverpool Sept04 SAMGrid GridPP11 Liverpool Sept 2004 Gavin Davies Imperial College London.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
UTA MC Production Farm & Grid Computing Activities Jae Yu UT Arlington DØRACE Workshop Feb. 12, 2002 UTA DØMC Farm MCFARM Job control and packaging software.
Outline: Status: Report after one month of Plans for the future (Preparing Summer -Fall 2003) (CNAF): Update A. Sidoti, INFN Pisa and.
The ATLAS TAGs Database - Experiences and further developments Elisabeth Vinek, CERN & University of Vienna on behalf of the TAGs developers group.
Analysis Tools at D0 PPDG Analysis Grid Computing Project, CS 11 Caltech Meeting Lee Lueking Femilab Computing Division December 19, 2002.
General SAM Hints and Tips Adam Lyon (FNAL/CD/D0) GCAS Meeting 12/19/02 Removing Special RunsChecking Sam’s Health Cutting on Detector QualitySamRoot DB.
Data Management with SAM at DØ The 2 nd International Workshop on HEP Data Grid Kyunpook National University Daegu, Korea August 22-23, 2003 Lee Lueking.
April 25, 2006Parag Mhashilkar, Fermilab1 Resource Selection in OSG & SAM-On-The-Fly Parag Mhashilkar Fermi National Accelerator Laboratory Condor Week.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
D0 Farms 1 D0 Run II Farms M. Diesburg, B.Alcorn, J.Bakken, R. Brock,T.Dawson, D.Fagan, J.Fromm, K.Genser, L.Giacchetti, D.Holmgren, T.Jones, T.Levshina,
D0 File Replication PPDG SLAC File replication workshop 9/20/00 Vicky White.
Simulation Production System Science Advisory Committee Meeting UW-Madison March 1 st -2 nd 2007 Juan Carlos Díaz Vélez.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
A Data Handling System for Modern and Future Fermilab Experiments Robert Illingworth Fermilab Scientific Computing Division.
Jianming Qian, UM/DØ Software & Computing Where we are now Where we want to go Overview Director’s Review, June 5, 2002.
CDF SAM Deployment Status Doug Benjamin Duke University (for the CDF Data Handling Group)
Sept Wyatt Merritt Run II Computing Review1 Status of SAMGrid / Future Plans for SAMGrid  Brief introduction to SAMGrid  Status and deployments.
Open Science Grid Consortium Storage on Open Science Grid Placing, Using and Retrieving Data on OSG Resources Abhishek Singh Rana OSG Users Meeting July.
A Web Based Job Submission System for a Physics Computing Cluster David Jones IOP Particle Physics 2004 Birmingham 1.
DØ Grid Computing Gavin Davies, Frédéric Villeneuve-Séguier Imperial College London On behalf of the DØ Collaboration and the SAMGrid team The 2007 Europhysics.
Belle II Physics Analysis Center at TIFR
DØ Computing & Analysis Model
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
 YongPyong-High Jan We appreciate that you give an opportunity to have this talk. Our Belle II computing group would like to report on.
DØ MC and Data Processing on the Grid
Lee Lueking D0RACE January 17, 2002
Presentation transcript:

The SAMGrid Data Handling System Outline:  What Is SAMGrid?  Use Cases for SAMGrid in Run II Experiments  Current Operational Load  Stress Testing  Future Project Plans Wyatt Merritt (FNAL/CD/DØ) for the SAMGrid Team (Talk created by Adam Lyon)

2 What is SAMGrid?  Data handling system for Run II DØ and CDF  SAMGrid manages file storage (replica catalogs)  Data files are stored in tape systems at FNAL and elsewhere (most use ENSTORE at FNAL)  Files are cached around the world for fast access  SAMGrid manages file delivery  Users at FNAL and remote sites retrieve files out of file storage. SAMGrid handles caching for efficiency  You don't care about file locations  SAMGrid manages file meta-data cataloging  SAMGrid DB holds meta-data for each file. You don't need to know the file names to get data  SAMGrid manages analysis bookkeeping  SAMGrid remembers what files you ran over, what files you processed successfully, what applications you ran, when you ran them and wher

3 SAMGrid Terms and Concepts  A project runs on a station and requests delivery of a dataset to one or more consumers on that station.  Station : Processing power + disk cache + (connection to tape storage) + network access to SAMGrid catalog and other station caches Example: a linux analysis cluster at D0  Dataset : metadata description which is resolved through a catalog query to a list of files. Datasets are named. Examples: (syntax not exact)  data_type physics and run_number and data_tier raw  request_id 5879 and data_tier thumbnail  Consumer : User application (one or many exe instances) Examples: script to copy files; reconstruction job

4 Physics Analysis Use Cases I.Process Simulated Data II.Process Unskimmed Collider Data III.Process Skimmed Collider Data IV.Process Missed/New Data V.Analyze Sets of Individual Events ("Pick Events")  Note that there are more use cases for production of simulated data and reconstruction not covered here

5 I. Process Simulated Data  Look up simulation request with parameters of interest  e.g. Request 5874 has using Pythia with m t = 174 GeV/ c 2  Define dataset (via command-line or GUI):  request_id 5874 and data_tier thumbnail  Submit project to SAMGrid station and submit executable instance(s) to batch system (our tools to make that easy)  Consumer is started  Station delivers files to executable instance(s)  Station marks which files were delivered and consumed successfully and which had errors

6 II. Process Unskimmed Collider Data  Define dataset by describing files of interest (not listing file names) using command-line or GUI  data_tier thumbnail and version p and run_type physics and run_qual_group MUO and run_quality GOOD  Submit project to SAMGrid station and submit executable instance(s) to batch system (our tools to make that easy)  Consumer is started  Station delivers files to executable instance(s)  Station marks which files were delivered and consumed successfully and which had errors

7 III. Process Skimmed Collider Data  Someone (a Physics group, the Common Skimming Group, or an individual) has produced skimmed files  They created a dataset that describes these files  You can...  Submit project/jobs using their dataset name OR  Create a new dataset based on theirs and adding additional constraints __set__ DiElectronSkim and run_number  Submission is same as before

8 IV. Process Missed/New Data  The set of files that satisfy the dataset query at a given time is a snapshot and is remembered with the SAMGrid project information  One can make new datasets with:  Files that satisfy a dataset but are newer than the snapshot (new since the project ran)  Files that should have been processed by the original project but were not delivered or not consumed __set__ myDataSet minus (project_name myProject and consumed_status consumed and consumer lyon)

9 V. Analyze Individual Events (Pick Events)  Users want to analyze individual events with lower level data tiers (e.g. raw)  Event displays  Special investigations  The SAMGrid catalog keeps information on each raw event (run#, event#, trigger bits, file)  Users run a pick events tool that creates a dataset and submits an executable to extract and process these events

10 Current SAMGrid Production Configurations  Large SMP  A 10 TB central cache with 128 attached processors  Mostly used for Pick Events  Local job submission (not Grid)  Linux production farms  Reconstruction, Monte Carlo Production, Reprocessing  Various cache arrangements: distributed on worker nodes, NFS- mounted from a central head node, routed through a central head node and then distributed  Local or Grid (Condor) job submission  Linux analysis clusters  Analysis jobs  Various cache arrangements: distributed on worker nodes, routed through a central head node and then distributed; cached on central node only  Local job submission or remote submission (with experiment-specific tools, not Condor)

11 SAMGrid Statistics (DØ) Files delivered by month Run II Begins

12 SAMGrid Statistics - Usage Data 9000 Projects!233 Different Users! Data from early January 6 until February 24 at DØ

13 SAMGrid Statistics - Usage Data ~500K Files! ~1%

14 SAMGrid Statistics - Usage Data Raw Thumbnails + … 256 TB! 8.3 Billion Events! Data from early January 6 until February 24 at DØ

15 SAMGrid Statistics - Operations Data

16 SAMGrid Statistics - Operations Data  Time between Request Next File and Open File  For CAB and CABSRV1 analysis farms stations  50% of enstore (tape) transfers occur within 10 minutes.  75% within 20 minutes  95% within 1 hour  For CENTRAL-ANALYSIS (SMP station) and CLUED0 (desktop cluster station)  95% of enstore (tape) transfers within 10 minutes

17 SAMGrid Statistics - Operations Data

18 Stress Testing  There are many station parameters to tune  Maximum parallel transfers  Maximum concurrent enstore requests  Configuration of cache disks  …  We're moving away from d0mino to Linux  How robust are these linux machines?  How many projects can they run?  How many concurrent file transfers can they handle?  Running test harness on a small cluster to explore SAMGrid parameter space

19 SAMGrid Stress Testing max transfers =5max transfers =1

20 SAMGrid Stress Testing max transfers =5max transfers =1

21 SAMGrid Stress Testing max transfers =5max transfers =1

22 SAMGrid Projects  Recently Completed:  Python & C++ Client API  Improved DB Schema  Batch system adapters  1 st generation monitoring  Active & Future:  DB-Middleware improvements  2 nd generation monitoring  Improved query language  Grid submission  Conversion to SRM interface for cache functions

23 Summary  SAMGrid has been successfully used at DØ for all data handling  Over the past year 44 stations consumed (remote = outside fermilab) 3.6 million files (0.46 million files remote) 40 billion events (3 billion events remote) 1.6 Petabytes of data (137 TB of data remote)  ~25 million MC events produced remotely with SAMGrid  ~90 million events reprocessed remotely with SAMGrid  SAMGrid will soon become the data handling system for CDF and later MINOS  SAMGrid deployment on grids is contributing improvements to Grid tools & interfaces

24 EXTRA SLIDES FOLLOW

25 ENSTORE Statistics  0.6 Petabytes in tape storage! Only 5 files unrecoverable (5 GB total; 8ppm loss) !!! One of them was RAW file

26 Some SAMGrid buzzwords  Dataset Definition  A set of requirements to obtain a particular set of files  e.g. data_tier thumbnail and run_number  Datasets can change over time More files that satisfy the dataset may be added to SAMGrid  Snapshot  The files that satisfy a dataset at a particular time (e.g. when you start an analysis job)  Snapshots are static  Project  The running of an executable over files in SAMGrid  Consists of the dataset definition, the snapshot from that dataset definition, and application information  Bookkeeping data is kept - how many files did you successfully process, where did your job run, how long did it take