Download presentation
Presentation is loading. Please wait.
Published byMargaret O’Neal’ Modified over 9 years ago
1
The SAMGrid Data Handling System Outline: What Is SAMGrid? Use Cases for SAMGrid in Run II Experiments Current Operational Load Stress Testing Future Project Plans Wyatt Merritt (FNAL/CD/DØ) for the SAMGrid Team (Talk created by Adam Lyon)
2
2 What is SAMGrid? Data handling system for Run II DØ and CDF SAMGrid manages file storage (replica catalogs) Data files are stored in tape systems at FNAL and elsewhere (most use ENSTORE at FNAL) Files are cached around the world for fast access SAMGrid manages file delivery Users at FNAL and remote sites retrieve files out of file storage. SAMGrid handles caching for efficiency You don't care about file locations SAMGrid manages file meta-data cataloging SAMGrid DB holds meta-data for each file. You don't need to know the file names to get data SAMGrid manages analysis bookkeeping SAMGrid remembers what files you ran over, what files you processed successfully, what applications you ran, when you ran them and wher
3
3 SAMGrid Terms and Concepts A project runs on a station and requests delivery of a dataset to one or more consumers on that station. Station : Processing power + disk cache + (connection to tape storage) + network access to SAMGrid catalog and other station caches Example: a linux analysis cluster at D0 Dataset : metadata description which is resolved through a catalog query to a list of files. Datasets are named. Examples: (syntax not exact) data_type physics and run_number 78904 and data_tier raw request_id 5879 and data_tier thumbnail Consumer : User application (one or many exe instances) Examples: script to copy files; reconstruction job
4
4 Physics Analysis Use Cases I.Process Simulated Data II.Process Unskimmed Collider Data III.Process Skimmed Collider Data IV.Process Missed/New Data V.Analyze Sets of Individual Events ("Pick Events") Note that there are more use cases for production of simulated data and reconstruction not covered here
5
5 I. Process Simulated Data Look up simulation request with parameters of interest e.g. Request 5874 has using Pythia with m t = 174 GeV/ c 2 Define dataset (via command-line or GUI): request_id 5874 and data_tier thumbnail Submit project to SAMGrid station and submit executable instance(s) to batch system (our tools to make that easy) Consumer is started Station delivers files to executable instance(s) Station marks which files were delivered and consumed successfully and which had errors
6
6 II. Process Unskimmed Collider Data Define dataset by describing files of interest (not listing file names) using command-line or GUI data_tier thumbnail and version p14.06.01 and run_type physics and run_qual_group MUO and run_quality GOOD Submit project to SAMGrid station and submit executable instance(s) to batch system (our tools to make that easy) Consumer is started Station delivers files to executable instance(s) Station marks which files were delivered and consumed successfully and which had errors
7
7 III. Process Skimmed Collider Data Someone (a Physics group, the Common Skimming Group, or an individual) has produced skimmed files They created a dataset that describes these files You can... Submit project/jobs using their dataset name OR Create a new dataset based on theirs and adding additional constraints __set__ DiElectronSkim and run_number 168339 Submission is same as before
8
8 IV. Process Missed/New Data The set of files that satisfy the dataset query at a given time is a snapshot and is remembered with the SAMGrid project information One can make new datasets with: Files that satisfy a dataset but are newer than the snapshot (new since the project ran) Files that should have been processed by the original project but were not delivered or not consumed __set__ myDataSet minus (project_name myProject and consumed_status consumed and consumer lyon)
9
9 V. Analyze Individual Events (Pick Events) Users want to analyze individual events with lower level data tiers (e.g. raw) Event displays Special investigations The SAMGrid catalog keeps information on each raw event (run#, event#, trigger bits, file) Users run a pick events tool that creates a dataset and submits an executable to extract and process these events
10
10 Current SAMGrid Production Configurations Large SMP A 10 TB central cache with 128 attached processors Mostly used for Pick Events Local job submission (not Grid) Linux production farms Reconstruction, Monte Carlo Production, Reprocessing Various cache arrangements: distributed on worker nodes, NFS- mounted from a central head node, routed through a central head node and then distributed Local or Grid (Condor) job submission Linux analysis clusters Analysis jobs Various cache arrangements: distributed on worker nodes, routed through a central head node and then distributed; cached on central node only Local job submission or remote submission (with experiment-specific tools, not Condor)
11
11 SAMGrid Statistics (DØ) Files delivered by month 19992000200120022003 Run II Begins
12
12 SAMGrid Statistics - Usage Data 9000 Projects!233 Different Users! Data from early January 6 until February 24 at DØ
13
13 SAMGrid Statistics - Usage Data ~500K Files! ~1%
14
14 SAMGrid Statistics - Usage Data Raw Thumbnails + … 256 TB! 8.3 Billion Events! Data from early January 6 until February 24 at DØ
15
15 SAMGrid Statistics - Operations Data
16
16 SAMGrid Statistics - Operations Data Time between Request Next File and Open File For CAB and CABSRV1 analysis farms stations 50% of enstore (tape) transfers occur within 10 minutes. 75% within 20 minutes 95% within 1 hour For CENTRAL-ANALYSIS (SMP station) and CLUED0 (desktop cluster station) 95% of enstore (tape) transfers within 10 minutes
17
17 SAMGrid Statistics - Operations Data
18
18 Stress Testing There are many station parameters to tune Maximum parallel transfers Maximum concurrent enstore requests Configuration of cache disks … We're moving away from d0mino to Linux How robust are these linux machines? How many projects can they run? How many concurrent file transfers can they handle? Running test harness on a small cluster to explore SAMGrid parameter space
19
19 SAMGrid Stress Testing max transfers =5max transfers =1
20
20 SAMGrid Stress Testing max transfers =5max transfers =1
21
21 SAMGrid Stress Testing max transfers =5max transfers =1
22
22 SAMGrid Projects Recently Completed: Python & C++ Client API Improved DB Schema Batch system adapters 1 st generation monitoring Active & Future: DB-Middleware improvements 2 nd generation monitoring Improved query language Grid submission Conversion to SRM interface for cache functions
23
23 Summary SAMGrid has been successfully used at DØ for all data handling Over the past year 44 stations consumed (remote = outside fermilab) 3.6 million files (0.46 million files remote) 40 billion events (3 billion events remote) 1.6 Petabytes of data (137 TB of data remote) ~25 million MC events produced remotely with SAMGrid ~90 million events reprocessed remotely with SAMGrid SAMGrid will soon become the data handling system for CDF and later MINOS SAMGrid deployment on grids is contributing improvements to Grid tools & interfaces
24
24 EXTRA SLIDES FOLLOW
25
25 ENSTORE Statistics 0.6 Petabytes in tape storage! Only 5 files unrecoverable (5 GB total; 8ppm loss) !!! One of them was RAW file
26
26 Some SAMGrid buzzwords Dataset Definition A set of requirements to obtain a particular set of files e.g. data_tier thumbnail and run_number 181933 Datasets can change over time More files that satisfy the dataset may be added to SAMGrid Snapshot The files that satisfy a dataset at a particular time (e.g. when you start an analysis job) Snapshots are static Project The running of an executable over files in SAMGrid Consists of the dataset definition, the snapshot from that dataset definition, and application information Bookkeeping data is kept - how many files did you successfully process, where did your job run, how long did it take
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.