Download presentation
Presentation is loading. Please wait.
Published byBrent Freeman Modified over 9 years ago
1
PROOF Farm preparation for Atlas FDR-1 Wensheng Deng, Tadashi Maeno, Sergey Panitkin, Robert Petkus, Ofer Rind, Torre Wenaus, Shuwei Ye BNL
2
Outline Introduction Atlas FDR-1 Farm preparation for FDR1 PROOF tests Analyses Sergey Panitkin
3
S. Rajagopalan, FDR meeting for U.S. FDR: What is it? Provides a realistic test of the computing model from online (SFO) to analysis at Tier-2’s. Exercise the full software infrastructure (CondDB, TAGDB, trigger configuration, simulation with mis-alignments, etc.) using mixed events. Implement the calibration/alignment model. Implement the Data Quality monitoring. Specifics (from D. Charlon, T/P week): Prepare a sample of mixed events that looks like raw data (bytestream) Stream the events from the SFO output at Point 1 Including express and calibration streams Copy to Tier 0 (and replicate to Tier 1’s) Run calibration and DQ procedures on express/calibration stream Bulk processing after 24-48 hours incorporating any new calibrations. Distribute ESD and AOD to Tier-1s (later to Tier 2’s as well) Make TAG and DPDs Distributed Analysis Reprocess data after a certain time.
4
S. Rajagopalan, FDR meeting for U.S. FDR-1 Time Line January: Sample preparation, mixing events Week of Feb. 4: FDR-1 run Stream data through SFOs Transfer to T0, processing of ES and CS. Bulk processing completed by weekend. Including ESD and AOD production Regular shifts: DQ monitoring, Calibration and Tier-0 processing shifts Expert coverage at Tier-1 as well to ensure smooth data transfer. Week of February 11: AOD samples transferred to Tier-1s DPD production at Tier-1. Week of February 18/25: All data samples should be available for subsequent analysis. At some later point: Reprocessing at Tier-1’s and re-production of DPDs. FDR-1 should complete before April and feedback into FDR-2
5
PROOF farm preparation Existing Atlas PROOF farm @BNL was expanded in anticipation of FDR1 10 new nodes each with: 8 CPUs 16 GB RAM 500 GB Hard drive Expect additional 64 GB Solid State Disk (SSD) 1Gb network Standard Atlas software stack Ganglia Monitoring Latest version of root (currently 5.18 as of Jan. 28, 2008) Sergey Panitkin
6
Current Farm Configuration Sergey Panitkin “Old farm” 10 nodes – 4 GB RAM each 40 cores: 1.8 GHz Opterons 20 TB of HDD space (10x4x500 GB) Extension 10 nodes - 16 GB RAM each 80 cores: 2.0 GHz Kentsfields 5 TB of HDD space (10x500 GB) 640 GB SSD space (10x64 GB) +
7
Farm resource distribution issues The new “extension” machines are “CPU heavy”:8 cores, 1 HDD Tests showed that 1 CPU core requires ~ 10MB/s in typical I/O bound Atlas analysis Tests showed 1 SATA HD can sustain ~ 20 MB/s, e.g. ~ 2 cores In order to provide adequate bandwidth for 8 cores per box we needed to augment “extension” machines with SSDs SSDs provide high bandwidth capable of sustaining 8 core load, but have relatively small volume – 64 GB per machine. They will be able to accommodate only a fraction of the expected FDR1 data. Hence, SSD space should be actively managed The exact scheme of data management needs to be worked out The following slides attempt to summarize current discussion about data management with current proof farm configuration Sergey Panitkin
8
New Solid State Disks Model: Mtron MSP-SATA7035064 Capacity 64 GB Average access time ~0.1 ms (typical HD ~10ms) Sustained read ~120MB/s Sustained write ~80 MB/s IOPS (Sequential/ Random) 81,000/18,000 Write endurance >140 years @ 50GB write per day MTBF 1,000,000 hours 7-bit Error Correction Code Sergey Panitkin
9
Farm resource distribution Sergey Panitkin SSD 640GB HDD 5TB HDD 20TB 40 Cores 80 Cores “Old Farm” Extension BNLXRDHDD1 BNLXRDHDD2 BNLXRDSSD
10
Plans for FDR1 and beyond Test data transfer from dCache Direct transfer (xrdcp) via Xrootd door on dCache Two step transfer (dccp-xrdcp) through intermediate storage Integration with Atlas DDM Implement dq2 registration for dataset transfers Gain experience with SSDs Scalability tests with SSDs and regular HDs Choice of optimal PROOF configuration for SSD nodes Data staging mechanism within the farm HD to SSD data transfer SSD space monitoring and management Analysis policies ( free for all, analysis train, subscription, etc) Test “fast Xrootd access” – new I/O mode for Xrootd client Test Xrootd/PROOF federation (geographically distributed) with Wisconsin Organize local user community to analyze FDR data Sergey Panitkin
11
Data Flow I We expect that all the data (AODs, DPDs, TAGS, etc) will first arrive at dCache. We assume that certain subset of the data will be copied from dCache to the PROOF farm for analysis in root. This movement is expected to be done using a set of custom scripts and is initiated by the Xrootd/PROOF farm manager. Scripts will copy datasets using xrdcp via Xrootd door on dCache. Fall back solution exists in case Xrootd door on dCache is unstable. Copied datasets will be registered in DQ2. On the xrootd farm datasets will be stored on HDD space (currentely ~25 TB) Certain high priority datasets will be copied to SSD disks by farm manager for analysis with PROOF Determination of the high priority datasets will be done based on physics analysis priorities (FDR coordinator, PWG, etc) The exact scheme for SSD “subscription” needs to be worked out Subscription, On-demand loading, etc Look at Alice Sergey Panitkin
12
Integration with Atlas DDM Sergey Panitkin /data Xrootd/PROOF Farm dCache Panda xrdcp with dq2 registration /ssd xrdcptentakel DQ2 T0 dq2_ls –fp –s BNLXRDHDD1 “my_dataset” analysis Atlas user Grid transfers
13
FDR tests Batch analyses with Xrootd as data server AOD analysis. Compare speed with dCache – D.Adams, H.Ma Store (all?) TAGS on the farm Our previous tests showed that Athena analyses gain from TAGs stored on Xrootd Use PROOF farm for physics analysis Athena Root Access analysis of AODs using PROOF ARA was demonstrated to run on PROOF in January (Shuwei Ye) Store (all?) FDR1 DPDs on the farm FDR1 DPDs made by H. Ma already copied to the farm DPD based analyses Stephanie Majewski plans to study increase in the sensitivity of an inclusive SUSY search using information from isolated tracks Sergey Panitkin
14
Root version mismatch issues All of datasets for FDR1 will be produced with rel. 13, which relies on root v.5.14 PROOF farm currently uses the latest production version of root -5.18. This version has many improvements in functionality and stability compare to v.5.14. It is recommend by PROOF developers Due to changes in xrootd protocol clients running root v.5.14 cannot work with xrootd/PROOF servers from v.5.18 In order to run ARA analysis on PROOF or utilize it as Xrootd SE for AOD/TAG analysis, the PROOF farm needs to be downgraded to v5.14. Such downgrade will hurt root based analysis of AANT and DnPDs. In principle we can run 2 farms in parallel The old farm with PROOF v.5.14 The extension farm with PROOF v.5.18 The data management scheme described on previous slides can be trivially applied to both farms. This is a temporary solution. Athena is expected to use root v 5.18 in the next release. This will largely remove version mismatch problems Sergey Panitkin
15
Current status Work in progress! File transfer from dCache is functional New LRC was created Files copied to Xrootd are registered in LRC via custom dq2_cr Datasets can be found using DDM tools dq2-list-dataset-replicas user.HongMa.fdr08_run1.0003050.StreamEgamma.merge.AOD.o1_r6_t1.DPD_v130040_V5 INCOMPLETE: BNLPANDA,BNLXRDHDD1 COMPLETE: List of files in a dataset on Xrootd can be obtained via dq2_ls Several FDR1 AOD datasets and one DPD dataset were transferred using this mechanis Issues: Still need better integration with DDM Possible problem with large files transfers via dCache door Sergey Panitkin
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.