Download presentation
Presentation is loading. Please wait.
1
ATLAS on Grid3/OSG R. Gardner December 16, 2004
2
ATLAS Applications Pythia Generation Geant4 simulation Pileup
Digitization Reconstruction
3
ATLAS Users DC2 production team User production Managed production
High priority 7 users User production Opportunistic production and reconstruction 3 users growing
4
ATLAS DC2 on Grid3 Production statistics on Grid3 (End of November 2004) Overall “success” rate: 74% Through September: 66% During last 2 months: finished: failed: success rate: 78%. We improved our results since (September) Only 2-3 submit-clients now (10-20 in September ) # Job status Capone Total 1 failed 33165 2 finished 90534 3 running 101 4 submitted 42
5
Job Success Rate on GRID3
Passed Failed Success Rate July 8799 6676 57% August 17083 9448 64% September 17283 7717 69% October 26600 5186 84% November 21869 5038 81% Key factors in improved success rate: Experienced team using common submit hosts Quicker response to large scale site/network/hardware failures Can we improve more? Some shifts >95% success, others <50% Automatic throttle for failures? But still lose all running jobs Do we care? K. De + improvements to Capone/GCE
6
ATLAS ProdDB 1 BU_ATLAS_Tier2 19395 16349 3046 84.29 2 UTA_dpcc 19214
# CE Gatekeeper Finished+Failed Jobs Finished Jobs Failed Success Rate (%) 1 BU_ATLAS_Tier2 19395 16349 3046 84.29 2 UTA_dpcc 19214 14634 4580 76.16 3 UC_ATLAS_Tier2 13285 11196 2089 84.28 4 BNL_ATLAS 11261 8993 2268 79.86 5 IU_ATLAS_Tier2 10528 8403 2125 79.82 6 UM_ATLAS 9434 6054 3380 64.17 7 BNL_ATLAS_BAK 6061 4578 1483 75.53 8 UBuffalo_CCR 4654 3992 662 85.78 9 PDSF 5075 3590 1485 70.74 10 FNAL_CMS 3857 2222 1635 57.61 11 CalTech_PG 3136 2178 958 69.45 12 UCSanDiego_PG 2828 2101 727 74.29 13 FNAL_CMS2 2157 1506 651 69.82 14 SMU_Physics_Cluster 1462 969 493 66.28 15 BU_AGT_Tier2 975 820 155 84.10 16 PSU_Grid3 769 583 186 75.81 17 OU_OSCER 843 575 268 68.21 18 UFlorida_PG 946 451 495 47.67 19 Rice_Grid3 569 370 199 65.03 20 UWMadison 803 363 440 45.21 21 UNM_HPC 502 347 69.12 22 OU_OSCER_LSF 412 251 161 60.92 ATLAS ProdDB
7
Detailed Job Failures (un-normalized)
Total, till Nov. Total, till Sep. Last 2 months Submission 894 472 422 Execution 428 Post Run 10131 1147 8984 Stage-Out 10833 8037 2796 RLS 1065 989 76 Capone 3975 2725 1250 Windmill 564 57 507 Other 5225 5139 86 TOTAL 33165 19303 13862
8
Status of GRID3 Jobs evgen simul digi pile-up Done % dc B1_jets_180 100 100% 19998 11899 60% 14833 74% dc A9_susy 400 11409 71% 7992 50% dc J1_Pt_17_35 2 dc J2_Pt_35_70 dc J3_Pt_70_140 dc J4_Pt_140_280 dc J5_Pt_280_560 dc J6_Pt_560_1120 dc J7_Pt_1120_2240 1 200 dc J8_Pt_2240 dc B2_gamjet 4000 3990 dc B3_Bmumu 4300 86% 0% dc B4_jets17 9606 96% To Do – extra A9 simulation, some digitization and some B1 pile-up Note – also waiting for some B3 and B4 input evgen files from LCG K. De
9
ATLAS historical use ACDC archive
10
ATLAS Jobs by site ACDC archive
12
Grid3OSG Resource Availability
ATLAS expects to be running continuous production starting now throughout 2005 This activity consists of: Completion of DC2 Production for the Rome physics workshop in June User production via Capone clients Distributed analysis via ADA Expect trend towards resource saturation to continue as more users are equipped with job submission tools
13
Some OSG Issues Managed storage is now the biggest problem facing continued DC2 production for both access and space management Authorization role based, access rights, queue priorities policy infrastructure, publication Accounting service user-level what resources have been used cpu, storage over an arbitrary time period Operations – extend operations protocol between BNL Tier1 and iGOC/OSG operations activity
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.