DØ MC and Data Processing on the Grid

DØ MC and Data Processing on the Grid
Brad Abbott University of Oklahoma D0SAR Sept 21, 2006

Computing at DØ Provide the necessary resources for primary processing of data, reprocessing, fixing, skimming, data analysis, MC production, data handling, data verification… Provide this in a timely manner to allow researchers to analyze data in efficient manner.

Challenges Collecting data at ~ 50 events/sec.
Processing time is ~ 70 GHz-sec event ~ 900 CPU’s on DØ farm running 24/7 to keep up with data Need Millions of Monte Carlo events Store data to tape and allow easy access (SAM) Have ability to reprocess, fix, data in timely manner. Provide computing resources to analyzers

Local Facilities 70 TB of project disk CluedØ/CAB CAB CluedØ
2.2 THz of CPU (comparable to the FNAL production farm) 235TB of SAM Cache More CPU/Disk on order CluedØ An incredible resource by the people for the people! 1+ THz SAM Cache 70 TB (nodes) TB (servers)

Usage What does a typical week look like? ANALYSIS STATIONS
Monday Report August 14, 2006, Typical week Usage What does a typical week look like? ANALYSIS STATIONS data analyzed events projects clued T M fnal-cabsrv T M fnal-cabsrv T M D0 TOTAL T M

Analysis over time Events consumed by station since “the beginning of SAM time” Integrates to 300B events consumed Cabsrv-Blue, red Clued0-grey

Statistics

Current Computing Status
Overall very good. Reconstruction keeping up with data taking. Data handling working well Remote sites for MC, reprocessing, processing, fixing Significant analysis CPU

Future challenges Larger data sets
Luminosities > 200 E 30 Increased sharing of manpower with LHC Reduced manpower for DØ Tight budgets Need to use shared resources

Need to plan on luminosities of 400 E 30
Significantly longer to process Computing resources need to Deal with this Previously Need to plan on luminosities of 400 E 30

DØ computing model Distributed computing, moving toward automated use of common tools on grid Scalable Work with LHC, not against, increased resources Need to conform to standards DØ running experiment and is taking data. Need to take prudent approach to computing SAMgrid

SamGrid SAM: Data Handling SAMGrid: Over 7PB consumed last year
Up to 1 PB/month SAMGrid: JIM: Job submission and monitoring SAM+JIM: SAMGrid 20 native execution sites Automated submission to other grids

Progression on Remote Farms
MC  data reprocessing  processing  skimming* analysis* Facilities: Dedicated farms  shared farm OSG/LCG Automation: Expert  regional farmer  any user* *Not yet implemented

Data Reprocessing on Grid
Reprocessing of data: 1 Billion events (250 TB from raw) SAMGrid as default, using shared resources 3.5 THz for 6 months – Largest such effort in HEP Refixing: 1.4 B events in 6 weeks Used SAMGrid, automated use of LCG,OSG Finished on time. Very successful

Processing on Grid Prefer not to do primary processing on Grid.
Can do processing on a few select sites that have been well certified (Has been shown, Cable swap data processed at OU) Certification of Grid is problematic Do not need to worry about fair-share, availability of nodes etc.

Cable swap data at OU First time that primary processing performed at a remote site for DØ Processed 9463 files Total of GB Events: Took ~ 3 months. Partly since we only had ~70 of the available 270 CPU’s

MC Production resources
All produced offsite MC less stringent, i.e. can always make more Native SAMGrid Producers: CMS-FNAL. Gridka, LTU, LU, MSU, OU(2), SPRACE, TATA, Westgrid, Wuppertal, FZU Non-SAMGrid: Lyon and Nikhef LCG -21 CE’s (10 UK, 6 FR, 3NL, 1 CZ, 1 DE) OSG 8 CE’s ( UNL, IU, Purdue, SPGRID, OCHEP, TOPDAWG, UWM, CMS-FNAL

Monte Carlo More than 250 Million events produced
Up to 10 million events/week LCG and OSG 59% SAMGrid 80.4% Europe 15.7% N. America 3.5% S. America 0.3% Asia

Current plans Reprocessing of Run IIB data needed 300 million events
Takes ~ 80 GHZ-sec/event to process Expect to need ~ 2000 CPUs for 4 months to reprocess data Utilize OSG sites much more extensively SAM v7 (One version of SAM) Plan on beginning in November

Current plans (cont) Overall priority is to reduce manpower needs for midterm and long term by assuring additional functionality is quickly developed. First in SAMGrid mode with rapid transfer to automated forwarding nodes. CAB running as part of Fermigrid Moving full functionality to the forwarding mechanisms Automated production of MC with OSG Sam shifters take over responsibility of submitting jobs Automated submission to use full power of interoperability/grid resources

OSG/LCG

Conclusions DØ computing model very successful
MC and data are continuing to move more toward using Grid resources LCG has been used more heavily in past but soon OSG will be more heavily utilized Remote computing critical for continued success of DØ

DØ MC and Data Processing on the Grid

Similar presentations

Presentation on theme: "DØ MC and Data Processing on the Grid"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

DØ MC and Data Processing on the Grid

Similar presentations

Presentation on theme: "DØ MC and Data Processing on the Grid"— Presentation transcript:

Similar presentations

About project

Feedback