- Iain Bertram R-GMA and DØ Iain Bertram RAL 13 May 2004 Thanks to Jeff Templon at Nikhef
- Iain Bertram Background DØ uses SAM as its Datagrid –( All official MC production carried out off-site –I.e. not at FNAL –Store in SAM Carried out significant fraction of data reprocessing off-site –Access and store data in SAM
- Iain Bertram DØ and EDG/LCG Nikhef group have implemented submission of DØ jobs on LCG –MC production –Data reconstruction –Notes from Jeff Templon. caveat: Jeff is the expert. I am not! Therefore I may have trouble answering questions (my technical experts are at the 4 corners of the globe…).
- Iain Bertram Monitoring using RGMA From within python script: –worker_node = socket.getfqdn() site = worker_node[string.find(worker_node,'.')+1:] jstabl.set_val('site',site) jstabl.set_val('start_time',start_time) cmdline = string.join(sys.argv) jstabl.set_val('command',cmdline) jstabl.insert() Under the hood: R-GMA (EDG product) Can easily replace as long as don’t require more than “set_val” and “insert” … R-GMA has SQL like structure
- Iain Bertram J. Templon Comments It was useful not to worry about details of where servers, you Commands such as – "DEFINE TABLE" and "INSERT" or "LATEST SELECT". –R-GMA looked like a giant distributed database. The SQL model worked well for what we wanted to do. The down side is that the archiver process is not ready for prime time. –It never stays up for more than a few days at a time, and it often dies in a way that fools the babysitting script into thinking that it is still alive. –This of course is deadly. (the thing that sucks in the published records from jobs, and puts them in a database)
- Iain Bertram LCG/EDG Problems Single Storage Machine => bottleneck –“WP5” SEs –Traffic Jams R-GMA not really stable until end December –Couldn’t submit jobs – Missed monitoring records Software distribution reliable but extremely inefficient Poor submission command throughput
- Iain Bertram Plans All MC and data production will be running on SAM computational grid by summer –MC by June 1 –Data reprocessing scheduled for later this year. –FNAL DØ farm will move to SAM-grid. Plan to support interfaces to LCG for this processing –Runjob will interface directly to LCG
- Iain Bertram Needs Database Proxy Servers –Need to access trigger/calibration issues –Oracle database The DB proxy design is in principle generic being based on CORBA (Common Request Broker Architecture) which wraps the sql queries. A two-stage cache is used: RAM and disk space of which the size is configurable, e.g. the cache sizes we currently have configured are in the order of a couple GBs. Interface between SE and SAM? –Can store our files directly to SAM from LCG site