Download presentation
Presentation is loading. Please wait.
Published byRuth Weaver Modified over 9 years ago
1
Adapting SAM for CDF Gabriele Garzoglio Fermilab/CD/CCF/MAP CHEP 2003
2
Overview SAM for CDF: why ? Goals of the pilot project The path (and pitfalls...) to integration (summer 02) Current status and future vision
3
History End 2001. Pre-Pilot Project: UK starts an evaluation of the needs of CDF for distributed computing: the Grid and SAM April 2002. Pilot Project: adaptation of SAM for CDF August 2002. Pilot SAM Deployment complete: move toward deployment of SAM for the collaboration
4
Why considering SAM ? Collaborating institutions can provide local computing resources to process data. An example: estimated size of the datasets UK institutes wish to access PhysicsPhysics Trigger SetNo. EvtsSecondary Data Size (GB) B Lifetime B oscillations CP violation Central J/Psi Displaced Vertex 12 M 1,000 M 1,200 100,000 W/Z Higgs SUSY B Physics High Pt leptons Inclusive electrons Inclusive muons 2 M 50 M 14 M 200 5,000 1,400 SUSY Calibrations High Et photons58 M5,800 HiggsZ 0 ---> b bbar6 M600 SAM was/is being actively developed and integrated with Grid middleware (SAM-Grid)
5
What SAM Provides SAM is a Data Handling System (project started in 1997) used in production by DZero (see Lee Lueking’s talk) Main characteristics –data movement and caching –meta-data catalogue –bookkeeping of analysis projects –set of tools for users and administrators
6
Highlights of the mapping between DFC and SAM Problem: how do we map the CDF (DFC) and DZero (SAM) views of the data ? DFC: files organized in Datasets which contain Filesets SAM: provides virtual files + metadata parameters (datastreams, data tiers, applications, …) DFC has the concept of Books to implement group/user-specific metadata and resource management. The complete mapping between the DFC and SAM on CDF note 6169.
7
Goals of the pilot phase (by summer 2002) 1.Supporting 5 “remote” groups to do data analysis 2.Enabling access to datasets of interest: read access to secondary data + read/write to higher order data 3.Production quality availability of the system: key machines maintained 24x7 4.Controllable limited impact on the CDF Mass Storage System (Enstore)
8
Areas of work Designing and implementing an architecture Adding /Adjusting features to SAM Design / Load CDF SAM Database Enable CDF clients to access SAM Installation / Configuration Group Coordination during development Support/Shifter Organization
9
Architecture this architecture was designed with the goals in mind. 2 MSS: –CDFen: 1-ary/2-ary data via DCache –STKen: higher order; 5TB tapes
10
Architecture 1 routing station for >2-ary data: –1TB disk cache –1GBs connectivity 1 FNAL analysis station: –dual 1GHz Pentium –160 GB Disk.
11
Architecture SAM services on a sun machine (like DZero) supporting development/ integration/ production DB. PC were the offline CDF Oracle machines.
12
Feature adjustment Integration of SAM with DCache: SAM transports files to local caches from the weakly authenticated ftp door of DCache Enable Enstore “discipline” module to limit access to SAM (see Don Petravick’s talk for more on FNAL MSS) Enable direct SAM station to station file transfer: implement better file routing in SAM
13
Database Use development/integration/production databases for schema evolution Periodically read data from the DFC and translate them to the SAM schema: a java based program, Predator, runs every 3 hours to keep SAM up-to- date Loading the db using the SAM interface served as proof of principle but turned out to be too slow
14
Integration with the CDF analysis framework: AC++ AC++ does not use exception handling SAM was developed for DZero, whose analysis framework support exceptions SAM communication is based on CORBA and the idl interfaces used exception To manage communication, AC++ I/O modules fork a CDF Project Protocol Converter (CPPC). CPPC is a finite state machine that communicates with SAM via CORBA and with AC++ via pipes CPPC was generalized to allow communication with multiple processes The UI required a dataset definition + project name
15
Installation and Configuration Version management resulted to be a sensitive issue DZero used to maintain the current cooperating product versions tagging them as “current” in the UPS/UPD repository To promote independency of software upgrades, CDF needed a different way of tagging consistent cooperating versions Cooperating versions were hardcoded in SAM installation script.
16
Current status Today, CDF uses SAM for physics at Oxford and Karlsruhe Currently Testing SAM on the FNAL CAF (see Frank Wuerthwein’s talk) Next step is a tighter integration of SAM with DCache.
17
Future vision: the SAM-Grid The integration of SAM with Grid middleware to enable Job handling and Information Monitoring (JIM) was demonstrated at SC2002 in November (see Igor Terekhov talk on JIM; Fedor Ratnikov’s talks on JIM and DCAF; Stefan Stonjek’s talk on SAM-Grid at sc2002) SAM was integrated with CDF DCAF and deployed at ~10 sites around the world The production version of SAM-Grid is going to be deployed for DZero in April, we look forward to CDF
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.