Download presentation
Presentation is loading. Please wait.
Published byNoel Quinn Modified over 9 years ago
2
4 March 2004GridPP 9th Collaboration Meeting SAMGrid:JIM and CDF Development CDF Accepts the Need for the Grid –Requirements How to Meet the Need –Status of SAMGrid for CDF Rick St. Denis, University of Glasgow
3
4 March 2004GridPP 9th Collaboration Meeting Director’s review, International Finance Committee: 50% computing outside FNAL Maximize physics output @ low Lumi –L3 output rate: 80 -> 360Hz by 06 Spokespersons’ Requirements for CDF CDFGrid supported by FNAL PAC CDF needs the Grid
4
4 March 2004GridPP 9th Collaboration Meeting Scale of CDF Requirements THz%offsiteCPU Speed #duals FY043.725%3GHz150 FY059.050%5GHz+360 FY0616.550%8GHz+220 6-7 sites, 100Duals each, by 2006 + 700 @FNAL
5
4 March 2004GridPP 9th Collaboration Meeting CDF Computing Model Develop Analysis on desktop –Access to all CDF data from anywhere Large scale processing on batch clusters –Submission from anywhere –interactive tools: ls,top,head/tail/cat –Output to scratch space or desktop Implemented Now with CAF
6
4 March 2004GridPP 9th Collaboration Meeting Use Cases for Summer 2004 User Level MC Production –All CDF Users have access –No data on site -> SAM write User Level Data Access –All users have access –Selected samples on site: Full SAM Support SAM Essential for Summer 2004
7
4 March 2004GridPP 9th Collaboration Meeting Medium Term Vision Many Sites Fully transparent submission to all of CDF resources: 75% FNAL, 25% outside Fully transparent input and output of data
8
4 March 2004GridPP 9th Collaboration Meeting Summer 04 Functionality User selects submission site, saying what dataset they will use System checks they can do this (privileges) User access with SAM/dCache User registers output with SAM
9
4 March 2004GridPP 9th Collaboration Meeting October 04 To extend beyond 25% outside computing JIM is essential: JIM Test for CDF June04, production October 04 HOWEVER: It already seems that the 25% resources are not sufficient for the produciton passes: will want JIM earlier.
10
4 March 2004GridPP 9th Collaboration Meeting CAF Gui/CLI CDFGrid from a User Perspective AC++ Grid TorontoKoreaItalyTaiwanFermiCAFUK CAF Gui/CLI CDF Grid from a User Perspective Only Fermilab Uses SAM Outside LabGrid Uses SAM
11
4 March 2004GridPP 9th Collaboration Meeting CDF Grid Strategy 25% of CDF Computing from external resources. All CDF computing on CDF Grid by April 15: Utilize resources fully controlled by CDF: Kerberos/fbsng: dCAF + SAM October 15, 2004: JIM to capture shared resources June 2005: 50% of Computing resources external
12
4 March 2004GridPP 9th Collaboration Meeting Desktop Anywhere Condor Submitter @regional centers SAM DB Condor Matchmaker @FNAL Globus GK CAF Submitter SAM Station @ each site WN Private LAN dCache June 2004 testing June 2005 required Simple JIM
13
4 March 2004GridPP 9th Collaboration Meeting Detailed JIM Site Resource Selector Info Collector Info Gatherer Match Making User Interface Submission Global Job Queue Grid Client Submission User Interface Global DH Services SAM Naming Server SAM Log Server Resource Optimizer SAM DB Server RCMetaData Catalog Bookkeeping Service SAM Stager(s) SAM Station (+other servs) Data Handling Worker Nodes Grid Gateway Local Job Handler (CAF, D0MC, BS,...) JIM Advertise Local Job Handling Cluster AAA Dist.FS Info Manager XML DB server Site Conf. Glob/Loc JID map... Info Providers MDS MSS Cache Site Web Serv Grid Monitoring User Tools Flow of: jobdata meta-data
14
4 March 2004GridPP 9th Collaboration Meeting Meeting the Needs Progress in SAM JIM Status RunJob CDFGridWorkshop: “Nerd’s Paradise” Strict Project Management and process to respond to operational issues
15
4 March 2004GridPP 9th Collaboration Meeting Progress in SAM Dbserver, the database server between applications and Oracle, was upgraded to use a common schema for CDF and D0. All CDF data files are in SAM Sam in is in beta testing on the CDF CAF (1200 cpus): passed 20TB/Day delivery Minos uses SAM for its Data Handling Steve Mrenna (Phenomenology) depositing ALPGEN files in SAM for common CDF/D0 use.
16
4 March 2004GridPP 9th Collaboration Meeting JIM Deployment Issues Focus: 200 jobs each getting 200 files generated 120000 requests simultaneously to the DBServer! –Sensible sam: reliability went to 60%. Now add retries. Training Users D0 has D0Tools: Big script; determines where user is and copies files: harder to get into a sandbox; CAF conditions users! Distribution and compatibility: This has made great strides with SAM, now time for JIM Communication with the expert!
17
4 March 2004GridPP 9th Collaboration Meeting RunJob Dedicated farms at FNAL will go away and RunJob will be used for production processing of data CDF will use RunJob for MC production Dave Evans worked for CDF for 2 mo.: has made CDFRunJob based on RunJob(Shakar), a tool common to CMS. Morag will work on this.
18
4 March 2004GridPP 9th Collaboration Meeting Florida workshop: 11 installations in about 2 hours. Integrated with dCAF in 2 cases in 2 days. 3 in Asia, 4 in Europe 6 sites committed to summer 2004 usage of their facilities for all of CDF (mostly MC) Sam installation now: initsam cdf Follow-up on April 1. Each site has a local user support person to reduce load on core development team. Generally: Security ate 80% of the effort! Now 20!
19
4 March 2004GridPP 9th Collaboration Meeting
20
4 March 2004GridPP 9th Collaboration Meeting Florida Workshop: After 2 Days
21
4 March 2004GridPP 9th Collaboration Meeting 2TB/Day: Karlsruhe
22
4 March 2004GridPP 9th Collaboration Meeting CDF Dcache on CAF ALL CDF on CAF reads 20TB/Day
23
4 March 2004GridPP 9th Collaboration Meeting
24
4 March 2004GridPP 9th Collaboration Meeting
25
4 March 2004GridPP 9th Collaboration Meeting Dcache and SAM Dcache shapes traffic into disk: If a SAM cache is large, need to use Dcache instead of nfs mounts Dcache gives the user what is requested. 1TB gets same priority as 1GB: CDF users must send email requesting data to be staged. SAM examines consumption rate before staging next files – No EMAIL needed. SAM uses Dcache for its Caching at FNAL. This needs further work with SRM
26
4 March 2004GridPP 9th Collaboration Meeting SAMGrid Management Sam Management Team Sam Operations And Projects Sam Design Sam Project Leaders Sam Technical Leaders
27
4 March 2004GridPP 9th Collaboration Meeting SamGrid Development Process SAMGrid Operations/ProjectsIssue Raised SAMGrid Design SAMGrid Management Team Grid Deliverables Subproject Chaired by Technical Managers Chaired by Project Leaders
28
4 March 2004GridPP 9th Collaboration Meeting Subproject Organization Each Subproject has a subproject leader (SPL) responsible for making a plan and reporting progress. Each Subproject has one of the Technical leaders evaluating against an assessment template. No deliverable requires more than 3mo work to deliver.
29
4 March 2004GridPP 9th Collaboration Meeting SubProject Assessment Template 1.Background Documents 2.Project Definition/Mission Statement 3.Deliverables and timetable 4.Inter-project deliverables 5.Project status 6.Challenges and Critical Path Items 7.Lessons Learned 8.Project specific comments, alternate views
30
4 March 2004GridPP 9th Collaboration Meeting Housekeeping SAMGrid Assigned SubProjects JIM:D0Tools Common API Database Server Rewrite Database Servers toLinux Metadata Query with configurable Params Work FlowPackage MCRequest H Stream for CDF JIM:MCD0 Test Harness Retire CDF Replica Catalog Caching Configuration Management HousekeepingMC / Reconstruction Infrastructure User analysis Apps
31
4 March 2004GridPP 9th Collaboration Meeting Status of Assessments Subprojects defined Interviews conducted on about ½ Assessment reports being written
32
4 March 2004GridPP 9th Collaboration Meeting Conclusions CDF has embraced the need for the Grid to achieve its physics mission Progress in deployment, robustness testing has SAM in CDF JIM is rapidly solving its problems … with the help of a review and management process
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.