Download presentation
Presentation is loading. Please wait.
1
xgrid@MIT: An innovative campus grid prototype
Adam Kocoloski and Mike Miller Massachusetts Institute of Technology STAR collaboration
2
Outline Motivation for (another) campus grid
What is unique about Introduction to Apple’s xgrid system Integrating xgrid into SUMS Results and deliverables Future plans
3
Typical user analyses Run ‘n’ identical processes with different seeds
Analyze ‘n’ files in ‘m’ processes (m<n) /star/data09/reco/productionCentral/FullFie... ... sched _1.list / .csh sched _2.list / .csh <job maxFilesPerProcess="500"> rootMacros/numberOfEventsList.C\ <stdout /> <input etype=daq_reco_mudst" preferStorage="local" nFiles="all"/> toURL="file:/star/u/xxx/scheduler/out/" /> sched _0.list / .csh /star/data09/reco/productionentral/FullFie... /star/data09/reco/productionCentral/FullFie... / star/data09/reco/productionCentral/FullFie... Query/Wildcard resolution <?xml version="1.0" encoding="utf-8" ?> <command>root4star -q -b (\"$FILELIST\"\)</command> URL="file:/star/u/xxx/scheduler/out/$JOBID.out" URL="catalog:star.bnl.gov?production=P02gd,fil <output fromScratch="*.root" </job> Job description test.xml User Input … () … Policy … dispatcher
4
How it all began… Wouldn’t it be great to harvest wasted cycles? But…
Not another batch system No scripting No overhead for setup No admin access to others’ machines No overhead for maintenance Someone said: try that “xgrid” button Apple xgrid: a new HE(N)P grid platform? Single vendor, proprietary, built into OS X 10.4 No requirements on clients and agents Uniquely scalable OSG interface Standard STAR-GRID interface No need to learn xgrid syntax No scripting needed Business as usual A unique campus grid
5
xgrid@MIT layout Dedicated controller @home agents Dedicated agents
Users SUMS Star Unified Meta Scheduler “A front end around evolving technologies for user analysis and data production.” J. Lauret, CHEP 2004 Harvestedagents
6
Apple’s xgrid: the promises
Instant configuration A “clickable” setup Anywhere (that port 4111 is open) Even behind BNL firewall Single sign on capable (KDC) Separate authentication for clients, agents Scalable and stable No hard limit on number of agents Prompt controller auto-recovery from crash Management made easy Once authenticated, agents auto-detected by controller
7
Integrating xgrid into SUMS
% star-submit test.xml % star-submit-template -template tpltJob.xml -entities ptCut=5,year=2006 Abstract user task request into xml Auto generate scripts Test/choose queues Create sandbox Submit tasks{jobs} Resubmit failed jobs Retrieve results Clean jobs from queue Xgrid
8
Growing the grid Challenges: Established: 19 cpu for 42 GHz
No control of apps installed on agents! Jobs run as user ‘nobody’ No static linking under OS X Admin must provide NFS/AFP accessible libraries Grid build recipe Controller cannot retrieve 10GB text file transfer Established: XgridDispatcher integrated into standard SUMS development dedicated controller dedicated NFS/AFP fileservers 1 dedicated agent 4 harvested desktop agents handful of laptops (PPC and Intel!) 19 cpu for 42 GHz tiny fraction of available MIT machines Stability: Only 1 (predictable) admin intervention in 3 months! Once added, agents truly are auto-detected <2 FTE hours/week required
9
Performance and deliverables
33-44 GHz range Currently handful of users But 1/2 day job becomes 1/2 hour Smooth processing of O(100GB) datasets, impossible on laptop weekend meetings Deliverables Subset of 2006 offline calibration for STAR Calorimeter Various suite of simulation studies Extensive analyses for impending publication Data analysis for hep-ex/ , submitted to PRL
10
Future plans Next 2 months By next year
Extended testing of prototype Already integrated Intel architecture Adapt single task xml submission Integrate MIT-IST Apple cluster Testing by string, HEP theorists, Neutrino group Demonstrate OSG capability Port STAR libs to OS X Validate simultaneous SUMS-submission to PDSF, BNL, etc By next year Require single-sign on user authentication Dedicated, scalable backbone x10 in dedicated CPU Xsan + Xserve RAID (5 TB) Harvest MIT Website under development launch Capable of <cpu> = 300 GHz in next 12 months One of largest Apple campus grids Largest HE(N)P Apple grid?
11
Conclusions Marriage of two technologies Prototype
Apple xgrid SUMS user interface For MIT-STAR users: essentially free grid business as usual Prototype All from existing machines No FTE administrator! Stable, immediately useful Unique OSG growth potential Dedicated, I/O capable backbone Harvested, component
12
Backup
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.