Download presentation
Presentation is loading. Please wait.
Published byPercival Booker Modified over 9 years ago
1
A Regional Analysis Center at the University of Florida Jorge L. Rodriguez University of Florida September 27, 2004 jorge@phys.ufl.edu
2
2 Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida Outline Facility’s Function and Motivation Physical Infrastructure Facility Systems Administration Future Plans
3
3 Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida Facility’s Function & Motivation Operational support for various organization –Experimental High Energy Physics CMS (Compact Muon Solenoid @ the LHC) CDF (Central Detector Facility @ FNAL) CLEO ( e+e- collider experiment @ Cornell) –Computer Science GriPhyN iVDGL –“Friends and Neighbors” US Grid computing projects
4
4 Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida High Energy Physics Activities US-CMS Tier2 Center –Monte Carlo (MC) production Small number of expert and dedicated users Primary consumer of computing resources –Support US-CMS regional analysis community Larger number (~40) of not so expert users Large consumer of disk resources CLEO & CDF analysis and MC simulation –Local CDF activities in recent past –Expect ramp up of local CLEO-C activities
5
5 Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida Computer Science Research Grid computing research projects –iVDGL & Grid3 production cluster(s) Grid3 Production site Provides resource to 100s of users via grid “logins” –GriPhyN & iVDGL Grid development Middleware infrastructure development –grid3dev and other testbeds Middleware and Grid application development –GridCat : http://www.ivdgl.org/gridcathttp://www.ivdgl.org/gridcat –Sphinx : http://www.griphyn.org/sphinxhttp://www.griphyn.org/sphinx –CAVES : http://caves.phys.ufl.eduhttp://caves.phys.ufl.edu Need to test on real cluster environments
6
6 Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida The Primary Challenge All of this needs to be supported with minimal staffing, “cheap” hardware and moderate expertise
7
Physical Infrastructure
8
8 Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida Physical Infrastructure Server Room –Dedicated floor space from dept. –Our feed into campus backbone FLR upgrade to 10 Gbps by 2005 Hardware: Currently 75 servers –Servers: mix of Dual PIII & P4 Xeon –LAN: mix of FastE and GigE –Total of 9.0 TB of Storage, 5.4 TB on dCache 4U dual Xeons fileservers w/dual 3ware RAID controllers… SunEnterprise with FC and RAID enclosures –More storage and servers on order New dedicated analysis farm Additional 4 TB dual Opteron system 10 GigE ready (S2io cards)
9
9 Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida Video Conferencing Two Polycom equipped conference rooms –Polycom ViewStations H.323 and H.262 –Windows XP PCs Access Grid –Broadcast and participate in lectures from our large conference room
10
10 Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida Visitor’s Offices and Services Visitor Work Spaces –6 Windows & Linux Desktops –Expanding visitor workspace Workspaces, printers, LCD projector … Espresso Machine –Cuban Coffee –Lighthearted conversation
11
Facility Systems Administration Design Overview
12
12 Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida Facility Design Considerations Centralization of systems management –A single repository of server, cluster and meta- cluster configuration and description Significantly simplifies maintenance and upgrades Allows for easy resource deployment and reassignments –Organization is a very, very good thing! Support multiple versions of the RedHat dist. –Keep up with HEP experiments expectations –Currently we support RH7.3 and RHEL 3 –The future is Scientific linux (based on RHEL) ?
13
13 Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida Cluster Management Technology We are a pretty much a ROCKS Shop! –ROCKS is a open source Cluster Distribution –Its main function is to deploy an OS on a cluster –ROCKS is layered on top the RedHat dist. –ROCKS is extensible ROCKS provides us with the framework and tools necessary to meet our design requirements simply and efficiently http://www.rocksclusters.org
14
14 Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida ROCKS the 5 Minute Tour ROCKS builds on top RH kickstart technology –A simplistic Kickstart digression A kickstart is a single ASCII script –Lists a single and/or groupings of RPMS to be installed –Proviced staged installation sections %pre, %main %post … Anaconda the installer –Parses and processes kickstart commands and installs the system This is in fact what you interact with when you install RedHat on your desktop
15
15 Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida ROCKS the 5 Minute Tour cont. ROCKS enhances kickstart by –Providing machinery to push installations to servers DHCP: node identity assignment to a specific MAC address https: protocol used to exchange data (kickstart, images, RPM) cgi script: generates kickstart on the fly –ROCKS also provides a kickstart management system kickstart generator parses user defined XML spec. files and combines that with node specific information stored in a MySQL database Complete system description is packaged in components grouped into logical object modules
16
16 Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida ROCKS the 5 Minute Tour cont. The standard ROCKS graph describes a single cluster with service nodes Note: use of “rolls”
17
17 Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida gatekeeper-pg ROCKS’ and MySQL/XML frontend gatekeeper comput e gaterkeeper-dgt compute-pg compute-dgt uflorida-frontend grinux01 grinux40 ufloridaPG ufloridaDGT grinuxN grinuxM ufgrid01 gatekeeper-grid01 compute-grid01 XML graphs: appliances are like the classes Global Variables, MAC address, node names, distribution, membership, appliances… Physical servers: are the objects ROCKS’ MySQL DataBase
18
Facility Systems Administration Implementation
19
19 Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida ROCKS at UFlorida Current installation is based on ROCKS 3.2.0 Basic graph architecture modified to meet our requirements –Single frontend manages multiple clusters and service nodes –Support dual distributions RH7.3 & RHEL 3 –Direct interaction with ROCKS MySQL database –Extensive modification to xml trees The XML tree is based on an older ROCKS vers. 2.3.2 Our own “uflorida” graphs, one for each distribution Many changes & additions to the stnd XMLs
20
20 Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida The uflorida XML graphs uflorida.xml for RHEL 3 servers uflorida.xml for RH7.3 servers
21
21 Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida grid users only and limited access from WAN The Servers RHEL 3 RH7.3 uflorida-frontend gatoraid1 gatoraid2 grinuxN+1 grinuxM grinux41 grinuxN grinux01 grinux40 nfs-homes ufdcache alachua archermicanopy ufgrid01 ufgrid03 ufgrid02 ufgrid04 GriPhyN ufloridaDGT ufloridaPG WAN all users and limited access from WAN private LAN No user log in and very limited or no access from the WAN
22
22 Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida private LAN uflorida-frontend gatoraid1 gatoraid2 grinuxN+1 grinuxM grinux41 grinuxN grinux01 grinux40 nfs-homes ufdcache alachua archermicanopy ufgrid01 ufgrid03 ufgrid02 ufgrid04 GriPhyN ufloridaDGT ufloridaPG WAN The Services RHEL 3 RH7.3 The big kahuna: Our stripped down version of the ROCKS frontend ROCKS administration node RPM server ROCKS DB server Kickstart generation DHCP server Admin nfs server etc No users, no logins, no home … Strict firewall rules Primary DNS server NFS servers Users home area Users data Grid users User Interface machines Grid User login Access to grid via condorG GriPhyn & Analysis Servers User login machines User environment Other Services cvs pserver webserver
23
23 Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida private LAN Batch System Services RHEL 3 RH7.3 ufdcache alachua archermicanopy ufgrid01 ufgrid03 ufgrid02 ufgrid04 GriPhyN ufloridaDGT ufloridaPG WAN uflorida-frontend gatoraid1 gatoraid2 grinuxN+1 grinuxM grinux41 grinuxN grinux01 grinux40 nfs-homes gatekeeper / master nodes Grid access only GRAM, GSIFTP… Other services minimum
24
24 Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida private LAN dCache with SRM: Virtual file system (pnfs) with cache’ing ufdcache alachua archermicanopy ufgrid01 ufgrid03 ufgrid02 ufgrid04 GriPhyN ufloridaDGT ufloridaPG WAN dCache Disk Storage Service RHEL 3 RH7.3 dCache pool nodes 40 x 50 GB partitions uflorida-frontend gatoraid1 gatoraid2 grinuxN+1 grinuxM grinux41 grinuxN grinux01 grinux40 nfs-homes dCache Administrator “admin door” SRM & dCache webserver RAID fileserver Entire 2 TBs of disk on dCache
25
25 Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida The uflorida XML graphs RHEL 3 graph RH 7.3 graph The production cluster with dCache pool compute nodes The analysis cluster NIS server and clients analysis environment
26
26 Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida UFlorida Website Currently just the default ROCKS website. Ours is accessible only from.ufl.edu domain the ganglia pages
27
Status and Plans
28
28 Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida Facility’s Status Participated in all major CMS data challenges –Since Fall of 2001: major US contributor of events Since end of 2002: GRID only MC production –Contributed to CDF analysis UFlorida Group: 4 faculty & staff and 2-3 students Recently re-deployed infrastructure –Motivated mostly by security concerns –Added new clusters for additional user communities Ram up to support full USCMS Tier2 center
29
29 Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida Facility’s Future Plan to re deploy with ROCKS 3.3.x –Recast our own 2.3.2 based XMLs as real 3.3.x –Provide access to new features Recast our tweaked and tuned services terms of Rolls Make use of new ROCKS on the WAN Improve collaboration with our peers –With US-CMS FNAL (Tier1) –State wide FIU, FSU (Tier3) and campus wide HPC –Other Tier2 centers US CMS & Atlas
30
30 Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida Facility’s Future New hardware is always comming –New dedicated analysis and files servers on order 6 new dual Xeon based servers 8.0 TB of disk –New Opteron systems on order Participate in SC ‘04 bandwidth challenge CIT to JAX Connect to new FLR 10GigE network equipment –Official Tier2 center will bring new hardware Approximately 120 servers Additional 200 TB of storage
31
31 Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida Summary We have successfully built, from scratch, a computing facility at the University of Florida –Support HEP experiments (LHC, FNAL …) –Support Computer Science activities –Expect a much more active analysis/user community Infrastructure designed to support an large increase in hardware in support of a larger community of users in anticipation of LHC turn on and beyond
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.