Michigan Grid Testbed Report Shawn McKee University of Michigan UTA US ATLAS Testbed Meeting April 4, 2002
4/4/2002Shawn McKee - University of Michigan - UTA ATLAS Grid Mtg Michigan Grid Testbed Layout
4/4/2002Shawn McKee - University of Michigan - UTA ATLAS Grid Mtg Grid Machine Details MachineCPUMemoryDiskNetworkOS/kernel atgrid 2 x 800 MHz 1024 MB 4 x 36 GB (Raid5) 100 Mbs 1000 Mbs fiber RH 6.2/ linat01 2 x 450 MHz 512 MB 2 x 9GB 100 Mbs 1000 Mbs fiber RH 6.2/ linat02 2 x 800 MHz 768 MB 18 GB 100 Mbs 1000 Mbs copper RH 6.2/ linat03 2 x 800 MHz 768 MB 4 x 18 GB 100 Mbs 1000 Mbs copper RH 6.2/ linat04 2 x 800 MHz 512 MB 2 x 18 GB 100 Mbs 1000 Mbs copper (being rebuilt) linat05 2 x 550 MHz 512 MB 35 GB 100 Mbs 1000 Mbs fiber (being added) linat06 2 x 800 MHz 768 MB 450 GB (Raid5) 100 Mbs 1000 Mbs fiber RH 7.1/2.4.16
4/4/2002Shawn McKee - University of Michigan - UTA ATLAS Grid Mtg Grid Related Activites at UM Network monitoring and testing Security related tools and configuration Crash-dump testing for Linux Web100 testing MGRID initiative (sent to UM Administration) MJPEG video boxes for videoconferencing UM is now an “unsponsored” NSF SURA Network Middleware Initiative Testbed site Authenticated QoS signaling and testing
4/4/2002Shawn McKee - University of Michigan - UTA ATLAS Grid Mtg Web100 Experience We upgraded many of our nodes kernels to and then applied the Web100 patches (alpha release) goalThe goal is to provide network tuning and debugging info and tools by instrumenting low level code in the TCP stack and kernel Our experience has been mixed: –Nodes with patches crash every ~24-36 hours –Application monitoring tools don’t all work –Difficult to have a non-expert get anything meaningful from the tools Recommendation is to wait for a real release!
4/4/2002Shawn McKee - University of Michigan - UTA ATLAS Grid Mtg Iperf/Network Testing We have been working on automated network testing and monitoring Perl scripts have been used to run Iperf tests from LINAT01 (gatekeeper) to each other testbed sites gatekeeper using Globus. We track UDP/TCP bandwidth, packet loss, jitter, buffer sizes for each “direction” between each pair of sites. Results are recorded by Cricket and are available as plots for various time-frames Problems with Globus job submissions at certain sites, automating restart of Perl scripts and “zombie” processes accumulating…needs better exception handling. We separately use Cricket to monitor: – Round-trip times and packet losses using Ping –Testbed node details (load avg, cpu usage, disk usage, processes) using SNMP –Switch and router statistics using SNMP Long term goal is to deploy hardware: monitors&beacons on testbed.
4/4/2002Shawn McKee - University of Michigan - UTA ATLAS Grid Mtg MGRID (Michigan Grid Research and Infrastructure Development ) Various colleges and units at UM are very interested in grids and grid technology We have proposed formation of an MGRID center, funded by the University Size is to be 3 FTEs plus a director with initial funding for three years The MGRID Center is a cooperative center of faculty and staff from participating units with a central core of technical staff, who together will carry out the grid development and deployment activities at the UM. US ATLAS grids would be a focus of such a center…we should find out about MGRID by July 2002
4/4/2002Shawn McKee - University of Michigan - UTA ATLAS Grid Mtg NMI Testbed Michigan has been selected as an “unsponsored” NMI testbed member. Goals are to: Develop and release a first version of GRIDS and Middleware software Develop security and directory architectures, mechanisms and best practices for campus integration Put in place associated support and training mechanisms Develop partnership agreements with external groups focused on adoption of software Put in place a communication and outreach plan Develop a community repository of NMI software and best practices
4/4/2002Shawn McKee - University of Michigan - UTA ATLAS Grid Mtg NMI GRIDS NMI-GRIDS components: Globus Toolkit 2.0 (Resource discovery and management, authenticated access to and scheduling of distributed resources, coordinated performance of selected distributed resources to function as a dynamically configured "single" resource.) GRAM 1.5 MDS 2.2 GPT v.? GridFTP Condor-G Network Weather Service All services should accept x.509 credentials for authentication and access control. Much the same type of tools we already are using on our testbed
4/4/2002Shawn McKee - University of Michigan - UTA ATLAS Grid Mtg NMI EDIT (Enterprise and Desktop Integration Technologies) NMI-EDIT components: The deliverables anticipated from NMI-EDIT for NMI Release 1 are of four types: 10. Code - Code is being developed, adapted or identified for desktops (e.g. KX.509, openH.323, SIP clients) and for enterprise use (such as Metamerge connectors, Shibboleth modules for Apache, etc.). Code releases are generally clients, modules, plug-ins and connectors, rather than stand-alone executables. 11. Objects - Objects include data and metadata standards for directories, certificates, and for use with applications such as video. Examples include eduPerson and eduOrg objectclasses, S/MIME certificate profiles, video objectclasses, etc. 12. Documents - This includes white papers, conventions and best practices, and formal policies. There is an implied progression in that the basic development of a new core middleware area results in a white paper (scenarios and alternatives) intended to promote an architectural consensus as well as to inform researchers and campuses. The white paper in turn leads to deployments, which require in conventions, best practices and requisite policies. The various core middleware areas being worked within release 1 include PKI, directories, account management, and video. 13. Services - “Within the net” operations are needed to register unique names and keys for organizations, services, etc. Roots and bridges for security and directory activities must be provided.
4/4/2002Shawn McKee - University of Michigan - UTA ATLAS Grid Mtg Authenticated QoS Work We have been working with CITI (Andy Adamson) at UM on issues related to QoS (Quality of Service) criticalThis is a critical issue for grids and any applications which require certain levels of performance from the underlying network A secure signaling protocol has been developed and tested…it is being moved into the GSI (Globus Security Infrastructure) A “Grid Portal” application is planned to provide web based secure access to grids. Our US ATLAS testbed could be a testing ground for such an application, if there is interest.
4/4/2002Shawn McKee - University of Michigan - UTA ATLAS Grid Mtg Network Connectivity Diagram
4/4/2002Shawn McKee - University of Michigan - UTA ATLAS Grid Mtg Future UM Network Layout
4/4/2002Shawn McKee - University of Michigan - UTA ATLAS Grid Mtg Future Additions to the UM Grid We have been working closely with others on campus in grid related activities Tom Hacker(CAC/Visible Human) has asked us to install VDT 1.0 on two different installations on campus with significant compute resources. We hope to test how we can use and access shared resources as part of the US ATLAS grid testbed Primary issue is finding a block of time to complete the install and testing…
4/4/2002Shawn McKee - University of Michigan - UTA ATLAS Grid Mtg Linux Cluster – 100 processor Beawolf cluster equipment donated from Intel Corporation dual 800 Mhz Pentium III, 1GB RAM per node (512 MB per processor) 30 GB hard drive per node, Intel connect is Gigabit Ethernet. Hardware Resources – Arbor Lakes 80 GB NSF Fileserver Node “Master Node” For login, text and job submission Gigabit Ethernet Interconnect 42 TB Tivioli Mass Storage System Via NSF Computation Node Computation Nodes Intel Copper Gigabit Ethernet Adapter 2 processors 1 GB RAM 30 GB Hard drive
4/4/2002Shawn McKee - University of Michigan - UTA ATLAS Grid Mtg AMD Linux Cluster – 100 AMD processor – 2 per node – 1 GB Ram per node (512 MB per processor) – Interconnected with Mgnnect – Redhat Linux – Distributed Architecture Hardware Resources – Media Union
4/4/2002Shawn McKee - University of Michigan - UTA ATLAS Grid Mtg To Do… Install VDT 1.0Install VDT 1.0, first at Arbor Lakes and Media Union, then upgrading our site Get network details at each site documented Start gigabit level testing to selected sites Get crash dumps to actually work Document and provide best practices on WWW site for networking (HENP+NSF?) and grid related software… Determine how to leverage NMI testbed tools for USATLAS…