Download presentation
Presentation is loading. Please wait.
Published byJoleen Atkins Modified over 8 years ago
1
Environmental Monitoring and Alerting for Computing Room Facilities Wednesday, November 17, 2004 9:00 am – 10:00 am Gerry Bellendir, Jack MacNerland, David Ritchie, and Mark Thomas
2
Agenda FCC New Muon -> LCC HDCF -> GCC Futures Vulnerabilities Discussion, Questions, etc.
3
FCC Presented by Jack MacNerland Smoke detection Sprinklers Under Floor Fire Supression Tape robot fire suppression Power Logic Electrical Panel Monitoring Security at FCC
4
FCC (cont’d) Presented by Mark Thomas Firus –New developments –Installed FIRUS Terminal in OPS Office so can monitor chillers at New Muon. –Set up page to show critical info for FCC, New Muon, HDCF, and Casey’s Pond. –Com Center monitors night; FESS monitors day; –CD/OPS monitors also.
5
FCC (cont’d) Presented by David Ritchie Metasys –UPS and Generator Monitoring and Alerting via Metasys –Current and future Status (see Appendix A) Other Monitoring –CSS (Stan Naymola): Two types… lm_sensors. –Can shutdown systems that are hot. –Self-contained, works independently of any other system. –If >50% of the nodes are down, it notifies. –single nodes that turn themselves off - recorded in logs for investigation. –CSS (cont’d): Independent temperature monitor located in the top of a rack. –Recorded in ganglia as record of room temperature. –Emails when temp crosses highs and lows. –Does not page. –CDF (Glenn Cooper): CDF nodes just have straight lm_sensors, uses the RPM put together by the Farms group.
6
New Muon -> LCC Presented by Jack MacNerland Smoke detection Sprinklers Under Floor Fire Suppression Security (Pegasys)
7
New Muon (cont’d) Presented by Mark Thomas Firus –The usual fire protection system –Chillers
8
New Muon (cont’d) Presented by David Ritchie Metasys – See Appendix A. Other – CDF - see above lm_sensors discussion Other – Lattice QCD (Don Holmgren)… –Omega temp. box In use for a couple of years Alarms on high/low temperature, dry contact input. Only Notification: dial out to 4 phone number rotation until acknowledged. Currently: Call Center, Amitoj's office number, DH office number. –Discussion… Vulnerability: Omega box not able to reach someone (pre-call-center, post-operator-exit) Addressed with Netbotz unit –connects to the network, can send e-mail, push files via FTP, and serve data via HTTP. – has "last call" pager when power loss. Have not switched to the Netbotz for notifying the call center; Still use Omega box. Have the Netbotz unit set to send e-mail to lqcd principals on various alarms. Also have trend plots and live web page… http://lqcd.fnal.gov/cgi-bin/netbotz http://netbotz.fnal.gov/ http://lqcd.fnal.gov/cgi-bin/netbotz http://netbotz.fnal.gov/ Other – Lattice QCD ( cont’d) –IPMI Reads out cpu and system temperatures, fans. Includes vendor-specified thresholds. –When a sufficient number of nodes are over temperature, we automatically declare an alarm and shutdown… »Batch queues, »Operating systems, and »Power off the nodes via IPMI. –Independently, the Netbotz and Omega boxes can trigger an alarm which causes the LQCD and/or ISA groups to manually initiate shutdowns if necessary. We maintain trend plots for all measured quantities, and have automated mailings listing nodes with bad fans and/or high temperatures. The trend plots are available by clicking on the vertical bars on: http://lqcd.fnal.gov/cgi-bin/stat?health=all or via individual nodes, http://lqcd.fnal.gov/cgi-bin/stat?health=qcd0102 http://lqcd.fnal.gov/cgi- bin/stat?health=MRTG=qcd0102 http://lqcd.fnal.gov/cgi-bin/stat?health=all http://lqcd.fnal.gov/cgi-bin/stat?health=qcd0102 http://lqcd.fnal.gov/cgi- bin/stat?health=MRTG=qcd0102
9
HDCF -> GCC Presented by Jack MacNerland Smoke detection Sprinklers Under Floor Fire Suppression Security at GCC (Pegasys) Presented by Mark Thomas Firus UPS Monitoring and Alerting via Metasys –Connection under development – See Appendix A.
10
HDCF -> GCC (cont’d) Presented by David Ritchie Other – lm_sensors (see above) Other – auto-shutdown when UPS goes to batteries. –Zonatherm / Liebert have automatic shutdown capability may be acceptable to shut down the PCs in GCC upon the UPS going to batteries Involves: –Agent PC running Liebert-provided software which senses UPS dry contacts status. –Software (SNMP) notifying, IP-by-IP address, each PC that it should shutdown. –Cost ~$5,000. –Outstanding issues Must hand-installed s/w in all ~1400 PCs and Must manually enter 1400 IP addresses –Liebert seems interested in joint effort.
11
Other Matters Futures –Facilities Environmental Event Notification Scheme –Next Generation Metasys Vulnerabilities –FCC has loss of Casey's Pond Water or anything in that causality chain as its main vulnerability (JM) –New Muon has loss of electrical and/or loss of water as its primary vulnerability (age?, ownership?) (JM/DR) –HDCF has loss of cooling without consequent loss of power as its main vulnerability (JM/DR) Discussion, Questions, etc.
12
Metasys – Current FESS (Mike Michalak) — Status as of 11/12: –FCC is operational. (Power Logic panel monitoring work still required?). –HDCF network connected to Metasys panel Mike: should have HDCF up on the Metasys System Extended Architecture (Next Generation) next week (week of 11/15?). power outage required to tie in the power meters. –New Muon has no Metasys. NAE purchased for New Muon Ready to plan connections at New Muon. Network connection will be required. Metasys System Extended Architecture (MSEA) will be installed with the new CRAC units as part of the New Muon project which started on 11/15. Monitoring of chilled water temperature, chiller status, and pump status on MSEA will then begin.
13
Metasys - Future FESS (Ted Thorson) — Technology: Status is: –New Metasys system is ready for deployment awaiting the approval of the Critical System Plan, a pre-requisite to buying the PIX firewall and VPN concentrator. –All existing equipment on Ethernet and –All existing equipment migrated to the new system. –However, no one will be able to see the equipment at HDCF or New Muon until the new system can be deployed. FESS (Roger Slisz) — Critical System Coordinator: To do list is: –Procure a VPN concentrator and a PIX firewall device. –Secure VPN accounts for initial round of named users –Complete third draft of the CSP –Train initial round of named users on how MESA works and what they can and can not do with it. This has been a long complex project begun in February 2001. It is now perhaps close to first deployment.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.