Download presentation
Presentation is loading. Please wait.
Published byAriel Newton Modified over 9 years ago
1
Nick LeRoy Computer Sciences Department University of Wisconsin-Madison nleroy@cs.wisc.edu http://www.cs.wisc.edu/condor/hawkeye Hawkeye
2
www.cs.wisc.edu/condor What is Hawkeye? › A monitoring and management tool for distributed systems › That's great, but... What does that mean? What can Hawkeye do for me?
3
www.cs.wisc.edu/condor What is does that mean? › Hawkeye is a tool that can be used to monitor various aspects of your computers › Examples: System load monitoring Watching for run-away processes Monitoring the health of your Condor pool
4
www.cs.wisc.edu/condor What can Hawkeye do? › Hawkeye can alert you when things go wrong. For example, Hawkeye can: Alert you when virtually any condition is found Alert you when various Condor problems are identified Allow you to specify your own custom alerts
5
www.cs.wisc.edu/condor Why Hawkeye? › Make system administration easier › Make Condor pool maintenance easier
6
www.cs.wisc.edu/condor Hawkeye Monitoring Agent Hawkeye Architecture Hawkeye Module Hawkeye Monitoring Agent Condor Pool Grid Hawkeye Module Hawkeye Manager
7
www.cs.wisc.edu/condor Hawkeye Matchmaking › Hawkeye alerts are done using ClassAd matchmaking. Machine Ad Trigger Ad Match Alert
8
www.cs.wisc.edu/condor Hawkeye ClassAds › Hawkeye uses ClassAds to represent collected data Schema-free data representation Provides matching mechanism Represent whatever data you gather in a way that works best for you
9
www.cs.wisc.edu/condor Hawkeye ClassAds › Example ClassAd “snippet”: RAM_MemFree = 841932800 RAM_MemShared = 0 RAM_MemTotal = 1055367168 RAM_SwapCached = 0 RAM_SwapFree = 2147483647 RAM_SwapTotal = 2147483647
10
www.cs.wisc.edu/condor Hawkeye ClassAds › Example ClassAd “snippet” #2: Condor_NumExecs = 2 Condor_NumMasters = 1 Condor_NumRunaway = 2 Condor_NumSchedds = 0 Condor_NumShadows = 0 Condor_NumStartds = 1 Condor_NumStarters = 2 Condor_RunawayPids = "3214,8753”
11
www.cs.wisc.edu/condor Sample Alert Trigger [ AlertTrigger = ( MyType == "Pool" && Absent.count > 5 ); AlertSeverity = ( Absent.count > 5 ) ? 1 : 0; Name = "Absent Nodes"; AlertText = StrCat(Absent.count, " machines are missing in ", Name) ]
12
www.cs.wisc.edu/condor Hawkeye at UW › Currently at UW, we're using Hawkeye: To monitor our Condor cluster To aid in detecting and correcting cluster problems To monitor the US/CMS testbed health
13
www.cs.wisc.edu/condor ›
16
Customizing Hawkeye › Hawkeye allows you to run your own custom “modules” to gather data. › Hawkeye allows you in set your own custom “alerts”, on attributes generated by “standard” and “custom” modules.
17
www.cs.wisc.edu/condor What is the status of Hawkeye? › Hawkeye 1.0 Release Candidate 1 (RC1) › Current module library includes modules to monitor system load, users, disk space, Condor, and more › Available from http://cs.wisc.edu/condor/hawkeye
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.