Presentation is loading. Please wait.

Presentation is loading. Please wait.

Nick LeRoy Computer Sciences Department University of Wisconsin-Madison Hawkeye.

Similar presentations


Presentation on theme: "Nick LeRoy Computer Sciences Department University of Wisconsin-Madison Hawkeye."— Presentation transcript:

1 Nick LeRoy Computer Sciences Department University of Wisconsin-Madison nleroy@cs.wisc.edu http://www.cs.wisc.edu/condor/hawkeye Hawkeye

2 www.cs.wisc.edu/condor What is Hawkeye? › A monitoring and management tool for distributed systems › That's great, but...  What does that mean?  What can Hawkeye do for me?

3 www.cs.wisc.edu/condor What is does that mean? › Hawkeye is a tool that can be used to monitor various aspects of your computers › Examples:  System load monitoring  Watching for run-away processes  Monitoring the health of your Condor pool

4 www.cs.wisc.edu/condor What can Hawkeye do? › Hawkeye can alert you when things go wrong. For example, Hawkeye can:  Alert you when virtually any condition is found  Alert you when various Condor problems are identified  Allow you to specify your own custom alerts

5 www.cs.wisc.edu/condor Why Hawkeye? › Make system administration easier › Make Condor pool maintenance easier

6 www.cs.wisc.edu/condor Hawkeye Monitoring Agent Hawkeye Architecture Hawkeye Module Hawkeye Monitoring Agent Condor Pool Grid Hawkeye Module Hawkeye Manager

7 www.cs.wisc.edu/condor Hawkeye Matchmaking › Hawkeye alerts are done using ClassAd matchmaking. Machine Ad Trigger Ad Match Alert

8 www.cs.wisc.edu/condor Hawkeye ClassAds › Hawkeye uses ClassAds to represent collected data  Schema-free data representation  Provides matching mechanism  Represent whatever data you gather in a way that works best for you

9 www.cs.wisc.edu/condor Hawkeye ClassAds › Example ClassAd “snippet”: RAM_MemFree = 841932800 RAM_MemShared = 0 RAM_MemTotal = 1055367168 RAM_SwapCached = 0 RAM_SwapFree = 2147483647 RAM_SwapTotal = 2147483647

10 www.cs.wisc.edu/condor Hawkeye ClassAds › Example ClassAd “snippet” #2: Condor_NumExecs = 2 Condor_NumMasters = 1 Condor_NumRunaway = 2 Condor_NumSchedds = 0 Condor_NumShadows = 0 Condor_NumStartds = 1 Condor_NumStarters = 2 Condor_RunawayPids = "3214,8753”

11 www.cs.wisc.edu/condor Sample Alert Trigger [ AlertTrigger = ( MyType == "Pool" && Absent.count > 5 ); AlertSeverity = ( Absent.count > 5 ) ? 1 : 0; Name = "Absent Nodes"; AlertText = StrCat(Absent.count, " machines are missing in ", Name) ]

12 www.cs.wisc.edu/condor Hawkeye at UW › Currently at UW, we're using Hawkeye:  To monitor our Condor cluster  To aid in detecting and correcting cluster problems  To monitor the US/CMS testbed health

13 www.cs.wisc.edu/condor ›

14

15

16 Customizing Hawkeye › Hawkeye allows you to run your own custom “modules” to gather data. › Hawkeye allows you in set your own custom “alerts”, on attributes generated by “standard” and “custom” modules.

17 www.cs.wisc.edu/condor What is the status of Hawkeye? › Hawkeye 1.0 Release Candidate 1 (RC1) › Current module library includes modules to monitor system load, users, disk space, Condor, and more › Available from http://cs.wisc.edu/condor/hawkeye


Download ppt "Nick LeRoy Computer Sciences Department University of Wisconsin-Madison Hawkeye."

Similar presentations


Ads by Google