Maintaining Large Vista Installations Amy Edwards, Ezra Freelove, & George Hernandez July 12, 2007
2 Agenda Comparisons Who is USG Automation Monitoring Maintenance More Tricks Questions?
3 Informal Poll - Number of nodes (All prod clusters) now: Ours in bold (All prod clusters) by December:
4 Informal Poll – Number of DB Instances Including secondary and non-production Ours in bold
5 Vista Architecture
6 GeorgiaVIEW Project University System of Georgia (USG) Vista Host 32 institutions & multiple consortial programs >150,000 active students –Active is 100+ actions >11,000 active sections / term
7 Issues Handling performance issues Capacity planning Upgrades Replication JMS sensitivity Integration
8 Automation Rolling Restarts –Managed nodes restarted weekly except JMS Log cleanup to preserve space Error reporting –application, tracking, vulnerabilities Thread dumps Sync admin node with backup LDIS batch integration
9 Monitoring Nagios – –Sends alerts Stats –Custom AJAX web app –Watch changes of over time AWStats –
10 Nagios Example
11 Nagios Monitors OS / Hardware –Load –Temperature –Free space Database –Tablespace free space –Listener –Oracle processes Application –Direct-login –Weblogic processes –Java MBeans Default/Primary Pending Requests Current Count Java Heap Current JDBC Waiting for Connection Current Count Multicast Messages Lost Primary count
12 Stats Short and long term analysis –21 months of data Graphs all Nagios data collected Flexible creation of reports Built with AJAX
13 Stats Examples I of III
14 Stats Examples II of III
15 Stats Examples I of III
16 AWStats Records data from web server logs Custom script grabs data from webserver.log files Runs daily
17 AWStats Examples I of II
18 AWStats Eamples II of II
19 Specialized Nodes Admin JMS Institutional Admin –Integration Chat
20 JMS Node Provides special services –Mail, LC creation, chat Failure or migration of JMS node hinders usage Services do not migrate well –Allow targeted migration –OTHERS: Pin JMS to a specific node
21 Integration Batched LDIS data files Cron runs nightly Files broken up by: –type –“reasonable” number of records Done on Inst node –Issues with import can kill node
22 Touching Nodes ssh & dsh –Touch groups of nodes at once –Useful for: Installs Gathering logs Locating a session
23 Maintenance Page Hosted on opposite f5 Two versions –Scheduled maintenance –Unscheduled outage In an f5 outage, move DNS to other f5 so message still appears
24 Installs and Upgrades Silent install scripts Test in both development environments –Create against a small database –Get results of time to complete against a full size copy of production Install to production
25 Powerlinks and Custom Development Test in development Try to break Pilot in production Release to all
26 Questions?
27 Want More? To view my resources and references for this presentation, visit Simply click “Advanced Search” and search by ezrafreelove and tag: ‘bbworld07’
28 Contact Information Ezra Freelove Amy Edwards George Hernandez