Download presentation
Presentation is loading. Please wait.
Published byTurner Worsham Modified over 10 years ago
1
CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Upgrades and Maintenance
2
CIT 470: Advanced Network and System AdministrationSlide #2 Topics 1.Upgrade Procedure 2.Maintenance Windows 3.Service Conversions 4.Centralization and de-centralization
3
CIT 470: Advanced Network and System AdministrationSlide #3 Upgrades 1.Develop a service checklist 2.Verify each software package will work with new OS or plan upgrade. 3.Develop test for each service. 4.Write a back-out plan. 5.Select a maintenance window. 6.Announce upgrade. 7.Lock out users. 8.Do upgrade. 9.Perform tests. 10.Communicate success or back out. 11.Let users back in.
4
CIT 470: Advanced Network and System AdministrationSlide #4 Service Checklist 1.What services are provided by server? 2.Who are the customers of each service? 3.What package provides each service? 4.What other services depend on server?
5
CIT 470: Advanced Network and System AdministrationSlide #5 Verify Software Compatibility Don’t trust the vendor. Test the software yourself. What if the software isn’t compatible? –Upgrade to release supported by both OSes. –Upgrade to release supported by new OS only. –No upgrade path—don’t upgrade OS or migrate service to a VM running old OS.
6
CIT 470: Advanced Network and System AdministrationSlide #6 Verification Tests Automate tests with script. –Script compares actual and expected output. –Prints OK or FAIL for each test. –You’ll upgrade server or OS more than once. Tests can be simple. –“Hello world” program for compiler. –Use netcat to send text message to server. Some services come with tests.
7
CIT 470: Advanced Network and System AdministrationSlide #7 Back-Out Plan Back-out plan must be quick enough to perform within maintenance window. Upgrade strategies that support back-out –Clone disks, perform upgrade on clones. –Clone system disks, backup data disks. –Use snapshot capability of virtual machines. –Install upgrade on new server hardware.
8
CIT 470: Advanced Network and System AdministrationSlide #8 Select Maintenance Window When? –Evening or weekend. –Vendor support may be unavailable. How long? –t(upgrade) + t(testing) + t(debug) + t(back-out) –x 2 because you probably underestimated What time will back-out plan be initiated?
9
CIT 470: Advanced Network and System AdministrationSlide #9 Announcements Brief, direct, always use the same format. What you need to communicate: –Who is affected –What will happen –When –Why
10
CIT 470: Advanced Network and System AdministrationSlide #10 Test, Upgrade, Test 1.Perform verification tests before upgrade. 2.Perform the upgrade. 3.Repeat the tests. 4.Be sure a customer can access system too. 5.Back-out if the tests fail.
11
CIT 470: Advanced Network and System AdministrationSlide #11 Success or Failure Communicate success or failure. Be short. Provide contact in case something is broken.
12
CIT 470: Advanced Network and System AdministrationSlide #12 Disabling Services Follow the same procedures as upgrade. Be certain no one is still using service. –Check lists. –Use network sniffer to check for traffic. Disable service so that it’s easy to re-enable. –Don’t delete software until grace period passed. –Back up software before deletion.
13
CIT 470: Advanced Network and System AdministrationSlide #13 Upgrade Tips Don’t make two changes at the same time, as it makes debugging much more difficult. Practice the upgrade beforehand on a spare machine or VM.
14
CIT 470: Advanced Network and System AdministrationSlide #14 Maintenance Windows Scheduled for time-consuming changes. –Multiple sysadmins changing diff systems. –Large-scale data migration. –Shutting down services with many dependents. –Hardware changes: AC, re-wiring. –Moving to another data center. Evening, day, or weekend duration.
15
CIT 470: Advanced Network and System AdministrationSlide #15 Scheduling Coordinate with rest of organization. Avoid end of month, quarter, year. Schedule far in advance. Plan upgrade beforehand.
16
CIT 470: Advanced Network and System AdministrationSlide #16 Flight Director Single person responsible. Send out announcements. Approving and scheduling work proposals. –Ensure that workers don’t conflict with each other. Monitor progress during window. –Ensure that testing is performed. –Deciding if and when back-out should be initiated. Communicate success or failure at end.
17
CIT 470: Advanced Network and System AdministrationSlide #17 Change Proposals 1.What changes are going to be made? 2.What machines will be affected? 3.What are the premaintenance dependencies? 4.What needs to be up for change to happen? 5.Who is performing the work? 6.How long will the change take? 7.What are the test procedures? 8.What is the back-out procedure?
18
CIT 470: Advanced Network and System AdministrationSlide #18 Master Plan Takes into account –Dependencies (people, services, hardware) –Resources (people, time, hardware) Need slack in schedule for when things go wrong.
19
CIT 470: Advanced Network and System AdministrationSlide #19 Disabling Access Disable all access at start of window –Place notices on doors. –Disable remote access. –Announce over PA system. –Helpdesk voicemail message. Prevents people from using systems during maintenance and causing inconsistencies or accidental loss of data.
20
CIT 470: Advanced Network and System AdministrationSlide #20 Shutdown/Boot Sequence Proper sequence to ensure that all systems shutdown or boot cleanly. Takes into account dependencies –Network –Console servers –DNS –Authentication –License servers –File services –Database servers –Web and other application servers
21
CIT 470: Advanced Network and System AdministrationSlide #21 Deadlines Each change must be completed by deadline. –Back-out if change cannot be completed. –Ensures that dependent tasks won’t get started if they cannot be completed.
22
CIT 470: Advanced Network and System AdministrationSlide #22 System Testing Verification tests for each upgrade. Whole system tests to ensure everything works together before end of window. Shutdown and restart all systems.
23
CIT 470: Advanced Network and System AdministrationSlide #23 Completion Postmaintenance announcement. –Write this before the window starts. Re-enable remote access. Be available early the next morning to ensure that problems are detected and fixed quickly.
24
CIT 470: Advanced Network and System AdministrationSlide #24 Postmortem Meeting after all problems fixed. Review maintenance window –What went wrong? –What went right? –How can future windows go better? Data collection –How long does it really take to upgrade? –Track historical trends.
25
CIT 470: Advanced Network and System AdministrationSlide #25 High Availability Sites What’s high availability? –99.9% (9 hours per year downtime) –99.99% (1 hour per year) –99.999% (5 minutes per year) –99.9999% (<1 minute per year) What’s different during maintenance? –Redundant systems. –No full shutdown/reboots. –Availability must be closely monitored.
26
CIT 470: Advanced Network and System AdministrationSlide #26 Service Conversions Replacing existing svc with a new svc. One, some, many procedure. Communicate change to customers. Minimize service downtime.
27
CIT 470: Advanced Network and System AdministrationSlide #27 Layers vs. Pillars Layers –Perform one task for all customers at once. –Then move onto next task. –Better for non-intrusive tasks Pillars –Perform all tasks for each customer. –Then move onto next customer. –Better for intrusive tasks, as reduces # intrusions.
28
CIT 470: Advanced Network and System AdministrationSlide #28 Avoid Flash Cuts Avoid converting everyone at once. Convert willing test subjects first. Make both svcs available simultaneously. –Customers can try new service, get used to it. –Return to old service if they experience problems. Sometimes a flash-cut is the only solution. –Careful planning. –Comprehensive testing. –Back-out plan.
29
CIT 470: Advanced Network and System AdministrationSlide #29 Centralization Single, central focus of control. Centralize distributed systems. –Distributed systems can be complex. –Multiple servers, one point of control. Centralize administration –Single point of contact to get IT help. –Consolidate expertise. Centralize infrastructure decisions –Volume purchasing discounts. –One PC model = easy to repair, keep spare parts.
30
CIT 470: Advanced Network and System AdministrationSlide #30 De-Centralization Fault Tolerance –Systems work even when WAN is down. –Distributed systems can solve this too. Customization –Some groups need customized software/hardware. –One size never fits all customers.
31
CIT 470: Advanced Network and System AdministrationSlide #31 References 1.Mark Burgess, Principles of System and Network Administration, Wiley, 2000. 2.Aeleen Frisch, Essential System Administration, 3 rd edition, O’Reilly, 2002. 3.R. Evard. "An analysis of unix system configuration." Proceedings of the 11th Systems Administration conference (LISA), page 179, http://www.usenix.org/publications/library/proceedings/lisa97/full_pa pers/20.evard/20_html/main.html, 1997 http://www.usenix.org/publications/library/proceedings/lisa97/full_pa pers/20.evard/20_html/main.html 4.Evi Nemeth et al, UNIX System Administration Handbook, 3 rd edition, Prentice Hall, 2001. 5.SAGE, Job Descriptions, http://www.sage.org/field/jobs- descriptions.mm.http://www.sage.org/field/jobs- descriptions.mm 6.SAGE, SAGE Code of Ethics, http://www.sage.org/ethics.mm 7.Shelley Powers et. al., UNIX Power Tools, 3 rd edition, O’Reilly, 2002.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.