CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Change and Configuration Management
CIT 470: Advanced Network and System AdministrationSlide #2 Topics 1.Change Management 2.Change Processes 3.Revision Control 4.Configuration Management 5.cfengine Images from Pro Git
CIT 470: Advanced Network and System AdministrationSlide #3 Change Management Effective planning and implementation of changes to systems. Changes should be 1.Well documented. 2.Have a backout plan. 3.Reproducible.
CIT 470: Advanced Network and System AdministrationSlide #4 Why do we need Change Management? March 26-29, 2006: BART trains halted to avoid running into each other when computer systems crashed. Crashes on Monday/Tuesday resulted from software maintenance upgrades. Crash on Wednesday resulted from installing a backup system to avoid future crashes. Thousands of passengers stranded for several hours each time.
CIT 470: Advanced Network and System AdministrationSlide #5 Change Management 1.Plan change. 2.Test change on single system. 3.Test change on multiple systems. 4.File a change request. 5.Change committee approves request. 6.Schedule change. 7.Communication with users/admins. 8.Change systems at scheduled time. 9.Post-event analysis.
CIT 470: Advanced Network and System AdministrationSlide #6 Testing Changes Automated checks. –Sanity checks like Samba testparm. –Reboot system. Test on one system first. Then test on set of systems. –Dedicated test systems. –System admin workstations. –Virtual machines.
CIT 470: Advanced Network and System AdministrationSlide #7 When do you need a Change Proposal? Does the change impact critical services? Critical machines/services –Business critical: e-commerce server, etc. –Essential services: routers, DNS, NFS, auth. Non-critical machines/services –Individual desktops –Internal news web server
CIT 470: Advanced Network and System AdministrationSlide #8 Change Proposal 1.Description of the change. 2.Systems impacted by change. 3.Why the change is being made. 4.Risks presented by the change. 5.Test procedure. 6.Backout plans. 7.How long the change will require.
CIT 470: Advanced Network and System AdministrationSlide #9 Communication Communicate change to impacted people. –What change is being made (nontechnical.) –Which services will be unavailable. –When and how long will they be unavailable. –What actions do they need to task (if any.) Communication issues –If you send too many notes, they’ll be ignored. –Send notices only to those impacted. –Push critical notices; use pull for non-critical.
CIT 470: Advanced Network and System AdministrationSlide #10 Scheduling ScopeWhenNotification Type RoutineSingle host or user. Anytime.Personal. MajorMany hosts or users. Off-peakPush. SensitiveNone but major impact on failure. Off-peak.Pull.
CIT 470: Advanced Network and System AdministrationSlide #11 Change Freezes Time when only minor updates can be done. –End of quarter or year. –“Crunch time” for projects.
CIT 470: Advanced Network and System AdministrationSlide #12 Backing Out Decide back-out conditions before downtime –Avoid the “just 5 more minutes” problem. –Be sure that someone is keeping track of time. Questions: –How much time is required for back out? –When is the latest time you can successfully back out? –Will backing out this change prevent other changes from being committed?
CIT 470: Advanced Network and System AdministrationSlide #13 Backing Out: How to do it? Service-level changes Use revision control system to revert config. Restart service. Machine-level changes Soft cutover: Old service is still running. Hard cutover: Power up old server or restore from backups. Issues Data migration. Compatibility.
CIT 470: Advanced Network and System AdministrationSlide #14 Automatic Checks Check integrity of critical files before use. –Some services provide checks: LDAP, SMB. –Check startup files by rebooting machine. –Write your own checks for other files. Most people only do this after they have a problem.
CIT 470: Advanced Network and System AdministrationSlide #15 Revision Control Revision control systems provide Conflict management: prevents multiple people from modifying file at once and corrupting it. Change history: records who modified the file when and why the change was made. Revision control paradigms Lock-Modify-Unlock: rcs Copy-Modify-Merge: cvs, subversion, etc. Distributed: darcs, git, mercurial
Local Version Control CIT 470: Advanced Network and System AdministrationSlide #16
Centralized Version Control CIT 470: Advanced Network and System AdministrationSlide #17
Distributed Version Control CIT 470: Advanced Network and System AdministrationSlide #18
Local Git Operations CIT 470: Advanced Network and System AdministrationSlide #19
Git File Lifecycle CIT 470: Advanced Network and System AdministrationSlide #20
Gitk history visualizer CIT 470: Advanced Network and System AdministrationSlide #21
CIT 470: Advanced Network and System AdministrationSlide #22
CIT 470: Advanced Network and System AdministrationSlide #23 References 1.Mark Burgess, Principles of Network and System Administration, 2 nd edition, Wiley, Aeleen Frisch, Essential System Administration, 3 rd edition, O’Reilly, Thomas A. Limoncelli and Christine Hogan, The Practice of System and Network Administration, Addison-Wesley, Evi Nemeth et al, UNIX System Administration Handbook, 3 rd edition, Prentice Hall, Todd R. Weiss, “IT upgrades slow BART trains in San Francisco,” ,00.html, ComputerWorld, March 31, ,00.html