Download presentation
Presentation is loading. Please wait.
Published byMilton Crawford Modified over 6 years ago
1
Life After Implementation: Ensuring 24 x 7 Availability
After implementing an enterprise directory service, it's critical to have high availability to ensure its acceptance. This session will outline monitoring tools, as well as strategies for ensuring 24 x 7 availability. Copyright John Ball This work is the intellectual property of the author. Permission is granted for this material to be shared for non-commercial, educational purposes, provided that this copyright statement appears on the reproduced materials and notice is given that the copying is by permission of the author. To disseminate otherwise or to republish requires written permission from the author.
2
Life after implementation
Background 24 x 7 – what it means Strategy & Design Tools & Monitoring Acceptance, Interaction, & Policy 11/19/2018 2
3
SUNY@Buffalo Part of the NY state system http://www.suny.edu/
27,000+ enrolled students 13,000+ Staff & Faculty 2 Campuses 3 (4) geographically dispersed machine rooms 50/50 mix of Central IT and departmental support Enterprise Metadirectory 5 Enterprise Directories AD, DCE, Kerberos, 2 LDAPs 11/19/2018 3
4
John Ball john@buffalo.edu
Chief Information Technology Architect Middleware Area Coordinator (Manager) Machine Room Service Coordinator Project Manager System Administrator Geek Other duties as assigned… 11/19/2018 4
6
24 x 7 availability 13 services 24x7
Services composed of smaller components Many dependencies among services Limited window for scheduled service downtime 5am-7am Downtime of any kind wildly unpopular… Business Continuity (Disaster Recovery) Biz cont- ability for central computing center to “Go away” and biz to be able to continue. 11/19/2018 6
7
Definition of Service (e-mail)
40+ machines that do “ ” as perceived by the campus which include: News Listserv White pages LDAP Webmail Central Priority Mail throttling Anti-virus scanning Mail filtering 11/19/2018 7
8
Definition of Service (24 x 7 Auth)
50+ machines - which are broken in to the following components: Meta directory (Oracle) Accounts Active Directory support DCE Kerberos LDAP for Auth WebISO Shibboleth 11/19/2018 8
9
Service Interdependency
Dependencies: depends on auth and DNS Auth depends on DNS and time, etc. Downtime for one, means downtime for everything up the chain of dependency 11/19/2018 9
10
Design/Architecture N+1 Machines per service
Individual components of a service can be maintained while the service is still available Network layout Load balancing on the network Clustering where appropriate Multiple machine rooms Test gear 11/19/2018 10
11
N+1 Machines per service
Common Hardware base across multiple services Sun or Dell same/similar model types: Dell 1x50s or 2x50s (x is generation of machine) Sun V120s (single CPU), or Sunfire 280s Machines for various services and applications are grouped by these principles during budget time Various other budget principles (cost, discount, end of support by model, etc) are applied and many of the same models are ordered This also applies for sparing and repair 11/19/2018 11
12
N+1 Machines per service
Standard “build” – ability to create multiple near identical servers with minimal effort Flash/Flare Rsync Ghost We can add or remove machines to/from a service based on repairs, scalability (short term load or long term sizing) 11/19/2018 12
13
N+1 Machines per service
Configuring N+1 machines to respond to service requests. Service machine pooling (DCE, AD) DNS round robin Clustering (shared disk or state?) Network load balancing Various subnets (locations) “Next networking” Measuring the load 11/19/2018 13
14
Servicing Individual Machines
Need the ability to take a machine(s) out of service (planned or unplanned) without service interruptions. Not all machines for a service in the same rack, on the same plug strip, on the same circuit, in the same electrical panel, etc. Same for network, not on the same subnet, same switch, etc. 11/19/2018 14
15
Test Gear Test environment as close to production as feasible
N+1 provides possibility to test on a set of production machines Ability to make “flash cuts” and “rolling changes” with N+1 machines This includes fall backs…. 11/19/2018 15
17
Business Continuity 3 (4th planned) machine rooms
Geographically dispersed Service Pooling behind 3 redundant Cisco Content Services Switches DNS round robin Core network redundancy (network loop with multiple internet connections) Clustering and shared SAN storage 11/19/2018 17
18
Monitoring Custom monitoring – now legacy, mostly with DCE
Monitor in a standard way across all services and service components Up does not just mean responding to pings What is user wait time? Avoid reactive monitoring where possible Also looked at MON, SPONGE, and 7 other products that did not meet the requirements. 11/19/2018 18
19
Big Brother Monitors many “reactive things”
Connectivity, CPU, Disk Space, etc. Monitoring of unique service components Web servers, SMTP connects, applications, etc. Monitor “user experience” (delays) Can monitor anything you can log When issues do arise, page/ Dynamic Call list 11/19/2018 19
20
Staffing/Call List 30 support staff to support:
13 different 24 x 7 services – this means people get paged any time of the day or night for issues 40+ other enterprise services Concept of service teams Escalation outside of service teams Everyone in the group at least knows how to spell LDAP and DCE. 11/19/2018 20
21
More on Monitoring If the customer is the first one to call about downtime…. You already have a problem. Monitor Pro-actively as much as possible Make monitoring “sane” for support staff 11/19/2018 21
22
Policy How does data get in to the Metadirectory?
Concept of Data Custodian or “owner” What is the authoritative source for each piece of data? Need for Process to get information from the Enterprise Directory Naming standards? Security 11/19/2018 22
23
How does Data get in? How often is the data changing?
When do the updates occur to what systems or directories? Is the data being transferred in a secure manner? SSN user names passwords 11/19/2018 23
24
Data Custodian This is the question who has the authority to grant access to other data consumers on campus and off. Data consumers can be affiliated people, applications, or off campus Principles for data access No one gets passwords very limited SSN applications get application specific data Departments can have groups prefixed with their entity codes 11/19/2018 24
25
Authoritative Data Sources
For each piece of data in an enterprise directory there needs to be a single authoritative source Groups are a prime example: DCE groups, LDAP groups, AD groups DCE as a legacy is authoritative In the future the meta directory via GROUPER? 11/19/2018 25
26
Process for requests How does a consumer make a request to the data custodian? What will the data be used for? Auditing access? 11/19/2018 26
27
Naming Standards Important to choose names that do not break various technologies or directories Concept of enterprise names and entity code names (departmental) All-Campus-Staff vs. All-CIT-Staff or all.campus.staff vs all.cit.staff 11/19/2018 27
28
Enterprise Directory Interaction
Ensure all Enterprise Directories are downstream of the MetaDirectory Data For those exceptions – how does the data make it back to the Meta Directory to populate the other Enterprise Directories? Also how often? Persistent ID’s across enterprise applications - password synchronization? 11/19/2018 28
29
Questions? John Ball 11/19/2018 29
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.