Presentation is loading. Please wait.

Presentation is loading. Please wait.

How does Microsoft approach change management communication? What happens when I have an outage? What is the Service Health Dashboard? What is the future.

Similar presentations


Presentation on theme: "How does Microsoft approach change management communication? What happens when I have an outage? What is the Service Health Dashboard? What is the future."— Presentation transcript:

1

2

3 How does Microsoft approach change management communication? What happens when I have an outage? What is the Service Health Dashboard? What is the future direction of Office 365 Service communication? What is service continuity? What are Post Incident Reports ? What does Microsoft do to make sure uptime is good?

4 What is service continuity? Service continuity is an approach to implement and validate a combination of preventive and recovery controls. Office 365 service continuity includes strategies to: Increase the availability of the service Build ability to recover from disasters Continuously learn and improve the service A good measure of service continuity is Service Uptime

5 What does uptime mean to my organization? The objective is to describe the risk of outage to an individual customer based on the aggregate uptime of the service. Longer outages have greater impact to the percentage Outages that affect a greater number of users have greater impact More severe outages in terms of users or duration lead to greater deviations from 100% The Office 365 service level agreement expresses uptime in this way: The aggregate uptime of service components can be expressed similarly.

6 Service Credits Customers are eligible for Service Credits whenever monthly uptime falls below 99.9% Service credits are calculated according the table on the right Monthly Uptime Percentage Service Credit < 99.9%25% < 99%50% < 95%100%

7 Redundancy Physical redundancy Data redundancy Functional redundancy Resiliency Active load balancing Recovery across “failure domains” regularly tested Human backup Automated recovery alerts 24x7 on-call engineer On-call engineers are core product group members Distributed Workloads Distributed components are more resilient Most failures are contained to a single service. Service component isolation Complexity avoidance and graceful degradation Standardized hardware Fully automated deployment Built-in workload management mechanisms Inspectability and predictability Detailed log and tracing Deep internal monitoring augmented by extensive outside-in monitoring diagnostics

8 Redundancy: physical Office 365 provides physical redundancy at multiple levels to protect against hardware failures Network and hardware redundancy Facilities and power redundancy At least 2 datacenters per region Physical redundancy at disk, NIC, power supply, and server levels. Data centers located in seismically safe zones

9 Redundancy: functional Online and offline functionality provide continuity in case of: Cloud disruptions Network interruptions The realities of business life (airplane mode)

10 Resiliency Active load balancing to restructure the system against rare extreme load conditions Automated failover to healthy resources in response to: Hardware or software failures Monitoring alerts Human initiated failover to healthy resources in response to: Service incidents Customer reported incidents Recovery across “failure domains” tested regularly

11 Distributed workloads Separation of function with distributed functional components Loose coupling serves to further limit the scope and impact of most failures Service component isolation to avoid failure cascades Replication of directory data across services ensures a seamless experience. SPO EXO Microsoft Online ID Office 365 Portal Office 365 Provisioning Lync

12 Human backup Automated recovery actions 24x7 on-call engineer: “Human in loop” Rapid response and information collection Dedicated support teams

13 13

14 Service incident Service-interrupting incidents Planned maintenance Planned service maintenance, including transitions/upgrades, repair, and update scenarios Service alteration Changes to service features, capabilities, or business terms of service Account life cycle Milestones in the subscription life cycle

15 Additional Channels Primary Channels

16 StatusDescription SHD icon Investigating Monitors have indicated a service anomaly and/or Microsoft has received reports of a potential service incident. Microsoft is currently investigating. Service Interruption Microsoft has confirmed that normal services are being impacted. Microsoft is taking immediate action to understand the cause of the failure and determine best course of action to restore service. Service Degradation Services are still active, but service responsiveness and/or delivery times may be slower than usual. Microsoft is working to restore normal service responsiveness. Restoring ServiceMicrosoft has isolated the likely cause of the incident and is in the process of restoring service Extended RecoveryServices are restored and may be slower than usual Service RestoredNormal system services have been restored False PositiveThe service is healthy and a service incident did not actually occur Additional InformationThere is additional information provided Normal ServiceThe service is healthy ?

17

18 Service Health Dashboard First and Best Content Regional Updated Hourly Emergency Broadcast System will automatically redirect customers http://status.office365.com. http://status.office365.com

19 Click on “View history for past 30 days”

20 Click on “Incident ID MO2708””

21 RSS Feed Regional Tenant Admin Points to SHD

22 Community http://community.office365.com Forums are helpful resource Technet or local marketing site is used in countries without full community site.

23 To: Customer Email For Limited Set of Service Incidents Explanation of Incident Localized Content

24 Twitter @Office365

25

26 Are published for Service Availability issues that span multiple customers Available within 5 business days Downloadable document accessible from SHD A PIR includes: Incident Information Summary Customer Impact Incident Start Date and Time Root Cause Next Steps 30 day historical view in SHD New survey feedback option

27 Click on “Post- incident report published”

28

29

30 Focus is on future protection from similar issues Next steps determined Post incident review within 5 days Service review within 30 days Improvement Solid next steps Tracked through delivery 1 immediate next step in PIR 10 additional changes in comprehensive plan

31 TypeDescriptionChannel Planned Maintenance Update 5 business days prior notification of planned service maintenance. Notification includes start and end time. Service Health Dashboard RSS Admin Feed (for subscribed admins)

32

33 Primary service alteration communication channel Tailored to your environment: only those actions you must take appear 33

34 Supporting service alteration communication channel Nearly every task has an FAQ covering The technical task required Why the change is important What happens if you don’t take action

35

36

37 37 Best experienceLatest version of Internet Explorer RecommendedCurrent and previous versions Internet Explorer Latest versions of Chrome, Firefox and Safari Best experienceOffice 365 ProPlus RecommendedAny Office client in mainstream support Not recommendedOffice clients in extended support Commercially reasonable support 12 months’ notice of substantial user experience degradation Best experienceLatest version of Windows or MacOS SupportedAny supported version of Windows or MacOS Web browser Office client Operating system

38 Transparent non-customer impacting service maintenance More detailed information and programmatic approach around service updates and service incidents Tenant Level Reporting Service Health Dashboard Customer Preview Programs In Product Notifications

39

40 http://microsoft.com/msdn www.microsoft.com/learning http://channel9.msdn.com/Events/TechEd http://microsoft.com/technet

41

42

43 Thank you!

44


Download ppt "How does Microsoft approach change management communication? What happens when I have an outage? What is the Service Health Dashboard? What is the future."

Similar presentations


Ads by Google