Location of the theme gray Location of the theme blue ZTE Proprietary On Reliability of COTS Hardware Dr. Li Mo Chief Architect, CTO Group.

Slides:



Advertisements
Similar presentations
Express5800/ft series servers Product Information Fault-Tolerant General Purpose Servers.
Advertisements

New Timing Distribution Mechanism TICTOC WG, IETF 71th Philadelphia, USA draft-ji-tictoc-new-timing-distribution-mechanism-00.txt Kuiwen Ji
OVERVIEW Virtualization Defined Server Virtualization
Distributed Data Processing
Distributed Systems Major Design Issues Presented by: Christopher Hector CS8320 – Advanced Operating Systems Spring 2007 – Section 2.6 Presentation Dr.
VSE Corporation Proprietary Information
Fully Compliant Cloud Based Repository Lessons along the way Mark Ellis, Electronic Records Management Consultant April 8th, 2014.
Relex Reliability Software “the intuitive solution
Data Center Site Infrastructure Tier Standard: Topology Dr. Natheer Khasawneh Sadeem Al-Saeedi (8276)
Availability in Globally Distributed Storage Systems
© 2009 EMC Corporation. All rights reserved. Introduction to Business Continuity Module 3.1.
A Server-less Architecture for Building Scalable, Reliable, and Cost-Effective Video-on-demand Systems Jack Lee Yiu-bun, Raymond Leung Wai Tak Department.
NETE4631:Capacity Planning (3)- Private Cloud Lecture 11 Suronapee Phoomvuthisarn, Ph.D. /
Chapter 19: Network Management Business Data Communications, 4e.
1 Northwestern University Information Technology Data Center Elements Research and Administrative Computing Committee Presented October 8, 2007.
Documenting the Existing Network - Starting Points IACT 418 IACT 918 Corporate Network Planning.
Introduction to the new mainframe: Large-Scale Commercial Computing © Copyright IBM Corp., All rights reserved. Chapter 1: The new mainframe.
© 2014 ScaleArc. All Rights Reserved. 1 Creating an Agile Data Environment for Apps in the Cloud Summer 2014.
Understanding Network Failures in Data Centers: Measurement, Analysis and Implications Phillipa Gill University of Toronto Navendu Jain & Nachiappan Nagappan.
VIRTUALIZATION AND YOUR BUSINESS November 18, 2010 | Worksighted.
Steve Craker K-12 Team Lead Geoff Overland IT and Data Center Focus on Energy Increase IT Budgets with Energy Efficiency.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF CERN Business Continuity Overview Wayne Salter HEPiX April 2012.
Enabling High Efficiency Power Supplies for Servers : Update on industry, government and utility initiatives Brian Griffith System Power Architect Intel.
ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance and High Availability.
CHAPTER FIVE Enterprise Architectures. Enterprise Architecture (Introduction) An enterprise-wide plan for managing and implementing corporate data assets.
Lecture 13 Fault Tolerance Networked vs. Distributed Operating Systems.
Redundant Array of Independent Disks
Managing Multi-User Databases AIMS 3710 R. Nakatsu.
data center classifications or ratings
15 Maintaining a Web Site Section 15.1 Identify Webmastering tasks Identify Web server maintenance techniques Describe the importance of backups Section.
1 An SLA-Oriented Capacity Planning Tool for Streaming Media Services Lucy Cherkasova, Wenting Tang, and Sharad Singhal HPLabs,USA.
Version 4.0. Objectives Describe how networks impact our daily lives. Describe the role of data networking in the human network. Identify the key components.
Virtualization. Virtualization  In computing, virtualization is a broad term that refers to the abstraction of computer resources  It is "a technique.
Sofia, Bulgaria | 9-10 October SQL Server 2005 High Availability for developers Vladimir Tchalkov Crossroad Ltd. Vladimir Tchalkov Crossroad Ltd.
A novel approach of gateway selection and placement in cellular Wi-Fi system Presented By Rajesh Prasad.
1 © A. Kwasinski, 2015 Cyber Physical Power Systems Fall 2015 Power in Communications.
1 Drivers for Building Life Cycle Integrations Jim Sinopoli.
Business Data Communications, Fourth Edition Chapter 11: Network Management.
Clustering In A SAN For High Availability Steve Dalton, President and CEO Gadzoox Networks September 2002.
Chapter 5 McGraw-Hill/Irwin Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved.
OSIsoft High Availability PI Replication
CISN: Draft Plans for Funding Sources: OES/FEMA/ANSS/Others CISN-PMG Sacramento 10/19/2004.
NETE4631: Network Information System Capacity Planning (2) Suronapee Phoomvuthisarn, Ph.D. /
Use Cases for High Bandwidth Query and Control of Core Networks Greg Bernstein, Grotto Networking Young Lee, Huawei draft-bernstein-alto-large-bandwidth-cases-00.txt.
Network management Network management refers to the activities, methods, procedures, and tools that pertain to the operation, administration, maintenance,
Teknologi Pusat Data 12 Data Center Site Infrastructure Tier Standard: Topology Ida Nurhaida, ST., MT. FASILKOM Teknik Informatika.
CLOUD COMPUTING WHAT IS CLOUD COMPUTING?  Cloud Computing, also known as ‘on-demand computing’, is a kind of Internet-based computing,
1 CEG 2400 Fall 2012 Network Servers. 2 Network Servers Critical Network servers – Contain redundant components Power supplies Fans Memory CPU Hard Drives.
OSIsoft High Availability PI Replication Colin Breck, PI Server Team Dave Oda, PI SDK Team.
PERFORMANCE MANAGEMENT IMPROVING PERFORMANCE TECHNIQUES Network management system 1.
Austin Energy Elevated Requests for High Reliability Service South West Electric Distribution Exchange (SWEDE) May 8, 2015 David Tomczyszyn, P.E. – Distribution.
UNIT 7 MAINTENANCE OF DIGITAL SWITCHING SYSTEMS 9/23/2016.
Making Choice Possible in the Acquisition of Machinery Control Systems Program Executive Office Integrated Warfare Systems (PEO IWS)
Tier Certification Explained
Enterprise Architectures
Maintenance strategies
RENEWABLES AND RELIABILITY
Chapter 19: Network Management
Designing the Physical Architecture
Providing Application High Availability
High Availability Linux (HA Linux)
IOT Critical Impact on DC Design
MPE-PE Section Meeting
Operational procedures and tools for scheduled shutdowns at CC-IN2P3
Maximum Availability Architecture Enterprise Technology Centre.
Outline Introduction What is Data Center? Storage Equipment in DC.
GES SYSTEM THE IMPORTANCE OF GES SYSTEM IN BUILDING
CLUSTER COMPUTING.
Presentation transcript:

Location of the theme gray Location of the theme blue ZTE Proprietary On Reliability of COTS Hardware Dr. Li Mo Chief Architect, CTO Group

Agenda Chapter Title: Type: Calibri bold Size: 28-36pt Color: The theme blue ZTE Proprietary Differences between “Telecom Hardware” and COTS Hardware Analysis Framework Enhancing Application Reliability via Backups – Theoretical With one backup (1+1) With Different Types of Data Centers (type 1 – type 4) Remarks

© ZTE Corporation. All rights reserved. Title: Type: Calibri bold Size: 32-40pt Color: The theme blue Text (1-5 Level): Type: Calibri Size: 20-28pt Color: Black Page 3 ZTE Proprietary Comparing “Telecom Hardware” and COTS (Commercial of the Shelf) Hardware COTS Hardware May have smaller “mean time between failure” (MTBF) Relative smaller “mean time to repair” (MTTR) COTS procedures for software upgrade, patching, and maintenance contribute more to “scheduled down time” Different grade of reliability for data centers Strong fault detection and fault isolation capabilities at hardware level Well established traditions on software upgrade, patching, and maintenance Reliably Central Office assumed Telecom Hardware

© ZTE Corporation. All rights reserved. Title: Type: Calibri bold Size: 32-40pt Color: The theme blue Text (1-5 Level): Type: Calibri Size: 20-28pt Color: Black Page 4 ZTE Proprietary New Item to Consider for COTS – Site Down Time COTS Hardware Site downtime (scheduled, non- scheduled) with varying duration and varying intervals COTS Hardware May have smaller “mean time between failure” (MTBF) Relative smaller “mean time to repair” (MTTR) COTS procedures for software upgrade, patching, and maintenance contribute more to “scheduled down time” Different grade of reliability for data centers Type % Type % Type % Type % Types of DC Reliability

© ZTE Corporation. All rights reserved. Title: Type: Calibri bold Size: 32-40pt Color: The theme blue Text (1-5 Level): Type: Calibri Size: 20-28pt Color: Black Page 5 ZTE Proprietary Common Mechanisms to Improve Reliability – Application Level Backup Master Server Slave Server (Backup Server) Slave Server (Backup Server) Data Synchronization Benefits of Application Level Backup: Against failures Facilitate maintenance and upgrade Benefits of Application Level Backup: Against failures Facilitate maintenance and upgrade Various Failures Hardwar Failure Hypervisor/OS Failure Application Failure Various Failures Hardwar Failure Hypervisor/OS Failure Application Failure

Agenda Chapter Title: Type: Calibri bold Size: 28-36pt Color: The theme blue ZTE Proprietary Differences between “Telecom Hardware” and COTS Hardware Analysis Framework Enhancing Application Reliability via Backups – Theoretical With one backup (1+1) With Different Types of Data Centers (type 1 – type 4) Remarks

© ZTE Corporation. All rights reserved. Title: Type: Calibri bold Size: 32-40pt Color: The theme blue Text (1-5 Level): Type: Calibri Size: 20-28pt Color: Black Page 7 ZTE Proprietary Availability: P1 Availability: P2 COTS Server VM COTS Server VM vSwitch 1 vSwitch 2 System availability P1 * P2

© ZTE Corporation. All rights reserved. Title: Type: Calibri bold Size: 32-40pt Color: The theme blue Text (1-5 Level): Type: Calibri Size: 20-28pt Color: Black Page 8 ZTE Proprietary Introducing COTS – Focusing on Server Part of Reliability Providing 5 9s reliability with 4 9s availability per networking equipment

© ZTE Corporation. All rights reserved. Title: Type: Calibri bold Size: 32-40pt Color: The theme blue Text (1-5 Level): Type: Calibri Size: 20-28pt Color: Black Page 9 ZTE Proprietary Various Backup Schemes One Backup Master Server Slave Server Data Synchronization Other Apps Data Synchronization Two Backups Master Server Slave Server (Backup 1) Slave Server (Backup 1) Slave Server (Backup 2) Slave Server (Backup 2) Other Apps Data Synchronization Load Sharing Master Server 1 Master Server N N Backup Servers Other Apps Data Synchronization Master Server 1 Slave Server 2 Other Applicat ions Master Server 2 Slave Server 1 Other Applicati ons 1:1 Load sharing

Agenda Chapter Title: Type: Calibri bold Size: 28-36pt Color: The theme blue ZTE Proprietary Differences between “Telecom Hardware” and COTS Hardware Analysis Framework Enhancing Application Reliability via Backups – Theoretical With one backup (1+1) With Different Types of Data Centers (type 1 – type 4) Remarks

© ZTE Corporation. All rights reserved. Title: Type: Calibri bold Size: 32-40pt Color: The theme blue Text (1-5 Level): Type: Calibri Size: 20-28pt Color: Black Page 11 ZTE Proprietary A Simple Example – Markov State Transition Model Legend: M M Server is good M M Server is badCurrent Master M M λ μ M M M M S0S1 1-λ 1- μ Chapman – Kolmogorov Equation or The Result

© ZTE Corporation. All rights reserved. Title: Type: Calibri bold Size: 32-40pt Color: The theme blue Text (1-5 Level): Type: Calibri Size: 20-28pt Color: Black Page 12 ZTE Proprietary S1S1 S1S1 S2S2 S2S2 S1 S1S1 S1S1 S2S2 S2S2 S2 S1S1 S1S1 S2S2 S2S2 S0 S1S1 S1S1 S2S2 S2S2 S3 μ μ μ (1-λ)λ(1-s)(1-λ)λ λ λ λ 2 +λ(1- λ)s Legend: S S Server is good S S Server is bad Current Master S S 1+1 System – Markov State Transition Model Inverse of Server MTBF Inverse of Server MTTR Silent Error Probability

© ZTE Corporation. All rights reserved. Title: Type: Calibri bold Size: 32-40pt Color: The theme blue Text (1-5 Level): Type: Calibri Size: 20-28pt Color: Black Page 13 ZTE Proprietary Solving the Global Balancing Equation for Getting overall System Availability (1+1)

© ZTE Corporation. All rights reserved. Title: Type: Calibri bold Size: 32-40pt Color: The theme blue Text (1-5 Level): Type: Calibri Size: 20-28pt Color: Black Page 14 ZTE Proprietary μ MTTR=12 minMTTR=6 min

© ZTE Corporation. All rights reserved. Title: Type: Calibri bold Size: 32-40pt Color: The theme blue Text (1-5 Level): Type: Calibri Size: 20-28pt Color: Black Page 15 ZTE Proprietary Differences in Availability between Theoretical Data and Simulation for 1+1 Backup Case MTTR=6 minutes

© ZTE Corporation. All rights reserved. Title: Type: Calibri bold Size: 32-40pt Color: The theme blue Text (1-5 Level): Type: Calibri Size: 20-28pt Color: Black Page 16 ZTE Proprietary Improvement of 1+2 (dual backup) v.s. 1+1 (single backup) Defining the “percentage” of improvement as

© ZTE Corporation. All rights reserved. Title: Type: Calibri bold Size: 32-40pt Color: The theme blue Text (1-5 Level): Type: Calibri Size: 20-28pt Color: Black Page 17 ZTE Proprietary Revertive Maintenance Data Synchronization Master Server Other Apps Slave Server Other Apps DC A DC B Master Server Slave Server (Backup) Slave Server (Backup) Other Apps DC B In Maintenance DC B In Maintenance DC A

© ZTE Corporation. All rights reserved. Title: Type: Calibri bold Size: 32-40pt Color: The theme blue Text (1-5 Level): Type: Calibri Size: 20-28pt Color: Black Page 18 ZTE Proprietary The Impact of Site Maintenance is Negligible (Revertive)

Agenda Chapter Title: Type: Calibri bold Size: 28-36pt Color: The theme blue ZTE Proprietary Differences between “Telecom Hardware” and COTS Hardware Analysis Framework Enhancing Application Reliability via Backups – Theoretical With one backup (1+1) With Different Types of Data Centers (type 1 – type 4) Remarks

© ZTE Corporation. All rights reserved. Title: Type: Calibri bold Size: 32-40pt Color: The theme blue Text (1-5 Level): Type: Calibri Size: 20-28pt Color: Black Page 20 ZTE Proprietary Four Types of Data Centers (ANSI/TIA-942) Type 1 Single non-redundant distribution path serving the IT equipment. Non- redundant capacity components. Basic site infrastructure with expected availability of %. Type 2 Meets or exceeds all type 1 requirements. Redundant site infrastructure capacity components with expected availability of %. Type 3 Meets or exceeds all type 2 requirements. Multiple independent distribution paths serving the IT equipment. All IT equipment must be dual- powered and fully compatible with the topology of a site's architecture. Concurrently maintainable site infrastructure with expected availability of %. Type 4 Meets or exceeds all type 3 requirements. All cooling equipment is independently dual-powered, including chillers and heating, ventilating and air-conditioning (HVAC) systems Fault-tolerant site infrastructure with electrical power storage and distribution facilities with expected availability of %

© ZTE Corporation. All rights reserved. Title: Type: Calibri bold Size: 32-40pt Color: The theme blue Text (1-5 Level): Type: Calibri Size: 20-28pt Color: Black Page 21 ZTE Proprietary 1+1 System with Site Error – Markov State Transition Model (Revertive) S S S S S0 S S S S S2 μ -(1-λ)λs+2(1-λ)λ S S S S S1 μ λ λ 2 +λ(1- λ)s S6 S S S S S S S S S3 S S S S S5 μ -(1-λ)λs+2(1-λ)λ S S S S S4 μ λ λ 2 +λ(1- λ)s

© ZTE Corporation. All rights reserved. Title: Type: Calibri bold Size: 32-40pt Color: The theme blue Text (1-5 Level): Type: Calibri Size: 20-28pt Color: Black Page 22 ZTE Proprietary Service Impact Error Probability for Various Data Centers 1x /(DC MTBF)

Agenda Chapter Title: Type: Calibri bold Size: 28-36pt Color: The theme blue ZTE Proprietary Differences between “Telecom Hardware” and COTS Hardware Analysis Framework Enhancing Application Reliability via Backups – Theoretical With one backup (1+1) With Different Types of Data Centers (type 1 – type 4) Remarks

© ZTE Corporation. All rights reserved. Title: Type: Calibri bold Size: 32-40pt Color: The theme blue Text (1-5 Level): Type: Calibri Size: 20-28pt Color: Black Page 24 ZTE Proprietary Remarks COTS Hardware May have smaller “mean time between failure” (MTBF) Relative smaller “mean time to repair” (MTTR) COTS procedures for software upgrade, patching, and maintenance contribute more to “scheduled down time” Different grade of reliability for data centers It is possible to provide 5 9 availability with COTS hardware with application level backup The Impact of MTTR is not significant if it is reasonably small (e.g. less than 10 minutes) for typical hardware MTBF The impact of data center scheduled maintenance is negligible 5 9 availability with can only be achieved via type 3 and type 4 data centers

© ZTE Corporation. All rights reserved. Thanks!