罗文彬讲座 All Rights Reserved 1 通信软件开发与管理 Course OD601 学时： 32 学分： 2 讲师：罗文彬.

Slides:

Advertisements

Similar presentations

Cultural Heritage in REGional NETworks REGNET Project Meeting Content Group

Advertisements

I/O Management and Disk Scheduling

Topics to be discussed Introduction Performance Factors Methodology Test Process Tools Conclusion Abu Bakr Siddiq.

Module – 3 Data protection – raid

1 Magnetic Disks 1956: IBM (RAMAC) first disk drive 5 Mb – Mb/in $/year 9 Kb/sec 1980: SEAGATE first 5.25’’ disk drive 5 Mb – 1.96 Mb/in2 625.

Chapter 3 Presented by: Anupam Mittal.  Data protection: Concept of RAID and its Components Data Protection: RAID - 2.

Chapter 19: Network Management Business Data Communications, 4e.

Lecture 1: History of Operating System

70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 11: Monitoring Server Performance.

Chapter 14 Chapter 14: Server Monitoring and Optimization.

I/O Hardware n Incredible variety of I/O devices n Common concepts: – Port – connection point to the computer – Bus (daisy chain or shared direct access)

Figure 1.1 Interaction between applications and the operating system.

Lesson 1: Configuring Network Load Balancing

Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.

Chapter 9 Overview  Reasons to monitor SQL Server  Performance Monitoring and Tuning  Tools for Monitoring SQL Server  Common Monitoring and Tuning.

Module 8: Monitoring SQL Server for Performance. Overview Why to Monitor SQL Server Performance Monitoring and Tuning Tools for Monitoring SQL Server.

VMware vCenter Server Module 4.

Hands-On Microsoft Windows Server 2008 Chapter 11 Server and Network Monitoring.

CH 13 Server and Network Monitoring. Hands-On Microsoft Windows Server Objectives Understand the importance of server monitoring Monitor server.

Windows Server 2008 Chapter 11 Last Update

© 2011 IBM Corporation 11 April 2011 IDS Architecture.

Windows Server MIS 424 Professor Sandvig. Overview Role of servers Performance Requirements Server Hardware Software Windows Server IIS.

Lecture 13 Fault Tolerance Networked vs. Distributed Operating Systems.

Rensselaer Polytechnic Institute CSCI-4210 – Operating Systems David Goldschmidt, Ph.D.

Server Hardware Chapter 22 Release 22/10/2010Jetking Infotrain Ltd.

Computer System Architectures Computer System Software

Bob Thome, Senior Director of Product Management, Oracle SIMPLIFYING YOUR HIGH AVAILABILITY DATABASE.

CHAPTER 2 OPERATING SYSTEM OVERVIEW 1. Operating System Operating System Definition A program that controls the execution of application programs and.

N-Tier Client/Server Architectures Chapter 4 Server - RAID Copyright 2002, Dr. Ken Hoganson All rights reserved. OS Kernel Concept RAID – Redundant Array.

Guide to Linux Installation and Administration, 2e 1 Chapter 9 Preparing for Emergencies.

CPU Cache Prefetching Timing Evaluations of Hardware Implementation Ravikiran Channagire & Ramandeep Buttar ECE7995 : Presentation.

Three fundamental concepts in computer security: Reference Monitors: An access control concept that refers to an abstract machine that mediates all accesses.

罗文彬讲座 All Rights Reserved 1 通信软件开发与管理 Course OD601 学时： 32 学分： 2 讲师：罗文彬.

© Pearson Education Limited, Chapter 16 Physical Database Design – Step 7 (Monitor and Tune the Operational System) Transparencies.

70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 11: Monitoring Server Performance.

Databases March 14, /14/2003Implementation Review2 Goals for Database Architecture Changes Simplify hardware architecture Improve performance Improve.

CE Operating Systems Lecture 3 Overview of OS functions and structure.

"1"1 Introduction to Managing Data " Describe problems associated with managing large numbers of disks " List requirements for easily managing large amounts.

70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 11: Monitoring Server Performance.

1 Fault Tolerant Computing Basics Dan Siewiorek Carnegie Mellon University June 2012.

The concept of RAID in Databases By Junaid Ali Siddiqui.

Lesson 19-E-Commerce Security Needs. Overview Understand e-commerce services. Understand the importance of availability. Implement client-side security.

CH 13 Server and Network Monitoring. Hands-On Microsoft Windows Server Objectives Understand the importance of server monitoring Monitor server.

Introduction to z/OS Basics © 2006 IBM Corporation Chapter 7: Batch processing and the Job Entry Subsystem (JES) Batch processing and JES.

罗文彬讲座 All Rights Reserved 1 通信软件开发与管理 Course OD601 学时： 32 学分： 2 讲师：罗文彬.

Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.

Unit - I Real Time Operating System. Content : Operating System Concepts Real-Time Tasks Real-Time Systems Types of Real-Time Tasks Real-Time Operating.

Interrupts and Exception Handling. Execution We are quite aware of the Fetch, Execute process of the control unit of the CPU –Fetch and instruction as.

Lecture 11. Switch Hardware Nowadays switches are very high performance computers with high hardware specifications Switches usually consist of a chassis.

1 Chapter 11 I/O Management and Disk Scheduling Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and.

Chapter 19: Network Management

Module 12: I/O Systems I/O hardware Application I/O Interface

Chapter Objectives In this chapter, you will learn:

Hands-On Microsoft Windows Server 2008

Operating System I/O System Monday, August 11, 2008.

Introduction to Operating System (OS)

Software Engineering Introduction to Apache Hadoop Map Reduce

Introduction of Week 3 Assignment Discussion

RAID RAID Mukesh N Tekwani

QNX Technology Overview

CS703 - Advanced Operating Systems

Chapter 2: Operating-System Structures

Chapter 13: I/O Systems I/O Hardware Application I/O Interface

Co-designed Virtual Machines for Reliable Computer Systems

RAID RAID Mukesh N Tekwani April 23, 2019

Database System Architectures

Chapter 2 Operating System Overview

Chapter 2: Operating-System Structures

Seminar on Enterprise Software

Presentation transcript:

罗文彬讲座 All Rights Reserved 1 通信软件开发与管理 Course OD601 学时： 32 学分： 2 讲师：罗文彬

罗文彬讲座 All Rights Reserved 2  Communication Overview  System Architecture Overview  Performance and Reliability  Operation, Administration, & Maintenance  Development Methodology  ISO9000/TL9000  CMMI  Project Management Class Subject

罗文彬讲座 All Rights Reserved 3 High performance and reliability is always a key factor in the network system. It has direct impact to the economy of a network operation. Reduce network downtime, increase network availability will increase the network revenue. To ensure the network availability, network operator always request equipment vendors to provide products with 5 9’s or 6 9’s availability. Network System Characteristic

罗文彬讲座 All Rights Reserved 4 System is consisting of hardware and software. The combined performance of hardware and software will determine the overall system performance. System Performance CPU Memory I/O CARD

罗文彬讲座 All Rights Reserved 5 Software Factors Software logics on high runner functions Hardware Factors CPU Memory I/O Card Disk Performance Factors

罗文彬讲座 All Rights Reserved 6 System Call Message Parsing Call Logic Data Access I/O Disk Access Threads & Processes High Runner Software functions

罗文彬讲座 All Rights Reserved 7 Process in user mode will use the CPU resource contiguously. System calls requires the process to enter kernel mode for acquiring exclusive system resource. Upon completion of the system call, the process will return to user mode. This operation called “process context switch” which requires extra CPU resource. Process can use CPU resource much more efficiently with fewer system calls. System Call

罗文彬讲座 All Rights Reserved 8 Data has to be passed through different processes to accomplish the desired software tasks. Buffering the data to be passed can greatly reduce the number of IPCs. Since IPC involved system call, buffered I/O is critical to improve the real-time performance. Buffered I/O sending process receiving process Shared memory write read

罗文彬讲座 All Rights Reserved 9 Database performance is critical to the overall system performance. Commercial database usually has tools for optimize the database performance, and it should be executed on a regular basis. In-Memory database such as TimesTen, MySQL, and Berkeley DB are commonly used real-time database products. Database Optimization

罗文彬讲座 All Rights Reserved 10 Hardware Technology Memory CPU Hardware Resource: CPU, Memory, Disk For normal traffic, system resource should be evenly utilized up to 40%. For overload traffic, system should be evenly utilized up to 80%. The high runner processes or threads should be evenly running on each CPU. The disk I/O should be evenly distributed on all disks. The most effective configuration to reach optimal throughput needs to be tested in a lab environment with simulated traffic.

罗文彬讲座 All Rights Reserved 11 Threads/Processes & CPU CPU 1 CPU 2 Number of threads and processes should use the CPU resource proportionally. For example, assume Proc A, B, and C use CPU time ratio is 1 : 2 : 1.5, the ratio of threads in Proc A, B, and C should be 2 : 4 : 3. The number of threads also depends on the characteristic of input messages. More threads are needed when each message take longer time to process. The same thread/process ratio should be replicated on all the CPUs. Assume there are two CPUs in the system, then either two identical processes can be created or use one process with double number of threads with the same ratio. Proc A Proc B Proc C

罗文彬讲座 All Rights Reserved 12 Memory CPU Keep data in memory is critical to the system performance. Optimize memory usage can keep more data in memory and improve the performance significantly. Data can be compressed to reduce memory usage, but require CPU resource to compress and de- compress the data. Memory locking operations will prevent multiple CPUs to be fully utilized because memory lock is a shared resource in multiple CPUs system.

罗文彬讲座 All Rights Reserved 13 Disk Memory CPU Minimize disk I/O is critical to improve the real-time performance Disk head movement takes 5-10ms of delay time which is critical to the real-time performance Reduce disk heads movement by buffering I/O can improve system performance significantly. Use all the disks in parallel can improve the I/O throughput significantly. Character I/O versus block I/O. Disk array versus mirrored disks.

罗文彬讲座 All Rights Reserved 14 Performance tuning is one important step during the network product development. Profile the CPU usage to identify high CPU usage functions. Optimize the top 10 CPU usage functions can improve the system performance significantly. Performance benchmark is a regular activity on every software release. Performance Tuning

罗文彬讲座 All Rights Reserved 15 Network system consists of both software and hardware. To increase the system reliability, both software and hardware reliability has to be improved. Software faults contribute much more system downtime compare to the hardware faults. Improve software reliability can improve the system reliability more effectively. System Reliability

罗文彬讲座 All Rights Reserved 16 Software reliability is determined by the software downtime caused by software bugs. To improve software reliability has three aspects: Software Reliability (1) 1)Reduce number of bugs. 2)Reduce downtime caused by software bugs. 3)Reduce bug fixing time.

罗文彬讲座 All Rights Reserved 17 Software Reliability (2) 1)Reduce number of bugs Software development process and quality control is the most effective way to reduce the number of software bugs and ensure software quality. More detail on the software development process will be discussed later.

罗文彬讲座 All Rights Reserved 18 Software Reliability (3) 2) Reduce downtime caused by software bugs Process could lose heartbeat because: Process dies Process too busy (infinite loop) Level 1 recovery, INIT kills the process which loses consecutive heartbeats, and re-initialize the process. Level 2 recovery, INIT re-initialize the process and its global resource. Level 3 recovery, INIT re-start the whole system. Level 4 recovery, INIT trigger OS re-boot. Level 5 recovery, power off, power on.

罗文彬讲座 All Rights Reserved 19 Software Reliability (4) 3) Reduce bug fixing time Error messages should be printed to the log file when unexpected software events detected such as unexpected incoming message or unexpected parameters in the incoming message, etc. The software code should cover all the logical branches in the “if..then..else..” statement. Error messages should be printed to the log file when unexpected logic branch has been reached.

罗文彬讲座 All Rights Reserved 20 Hardware Reliability (1) The hardware technology today can almost completely remove the hardware defects in the testing stage. Hardware faults usually caused by randomly failed components due to environment reason such as dust, static, vibration, and temperature. Ways to increase the hardware reliability: 1)Hardware Redundancy 2)Hot swappable hardware components 3)Spare parts inventory for hardware replacement

罗文彬讲座 All Rights Reserved 21 Hardware Redundancy Hardware redundancy is the most effective way of increasing system reliability from both software and hardware perspective. 99.9%99.999% % B A CD Probability of component A failed is ~A = Probability of component A and B failed together is * Probability of component A and B and C failed together is * * 0.001

罗文彬讲座 All Rights Reserved 22 N+K Redundancy 99.9%99.999% % B A CD Assume each component can process X amount of network traffic, with 4 identical components the total traffic can be processed is 4X. Assume one component is for redundancy, the system should be able to handle 3X of traffic with probability of ??? Assume two components are for redundancy, the system should be able to handle 2X of traffic with probability of ???

罗文彬讲座 All Rights Reserved 23 Reliability Model A G C D E B F Layer 1 availability X = 1 – (~A * ~B) Layer 2 availability Y = 1 – (~C * ~D * ~E) Layer 3 availability Z = 1 – (~F * ~G) System availability = X * Y * Z

罗文彬讲座 All Rights Reserved 24 Hardware Evolution To ensure the failed hardware can be replaced as soon as possible, spare parts inventory are needed. The commercial hardware technology today, usually has a Mean Time Between Failure (MTBF) around hours (4.5 years). The hardware technology will be obsolete in 5 years, commercial hardware will be discontinued in 5 years. The software system should be able to be ported onto the latest hardware system easily to take the best usage of the hardware technology curve.

罗文彬讲座 All Rights Reserved 25 ATCA v2 Hardware Configuration Core Chassis Features  19” 14-slot Rack-mount 11U  Dedicated 15th front slot for dual shelf manager  SA Forum OpenHPI shelf manager  Dual Star Fabric backplane  Front access fan trays and dust filters  ETSI and NEBS level 3 14 Single Processor SBC Board (Rouzic)  Dual-core 2.16GHz processor  8GB memory

罗文彬讲座 All Rights Reserved 26 Hardware Deployment View PEM APEM B Pilot Blade Switch Blade BE CPU Blade 1430 BE BE CPU Blade Total Capacity: 8000 TPS 11M Subscribers 1+1 redundancy