Presentation is loading. Please wait.

Presentation is loading. Please wait.

罗文彬讲座 All Rights Reserved 1 通信软件开发与管理 Course OD601 学时: 32 学分: 2 讲师:罗文彬.

Similar presentations


Presentation on theme: "罗文彬讲座 All Rights Reserved 1 通信软件开发与管理 Course OD601 学时: 32 学分: 2 讲师:罗文彬."— Presentation transcript:

1 罗文彬讲座 All Rights Reserved 1 通信软件开发与管理 Course OD601 学时: 32 学分: 2 讲师:罗文彬

2 罗文彬讲座 All Rights Reserved 2  Communication Overview  System Architecture Overview  Performance and Reliability  Operation, Administration, & Maintenance  Development Methodology  ISO9000/TL9000  CMMI  Project Management Class Subject

3 罗文彬讲座 All Rights Reserved 3 High performance and reliability is always a key factor in the network system. It has direct impact to the economy of a network operation. Reduce network downtime, increase network availability will increase the network revenue. To ensure the network availability, network operator always request equipment vendors to provide products with 5 9’s or 6 9’s availability. Network System Characteristic

4 罗文彬讲座 All Rights Reserved 4 System is consisting of hardware and software. The combined performance of hardware and software will determine the overall system performance. System Performance CPU Memory I/O CARD

5 罗文彬讲座 All Rights Reserved 5 Software Factors Software logics on high runner functions Hardware Factors CPU Memory I/O Card Disk Performance Factors

6 罗文彬讲座 All Rights Reserved 6 System Call Message Parsing Call Logic Data Access I/O Disk Access Threads & Processes High Runner Software functions

7 罗文彬讲座 All Rights Reserved 7 Process in user mode will use the CPU resource contiguously. System calls requires the process to enter kernel mode for acquiring exclusive system resource. Upon completion of the system call, the process will return to user mode. This operation called “process context switch” which requires extra CPU resource. Process can use CPU resource much more efficiently with fewer system calls. System Call

8 罗文彬讲座 All Rights Reserved 8 Data has to be passed through different processes to accomplish the desired software tasks. Buffering the data to be passed can greatly reduce the number of IPCs. Since IPC involved system call, buffered I/O is critical to improve the real-time performance. Buffered I/O sending process receiving process Shared memory write read

9 罗文彬讲座 All Rights Reserved 9 Database performance is critical to the overall system performance. Commercial database usually has tools for optimize the database performance, and it should be executed on a regular basis. In-Memory database such as TimesTen, MySQL, and Berkeley DB are commonly used real-time database products. Database Optimization

10 罗文彬讲座 All Rights Reserved 10 Hardware Technology Memory CPU Hardware Resource: CPU, Memory, Disk For normal traffic, system resource should be evenly utilized up to 40%. For overload traffic, system should be evenly utilized up to 80%. The high runner processes or threads should be evenly running on each CPU. The disk I/O should be evenly distributed on all disks. The most effective configuration to reach optimal throughput needs to be tested in a lab environment with simulated traffic.

11 罗文彬讲座 All Rights Reserved 11 Threads/Processes & CPU CPU 1 CPU 2 Number of threads and processes should use the CPU resource proportionally. For example, assume Proc A, B, and C use CPU time ratio is 1 : 2 : 1.5, the ratio of threads in Proc A, B, and C should be 2 : 4 : 3. The number of threads also depends on the characteristic of input messages. More threads are needed when each message take longer time to process. The same thread/process ratio should be replicated on all the CPUs. Assume there are two CPUs in the system, then either two identical processes can be created or use one process with double number of threads with the same ratio. Proc A Proc B Proc C

12 罗文彬讲座 All Rights Reserved 12 Memory CPU Keep data in memory is critical to the system performance. Optimize memory usage can keep more data in memory and improve the performance significantly. Data can be compressed to reduce memory usage, but require CPU resource to compress and de- compress the data. Memory locking operations will prevent multiple CPUs to be fully utilized because memory lock is a shared resource in multiple CPUs system.

13 罗文彬讲座 All Rights Reserved 13 Disk Memory CPU Minimize disk I/O is critical to improve the real-time performance Disk head movement takes 5-10ms of delay time which is critical to the real-time performance Reduce disk heads movement by buffering I/O can improve system performance significantly. Use all the disks in parallel can improve the I/O throughput significantly. Character I/O versus block I/O. Disk array versus mirrored disks.

14 罗文彬讲座 All Rights Reserved 14 Performance tuning is one important step during the network product development. Profile the CPU usage to identify high CPU usage functions. Optimize the top 10 CPU usage functions can improve the system performance significantly. Performance benchmark is a regular activity on every software release. Performance Tuning

15 罗文彬讲座 All Rights Reserved 15 Network system consists of both software and hardware. To increase the system reliability, both software and hardware reliability has to be improved. Software faults contribute much more system downtime compare to the hardware faults. Improve software reliability can improve the system reliability more effectively. System Reliability

16 罗文彬讲座 All Rights Reserved 16 Software reliability is determined by the software downtime caused by software bugs. To improve software reliability has three aspects: Software Reliability (1) 1)Reduce number of bugs. 2)Reduce downtime caused by software bugs. 3)Reduce bug fixing time.

17 罗文彬讲座 All Rights Reserved 17 Software Reliability (2) 1)Reduce number of bugs Software development process and quality control is the most effective way to reduce the number of software bugs and ensure software quality. More detail on the software development process will be discussed later.

18 罗文彬讲座 All Rights Reserved 18 Software Reliability (3) 2) Reduce downtime caused by software bugs Process could lose heartbeat because: Process dies Process too busy (infinite loop) Level 1 recovery, INIT kills the process which loses consecutive heartbeats, and re-initialize the process. Level 2 recovery, INIT re-initialize the process and its global resource. Level 3 recovery, INIT re-start the whole system. Level 4 recovery, INIT trigger OS re-boot. Level 5 recovery, power off, power on.

19 罗文彬讲座 All Rights Reserved 19 Software Reliability (4) 3) Reduce bug fixing time Error messages should be printed to the log file when unexpected software events detected such as unexpected incoming message or unexpected parameters in the incoming message, etc. The software code should cover all the logical branches in the “if..then..else..” statement. Error messages should be printed to the log file when unexpected logic branch has been reached.

20 罗文彬讲座 All Rights Reserved 20 Hardware Reliability (1) The hardware technology today can almost completely remove the hardware defects in the testing stage. Hardware faults usually caused by randomly failed components due to environment reason such as dust, static, vibration, and temperature. Ways to increase the hardware reliability: 1)Hardware Redundancy 2)Hot swappable hardware components 3)Spare parts inventory for hardware replacement

21 罗文彬讲座 All Rights Reserved 21 Hardware Redundancy Hardware redundancy is the most effective way of increasing system reliability from both software and hardware perspective. 99.9%99.999% 99.9999% + + + B A CD Probability of component A failed is ~A = 0.001. Probability of component A and B failed together is 0.001 * 0.001. Probability of component A and B and C failed together is 0.001.* 0.001 * 0.001

22 罗文彬讲座 All Rights Reserved 22 N+K Redundancy 99.9%99.999% 99.9999% + + + B A CD Assume each component can process X amount of network traffic, with 4 identical components the total traffic can be processed is 4X. Assume one component is for redundancy, the system should be able to handle 3X of traffic with probability of ??? Assume two components are for redundancy, the system should be able to handle 2X of traffic with probability of ???

23 罗文彬讲座 All Rights Reserved 23 Reliability Model A G C D E B F Layer 1 availability X = 1 – (~A * ~B) Layer 2 availability Y = 1 – (~C * ~D * ~E) Layer 3 availability Z = 1 – (~F * ~G) System availability = X * Y * Z

24 罗文彬讲座 All Rights Reserved 24 Hardware Evolution To ensure the failed hardware can be replaced as soon as possible, spare parts inventory are needed. The commercial hardware technology today, usually has a Mean Time Between Failure (MTBF) around 40000 hours (4.5 years). The hardware technology will be obsolete in 5 years, commercial hardware will be discontinued in 5 years. The software system should be able to be ported onto the latest hardware system easily to take the best usage of the hardware technology curve.

25 罗文彬讲座 All Rights Reserved 25 ATCA v2 Hardware Configuration Core Chassis Features  19” 14-slot Rack-mount 11U  Dedicated 15th front slot for dual shelf manager  SA Forum OpenHPI shelf manager  Dual Star Fabric backplane  Front access fan trays and dust filters  ETSI and NEBS level 3 14 Single Processor SBC Board (Rouzic)  Dual-core 2.16GHz processor  8GB memory

26 罗文彬讲座 All Rights Reserved 26 Hardware Deployment View PEM APEM B Pilot Blade Switch Blade BE CPU Blade 1430 BE BE CPU Blade Total Capacity: 8000 TPS 11M Subscribers 1+1 redundancy


Download ppt "罗文彬讲座 All Rights Reserved 1 通信软件开发与管理 Course OD601 学时: 32 学分: 2 讲师:罗文彬."

Similar presentations


Ads by Google