Download presentation
Presentation is loading. Please wait.
Published byErika Riley Modified over 9 years ago
1
1 Design and Implementation of TWAREN Hybrid Network Management System National Center for High-Performance Computing Speaker: Ming-Chang Liang & Li-Chi Ku
2
2 Outline Introduction Motivation Issues Design Implementation Future works
3
3 INTRODUCTION
4
4 About TWAREN TWAREN (TaiWan Advanced Research & Education Network) network construction was completed at the end of 2003 and started its operation and service in the beginning of 2004. In its initial phase, IP routing was the main service provided. The network management programs coming along with the purchase of network equipments, including CIC, Webtop, CW2K, HP Openview, HP NNM and other solutions.
5
5 Initial phase of TWAREN Taipei Taichung Tainan Hsinchu ASCC NDHU NCTU NTHU NCHUNYSU NCU CCU NCU NCCU TWAREN 10GE STM-64/OC-192 STM-16/OC- 48 GE C7609 C6509 GSR NTU C6509 NHLTC C6509 NTTU C6509 EBT10GE MOECC C6509
6
6 Initial phase of NMS 37502600 15454 15600 25227609 NAM CW2K (DFM) NNMCTM Cisco Info Center Remedy Help Desk ISM Notification 12416 Trap PING Polling Trap PING Polling PING Polling Trap CLI API Gateway Probe HTTP FTP SMTP DNS WebTop
7
7 Phase 2 of TWAREN TWAREN was adapted for more protection methods and better availability at the end of 2006, called TWAREN phase 2. Tens of optical switches and hundreds of lightpaths were then served as the foundation of the layer 2 VLAN services and the layer 3 IP routing services. In 2008, tens of VPLS switches were further incorporated to provide additional Multi-point VPLS VPN service. The layer 1 lightpaths can be protected by SNCP, layer 2 VLAN by spanning tree recalculation and layer 2 VPLS by fast reroute technology. All these improvements transform TWAREN phase 2 into a true hybrid network capable of providing multiple layers of services and high availability.
8
8 Architecture of TWAREN phase 2 STM64 STM16 10GE GE 6509 7609 15454 NTU 6509 7609 15454 NCU 6509 7609 15454 NSYSU 6509 7609 15454 NCHU 6509 7609 15454 NCTU 6509 7609 15454 NTHU 7609 15454 ASCC 6509 NCCU 6509 7609 15454 NCKU 6509 7609 15454 CCU 7609C 15454 Taipei 15600 12816 NCHC 7609C 15454 15600 12816 MOEcc Hsinchu 7609C 15454 15600 12816 NCHC Tainan 7609C 15454 12816 NCHC Taichung 7609 NCNU 7609 15454 NIU 6509 7609 15454 NDHU 3750 6509 3750 NHLTC 6509 3750 NTTU
9
9 MOTIVATION
10
10 Why need new NMS? The architecture of TWAREN phase 2 became more and more complicated. Since TWAREN phase 2 has more protection methods, a single point of hardware or circuit failure will not interrupt the service level provided to the end users. The initial phase of NMS was no longer competent for the hybrid network anymore because it is hard to determine and predict the correlation between failures and affected services.
11
11 Requirements for new NMS Automatically determine the correlation between failures, affected services, affected customs and severity level on this highly safeguard network. Provide single integrated visual user interface. Use integrated database, logs, message flows and exchange protocols. After several surveys, we decided to develop a new NMS which be suitable for monitoring all services provided by TWAREN phase 2.
12
12 ISSUES
13
13 Uncertainty of SNMP implementation There are some different implementations of the SNMP TRAP/MIB among equipments of same brand. The SNMP OIDs or the return values may vary between OS upgrade on the same equipment and are usually hard to reveal beforehand. Therefore, the system must be designed in a way such that these changes can be accommodated with minimal modifications.
14
14 The lack of skillful programmers Our programmers are the same guys with the members of operating team. We are not professional programmers and have not accordant programming language. The system must be partially available and operational during the early phase of its development such that it can evolve along with the real needs. So, an unified standard of communication between different modules is necessary
15
15 Huge historical data and computing For minimizing the false positive and false negative rate, baseline thresholds would have much better quality when they are dynamically generated from historical data. Therefore, we need to store sufficiently large historical data sets and to have very high efficiency to retrieve the data back while calculating those thresholds.
16
16 Automatically determine affected services and customs TWAREN phase 2 inherently has the ability to guard against a single point of hardware or circuit failure, so the failure is less likely to affect the actual service provisioning. An intelligent management system which is able to determine the scope of failure affected service will reduce the management cost.
17
17 DESIGN
18
18 1 st Stage System Architecture Current Status DB Long Term DB Monitor Objs Data Collectors Traps MIBs Syslogs Net flows Telnet/SSH Fault Detection Threshold DB Case/Action DB GUI & Ticket System Threshold Analyzer Fault Location Auto Action Control API Report System TL1 Mirror Interactive Passive
19
19 Relationship of Data Tables Component People Location Unit Vendor …., etc Basic Data Tables Circuit VLAN Services VPLS Services ONS Light Path ONS Cross Connection …., etc Relationship Tables
20
20 Basic Data Tables Component_IDParent_C_IDName 10TN7609P 121Slot_1 20TP15454 162Slot_3 13512Port_9 IDNamePhoneAddressService_TimeService_WeekDay 1John0939123123xxxxxxx8-171,3,5 2Mary0958123123xxxxxxxALL People Data Table IDNameAddress 1MOEccxxxxx 2NTUxxxxx Location Data Table Component Data Table IDName 1NCKU 18THU Unit Data Table IDName 1CHT 2APBT 3RingLine Vendor Data Table
21
21 Relationship Data Tables IDNameVendorIdentifyFrom_CIDTo_CIDBandwidth 1Taipei_Tainan_STM6418D5432671335STM64 2NCHU_NCNU_10GE2ST16987236710GE Circuit Data Table NodeANodeBPortAPortB 124514672346 163223123421 ONS Topology Link Table CRSPortAPortBSNCP_CRSChannelAChannelBSize 4821744175605134 213321334324173316 24354645342111716 ONS Cross Connection Table LPPortFromPortToSNCP_LPCRS_TraceSize 2231223450359,556,522,4754 983434445599482,541,33516 993434445598482,469,541,33516 ONS Light Path Table
22
22 IMPLEMENTATION
23
23 Current monitor objects Trap monitor Used interfaces, BGP, etc. Environment of equipment room Temperature (auto threshold), Voltage Statuses of equipments Temperature, CPU, RAM, FANs, Power-Supply BGP peering with other networks Statuses, Number of exchanged routes (auto threshold), Utilization analysis Performance monitor End to End RTT (auto threshold), End to End Packet Lost Rate (auto threshold), End to End Availability Throughput Backbone (auto threshold), Designate interfaces Top N Bytes, Flows, Packets Routes monitor The routes of customs (exact comparison) VPLS VPN Throughput of CE side, MACs of VPN Optical Network Current topology of lightpaths VLAN Current topology of VLAN
24
24 Future works Combine all developed monitor objects with single integrated visual user interface. Enhance the monitoring of optical, VPLS and VLAN networks. Automatically determine the fault location, root cause and affected scope. Minimize the false positive and false negative rate.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.