Presentation is loading. Please wait.

Presentation is loading. Please wait.

Presentation on developments for the period May - Sep 2006 on Fabric Management C. S. R.C. Murthy, Rohitashva Sharma, Salim A. Pathan & Dinesh Sarode.

Similar presentations


Presentation on theme: "Presentation on developments for the period May - Sep 2006 on Fabric Management C. S. R.C. Murthy, Rohitashva Sharma, Salim A. Pathan & Dinesh Sarode."— Presentation transcript:

1 Presentation on developments for the period May - Sep 2006 on Fabric Management C. S. R.C. Murthy, Rohitashva Sharma, Salim A. Pathan & Dinesh Sarode

2 Lemon Oramon monitoring currently receives ~250 samples/sec from nearly 3000 machines Lemon Oramon monitoring currently receives ~250 samples/sec from nearly 3000 machines Future estimate: 10000 machines (including on-behalf entities) and ~1000 samples/sec. Future estimate: 10000 machines (including on-behalf entities) and ~1000 samples/sec. Security services using public/private keys to be integrated Security services using public/private keys to be integrated CPU & memory Stability determination of Oramon and Oracle require stress tests CPU & memory Stability determination of Oramon and Oracle require stress tests Targets: Rate @ 2000 sample/sec, hosts ~10000 and 1024 bit digital signatures Targets: Rate @ 2000 sample/sec, hosts ~10000 and 1024 bit digital signatures A few phases of the tests completed during Mar – Apr 2006 and work continued A few phases of the tests completed during Mar – Apr 2006 and work continued Oramon performance Enhancements

3 Full scale tests launched from ~2000 lxbatch hosts with 10000 virtual hosts, RSA1024/sha1 authentication and aggregate rate of 2000 samples/sec. Full scale tests launched from ~2000 lxbatch hosts with 10000 virtual hosts, RSA1024/sha1 authentication and aggregate rate of 2000 samples/sec. Large number of virtual hosts & higher speeds made oracle response very slow. Large number of virtual hosts & higher speeds made oracle response very slow. High demand of CPU & memory for OraMon as well as Oracle High demand of CPU & memory for OraMon as well as Oracle Ever-increasing memory consumption by oramon and crashing of the oramon due to malloc failure at these speeds Ever-increasing memory consumption by oramon and crashing of the oramon due to malloc failure at these speeds Conclusion: The whole system is not stable at higher rates and large number of virtual hosts Conclusion: The whole system is not stable at higher rates and large number of virtual hosts Contd…

4 Investigation findings: Investigation findings: Bulk update failures cause reading of entire latest table from database and comparing each record in queue with all the retrieved records Bulk update failures cause reading of entire latest table from database and comparing each record in queue with all the retrieved records Bug in the code resulting in non-convergence of latest tables data Bug in the code resulting in non-convergence of latest tables data Large interval partitioning at target speeds puts heavy load on primary key comparison during history table inserts Large interval partitioning at target speeds puts heavy load on primary key comparison during history table inserts Solutions implemented: Solutions implemented: Efficient handling of latest table updates using sql “MERGE” facility. Efficient handling of latest table updates using sql “MERGE” facility. 3 hour interval partitioning scheme at target speed 2000 samples/sec 3 hour interval partitioning scheme at target speed 2000 samples/sec Oramon performance Enhancements Contd…

5 Results: Results: Achieved stable operation of Oramon & Oracle at ~2000 samples/sec, ~10000 virtual hosts and public key based authentication. Achieved stable operation of Oramon & Oracle at ~2000 samples/sec, ~10000 virtual hosts and public key based authentication. Recommendations are made w.r.t current and future operating setup of Oramon and monitoring. Recommendations are made w.r.t current and future operating setup of Oramon and monitoring. TCP 2700 samples/sec, rsa-sha1, 9500 virtual hosts, 6hours CPU utilization - Oramon server VM - Oramon server

6 TCP 1600 samples/sec, rsa-sha1, 9500 virtual hosts, 48 hours CPU utilization - Oramon server VM – Oramon Server CPU utilization - Db server Disk IO write – DB server

7 Handling of duplicate samples while inserting into historical tables Handling of duplicate samples while inserting into historical tables Duplicates are inevitable due to redundant monitoring of on-behalf entities Duplicates are inevitable due to redundant monitoring of on-behalf entities History table insertion and latest table update fails if there are duplicate samples in the queue History table insertion and latest table update fails if there are duplicate samples in the queue Implemented duplicates removal and sub sequent insertion & update of samples to prevent loss of data Implemented duplicates removal and sub sequent insertion & update of samples to prevent loss of data OraMon failure recovery, duplicates handling and other bug fixes

8 Oramon recovery from temporary db connect failures Oramon recovery from temporary db connect failures Temporary connect failures of database should be handled properly Temporary connect failures of database should be handled properly Implemented connection recovery to overcome temporary db failures Implemented connection recovery to overcome temporary db failures Graceful shutdown of OraMon on receiving signals Graceful shutdown of OraMon on receiving signals Proper DB disconnection on oramon shut down desirable Proper DB disconnection on oramon shut down desirable Implemented graceful shutdown of oramon on signals Implemented graceful shutdown of oramon on signals Oramon crashes on long metric field lengths Oramon crashes on long metric field lengths A few new metrics have field lengths > 1000 bytes A few new metrics have field lengths > 1000 bytes Identified the cause for the bug and removed it Identified the cause for the bug and removed it Contd…

9 Encryption of samples is a RFE of lemon security Encryption of samples is a RFE of lemon security Implementation decided to be based on public/private key of the server Implementation decided to be based on public/private key of the server Asymmetric key encryption can’t be done on data size exceeding modulus of key Asymmetric key encryption can’t be done on data size exceeding modulus of key Typical modulus is 308 for 1024 bit key length Typical modulus is 308 for 1024 bit key length Recursive encryption/decryption is necessary to achieve full sample encryption Recursive encryption/decryption is necessary to achieve full sample encryption Public/Private key based lemon samples encryption

10 Module implemented to do sample stream encryption/decryption recursively Module implemented to do sample stream encryption/decryption recursively TCP/UDP transports require changes to maintain and transmit normal and encrypted samples TCP/UDP transports require changes to maintain and transmit normal and encrypted samples Identified where and what changes to make in the source and implementation soon to follow Identified where and what changes to make in the source and implementation soon to follow Public/Private key based lemon samples encryption Contd…

11 Wassh2 Re-engineering Parallel SSH execution tool currently being used in CERN-CC Parallel SSH execution tool currently being used in CERN-CC Executes shell commands on remote hosts in parallel Executes shell commands on remote hosts in parallel Communicates with CDB Communicates with CDB Written in different languages like Perl, Python, C and Haskell Written in different languages like Perl, Python, C and Haskell

12 Wassh2 - Task Re-implementation of existing Wassh Re-implementation of existing Wassh It involves - It involves - Comparison of existing Wassh with other open-source solutions Comparison of existing Wassh with other open-source solutions Preparation of a design document depending upon the comparison and existing RFEs Preparation of a design document depending upon the comparison and existing RFEs Isolation of Wassh and CDB communication Isolation of Wassh and CDB communication Implementation of Wassh2 Implementation of Wassh2

13 Wassh2 - Design Wassh2 Front-End SSM (Optional Site-specific module) Parallel Engine SSH SSH Servers Wassh- Decorate User Options Target Selection Options List of Hosts [Parallelism Options Shell Command Output Formatted Output

14 Wassh2 Front-End Responsible for handling options Responsible for handling options Enumerates basic target selection options like: Enumerates basic target selection options like: Targets with numeric wildcards e.g. lxb[01-10] Targets with numeric wildcards e.g. lxb[01-10] Read target list from file/STDIN Read target list from file/STDIN Uses optional Site Specific Module (SSM) to expand site specific nomenclature Uses optional Site Specific Module (SSM) to expand site specific nomenclature Communication with other modules Communication with other modules

15 Wassh2 – CERN-SSM Interface between CERN CDB and Wassh front-end Interface between CERN CDB and Wassh front-end Uses HTTP/XML based CDBSQL API to connect to CDB and gather XML information Uses HTTP/XML based CDBSQL API to connect to CDB and gather XML information Returns list of hosts depending upon target selection options Returns list of hosts depending upon target selection options If options > 1, options are ANDed to generate host list If options > 1, options are ANDed to generate host list

16 Wassh2 – Work done Design document is prepared Design document is prepared First prototype version of Wassh2 is developed First prototype version of Wassh2 is developed Modifications suggested by CERN team are being done Modifications suggested by CERN team are being done

17 Sensor-Exception Responsible for generating exception/alarms based on local metrics Responsible for generating exception/alarms based on local metrics Generated exceptions are used to display alarms in LAS (Lemon Alarm System) Generated exceptions are used to display alarms in LAS (Lemon Alarm System) Supports logical correlation of multiple metrics Supports logical correlation of multiple metrics

18 Sensor-Exception Work done in sensor-exception Work done in sensor-exception Support for on-behalf metric correlation Support for on-behalf metric correlation Generates exception for all on-behalf metrics Generates exception for all on-behalf metrics Support for alarm state management Support for alarm state management Exception/alarm can be turned off dynamically without stopping monitoring Exception/alarm can be turned off dynamically without stopping monitoring Support for minimum occurrences of exception Support for minimum occurrences of exception Suppresses transient alarms Suppresses transient alarms

19 Lemon XML-Gateway XML-Gateway is an interface to Lemon monitoring-repository (MR). XML-Gateway is developed to overcome shortcomings of existing SOAP interface. SOAP-Server crashes when number of samples exceeds ( For large queries).

20 Lemon XML-Gateway Development work includes:- Development work includes:- 1. Developing new methods for fetching data from monitoring-repository (oramon as well as flatmon). 2. Development of XML wrapper classes. 3. Development of gateway program to receive client requests. 4. XML-Schema to describe the XML data. Contd…

21 Raw XML data can be requested from XML- Gateway using HTTP protocol. Raw XML data can be requested from XML- Gateway using HTTP protocol. Performance of XML-Gateway is evaluated and found satisfactory. Performance of XML-Gateway is evaluated and found satisfactory. Next part is to develop Lemon XML-API in each of the following languages: PERL, PHP, C++, Python, and Java. Next part is to develop Lemon XML-API in each of the following languages: PERL, PHP, C++, Python, and Java. Lemon XML-Gateway Contd…

22 Lemon XML-Gateway Performance Graphs ( OraMon) Contd…

23 Lemon XML-Gateway Performance Graphs ( OraMon) Contd…

24 Lemon XML-Gateway Performance Graphs ( FlatMon) Contd…

25 Lemon XML-Gateway Performance Graphs ( FlatMon) Contd…

26 Lemon XML-API Lemon XML-API is decided to be developed in C++. Lemon XML-API is decided to be developed in C++. Later SWIG will be used to generate interface in other languages. Later SWIG will be used to generate interface in other languages. Lemon XML-API includes methods to fetch XML data from Lemon XML-Gateway. Lemon XML-API includes methods to fetch XML data from Lemon XML-Gateway. Also methods will be provided to query local data. Also methods will be provided to query local data.

27 Lemon XML-API Development work is going on for C++ API. Development work is going on for C++ API. libxml++ will be used for parsing XML on client side. libxml++ will be used for parsing XML on client side. Some C++ classes are developed for Lemon XML-API taking into account SWIG limitations. Some C++ classes are developed for Lemon XML-API taking into account SWIG limitations. Contd…

28 CCTracker Currently provides display only (read) interface Currently provides display only (read) interface Now designed to initiate updates thro’ CCTracker Client Now designed to initiate updates thro’ CCTracker Client Design makes clear separation between generic & site-specific components and is highly configurable Design makes clear separation between generic & site-specific components and is highly configurable

29 CCTracker Design CCTracker Client CCTracker Client Servlet DB CCService Site Specific Logic Linux Windows Direct XML SOAP Web Service Updates View / Read

30 Developments XML Schema defined XML Schema defined Java Object binding for the XML implemented with Castor API Java Object binding for the XML implemented with Castor API Database mapping of XML/Java objects with Castor JDO Database mapping of XML/Java objects with Castor JDO useful for development and testing of new features that initiates updates useful for development and testing of new features that initiates updates

31 Developments … CCService (a web service) CCService (a web service) Handles the database updates Handles the database updates Authenticate & authorize users for managing Computer Center Authenticate & authorize users for managing Computer Center Implements use cases of managing infrastructure objects Implements use cases of managing infrastructure objects functionality is also implemented as client & server exchanging xml messages thro’ http post functionality is also implemented as client & server exchanging xml messages thro’ http post

32 CCTracker Client – new features Default view shows empty & filled racks with different colors Default view shows empty & filled racks with different colors Infrastructure related to the logical model i.e. domain, cluster & sub cluster is now shown Infrastructure related to the logical model i.e. domain, cluster & sub cluster is now shown View properties (right mouse click) features implemented View properties (right mouse click) features implemented infrastructure objects – display location and attributes, other relevant information in tabbed view infrastructure objects – display location and attributes, other relevant information in tabbed view logical objects – linked to infrastructure objects logical objects – linked to infrastructure objects

33 Snapshots Castor Cluster in FIO Domain

34 Snapshots View Properties

35 Thank You…


Download ppt "Presentation on developments for the period May - Sep 2006 on Fabric Management C. S. R.C. Murthy, Rohitashva Sharma, Salim A. Pathan & Dinesh Sarode."

Similar presentations


Ads by Google