Presentation is loading. Please wait.

Presentation is loading. Please wait.

Slide 1. Slide 2 Session Description This session will review a series of issue encountered during the real troubleshooting of some of our largest System.

Similar presentations


Presentation on theme: "Slide 1. Slide 2 Session Description This session will review a series of issue encountered during the real troubleshooting of some of our largest System."— Presentation transcript:

1 Slide 1

2 Slide 2 Session Description This session will review a series of issue encountered during the real troubleshooting of some of our largest System Platform installations in Europe. The approach we use is to present the problem, the solution we found and the explanation for the solution. These lessons learned can be very useful if you are dealing with a big galaxy. Duration: 90 min

3 © 2012 Invensys. All Rights Reserved. The names, logos, and taglines identifying the products and services of Invensys are proprietary marks of Invensys or its subsidiaries. All third party trademarks and service marks are the proprietary marks of their respective owners. WWTSS-11 Lessons from Large Implementations Presenters: Peter VonTluck Tony Vella Invensys Operation Management October, 2012 © Inven sys 00/00/ 00 Invensys proprietary & confidential Slid e 3

4 Slide 4 Topics Why? The issues, challenge, problem Why? Describe the solutions, workaround, troubleshooting, best practices. Solution Example Reference Workshop

5 Slide 5 Contents: Optimizing Runtime HMI performance in a Multi-Users Galaxy Duplicate Alarms Long time Deployment System is running at high CPU load Unwanted import of UDO version IAS 3.1 sp3 p1 into a Galaxy 3.1sp.3. Tuning Platform\Engine for Large application: Platform XXX exceed maximum heartbeat timeout of XXX Engines\platforms fails to startup Start Up or Shut Down (watchdog time) Tips and Tricks Nice to know: Customize Application Manager to show Name and description for new managed app. Good query for objects in checked out + force everybody checked in Automatically Generate Minidump for App Engines Enable PAE to support maximum RAM memory

6 Slide 6 What do we mean by Large System? We have observed these issues happening on many large customer systems : TetraPak, Porsche, Nestle, Thales Porsche Zuffenhausen, Germany (Paintshop) 150 Platforms 120 HMI node >60 Engines > 20k Objects (240K Scripts) 250K I\O (>50 PLCs) Thales, UK (Network Rails) 20 Platforms 4 HMI node (1400 users!) 80 Engines 30k Objects 600K I\O

7 Slide 7 What do we mean by Large System? We have observed these issues happening on many large customer systems : TetraPak, Porsche, Nestle, Thales TetraPak SP 2012, (Vinamilk - project in progress) 20 platforms 80 engines 3000 objects 6000 I/O. Nestle (Biessenhofen, Avenches, Timashevsk, Torun) 60 platforms 40 HMI 100 engines 20k objects

8 Slide 8 Ideal situation: define a real Multi-User Environment TN 665 from WDN well documents this topic Optimizing Runtime HMI performance in a Multi-User Galaxy

9 Slide 9 Optimizing Runtime HMI performance in a Multi-Users Galaxy ISSUE: When GR Node is busy with deploy operations (examples: imports, checkin Objects, deploy\undeploy etc) it is possible to observe in the InTouch client application that performance related to visualization data in AG is slowing (example: a little delay of 10-20 seconds before that values appear in the Archestra Graphic)

10 Slide 10 Optimizing Runtime HMI performance in a Multi-Users Galaxy: Solution part 1 Solution: If the delay is observed only when the GR is busy or if the GR node is shutdown then you can add the registry DWORD value “ResolverShortcutEnabled” in the following location and set the value to 1. HKEY_LOCAL_MACHINE\Software\ArchestrA\Framework\Lmx Note that the registry value does not exist by default Note: with those setting you will change the standard behavior of the system and enable the Enable Round Robin for Anonymous Engine cache file

11 Slide 11 Optimizing Runtime HMI performance in a Multi-User Galaxy: What is Round Robin? What is Anonymous Engine? Round Robin: It’s a simple algorithm where the time slices are assigned to each process in equal portions and in circular order, handling all processes without priority Anonymous engine file: this file has the object handle cache that is needed for resolving the object part of the reference. This file gets created when a platform is deployed. The file can be located in any one of the following locations location depending on the OS. \Documents and Settings\All Users\Application Data\ArchestrA\Cache (or) \ProgramData\ArchestrA\Cache

12 Slide 12 Optimizing Runtime HMI performance in a Multi-User Galaxy: How does the System work? Standard Behavior: The HMI application first check if the object portion of the reference is already resolved or not by looking up in the object handle cache (i.e. anonymous engine cache file). If the object handle is not found in the cache then the reference will be resolved by the GR. As a consequence, all indirect references linked in the HMI application are in standby, to be resolved Practical effect : a series of ####### instead of good value

13 Slide 13 Optimizing Runtime HMI performance in a Multi-User Galaxy: Registry Setting by Default

14 Slide 14 Optimizing Runtime HMI performance in a Multi-User Galaxy: Registry setting with the keys created

15 Slide 15 Optimizing Runtime HMI performance in a Multi-User Galaxy: Changing Standard behavior With the above registry setting in the case where the GR is busy, the references will be resolved by the cache file present in the first available engine of the galaxy*. IMPORTANT*: The system will start the search for available Anonymous engines based on the Platform ID. As it will go from engine to engine if the GR is busy, it can take a significant time to return to WindowViewer if there are a lot of engines and platforms to examine. So it is highly recommended to have AOS on a Platform with a low Platform-ID

16 Slide 16 Enable Round Robin for Anonymous Engine cache file: Optimization – Solution Part 2 Optimization: It has been observed that even enabling round robin for anonymous engines, the runtime performance was not as expected. On further investigation, we discovered the issue was due to an high number of broken references. Broken references: Broken references cause massive reference binding on the GR. It is important to find if there are broken references in the PLC and also in the Graphics, and delete them.

17 Slide 17 Optimization–Solution Part 2: How to Find Out if You Have Broken References You can check how many Bind Counts are resolved by the GR monitoring the following attributes: Gr.BindCnt: Number of bind requests resolved by GR GR.BindFailCnt: Number of bind failures

18 Slide 18 How to Find Out if you have Broken References: GR Log Flags Enabling the following LogFlags on the GR, you can search for Reference binding requests hitting the GR: wwPackageServer: ReferenceBinding LMX: ReferenceBinding

19 Slide 19 How to Find out if you have Broken References: What you Need to Find in the GR Logger

20 Slide 20 Optimizing Runtime HMI Performance in a Multi-Users Galaxy: GR log Flags – Good References How the logger looks: Good references have Galaxy, Platform, and Engine addressed in «exit results»

21 Slide 21 Optimizing Runtime HMI Performance in a Multi-Users Galaxy: Log Flags – Broken References Broken references: exit rsults has a serieres of Galaxy 0, Platform 0, Engines 0....

22 Slide 22 Optimizing Runtime HMI Performance in a Multi-User Galaxy: Client Logger “Invalid Reference”

23 Slide 23 Optimizing Runtime HMI Performance in a Multi-User Galaxy: Note on the Registry Setting THE HF is embedded from WAS 3.1 SP3. Check ReadMe file – Resolved Issues of Intouch 10.1 SP3 for details Xref HF 1996 - L00104841 Note: for previous version you need to contact Tech Support to get the right HF

24 Slide 24 Duplicate Alarms IF your WWAlmDB is increasing the size quickly, you might have duplicates alarms stored on it. Duplicate alarms is a row stored in the AlarmDB multiple time (same values for each fields including datetime) Invensys-Wonderware have released the below HFs to prevent this issue: Intouch 10.1 SP2 - CR L00112069-HF 2467 Intouch 10.1 SP2 P01-CR L00117164 Intouch 10.1 SP3 CR L00114473 Intouch 10.1 SP3 P01- CR L00113075 HF 2514 Note: HF is embedded in SP2012.

25 Slide 25 Duplicate Alarms: Cleaning Up WWALMDB HF will prevent AlarmDB logger from storing duplicate alarms in your DB. However you still need to clean up the DB. Many queries available with different performance..... And the winner is....! Do not use it in production!!!!

26 Slide 26 Duplicate Alarms: Cleaning up WWALMDB – Test Results Tests done: 7GB DB – 9000.000 of rows – with more then 8500000 duplicates Delete of duplicates from alarm master and detailed\consolidated running time: about 1h !!! ( 2 days first query)

27 Slide 27 Long time Deployment ISSUE: InTouch view applications in some cases take a long time to deploy. At the end graphics are missing. The problem mainly occurs if a lot of interlaced graphics are used.

28 Slide 28 Long time Deployment Cause of the behavior: Take a look at SQL Profiler during deployment:

29 Slide 29 Long Time Deployment Solution: This issue was related to repetitive calls to a stored procedure (internal_get_clientcontrol_and_feature_files.sql). This has been changed to one call only. The hotfix – available for IAS 3.1 sp.3 - (L00114486) will be part of IAS 3.5 sp.1.

30 Slide 30 System is Running at High CPU Load ISSUE: Historian and Wonderware Information Server on one node are running on high CPU. aaRetSVC.exe and w3wp.exe need CPU load. As a consequence, connection problems occur.

31 Slide 31 System is Running at High CPU Load Solution: Due to repeated connection problems (caused by high CPU load) history blocks are fragmented. Therefore retrieval takes longer and consumes higher CPU. As a consequence, connection problems occur again. Historian and Information Server should not run on the same node, as they affect performance of each other.

32 Slide 32 Unwanted Import of UDO version IAS 3.1 SP3 P1 into a Galaxy 3.1 SP3 ISSUE: In a Multiuser environment, it may be necessary to import objects from a Galaxy with a higher version. In that case (UserdefinedObject) all the Platforms are then marked with the following icon: Local GR version 3.1 SP3 Production GR Version 3.1 SP3 P1 Export/import aapkg file

33 Slide 33 Unwanted Import of UDO version IAS 3.1 SP3 P1 into a Galaxy 3.1 SP3 Consequences: No deployment of objects is possible BEFORE the Platform nodes are redeployed. Unwanted software changes are deployed to all the Platforms.

34 Slide 34 Unwanted Import of UDO version IAS 3.1 SP3 P1 into a Galaxy 3.1 SP3 Worse case scenario: Install the backup and redeploy all the nodes during production! Any other solutions?

35 Slide 35 Solution: First answer the question: Which files are changed when a UserdefinedObject with a higher version is imported into a galaxy? Unwanted Import of UDO version IAS 3.1 SP3 P1 into a Galaxy 3.1 SP3

36 Slide 36 Changed files Recover the Galaxy

37 Slide 37 1. Run the following commands on the Galaxy Database. update dbo.gobject set software_upgrade_needed =0 where software_upgrade_needed <> 0 and template_definition_id =15; delete from dbo.file_pending_update; 2. Stop the GR Platform Engine. 3. Copy (overwrite) the original binaries in the above 2 locations. 4. If the customer had opened any fresh remote IDE sessions after they imported the object, some of the editor and package binaries (new) might have copied to those remote machines, too. In that case they need to close those IDE sessions and overwrite those files with the original binaries in the 2 locations mentioned above. Changes like modifying the registry are not required. Recover the Galaxy

38 Slide 38 Tuning the Platform for Large Applications Issue: «Warning - Platform XXX exceed maximum heartbeat timeout of XXX ms (The component - NmxSvc )”

39 Slide 39 Solution: Setting the proper value in your Platform and AppEngine Configuration Editor Tuning the Platform for Large Applications

40 Slide 40 Tuning the Engine for Large Applications Explanation: maximum heartbeats timeout = WinPlatform.NetNMXHeartbeatPeriod * (WinPlatform.NetNMXHeartbeatsMissedConsecMax + 1) By default, the value for this formula is: 2000 (3 + 1) = 8000 ms Which corresponds to the timeout message in the Logger "Platform 1 exceed maximum heartbeats timeout of 8000ms"

41 Slide 41 Tuning the Engine for Large Applications Example: increasing the cons. number of missed heartbeats to 6, you see the same message with timeout 14.000ms Maximum heartbeats timeout must be higher than the existing time difference. (Any communication failure depends on this time limit.) Time difference will never exceed the one defined by the formula mentioned above.

42 Slide 42 Tuning the Engine for Large Applications Issue: Engines\Platforms fail to launch Start Up or Shut Down Solution: increase the watchdog timeout ( default value is 30000ms) Insert the following Registry Settings: [HKEY_LOCAL_MACHINE\SOFTWARE\ArchestrA\Framework\Platform] Enter the following values: "WatchdogStartupTimeout"=dword:000493e0 (300000ms) "WatchdogShutdownTimeout"=dword:000493e0 Note: This should be sufficient for a large system. Setting the values too high could lead to delays in discovery that the Engine has hung/crashed during startup or shutdown, since the Bootstrap considers the Engine healthy until the timeout expires.

43 Slide 43 Tips & Tricks Nice to Know: Customize Application Manager to show Name and Description for new managed apps. Default: creating a new Intouch managed app the provided default name is: IntouchBlanktemplate. You can rename it from IDE but default name remains in Intouch Aplication Manager on the clients machines How ?

44 Slide 44 Tips & Tricks Nice to Know Scripts to visualize all objects in checked out state Scripts to force all objects in checked In – (NOT supported – ONLY for diagnostics!!!)

45 Slide 45 Tips & Tricks Nice to Know Physical Address Extension (PAE) is a feature to allow (32bit) x86 processors to access a physical address space (including random access memory and memory mapped devices) larger than 4 gigabytes.x86random access memorygigabytes You can enable PAE: Opening the Boot.ini file, and then add the /PAE parameter to the ARC path, as shown in the following example Enable PAE to support maximum RAM memory

46 Slide 46 References: Wonderware Tech Notes & Articles available: –Multi-User Development of an ArchestrA Galaxy: Best Practices (TN 665) –Optimizing InTouch Application Performance (WDN Article) –Improving Application Performance with ArchestrA Graphics (TN 644) –Deleting InTouch Application Files Without Affecting the Application (TN 85) –Industrial Application Server Platform Deployment Checklist (TN 478) –Fine-Tuning AppEngine Redundancy Settings (TN 401) –Tuning Recommendations for Redundancy in Large Systems (WDN Article)

47 Slide 47 Questions? THANK YOU

48 Slide 48


Download ppt "Slide 1. Slide 2 Session Description This session will review a series of issue encountered during the real troubleshooting of some of our largest System."

Similar presentations


Ads by Google