G.Thomas & D.Davids (CERN) & O.Holme (ETH Zurich) JCOP FWWG Meeting: 24/04/2012 Tests status of the SYSTEC CAN-USBModul16 interface remotely connected via a DIGI ANYWHERE USB (USB over Ethernet device) G.Thomas & D.Davids (CERN) & O.Holme (ETH Zurich)
Outline Summary of last tests done and fixed issues Additional tests and analysis done Current known issues/limitations LS1 and long term? EN/ICE
Recovery testing and results Test conditions: Continuous sending/receiving of burst of CAN frames on two CAN ports at 125kbit/s -Reboot , short and longer disconnections (< ~1min) Recovery tests Results FIXED and TESTED on Windows XP, Windows7 Power cycle of SYSTEC CAN-USBmodul16 OK Power cycle of the DIGI interface Simultaneous power cycle of the DIG and CAN-USBmodul16 (Rack level) HW reset of the DIGI (front-end button on the device) HW disconnection of USB cable Reboot of the PC (where the application is running) Software reboot of the DIGI (via the web interface or Config utility) Partial Recovery OK (requires to disable a watchdog timeout SYSTEC CAN modules!!) Software Ethernet disconnection (via the web interface or Config utility) HW disconnection of Ethernet cable (for short and long disconnection, behavior is not the same) PC crash! Network Switch failure (for short and long disconnection , behavior is not the same) Power cycle tests and USB disconnection: OK Application reconnects automatically and re-establish the connected CAN ports to their previous states and the application continues to run. Same for USB disconnection Ethernet disconnection tests: Different symptoms are observed according to the duration of disconnection In some cases the application recovers and continues to run but the CAN status LEDS of the SYSTEC are blinking abnormally(and can only be reset by a power cycle of the Systec CAN interface or with a Firmware upload. Problem due to a watchdog timeout conflict between systec and digi! PROBLEMS SOLVED by disabling a software “watchdog timeout” introduced in systec FW which also clears the USB error status <there are two types of Watchdog timeouts. The first one is the Watchdog periphery provided by the microcontroller. This one cannot be deactivated via software. The second one is an own implemented Software Watchdog called "Status Timeout". The Status Timeout can be disabled by the USB-CANmodul Control since version V4.15. You have to mark a logical device in the hardware tabsheet, press CTRL key and right click to the logical device. Then select within the context menu "Set Status Timeout". If you set this value to zero, this feature is disabled. The Status Timeout value is written to the EEPROM of the logical device. So after next power on the Status Timeout is still disabled with the value zero.> (EN/ICE)
Additional tests and analysis Tests with DIGI 5 & 14 ports Usability tests (installation/un-installation of drivers) under Windows 7 (32bits/64bits) and WS 2003 Several days load testing done under Windows 7(32 bits) and WS 2003 Network latency tests (CMS ECAL requirements) (EN/ICE)
Tests setup with DIGI 5 ports (EN/ICE)
Tests setup with DIGI 14 ports (EN/ICE)
CMS ECAL network latency tests Purchased 14-port DIGI for all ECAL CAN based readout Problems seen after deployment with Wiener OPC data Invalid data for Wiener devices every couple of days Production system rollback until understood Wiener OPC server has hard-coded timeouts for HW polling Suspected additional latency of DIGI causing problem Latency tests designed with EN-ICE to investigate this November 22, 2011 (EN/ICE)
CMS ECAL network latency tests CAN data generator & latency test Ethernet (GPN) PC DIGI AnywhereUSB USB SYS TEC USB-CANmodul16 CAN Wireshark Danny’s CAN data generator & test tool CAN frames are generated with unique IDs Time taken between send and receive of each frame is measured Wireshark used to capture and understand IP traffic Another tool from Danny to analyse the capture files Quick identification of significant delays Infrequent but long delays (high latency) were seen Correlated to network traffic events Worst delays due to packet loss More delays seen on more complex networks November 22, 2011 (EN/ICE)
CMS ECAL network latency further tests CAN data generator & latency test Ethernet (GPN) PC DIGI AnywhereUSB USB SYS TEC USB-CANmodul16 CAN Wireshark Wireshark DIGI sends all packets DIGI network interface performs >= other network hardware in tests PVSS Ethernet (GPN) Wiener OPC Server PC DIGI AnywhereUSB USB SYS TEC USB-CANmodul16 CAN Wireshark Invalid OPC data correlates exactly with high DIGI latency due to packet loss Wiener PL508-DO November 22, 2011 (EN/ICE)
CMS ECAL network latency results Explanation < 50 ms Typical latency of round-trip < 150 ms Worst cases normal situation ~ 400 ms Delay due to address resolution (ARP) request being sent to DIGI ~ 1.2 s Usual delay in case of packet loss & causes problems with Wiener OPC server Performance is usually very good Uses TCP PUSH for low latency transfer TCP packets lost more frequently than expected (not due to DIGI) DIGI application flow control seems to limit data transfer and increase latency when recovering from errors Perhaps they can improve this… November 22, 2011 (EN/ICE)
Current known issues/limitations Windows 7(64bits) & Windows Server versions are currently NOT supported by SYSTEC No Linux driver available for DIGI Limited number of other device on the market (only one supplier found and device is out of stock!) DIGI high latency (EN/ICE)
LS1 and long term actions? LS1 requirements If DIGI solution Need Systec supported for Windows 7(64bits), Windows Server 2008 R2 Network latency causing Wiener OPC server bad quality to be fixed (DIGI level?) Additional tests/requirements? “Long term” Ethernet-CAN interface solution Solution for OPC UA Alternative solutions? Long term: Goal to remove these hardware and software layers (CAN to USB and USB to ethernet) with direct CAN-Ethernet interface properly integrated in the OPC servers November 22, 2011 (EN/ICE)