Redundancy. 2. Redundancy 2 the need for redundancy EPICS is a great software, but lacks redundancy support which is essential for some highly critical.

Slides:



Advertisements
Similar presentations
Make This work with EPICS! 2006
Advertisements

COM vs. CORBA.
1 1999/Ph 514: Channel Access Concepts EPICS Channel Access Concepts Bob Dalesio LANL.
October Dyalog File Server Version 2.0 Morten Kromberg CTO, Dyalog LTD Dyalog’13.
Channel Access Protocol Andrew Johnson Computer Scientist, AES Controls Group.
Performance Evaluation of Open Virtual Routers M.Siraj Rathore
Multiple Processor Systems
Lecture 6.2 System Architecture: Overview IMS1002 /CSE1205 Systems Analysis and Design.
Highly Available Central Services An Intelligent Router Approach Thomas Finnern Thorsten Witt DESY/IT.
Lesson 1: Configuring Network Load Balancing
OPC Overview OPC Device Support (PLC Gateway for 3.14) Ralph Lange – EPICS Collaboration Meeting at SLAC, April 2005.
VTS INNOVATOR SERIES Real Problems, Real solutions.
Computer Networks IGCSE ICT Section 4.
Microsoft Load Balancing and Clustering. Outline Introduction Load balancing Clustering.
R. Lange, M. Giacchini: Monitoring a Control System Using Nagios Monitoring a Control System Using Nagios Ralph Lange, BESSY – Mauro Giacchini, LNL.
Presence Applications in the Real World Patrick Ferriter VP of Product Marketing.
1. 2 How do I verify that my plant network is OK? Manually: Watch link lights and traffic indicators… Electronically: Purchase a SNMP management software.
ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance and High Availability.
Introduction Operating Systems. No. 2 Contents Definition of an Operating System (OS) Role of an Operating System History of Operating Systems Classification.
Chapter 3: Operating-System Structures System Components Operating System Services System Calls System Programs System Structure Virtual Machines System.
High-Availability Linux.  Reliability  Availability  Serviceability.
IMPROUVEMENT OF COMPUTER NETWORKS SECURITY BY USING FAULT TOLERANT CLUSTERS Prof. S ERB AUREL Ph. D. Prof. PATRICIU VICTOR-VALERIU Ph. D. Military Technical.
OS provide a user-friendly environment and manage resources of the computer system. Operating systems manage: –Processes –Memory –Storage –I/O subsystem.
HA-OSCAR Chuka Okoye Himanshu Chhetri. What is HA-OSCAR? “High Availability Open Source Cluster Application Resources”
Berliner Elektronenspeicherringgesellschaft für Synchrotronstrahlung mbH (BESSY) CA Proxy Gateway Status and Plans Ralph Lange, BESSY.
ICALEPCS 2007, Knoxville, Tennessee, October 15-19, 2007 Present Status of VEPP-5 Control System D.Yu.Bolkhovityanov, A.Yu.Antonov, R.E.Kuskov The Budker.
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
Running EPICS on NI CompactRIO Initial Experience Eric Björklund (LA-UR )
Session-8 Data Management for Decision Support
1 Introduction to Middleware. 2 Outline What is middleware? Purpose and origin Why use it? What Middleware does? Technical details Middleware services.
ATCA based LLRF system design review DESY Control servers for ATCA based LLRF system Piotr Pucyk - DESY, Warsaw University of Technology Jaroslaw.
Remote Access Using Citrix Presentation Server December 6, 2006 Matthew Granger IT665.
Redundant IOC with ATCA(HPI) support Utilizing modern hardware for better availability Artem Kazakov, KEK/SOKENDAI.
CE Operating Systems Lecture 3 Overview of OS functions and structure.
General Time Update David Thompson Epics Collaboration Meeting June 14, 2006.
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Experience Running Embedded EPICS on NI CompactRIO Eric Björklund Dolores Baros Scott Baily.
3.14 Work List IOC Core Channel Access. Changes to IOC Core Online add/delete of record instances Tool to support online add/delete OS independent layer.
 Load balancing is the process of distributing a workload evenly throughout a group or cluster of computers to maximize throughput.  This means that.
EPICS EPICS Collaboration Meeting Argonne National Laboratory drvTS improvements for soft timing EPICS Collaboration Meeting Argonne National Laboratory.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
RPC as Client/Server Framework. What is it? Takes care the protocol nastiness, don’t spend your time to create yet another protocol layer. Well established.
Fine-Grained Failover Using Connection Migration Alex C. Snoeren, David G. Andersen, Hari Balakrishnan MIT Laboratory for Computer Science.
UCAR Teresa Shibao & Paul Dial. The Major Components of IPT Communications Manager (formerly Call Manager) Voic Phones Miscellaneous others.
EPICS to TANGO Translator Rok Šabjan on behalf of Rok Štefanič Presented at ICALEPCS, Knoxville, October.
EPICS EPICS Collaboration Meeting Argonne June 2006 IOC Redundancy: Redundancy Monitor Task EPICS Meeting - Redundancy Argonne, June 16, 2006 Matthias.
CHAPTER 7 CLUSTERING SERVERS. CLUSTERING TYPES There are 2 types of clustering ; Server clusters Network Load Balancing (NLB) The difference between the.
Matthias Clausen, Gongfa Liu, Bernd Schoeneburg (DESY), ICALEPCS, 2007 XFEL The European X-Ray Laser Project X-Ray Free-Electron Laser Redundant EPICS.
EPICS Release 3.15 Bob Dalesio May 19, Features for 3.15 Support for large arrays Channel access priorities Portable server replacement of rsrv.
Data Communications and Networks Chapter 9 – Distributed Systems ICT-BVF8.1- Data Communications and Network Trainer: Dr. Abbes Sebihi.
Jefferson Lab Report Karen S. White 11/14/00. Overview  Status of Jefferson Lab Control System  Work In Progress  Transitioning to Operations.
Networking Components William Isakson LTEC 4550 October 7, 2012 Module 3.
1 Channel Access Concepts – IHEP EPICS Training – K.F – Aug EPICS Channel Access Concepts Kazuro Furukawa, KEK (Bob Dalesio, LANL)
Control System Considerations for ADS EuCARD-2/MAX Accelerators for Accelerator Driven Systems Workshop, CERN, March 20-21, 2014 Klemen Žagar Robert Modic.
The BaBar Online Detector Control System Upgrade Matthias Wittgen, SLAC.
CLIENT SERVER COMPUTING. We have 2 types of n/w architectures – client server and peer to peer. In P2P, each system has equal capabilities and responsibilities.
Monitoring Dynamic IOC Installations Using the alive Record Dohn Arms Beamline Controls & Data Acquisition Group Advanced Photon Source.
Using COTS Hardware with EPICS Through LabVIEW – A Status Report EPICS Collaboration Meeting Fall 2011.
Redundancy in the Control System of DESY’s Cryogenic Facility. M. Bieler, M. Clausen, J. Penning, B. Schoeneburg, DESY ARW 2013, Melbourne,
Chapter 4: Threads.
Consulting Services JobScheduler Architecture Decision Template
Distributed Databases
Storage Virtualization
Northbound API Dan Shmidt | January 2017
Windows Virtual PC / Hyper-V
Chapter 2: Operating-System Structures
Mark McKelvin EE249 Embedded System Design December 03, 2002
Channel Access Concepts
EPICS: Experimental Physics and Industrial Control System
Chapter 2: Operating-System Structures
Presentation transcript:

redundancy

2. Redundancy 2 the need for redundancy EPICS is a great software, but lacks redundancy support which is essential for some highly critical applications such as cryogenic plants

2. Redundancy 3 original epics redundancy Was developed by DESY in collaboration with SLAC support for vxWorks operating system only

2. Redundancy 4 What is redundant IOC? Public Ether net PublicPublic Private Ethernet Shared Network CA clients Hardwar e PV1 PV2 PV3 PV1 PV2 PV3 IOC#2IOC#2IOC#1IOC#1

2. Redundancy 5 epics redundancy terminology RMT: Redundancy Monitoring Task - key component of EPICS redundancy implementation CCE: Continuos Control Executive - data “exchanger” for EPICS IOC RMT Driver: a piece of software which conforms to RMT API

2. Redundancy 6 redundant EPICS ioc internals Normal IOC RMT CCE rsrv scans

2. Redundancy 7 rmt functions Check “health” of the drivers And control drivers (start, stop, sync, etc...) Check connectivity with the network Communicate with the “partner” And decide when to switch to the partner

2. Redundancy 8 generalization of EPICS redundancy Other laboratories showed some interest in redundancy for EPICS, including KEK Need for redundancy on platforms other than vxWorks Could use RMT to make other software redundant on Linux and other systems even EPICS unrelated software

2. Redundancy 9 generalization of EPICS redundancy all vxWorks specific code was replaced with EPICS/OSI (Operating System Independent) library calls additional libOSI functions were implemented

2. Redundancy 10 generalized version works on vxWorks Linux Darwin (Mac OS X) and virtually on any EPICS supported OS can be used to add redundancy to other software

2. Redundancy 11 generalized version Allowed to include EPICS redundancy support into EPICS BASE distribution since base has all the “hooks” needed for redundant IOC

2. Redundancy 12 some numbers switchover time < 3sec in case of normal IOC it could be from several minutes to hours CCE can handle synchronization of ~5000/sec records

2. Redundancy 13 SWITCH OVER “TIME-LOSS”

2. Redundancy 14 SWITCH OVER “TIME-LOSS”

redundant channel access gateway

2. Redundancy 16 ca Gateway very common program widely used in many laboratories used to make two or more subnets CA visible to each other and to provide access control, i.e. read ability for everyone outside control network

2. Redundancy 17 caGateway subnet 1 subnet 2 CA GAteway operation request reply

2. Redundancy 18 ca gateway needs redundancy It is single point of failure: if it is not working whole subnet becomes unreachable for other subnet

2. Redundancy 19 redundant ca gateway Has no critical internal state data to be synchronized between peers Can be redundant “out-of-the-box”, but client would see multiple replies would be very nice to have “load-balancing”, which would improve response time and improve throughput

2. Redundancy 20 Confusing redundancy ?? ! ! Client: -Who has PV? GW #1: -I have! GW #2: -I have! - I’m Confused !!!

2. Redundancy 21 SM Let’s add RMT ?? ! ! Client: -Who has PV? GW #1: -I have! GW #2: -I have! - OK!!! Firewall

2. Redundancy 22 redundancy only RMT as separate process, which does all monitoring, health-checking and decision making Gateway is running as usual On “SLAVE” we block replies from the Gateway by firewall rule no modification to the source code of GW (!!!) which means no new bugs whatsoever (!)

2. Redundancy 23 add load balancing Inform GW about its partner status, whether it is alive Load-balance using “directory service”-feature of CA protocol

2. Redundancy 24 SM ?? ! ! Client: -Who has PV? GW #1: -I have! GW #2: -I have! - OK!!! Firewall First query

2. Redundancy 25 SM ?? ! ! Client: -Who has PV2? GW #1: -GW2 has! GW #2: -GW1 has! - OK!!! Firewall Second query

Redundant IOc on atca

Advanced telecom. computing architecture Example boards and crates

advanced telecom computing architecture ATCA is a relatively new standard targeted as a platform for Highly Available applications

2. Redundancy 29 why run rioc on ATCA ATCA is a modern industry standard for HA applications Very reliable (99.999% design availability) ATCA is suggested as a platform for the ILC control system ATCA is a hardware designed for critical applications and RIOC is a software designed for critical applications

2. Redundancy 30 atca shelf manager Data is exchanged through redundant Intelligent Platform Management Bus IPMB

2. Redundancy 31 “plain” rioc on atca

2. Redundancy 32 “plain” rioc on atca can run RIOC on ATCA without modification But does not know anything about the “smart” hardware of ATCA Basically is same as running on two normal PCs

2. Redundancy 33 benefits of using atca- ”aware” rioc Failures can be “predicted” i.e. temperature starts to rise and the CPU is still working -> we can initiate fail-over procedure before actual hardware fails -> fail-over occurs in more stable and controlled environment Client connections can be gracefully closed Allowing the client to reconnect to back-up IOC within 1 second In case of “real” hardware failure reconnect would occur only after 30 seconds

2. Redundancy 34 ATCA-”aware” rioc

HPI usage example – Redundant EPICS IOC HPI (Hardware Platform Interface) is used to monitor the health of each blade and the shelf This information is used to make decision on failover

HPI usage example – Redundant EPICS IOC HPI is Platform independent Instead of ATCA we can use “conventional” server PC OpenHPI has /dev/sysfs mappings on Linux