1 The ILC Control System J. Carwardine, C. Saunders, N. Arnold, F. Lenkszus (Argonne), K. Rehlich, S. Simrock (DESY), B. Banerjee, B. Chase, E. Gottschalk,

Slides:



Advertisements
Similar presentations
26-Sep-11 1 New xTCA Developments at SLAC CERN xTCA for Physics Interest Group Sept 26, 2011 Ray Larsen SLAC National Accelerator Laboratory New xTCA Developments.
Advertisements

Argonne National Laboratory is managed by The University of Chicago for the U.S. Department of Energy ILC Controls Requirements Claude Saunders.
Chapter 19: Network Management Business Data Communications, 4e.
A U.S. Department of Energy Office of Science Laboratory Operated by The University of Chicago Argonne National Laboratory Office of Science U.S. Department.
Network Management Overview IACT 918 July 2004 Gene Awyzio SITACS University of Wollongong.
1 ITC242 – Introduction to Data Communications Week 12 Topic 18 Chapter 19 Network Management.
Lesson 12 – NETWORK SERVERS Distinguish between servers and workstations. Choose servers for Windows NT and Netware. Maintain and troubleshoot servers.
Distributed Control Systems Emad Ali Chemical Engineering Department King SAUD University.
Intel  modular server building blocks ( built on Intel  Multi-Flex Technology ) Intel  modular server building blocks ( built on Intel  Multi-Flex.
Fermilab ILC School, July 07 1 ILC Global Control System John Carwardine, ANL.
SNS Integrated Control System EPICS Collaboration Meeting SNS Machine Protection System SNS Timing System Coles Sibley xxxx/vlb.
Openlab Workshop on Data Analytics 16 th of November 2012 Axel Voitier – CERN EN-ICE.
IMPROUVEMENT OF COMPUTER NETWORKS SECURITY BY USING FAULT TOLERANT CLUSTERS Prof. S ERB AUREL Ph. D. Prof. PATRICIU VICTOR-VALERIU Ph. D. Military Technical.
ILC Control and IHEP Activity Jijiu. Zhao, Gang. Li IHEP, Nov.5~7,2007 CCAST ILC Accelerator Workshop and 1st Asia ILC R&D Seminar under JSPS Core-University.
Redundancy. 2. Redundancy 2 the need for redundancy EPICS is a great software, but lacks redundancy support which is essential for some highly critical.
XFEL The European X-Ray Laser Project X-Ray Free-Electron Laser Stefan Simrock, DESY LLRF-ATCA Review, Dec. 3, 2007 Requirements for the ATCA based LLRF.
ATCA based LLRF system design review DESY Control servers for ATCA based LLRF system Piotr Pucyk - DESY, Warsaw University of Technology Jaroslaw.
Instrumentation of the SAM-Grid Gabriele Garzoglio CSC 426 Research Proposal.
August 3-4, 2004 San Jose, CA Developing a Complete VoIP System Asif Naseem Senior Vice President & CTO GoAhead Software.
Middleware for FIs Apeego House 4B, Tardeo Rd. Mumbai Tel: Fax:
FAIR Accelerator Controls Strategy
SLAC ILC High Availability Electronics R&D LCWS IISc Bangalore India Ray Larsen, SLAC Presented by S. Dhawan, Yale University.
21 March 2007 Controls 1 Hardware, Design and Standards Issues for ILC Controls Bob Downing.
Operations, Test facilities, CF&S Tom Himel SLAC.
Clustering In A SAN For High Availability Steve Dalton, President and CEO Gadzoox Networks September 2002.
SONIC-3: Creating Large Scale Installations & Deployments Andrew S. Neumann Principal Engineer, Progress Sonic.
Eugenia Hatziangeli Beams Department Controls Group CERN, Accelerators and Technology Sector E.Hatziangeli - CERN-Greece Industry day, Athens 31st March.
9 Systems Analysis and Design in a Changing World, Fourth Edition.
OSIsoft High Availability PI Replication
1 Global Design Effort Beijing GDE Meeting, February 2007 Controls for Linac Parallel Session 2/6/07 John Carwardine ANL.
May 29, 2007 DESY Controls Mtg. Global Design Effort 1 Integration Requirements for ATCA C. Saunders.
Online Software 8-July-98 Commissioning Working Group DØ Workshop S. Fuess Objective: Define for you, the customers of the Online system, the products.
FLASH Free Electron Laser in Hamburg Status of the FLASH Free Electron Laser Control System Kay Rehlich DESY Content: Introduction Architecture Future.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
Cluster Consistency Monitor. Why use a cluster consistency monitoring tool? A Cluster is by definition a setup of configurations to maintain the operation.
Meeting of the European Global Design Effort Report from INFN non EU-funded R&D Carlo Pagani University of Milano and INFN DESY, 10 May 2006.
Creating SmartArt 1.Create a slide and select Insert > SmartArt. 2.Choose a SmartArt design and type your text. (Choose any format to start. You can change.
1 Global Design Effort: Control System Vancouver GDE Meeting, July 2006 Controls Global System Review John Carwardine, ANL (For Controls Global Group Team)
1 The ILC Control System J. Carwardine, C. Saunders, N. Arnold, F. Lenkszus (Argonne), K. Rehlich, S. Simrock (DESY), B. Banerjee, B. Chase, E. Gottschalk,
1 Global Design Effort Beijing GDE Meeting, February 2007 Global Controls: RDR to EDR John Carwardine For Controls Global Group.
1 The ILC Control Work Packages. ILC Control System Work Packages GDE Oct Who We Are Collaboration loosely formed at Snowmass which included SLAC,
Global Design Effort: Controls & LLRF Americas Region Team WBS x.2 Global Systems Program Overview for FY08/09.
11 th February 2008Brian Martlew EPICS for MICE Status of the MICE slow control system Brian Martlew STFC, Daresbury Laboratory.
1 CEG 2400 Fall 2012 Network Servers. 2 Network Servers Critical Network servers – Contain redundant components Power supplies Fans Memory CPU Hard Drives.
Project X RD&D Plan Controls Jim Patrick AAC Meeting February 3, 2009.
XFEL The European X-Ray Laser Project X-Ray Free-Electron Laser Wojciech Jalmuzna, Technical University of Lodz, Department of Microelectronics and Computer.
1 Global Control System J. Carwardine (ANL) 6 November, 2007.
JLab Accelerator Controls Matt Bickley MaRIE discussion April 26, 2016.
FLASH Free Electron Laser in Hamburg Status of the FLASH Free Electron Laser Control System Kay Rehlich DESY Outline: Introduction Architecture Future.
Fermilab Control System Jim Patrick - AD/Controls MaRIE Meeting March 9, 2016.
OSIsoft High Availability PI Replication Colin Breck, PI Server Team Dave Oda, PI SDK Team.
Redundancy in the Control System of DESY’s Cryogenic Facility. M. Bieler, M. Clausen, J. Penning, B. Schoeneburg, DESY ARW 2013, Melbourne,
XFEL The European X-Ray Laser Project X-Ray Free-Electron Laser Wojciech Jalmuzna, Technical University of Lodz, Department of Microelectronics and Computer.
RDR Controls Design Walk- Through Controls and LLRF EDR Kick-Off Meeting August 20-22, 2007.
MicroTCA Development and Status
Chapter 19: Network Management
Presented by Li Gang Accelerator Control Group
Integrating HA Legacy Products into OpenSAF based system
Self Healing and Dynamic Construction Framework:
The ILC Control Work Packages
New xTCA Developments at SLAC
Software Research Directions Related to HA/ATCA Ecosystem
Automation and Feedbacks
CHAPTER 3 Architectures for Distributed Systems
Storage Virtualization
ILC Global Control System
Programmable Logic Controllers (PLCs) An Overview.
QNX Technology Overview
The ILC Control System J. Carwardine, C. Saunders, N. Arnold, F. Lenkszus (Argonne), K. Rehlich, S. Simrock (DESY), B. Banerjee, B. Chase, E. Gottschalk,
RF System (HLRF, LLRF, Controls) EDR Plan Overview
Presentation transcript:

1 The ILC Control System J. Carwardine, C. Saunders, N. Arnold, F. Lenkszus (Argonne), K. Rehlich, S. Simrock (DESY), B. Banerjee, B. Chase, E. Gottschalk, P. Joireman, P. Kasley, S. Lackey, P. McBride, V. Pavlicek, J. Patrick, M. Votava, S. Wolbers (Fermilab), K. Furukawa, S. Michizono (KEK), R.S. Larsen, R. Downing (SLAC)

ILC Control System - Carwardine, Saunders, et al ICALEPCS 07 2 Content ILC overview Controls challenges and conceptual design –Availability –Services –Configuration management Collaboration Wrap-up

ILC Control System - Carwardine, Saunders, et al ICALEPCS 07 3 J. Bagger ILC accelerator overview Two 11km long 250-GeV linacs with 16,000 cavities and 640 RF units. A 4.5-km beam delivery system with a single interaction point. 5-GeV electron and positron damping rings, 6.7km circumference. Polarized PC gun electron source and undulator-based positron source.

ILC Control System - Carwardine, Saunders, et al ICALEPCS 07 4 Technically Driven Timeline August BCD All regions ~ 5 yrs Siting Plan being Developed RDREDR Begin Const End Const Engineer Design Site Prep Site Select R & D -- Industrialization Gradient e-Cloud Cryomodule Full Production System Tests & XFEL Detector Install Detector Construct Pre-Operations Construction  Start-up

ILC Control System - Carwardine, Saunders, et al ICALEPCS 07 5 Control System challenges Mainly driven by scale & complexity of ILC accelerators –100,000 devices, several million control points –Main linacs: 640 RF klystrons, 2000 cryomodules, 16,000 cavities –Control system: front-end crates Accelerator operations: reliance on automation & feedback Accelerator availability goal of 85% (control system: 99%) Precision timing & synchronization over 10’s km. Remote operations. ILC funding model: multi-national in-kind contributions.

ILC Control System - Carwardine, Saunders, et al ICALEPCS 07 6 Services Tier “Business logic” Device abstraction Feedback engine State machines Online models… Control System functional model Front-End Tier Equipment Interfaces Control-point level Client Tier GUIs Scripting HMI for high level apps

ILC Control System - Carwardine, Saunders, et al ICALEPCS 07 7 Physical model: front-end

ILC Control System - Carwardine, Saunders, et al ICALEPCS 07 8 Physical model: global layer

ILC Control System - Carwardine, Saunders, et al ICALEPCS 07 9 Preliminary component counts ComponentDescriptionQuantity 1U Switch Initial aggregator of network connections from technical systems 8356 Controls Shelf Standard chassis for front-end processing and instrumentation cards 1195 Controls Rack Standard rack populated with one to three controls shelves 753 Aggregator Switch High-density connection aggregator for 2 sectors of equipment 71 Controls Backbone Switch Backbone networking switch for controls network 126 LLRF Controls Station Two racks per station for signal processing and motor / piezo drives 668 Phase Ref. Link Redundant fiber transmission of 1.3-GHz phase reference 68

ILC Control System - Carwardine, Saunders, et al ICALEPCS Addressing the challenges Large scale deployment –High Availability, strong emphasis on diagnostics, QA. –Resource management –Emphasize standards-based solutions. Extensive reliance on automation and 5Hz feedback –Automation and feedback engines as Services. –Make all control & monitor points available to feedback engine, synchronize control and monitor actions to 5Hz beam pulses. Controls integration of in-kind contributed equipment –Scope, span of control, treaty points… –?

ILC Control System - Carwardine, Saunders, et al ICALEPCS Availability ILC control system availability goal: 99% by design front-end crates  % per crate. Cannot afford a long period of identifying & fixing problems once machine operations begin. ‘Best effort’ approach may not be sufficient  Investigate High Availability techniques.

ILC Control System - Carwardine, Saunders, et al ICALEPCS Accelerator Availability Accelerator availability goal: 85% Control system availability goal: 99% Requires intrinsic reliability + rapid recovery

ILC Control System - Carwardine, Saunders, et al ICALEPCS HA requires different considerations Apply techniques not typically used on an accelerator –Development culture must be different. –Cannot build ad-hoc with in-situ testing. –Build modeling, simulation, testing, and monitoring into hardware and software methodology up front. Hardware availability –Instrumentation electronics to servers and disks. –Redundancy where feasible, otherwise adapt in software. –Modeling and simulation Software availability –Equally important. –Software has many more internal states – difficult to predict. –Modeling and simulation needed here for networking and software. –Robustness –Exception handling.

ILC Control System - Carwardine, Saunders, et al ICALEPCS Relative cost/benefit of HA techniques Cost (some effort laden, some materials laden) Availability(benefit) 1. Good administrative practices 2. Disk volume management 3. Automation (supporting RF tune-up, magnet conditioning, etc.) 4. COTS redundancy (switches, routers, NFS, RAID disks, database, etc.) 5. Extensive monitoring (hardware and software) 6. Model-based configuration management (change management) 7. Adaptive machine control (detect failed BPM, modify feedback) 8. Development methodology (testing, standards, patterns) 9. Application design (error code checking, etc) 10. Hot swap hardware 11. Manual failover (eg bad memory, live patching) 12. Model-based automated diagnosis 13. Automatic failover C. Saunders

ILC Control System - Carwardine, Saunders, et al ICALEPCS HA R&D objectives Learn about HA (High Availability) in context of accelerator controls –Bring in expertise (RTES, training, NASA, military, …) –Explore standards-based methodologies Develop (adopt) a methodology for examining control system failures –Fault tree analysis, FMEA, scenario-based FMEA –Others… Develop policies for detecting and managing failure modes –Development and testing methodology –Instrumentation, out-of-band monitoring (independent diagnostics) –Workarounds –Redundancy Develop “vertical” prototypes –Ie. how we might implement above policies –Integrate portions of “vertical” prototypes with accelerator facilities

ILC Control System - Carwardine, Saunders, et al ICALEPCS Front-end electronics requirements HA-specific requirements – Intelligent Platform Management –Remote power on/off and reset/initialize for individual boards. – Highly improved diagnostics capabilities in all electronics subsystems. –Support redundancy: processors, comms links, power supplies,… –Hot-swappable components: circuit boards, fans, power supplies, … Platform basic requirements – Standard modular architecture – Broad industry support of core components – Wide range of COTS modules + support custom instrumentation. –‘High performance’ + cost-effective.

ILC Control System - Carwardine, Saunders, et al ICALEPCS If not VME or VXI, then what…? Candidate standards include ATCA, uTCA, VME64x, VXS, VPX, other VITA standards… Of systems available today, ATCA offers the best representative feature set –Represents best practices of decades of telecom platform development. –Increasing evidence of commercial products for controls applications. –Growing interest in the Controls and DAQ community. –Being evaluated by several labs. Strong candidate for XFEL. Two flavors –ATCA: Full-featured, large form-factor –uTCA: Reduced feature-set, smaller form-factor, lower cost.

ILC Control System - Carwardine, Saunders, et al ICALEPCS Slot Crate w/ Shelf Manager Fabric Switch Dual IOC Processors Rear View 16 Slot Dual Star Backplane 4 Hot- Swappable Fans Shelf Manager Dual IOC’s Fabric Switch Dual 48VDC Power Interface ATCA crates R. Larsen

ILC Control System - Carwardine, Saunders, et al ICALEPCS CPU1 CPU2 I/O Custom Services Tier IPMI, HPI, SNMP, others… Controls Protocol Client Tier Front-end tier SM sensor Shelf Manager: Identify all boards on shelf Power cycle boards (individually) Reset boards Monitor voltages/temps Manage Hot-Swap LED state Switch to backup flash mem bank More… C. Saunders Fault detection and remediation (“Shelf” Management)

ILC Control System - Carwardine, Saunders, et al ICALEPCS ATCA Shelf w/ dedicated shelf Management Controllers

ILC Control System - Carwardine, Saunders, et al ICALEPCS SAF Availability Management Framework Open standard from telecom industry geared towards highly available, highly distributed systems. Manages software runtime lifecycle, fault reporting, failover policies, etc. Works in combination with a collection of well-defined services to provide a powerful environment for application software components. Potential application to critical core control system software such as IOCs, device servers, gateways, name-servers, data reduction, etc. Know exactly what software is running where. Gracefully restart components, manage state for component hot-swap Uniform diagnostics to troubleshoot problems. Implementations: OpenClovis, OpenSAF, Self-Reliant, Element, … C. Saunders

ILC Control System - Carwardine, Saunders, et al ICALEPCS SAF – Availability Management Framework Shutting down Locked- Instantiation Unlocked Locked AMF Logical Entities Service Unit Administrative States Service Unit Component Node U Service Unit Component Node V Service Group Service Instance Service Instance is work assigned to Service Unit active standby 1. Service unit starts out un-instantiated. 2. State changed to locked, meaning software is instantiated on node, but not assigned work. 3. State changed to unlocked, meaning software is assigned work (Service Instance). A simple example of software component runtime lifecycle management C. Saunders

ILC Control System - Carwardine, Saunders, et al ICALEPCS HA software framework is just the start SAF (Service Availability Forum) implementations won’t “solve” HA problems –You still have to determine what you want to do and encode it in the framework – this is where work lies What are failures How to identify failure How to compensate (failover, adaptation, hot-swap) Is resultant software complexity manageable? –Potential fix worse than the problem –Always evaluate: “am I actually improving availability?” Where should we apply high availability techniques?

ILC Control System - Carwardine, Saunders, et al ICALEPCS Configuration management Example: replacing circuit board

ILC Control System - Carwardine, Saunders, et al ICALEPCS …underlying assumptions Electronics board functions: –Hot-swappable –Remote reset / re-initialize. –Unique ID, available remotely –Remotely configurable (“DIP switches”) –Remotely updatable software, firmware, FPGA code –Separate Standby and Run modes –On-board self-test RDB contains: –Information on all installed and spare electronics boards –Information for every crate / slot / location –Current version of all software, firmware, FPGA code

ILC Control System - Carwardine, Saunders, et al ICALEPCS Services Tier architecture Client Tier Engineering model Physics model Controls model Operational data DOP – distributed object protocol MQP – message queuing protocol SRTP – soft real-time protocol DBP – database protocol DMP – deployment & mgmt protocol RDB DB Access Channel-Oriented Interface Deployment & Mgmt. Interface Service Interface Services Device Abstraction, Model Interaction, Archiving, Save/Restore, Feedback, Deployment & Mgmt, Automation, … Channel-Oriented Interface Service Interface Applications DBP SRTP DOP DMP MQP General Purpose Network Front-end Tier (not shown) C. Saunders Applications Graphical interfaces Operator consoles Services Well defined high-level functions available to any application or other service Channel-oriented Interface Traditional high-performance, direct access to control points

ILC Control System - Carwardine, Saunders, et al ICALEPCS Why Services? Some activities are not well suited to channel-oriented interfaces –Complex May require lots of parameters and a sequence of interactions –Dynamic May be added and removed frequently during operations May require dynamic allocation (network latency and/or cpu loading) –It should be possible to create a well-defined interface for common control system functions. Services allow rapid prototyping of high level apps through composition, while maintaining an an impedance to changing the core functions. Someone is going to do this anyway

ILC Control System - Carwardine, Saunders, et al ICALEPCS Possible Services APIs Script execution service Archiving service Logging service Data processing & visualization Save, Compare, Restore Alarm Management RDB calls Locking (channel, instance,…) Math & logic functions Event sequencer / synchronizer Device server Data concentrator Feedback / dynamical control Video processing, compression Out of Band monitoring Exception handling Resource management Authentication / access control Notification ( , phone, sms,…)

ILC Control System - Carwardine, Saunders, et al ICALEPCS Example: Knob Service Sequence C. Saunders front ends knob GUI

ILC Control System - Carwardine, Saunders, et al ICALEPCS Example: Feedback Service Sequence C. Saunders front ends feedback GUI

ILC Control System - Carwardine, Saunders, et al ICALEPCS Services Tier realization… Defining a Services Tier does not define where it runs –Front-end processors –Client workstations –Dedicated processors –Central servers Interfaces are hard to define –API inputs, outputs. –Services dependencies.

ILC Control System - Carwardine, Saunders, et al ICALEPCS Collaboration Inherently an international collaboration –Resource-limited (2-3 FTEs) –Main collaborators: Argonne, DESY, Fermilab, KEK, SLAC Heavily reliant on activities at new and operating accelerators –Benefit from existing work. –Prototype & evaluate ideas and techniques. Strong connection with DESY XFEL, Fermilab ILCTA, KEK STF We need more people to support the Global Design Effort, contribute ideas, collaborate in developing & evaluating ideas.

ILC Control System - Carwardine, Saunders, et al ICALEPCS Work Package topics Electronics platform evaluations (eg ATCA, uTCA) High Availability Risk analysis of design, FMEA Services architecture development Integrated software development on an international scale Configuration management Control System architecture design Network simulation/modeling Machine protection Remote Operations Evaluate potential controls integration tools Cost optimization Controls integration

ILC Control System - Carwardine, Saunders, et al ICALEPCS Wrap-up We have identified technical challenges that can be pursued now, and have created work packages that describe ways to address them. There is more work than a few FTEs can do …we are looking for more collaborators ILC: Controls: