Argonne National Laboratory is managed by The University of Chicago for the U.S. Department of Energy ILC Controls: High Availability Software.

Slides:



Advertisements
Similar presentations
COM vs. CORBA.
Advertisements

©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 9 Distributed Systems Architectures Slide 1 1 Chapter 9 Distributed Systems Architectures.
Chapter 7 LAN Operating Systems LAN Software Software Compatibility Network Operating System (NOP) Architecture NOP Functions NOP Trends.
Adding scalability to legacy PHP web applications Overview Mario A. Valdez-Ramirez.
Approaches to EJB Replication. Overview J2EE architecture –EJB, components, services Replication –Clustering, container, application Conclusions –Advantages.
Notes to the presenter. I would like to thank Jim Waldo, Jon Bostrom, and Dennis Govoni. They helped me put this presentation together for the field.
© 2005 Prentice Hall7-1 Stumpf and Teague Object-Oriented Systems Analysis and Design with UML.
Slide 1 ITC 2005 Gunnar Carlsson 1, David Bäckström 2, Erik Larsson 2 2) Linköpings Universitet Department of Computer Science Sweden 1) Ericsson Radio.
Distributed Systems Architectures
Introduction to Enterprise JavaBeans. Integrating Software Development Server-side Component Model Distributed Object Architecture –CORBA –DCOM –Java.
CS 501: Software Engineering Fall 2000 Lecture 16 System Architecture III Distributed Objects.
Enterprise Applications & Java/J2EE Technologies Dr. Douglas C. Schmidt Professor of EECS.
Distributed Service Architectures Yitao Duan 03/19/2002.
Software Engineering and Middleware: a Roadmap by Wolfgang Emmerich Ebru Dincel Sahitya Gupta.
Basic Services: Architecture Options Vance Maverick ADAPT Bologna Feb. 13, 2003.
July 23 th, 2005 Software Architecture in Practice RiSE’s Seminars Bass’s et al. Book :: Chapter 16 Fred Durão.
A Computer Aided Despatch System on CORBA/Java Platform Chau Chi Wing.
INTRODUCTION TO CLOUD COMPUTING Cs 595 Lecture 5 2/11/2015.
Principles for Collaboration Systems Geoffrey Fox Community Grids Laboratory Indiana University Bloomington IN 47404
Understanding and Managing WebSphere V5
Web Application Architecture: multi-tier (2-tier, 3-tier) & mvc
New Direction Proposal: An OpenFabrics Framework for high-performance I/O apps OFA TAC, Key drivers: Sean Hefty, Paul Grun.
Quality Assurance for Component- Based Software Development Cai Xia (Mphil Term1) Supervisor: Prof. Michael R. Lyu 5 May, 2000.
Introduction to distributed systems Dr. S. Indran 23 January 2004.
1 G52IWS: Distributed Computing Chris Greenhalgh.
A Lightweight Platform for Integration of Resource Limited Devices into Pervasive Grids Stavros Isaiadis and Vladimir Getov University of Westminster
IMPROUVEMENT OF COMPUTER NETWORKS SECURITY BY USING FAULT TOLERANT CLUSTERS Prof. S ERB AUREL Ph. D. Prof. PATRICIU VICTOR-VALERIU Ph. D. Military Technical.
Enterprise JavaBeans. What is EJB? l An EJB is a specialized, non-visual JavaBean that runs on a server. l EJB technology supports application development.
Lecture 3: Sun: 16/4/1435 Distributed Computing Technologies and Middleware Lecturer/ Kawther Abas CS- 492 : Distributed system.
Enterprise Java Beans Java for the Enterprise Server-based platform for Enterprise Applications Designed for “medium-to-large scale business, enterprise-wide.
Redundancy. 2. Redundancy 2 the need for redundancy EPICS is a great software, but lacks redundancy support which is essential for some highly critical.
1 Introduction to Middleware. 2 Outline What is middleware? Purpose and origin Why use it? What Middleware does? Technical details Middleware services.
New features for CORBA 3.0 by Steve Vinoski Presented by Ajay Tandon.
Tony McGregor RIPE NCC Visiting Researcher The University of Waikato DAR Active measurement in the large.
August 3-4, 2004 San Jose, CA Developing a Complete VoIP System Asif Naseem Senior Vice President & CTO GoAhead Software.
Constructing Services with Interposable Virtual Hardware Author: Andrew Whitaker, Richard S. Cox, Marianne Shaw, and Steven D. Gribble Presenter: Huajing.
Introduction to CORBA University of Mazandran Science & Tecnology By : Esmaill Khanlarpour January
Sunday, October 15, 2000 JINI Pattern Language Workshop ACM OOPSLA 2000 Minneapolis, MN, USA Fault Tolerant CORBA Extensions for JINI Pattern Language.
Refining middleware functions for verification purpose Jérôme Hugues Laurent Pautet Fabrice Kordon
A Software Architecture for Translucent Replication Etienne Antoniutti Di Muro Università degli Studi di Trieste, Italy 29th November,
Clustering In A SAN For High Availability Steve Dalton, President and CEO Gadzoox Networks September 2002.
Fault Tolerance in CORBA and Wireless CORBA Chen Xinyu 18/9/2002.
Hwajung Lee.  Interprocess Communication (IPC) is at the heart of distributed computing.  Processes and Threads  Process is the execution of a program.
1 Global Design Effort Beijing GDE Meeting, February 2007 Controls for Linac Parallel Session 2/6/07 John Carwardine ANL.
1 BBN Technologies Quality Objects (QuO): Adaptive Management and Control Middleware for End-to-End QoS Craig Rodrigues, Joseph P. Loyall, Richard E. Schantz.
Highly Available Internet Telephony Fact or Fiction? Manfred Reitenspiess Fujitsu Siemens Computers Munich, Germany
Enterprise Computing with Jini Technology Mark Stang and Stephen Whinston Jan / Feb 2001, IT Pro presented by Alex Kotchnev.
1 DOT’98 Workshop Heidelberg, 1-2 September 1998 CORBA and TMN The Story So Far EURESCOM DOT ‘98, 1-2 September 1998 Tom Counihan, Researcher, Broadcom.
OpenSAF Technical Overview Mario Angelic Technical Co-Chair OpenSAF Project June 4 th, 2009.
A service Oriented Architecture & Web Service Technology.
Integrating HA Legacy Products into OpenSAF based system
OpenSAF portability Murthy Esakonu
Software Research Directions Related to HA/ATCA Ecosystem
CORBA Within the OS & Its Implementation
Storage Virtualization
Component-Based Software Engineering: Technologies, Development Frameworks, and Quality Assurance Schemes X. Cai, M. R. Lyu, K.F. Wong, R. Ko.
SpiraTest/Plan/Team Deployment Considerations
Distributed Systems Bina Ramamurthy 11/30/2018 B.Ramamurthy.
Distributed Systems Bina Ramamurthy 12/2/2018 B.Ramamurthy.
Component--based development
Bina Ramamurthy Chapter 9
Bina Ramamurthy Chapter 9
Web Application Server 2001/3/27 Kang, Seungwoo. Web Application Server A class of middleware Speeding application development Strategic platform for.
Bina Ramamurthy Chapter 9
Distributed Systems Bina Ramamurthy 4/7/2019 B.Ramamurthy.
Quality Assurance for Component-Based Software Development
SOFTWARE DEVELOPMENT LIFE CYCLE
An Interactive Browser For BaBar Databases
Presentation transcript:

Argonne National Laboratory is managed by The University of Chicago for the U.S. Department of Energy ILC Controls: High Availability Software

2 Outline Opening comments ILC software architecture refresher The HA stack Primary and management protocols HPI (Hardware Platform Interface) summary AIS (Application Interface Specification) summary Bottom-up, are these a good fit? –HPI and HPI-ATCA –AIS Conclusions A proposed “stack” for ILC HA research Tasks

3 Opening Comments –Don’t build any critical path software infrastructure without access to source code –HA software is a hard problem –SAF specifications are an impressive unification of known techniques –SAF implementations won’t “solve” HA problem You still have to determine what you want to do and encode it in the framework – this is where work lies 1.What are failures 2.How to identify failure 3.How to compensate (redundancy or reconfiguration or both) –How long for known reliable, SAF compliant products to come out? Compare to time between OMG CORBA spec and good implementations… –Is resultant software complexity manageable? Potential fix worse than the problem

4 Architecture Refresher

5 SAF and ILC Controls Architecture Real-Time Tier Services Tier (middleware) Client Tier Failed I/O card or power supply: fix locally (localization) Hung task: escalate Report upwards SM CPU1 CPU2 I/O 1 I/O 2 checkpoints CLM Crashed middleware container: escalate Report upwards GUI sensor container object HPI AIS Cluster Membership Service Shelf Manager

6 Primary and Management Protocols How do they interact? –Primary connection mgmt. informed by management protocol –Specific actions carried out over primary protocol based on info from management protocol State Info Primary Controls Protocol HA Management Protocol Level N Level N+1

7 HPI (Hardware Platform Interface) Summary HPI subsumes IPMI(established), SNMP, Others Sessions Domains Entities Resources Client access to manage events - RDR repository (SNMP OIDs) - Physical components HPI passes info as IPMI packets over RMCP HPI-ATCA –Expose ATCA entities through HPI (hot swap LEDs, etc..)

8 AIS (Application Interface Specification) Summary C-code interface specification No protocols or other language bindings given AMF (Application Mangement Framework) – the tie that binds –Object lifecycle state diagrams (behavior) Services –Message – similar to JMS, MQSeries, Tuxedo Log, Notification, Events –Cluster Membership – redundant instances within a “group” –Checkpoint – save my state so standby can take over –Distributed Lock – basic need of distributed, coordinated system –IMMS – what is out there configured and deployed LDAP-like DN (Distinguished Names) identify resources

9 Bottom-up, Are these a good fit? HPI and HPI-ATCA –Yes! – IPMI and SNMP implementations all gravitating to HPI –Interoperability very useful to us here –Unified view of hardware resources Front-end CPU’s and I/O cards Servers (database and application) NADs (network attached devices) AIS –Hard problem –Anyone promoting they’ve produced solid 100% compliant AIS product is probably exaggerating –C-code interface only so far –Not clear that components will be interoperable Are we really going to be shopping for COTS control system middleware components?

10 HA Middleware: The Contenders (SAF presentation dated 4/26/05) (note: not a good story…) –Commercial Cluster SW Pro: Transparent to application; ISV support Con: Failover too slow; Proprietary –FT OS Single System Image Pro: Transparent Con: Scalability; Very complex to implement –FT CORBA Pro: Reasonably Transparent; Industry Standard Con: Failover times; Heterogeneity; Management –Telco HA Middleware Pro: Fast Fail-over; Extensible; Management Con: Intrusive; Non-Intuitive Model

11 FT-CORBA (fault tolerant)

12 FT-CORBA No existing CORBA-based control system is HA –Tango – uses open-source JacORB –ACS – uses open-source ORBacus –NIF uses Visibroker with custom connection management No Commercial FT-CORBA ORB as of beginning of 2004 –Spec out since 2001 – not a good sign There exists very little open-source FT-CORBA (mostly academic) –GroupPAC –OCI (Object Computing Inc.) TAO

13 CORBA Alternative - ZeroC ICE ICE (Internet Communications Engine) –High performance middleware –Open-Source GPL licensed –Multiple language bindings (C++, Java, PHP, Python, C# so far) –Used by Hewlett Packard and FCS (Future Combat Systems) –Very much like CORBA, but addresses substantial complexity and performance issues with CORBA (not designed by committee) HA Features –Has explicit support for storing object state to db –Coarse-grain failover only so far (server to server) Could possibly even use this to unify RTP (Real Time Protocol) and DOP (Distributed Object Protocol)

14 Options from world of Java Web Development JBoss –Open source middleware container –Lots of sophisticated, solid features for redundant deployment JINI –Java RMI service lookup/discovery protocol –Very useful for connection management Spring Framework –Lightweight middleware container –Alternative to EJB 2.0 EJB 3.0 –Response to Spring and flaws in EJB 2.0

15 Middleware HA – my conclusions This is a hard problem to solve It’s OK if this part of our efforts here take longer to solidify OS based clustering too slow and complex SAF AIS specification is great on paper, but… –No implementations yet that offer full compliance –No bindings other than C so far as I can tell FT-CORBA not looking good Proprietary Telco solutions – need I say more Success stories seem to use non-HA standards to build HA system –Use set of standards that matches your culture Ie. Java (JINI/RMI) or non FT-CORBA –Build needed HA behavior custom to your requirements Add in checkpointing, active/standby, connection mgmt, etc.

16 Middleware HA – conclusions (2) My inclination is to look at ICE and/or standard CORBA Build basic HA features following model of SAF AIS where reasonable Need more knowledge to even evaluate SAF AIS compliant products Wait for commercial and open-source implementations of AIS… In the mean-time, build a la carte from known stable frameworks

17 Proposed Stack for ILC HA Research SM CPU1 CPU2 COTS Custom Arrow ATCA Starter Kit -Pigeon Point shelf manager - need SM SDK ? -Dual (Quad) X86 processors - we need board developers kit Run EPICS iocCore on dual CPU’s ICE Middleware Tier -Examine suitability - build prototype HA features IPMI V1.5 over RMCP Channel Access Java GUI Applications ICE protocol

18 Tasks 1.Study and document points of failure (look at FNAL project…) How to identify failure How to recover (redundancy and/or reconfiguration) 2.Port EPICS iocCore to ATCA CPU’s RTOS ? Explore redundancy and checkpointing within iocCore 3.Establish middleware server Explore HA feature development within ICE RMCP to ATCA shelf manager Channel Access to ATCA CPU’s 4.Look at custom hardware development in ATCA, including potential associated additions to shelf manager software