Software Configuration Management Lessons Learned Patrick Bong Safety Systems Group Stanford Linear Accelerator Center.

Slides:



Advertisements
Similar presentations
Configuration Management
Advertisements

Radiopharmaceutical Production
CIP Cyber Security – Security Management Controls
Chapter 7: Key Process Areas for Level 2: Repeatable - Arvind Kabir Yateesh.
More CMM Part Two : Details.
Presentation for the Management Study of the Code Enforcement Process City of Little Rock, Arkansas August 3, 2006.
Avoiding Plant Operating Errors AT&T NP&E April 10, 2012 © 2012 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks.
Project What is a project A temporary endeavor undertaken to create a unique product, service or result.
Security and Personnel
Environmental Management Systems An Overview With Practical Applications.
Laboratory Personnel Dr/Ehsan Moahmen Rizk.
Lecture 8. Quality Assurance/Quality Control The Islamic University of Gaza- Environmental Engineering Department Environmental Measurements (EENV 4244)
1 Mike Saleski 1 Safety Systems Configuration Control FAC 2009 Safety Systems Configuration Management E. Michael Saleski Control.
E. M. Saleski FAC 11/11/08 Configuration Control of PPS FAC Review November 2008 E. Michael Saleski Controls Dept Safety.
TEL382 Greene Chapter /27/09 2 Outline What is a Disaster? Disaster Strikes Without Warning Understanding Roles and Responsibilities Preparing For.
Corrective and Preventive Maintenance at NSCL. Outline Introduction to NSCL & MSU Quality Management at NSCL Tools used at NSCL for availability Future.
Quality Assurance/Quality Control Policy
Management Responsibility Procedure Tutorial. Introduction to Management Responsibility In this presentation we will discuss how to write a procedure.
Hamid Shoaee Accelerator Readiness Dec. 2, ‘08 SLAC National Accelerator Laboratory Controls Department LCLS Maintenance.
ISO 9000 Certification ISO 9001 and ISO
THE PRINCIPLES OF QUALITY MANAGEMENT. DEFINING QUALITY Good Appearance? High Price? The Best? Particular Specification? Not necessarily, but always: Fitness.
Instructions and forms
Release & Deployment ITIL Version 3
Risk Assessment – An Essential Standard
The Key Process Areas for Level 2: Repeatable Ralph Covington David Wang.
OHT 16.1 Galin, SQA from theory to implementation © Pearson Education Limited 2004 The objectives of training and certification The training and certification.
Chapter 4 Interpreting the CMM. Group (3) Fahmi Alkhalifi Pam Page Pardha Mugunda.
Safety and Health Program Section B of the Forest Activities Code.
Configuration Management T3 Webinar Feb 21, 2008 Chuck Larsen ITS Program Coordinator Oregon Department of Transportation.
Software Engineering Modern Approaches
WORKING EFFECTIVELY IN AN INFORMATION TECHNOLOGY ENVIRONMENT
S oftware Q uality A ssurance Part One Reviews and Inspections.
Implementing the New Reliability Standards Status of Draft Cyber Security Standards CIP through CIP Larry Bugh ECAR Standard Drafting Team.
Information Assurance The Coordinated Approach To Improving Enterprise Data Quality.
CMM Level 2 KPA’s CS 4320 Fall Requirements Management 1 Goals: – System requirements allocated to software are controlled using a baseline for.
Bring The Best to VeriSign. 2 VM3:Software Engineer –Network Operations Req # : 175,183 Position : Software Engineer - Network Operations Job Description.
Project Tracking. Questions... Why should we track a project that is underway? What aspects of a project need tracking?
Soft Tech Development Inc. 1 Software Project Tracking A CMM Level 2 Key Process Area Soft Tech Development Inc.
SENG521 (Fall SENG 521 Software Reliability & Testing Software Product & process Improvement using ISO (Part 3d) Department.
Certification and Accreditation CS Phase-1: Definition Atif Sultanuddin Raja Chawat Raja Chawat.
1.  Describe an overall framework for project integration management ◦ RelatIion to the other project management knowledge areas and the project life.
Chapter 6: Systems Development Steps, Tools, and Techniques Management Information Systems for the Information Age.
CMM Level 2: Repeatable Copyright, 2000 © Jerzy R. Nawrocki Quality Management.
Georgia Institute of Technology CS 4320 Fall 2003.
BIM Bridge Inspection and Maintenance Technical Standards Branch Class B Bridge Inspection Course Inspection Policies and Procedures INSPECTION POLICIES.
SWEN 5130 Requirements Engineering 1 Dr Jim Helm SWEN 5130 Requirements Engineering Requirements Management Under the CMM.
Paul Hardiman and Rob Brown SMMT IF Planning and organising an audit.
SMS Planning.  Safety management addresses all of the operational activities of the entire organization.  The four (4) components of an SMS are: 1)
Principles of Incident Response and Disaster Recovery Chapter 10 Business Continuity Operations and Maintenance.
Database Administration
1 Enzo Carrone 1 NEH Safety Systems NEH ARR 2009 NEH Safety Systems Enzo Carrone June 30 th, 2009.
1 City of Shelby Wastewater Treatment Division Becomes State’s Second Public Agency to Implement a Certified Environmental Management System CERTIFICATION.
Thursday August 20, 2009 John Anderson Page 1 Accelerator Interlock System Issues Flow Down of Requirements from the Safety Order to Engineered Safety.
An Overview of Support of Small Embedded Systems with Some Recommendations Controls Working Group April 14, 2004 T. Meyer, D. Peterson.
SEN 460 Software Quality Assurance. Bahria University Karachi Campus Waseem Akhtar Mufti B.E(C.S.E) UIT, M.S(S.E) AAU Denmark Assistant Professor Department.
Pertemuan 14 Matakuliah: A0214/Audit Sistem Informasi Tahun: 2007.
Laboratory equipment Dr. W. Huisman Cairo, November 21th 2012.
Problem Management for ITSD “Getting to the root of it” Thatcher Deane Feb 28, 2013.
Chapter 16 Staff training and certification. Outline The objectives of training and certification The training and certification process Determine professional.
Introduction for the Implementation of Software Configuration Management I thought I knew it all !
Planning for Succession
Configuration Management
Accelerator Operations Department Readiness (Part 2)
JSA Enhancements SIS competencies May, 2012.
Roadmap to COR.
Management Information Systems: Classic Models and New Approaches
DOE Review of the LCLS Project October 2006
Software Reviews.
System Safety Regulation
Radiopharmaceutical Production
Presentation transcript:

Software Configuration Management Lessons Learned Patrick Bong Safety Systems Group Stanford Linear Accelerator Center

Background August 23, 2007 –There was a failure of a Programmable Logic Controller (PLC) –The PLC was repaired using the incorrect version of software –The failure revealed systemic shortcomings within the safety systems group

Development R&D efforts –Two engineers –Ten architectures –Three working demonstrations –Numerous software revisions Design Effort –Two engineers –Three programmable systems –Two versions of software Test lab version (development) Field version (reviewed and approved)

Approved Architecture

Proposed Policy Software repository –Single location to retrieve approved software (Concurrent Versions System) Testing –Verification of approved software version –Black box certification Repair policy –Failed CPU requires system certification

Development Software Locations Not in CVS A:\Floppy Disks C:\Desk Top Drive D:\CD Burner F:\Flash Drives V:\Group Drive Z:\Employee Drive

Vendor Notification Allen-Bradley issued an engineering note –We determined that the application software was not affected by the problem –We determined that we did not wish to upgrade the operating system without an appropriate amount of testing –Our test and development systems did not exhibit any problems The engineering note did not specify that the operating system was affected by the problem

System in Operation The Laser Safety System is certified The Personnel Protection System is certified –The safety system operates as expected –The software is verified to be the correct version –The system is power cycled to verify power up cycle The internal registers of the CPU are reset The battery system installation is completed –The temporary power is removed –The battery power is turned on The safety systems are power cycled Four month pass with no problems –The system manager/engineer goes on vacation –System fails while manager is on vacation

Day One Two electrically isolated systems fail safe Staff – one technician –Technical depth Wiring Hardware Software (safety vs. process control) Debugging tools It is not possible to restore the system –The on-call tech has insufficient ability

Safety System Architecture

Day Two There were two simultaneous core dumps Staff – eight –Authority to authorize a repair –Technical ability to troubleshoot the system The recovered system reveals problems –The status reported on EPICS was not all correct One version of approved software in CVS Development software in numerous software locations Troubleshooting by black box methods

Day Three The recovered system requires certification Staff – one technician, one engineer –The manager has returned The certification document was not consistent with the previously executed version –A hardcopy was not available, so a document was printed from Microsoft Word –Track Changes in Microsoft Word Created two versions –Version Control

Shortcomings Failure to react to vendor notification –Not a root cause Would have delayed but not prevented an incident Personnel Resources –Training No qualified backup engineer with adequate knowledge of the system –Authorization No authorized backup engineer or technician Document Control –Controlled copies of procedures A controlled hardcopy of the certification procedure was unavailable –Written policies Lack of a robust and reliable procedure for retrieval of software –Released documentation Lack of a clear procedure for retrieval of current documentation

Root Causes Insufficient resources within the safety group Lack of skill sets –Need for another highly skilled safety system expert Inadequate peer review –Design –Software –Documentation –Procedures Insufficient document control process –Controls Department needed to define a document control process Lack of external verification and formal tracking of actions required by RSC –This should be done by a Controls Department member with authority to allocate resources

Corrective Actions Hire more staff Develop safety systems documentation Training and cross training Formal peer reviews Formal tracking and verification Department directives defining authority

Hire more staff Staffing Plan –Address the immediate need for another senior safety system engineer by arranging for a sabbatical visit by a senior engineer from another laboratory – A program is being put in place to foster cooperation with other labs. –Filled employee requisitions for key missing positions including A safety system manager A senior safety engineer A Documentation specialist An associate engineer

Documentation Created a web accessible site on the SLAC intranet for all safety system group documentation Created an accurate and complete document hierarchy and document catalog Created a document describing the documentation process Created procedures describing how to download from CVS and upload to A/B and Pilz PLCs –Deployed procedures on the SLAC intranet Created a Roles and Responsibilities document for the PPS group that has been approved by the Controls Department line management –The document provides Completed PPS Group training in the new documentation system and PLC code management system

More Documentation Develop software configuration management procedures and tools –Defined a configuration management process –Documented the process, including reviews-and- approvals process following major/minor changes –Developed a procedure for retrieving and deploying the current software version from CVS A tool for extracting the coded version number from an Allen- Bradley PLC was developed

Training and cross training Training of PPS Group members in new documentation and software procedures –Work has started on cross training in the PPS Group and with more formalized job assignments –This is the beginning of a formal training program –For the short term, training in the new documentation and software management procedures have been completed

Reviews, Tracking and Verification Reviews – Establishment a formal Life Cycle and process for PPS Reviews –Internal to the PPS group peer review of the design, software, and documentation –Internal to the Controls Department Tracking and external verification of the requirements mandated by the RSC Formal statement of policies on approval authorities, PLC operating system software upgrades, etc –External An annual review of safety systems practices at SLAC by experts from other laboratories

Department directives Organization –Re-organize the Safety Section in the Controls Department to establish an independent team focused on the operational issues, including procedures, documentation, liaison with Operation and RSC –The new safety section will consist of three branches including Engineering Operations, Procedures, Compliance, Liaisons with OPS, RSC, etc. Maintenance

Software Configuration Management Policy –A written policy must exist Execution –Staff must be aware of the policy –Staff must be trained to apply the policy Quality Control –Formal tracking and verification needs to be in place to insure that the policy is being followed

Q&A State Your –Name –Organization –Title –Question