FAULT TOLERANT POWER SYSTEMS Carsten Nesgaard Advisors: Professor Michael A. E. Andersen Professor Seth R. Sanders Ext. collaborators:

Slides:



Advertisements
Similar presentations
RAID Oh yes Whats RAID? Redundant Array (of) Independent Disks. A scheme involving multiple disks which replicates data across multiple drives. Methods.
Advertisements

Distributed Systems Major Design Issues Presented by: Christopher Hector CS8320 – Advanced Operating Systems Spring 2007 – Section 2.6 Presentation Dr.
1 Fault-Tolerant Computing Systems #6 Network Reliability Pattara Leelaprute Computer Engineering Department Kasetsart University
VSE Corporation Proprietary Information
NERC Lessons Learned Summary December NERC lessons learned published in December 2014 Three NERC lessons learned (LL) were published in December.
1 MM3 - Reliability and Fault tolerance in Networks Service Level Agreements Jens Myrup Pedersen.
Metrics for Process and Projects
REDUNDANT ARRAY OF INEXPENSIVE DISCS RAID. What is RAID ? RAID is an acronym for Redundant Array of Independent Drives (or Disks), also known as Redundant.
CS 795 – Spring  “Software Systems are increasingly Situated in dynamic, mission critical settings ◦ Operational profile is dynamic, and depends.
SMJ 4812 Project Mgmt and Maintenance Eng.
Objektorienteret Middleware Presentation 2: Distributed Systems – A brush up, and relations to Middleware, Heterogeneity & Transparency.
Network Management Overview IACT 918 July 2004 Gene Awyzio SITACS University of Wollongong.
Definition Reliability is a general quality of an object – an ability to perform a desired function, sustaining the values of rated operational indicators.
1 Chapter Fault Tolerant Design of Digital Systems.
Before start… Earlier work single-path routing in sensor networks
3-1 Introduction Experiment Random Random experiment.
©Ian Sommerville 2006Critical Systems Slide 1 Critical Systems Engineering l Processes and techniques for developing critical systems.
During a mains supply interruption the entire protected network is dependent on the integrity of the UPS battery as a secondary source of energy. A potential.
DC-DC Fundamentals 1.3 Switching Regulator
EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
Airbus flight control system  The organisation of the Airbus A330/340 flight control system 1Airbus FCS Overview.
The primary objective in the implementation of a UPS system is to improve power reliability to the limits of technical capability, the ultimate aim being.
Airbus flight control system
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 6 – RAID ©Manuel Rodriguez.
Computer System Lifecycle Chapter 1. Introduction Computer System users, administrators, and designers are all interested in performance evaluation. Whether.
©2003/04 Alessandro Bogliolo Background Information theory Probability theory Algorithms.
Software Project Management
Server Load Balancing. Introduction Why is load balancing of servers needed? If there is only one web server responding to all the incoming HTTP requests.
Redundant Array of Independent Disks
Carsten Nesgaard Michael A. E. Andersen
Relex Reliability Software “the intuitive solution
Transition of Component States N F Component fails Component is repaired Failed state continues Normal state continues.
ERT 312 SAFETY & LOSS PREVENTION IN BIOPROCESS RISK ASSESSMENT Prepared by: Miss Hairul Nazirah Abdul Halim.
Software Reliability SEG3202 N. El Kadri.
IMPROUVEMENT OF COMPUTER NETWORKS SECURITY BY USING FAULT TOLERANT CLUSTERS Prof. S ERB AUREL Ph. D. Prof. PATRICIU VICTOR-VALERIU Ph. D. Military Technical.
Repeaters and Hubs Repeaters: simplest type of connectivity devices that regenerate a digital signal Operate in Physical layer Cannot improve or correct.
Data and Computer Communications Circuit Switching and Packet Switching.
1 Digitally Controlled Converter with Dynamic Change of Control Law and Power Throughput Carsten Nesgaard Michael A. E. Andersen Nils Nielsen Technical.
Chapter 10: MANs and WANs. Topics What is MAN, WAN? How are they different from LANs? Subnet and three different switched-networks Connection-oriented.
Building Dependable Distributed Systems Chapter 1 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
"1"1 Introduction to Managing Data " Describe problems associated with managing large numbers of disks " List requirements for easily managing large amounts.
Adaptive control and process systems. Design and methods and control strategies 1.
On the Definition of Survivability J. C. Knight and K. J. Sullivan, Department of Computer Science, University of Virginia, December 2000.
L Berkley Davis Copyright 2009 MER035: Engineering Reliability Lecture 6 1 MER301: Engineering Reliability LECTURE 6: Chapter 3: 3.9, 3.11 and Reliability.
The concept of RAID in Databases By Junaid Ali Siddiqui.
Fault Tolerance Benchmarking. 2 Owerview What is Benchmarking? What is Dependability? What is Dependability Benchmarking? What is the relation between.
1 Optimized Load Sharing Control by means of Thermal Reliability Management Carsten Nesgaard * Michael A. E. Andersen Technical University of Denmark in.
Reliability Assessments Scope Per paragraph of the MAR and PAIP “ When necessary/prudent or when agreed upon with the GSFC Project Office, Glast.
Stracener_EMIS 7305/5305_Spr08_ Systems Reliability Modeling & Analysis Series and Active Parallel Configurations Dr. Jerrell T. Stracener, SAE.
Carsten Nesgaard Department of Electric Power Engineering
CONTENTS: 1.Abstract. 2.Objective. 3.Block diagram. 4.Methodology. 5.Advantages and Disadvantages. 6.Applications. 7.Conclusion.
"... To design the control system that effectively matches the plant requires an understanding of the plant rivaling that of the plant's designers, operators,
Unit-3 Reliability concepts Presented by N.Vigneshwari.
Component 8/Unit 9aHealth IT Workforce Curriculum Version 1.0 Fall Installation and Maintenance of Health IT Systems Unit 9a Creating Fault Tolerant.
1 CEG 2400 Fall 2012 Network Servers. 2 Network Servers Critical Network servers – Contain redundant components Power supplies Fans Memory CPU Hard Drives.
CS203 – Advanced Computer Architecture Dependability & Reliability.
Information Systems Security
More on Exponential Distribution, Hypo exponential distribution
Chapter 3 Data Representation
E212 – Analog Electronic II
Fundamentals of Electric Circuits Chapter 5
Software Project Management
Software Reliability PPT BY:Dr. R. Mall 7/5/2018.
PSU-Lab: A valuable expertise shared and open to CERN users
Feedback Amplifiers.
ELEC 7770 Advanced VLSI Design Spring 2012 Introduction
ELEC 7770 Advanced VLSI Design Spring 2010 Introduction
Definitions Cumulative time to failure (T): Mean life:
Presentation transcript:

FAULT TOLERANT POWER SYSTEMS Carsten Nesgaard Advisors: Professor Michael A. E. Andersen Professor Seth R. Sanders Ext. collaborators:

The chart shown to the right represents the focal points in the Ph.D.-project as well as reflecting the key elements in the presentation at hand. Overview:

Increased awareness: Originating within the field of high accuracy software for critical applications, modern fault tolerance applies equally well to hardware systems, since the weakest link within a given system determines the overall reliability. An unreliable power supply would degrade system performance, although the remaining system elements are highly reliable. Consequences of system downtime: Inability of financial transactions Substantial losses in sales Loss of customer services etc.

Fault tolerance (definition): The ability of a system to respond gracefully to an unexpected hardware or software failure. There are many levels of fault tolerance, the lowest being the ability to continue operation in the event of a power failure. Many fault tolerant computer systems mirror all operations - that is, every operation is performed on two or more duplicate systems, so if one fails the other can take over. Source:

Distributions: The following table contains the key functions and parameters concerning distributions in reliability evaluation: DistributionFailure density f(t)Survivor function R(t) Hazard rate (t)Variance  2 Poisson - Gaussian Exponential Weibull

Assuming the failure rate for each block/component within a given network can be found in the MIL-HDBK-217F the following simplifications can be applied: Constant hazard rate  exponential distribution MTBF  reciprocal of failure rate Reliability network reductions are independent of the distribution used: R Series  R Parallel  Network modeling:

Since no system can be made tolerant to all possible faults, it is essential that critical faults are identified and characterized during the design: Critical faults with realistic probability of occurrence The level of criticality (component, system, operator etc.) Two examples of critical failures in a redundant power supply: Over-voltage at output (resulting in loss of load) Short circuit of the input bus (resulting in loss of power) From the above-mentioned failures it can be seen that both lead to a loss of the load, thus undermining the concept of redundancy. System identification:

Fault isolation If critical failure-modes cannot be avoided in the design of a given system it is essential that these failure-modes are continuously monitored if fault tolerance within the system is to be maintained. Fault detection If a fault is detected within a given system the proper precautions must be taken by either dynamic replacement or redundancy. This prevents the propagation of a fault from its origin at one point within the system to a point where it can have a critical effect on a process or a user. System identification:

Fault prediction (estimation) As opposed to the above-mentioned topics that must be an integrated part of any fault tolerant system, a systems ability to predict faults based on continuous measurements of key components is a desirable feature that is made possible mainly due to advances in digital controllers. Redundancy control: Based on the two keywords fault detection and fault isolation a redundancy control algorithm has been developed using array based logic. A paper describing the approach taken has been submitted to COMPEL2002. System identification:

Dividing the three fault parameters into a high power and a low power category, one sees that fault isolation falls into the high power category whereas detection and prediction of faults, fall into the low power category due to the surveillance nature of these topics. System identification: Redundant network with mutually exclusive block failure rates. -values indicates proba- bility of block success.

Power system: Based on the system identification of the overall power system the following subjects must be considered: Power supply topology (high efficiency, component stress etc.) Control scheme Redundancy vs. optimised component selection Cost prize Active/passive current sharing in redundant power supplies Thermal surveillance Probability of malfunction

Based on the data found in MIL-HDBK-217F, a table containing block level failure rates for different converter topologies shall be established. Power system: In its basic form the Buck topology has no components directly connected across the power input v g (t). Source:Fundamentals of Power Electronics. Second ed. Erickson/Maksimovic

No redundancy (series systems – high quality comp.) Full redundancy (parallel systems – low quality comp.) Partial redundancy Standby systems Reliability / availability: Redundancy: The definition of the term reliability relates to a systems ability to stay in the operating state without failure. Thus, reliability is totally unsuitable as a measure for continuously operated systems that can tolerate failures. To describe the latter type of systems the term availability is used. The interpretation of this term is: The probability of finding the system in the operating state at some time into the future.

Digital vs. analog control: Surveillance and control of highly reliable power supplies can be performed by either digital or analog circuitry. Traditionally the analog approach has been taken (bandwidth, accuracy etc.) With increased processor speed and lower cost the digitally approach presents a wide variety of sophisticated control schemes that enables ‘intelligent’ determination of redundancy management.

Digital vs. analog control: The main purposes for implementing a digital control scheme in DC/DC converter applications are: Possibility of advanced fault detection (location, impact etc.) Fault isolation (controlled shut-down, redundancy control etc.) Fault estimation based on selected measurement parameters

The following list of pros and cons concerns the power systems surveillance and control circuitry. Digital: Analog: Pros: Cons: Pros: Cons: Noise margin Temperature stability Implem. of control algorithms Multiple surveillance functions Short reaction time High accuracy Digital vs. analog control: Discrete values – thus bit errors Finite sample time Noise and temperature sensitive Non or very little ‘intelligence’ Single function surveillance circuitry

In order to test the implementation of different surveillance schemes a Buck converter has been assembled. Digital vs. analog control: Test converter with switches for external fault simulation 4 measurement points for oscilloscope connection 4 switches for fault simulation Interface to microcontroller incl. various meas. parameters

Chosen approach: 1.Know precisely what the system is supposed to do when working under both normal and abnormal circumstances. 2.Group fault causes into different classes. Thus, identifying and categorizing all critical failure-modes. 3.Determine fault containment regions within the system. This is important since fault propagation in any system is to be prevented. 4.Determine the application failure margins and balance the level of fault tolerance with the cost of implementation. Based on this presentation the following basic rules have been deduced:

Summary: An overview of the main topics within the field of fault tolerant power systems has been presented. These include: Identification of power systems Probability analysis of power systems Digital vs. analog control schemes Fault detection, fault isolation and fault prediction