Fault Detection in a HW/SW CoDesign Environment Prepared by A. Gaye Soykök.

Slides:

Advertisements

Similar presentations

Embedded System, A Brief Introduction

Advertisements

Technical System Options

© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.

Software Quality Assurance Plan

11. Practical fault-tolerant system design Reliable System Design 2005 by: Amir M. Rahmani.

1 HW/SW Partitioning Embedded Systems Design. 2 Hardware/Software Codesign “Exploration of the system design space formed by combinations of hardware.

1 SYSTEM and MODULE DESIGN Elements and Definitions.

Define Embedded Systems Small (?) Application Specific Computer Systems.

Architectural Design Principles. Outline  Architectural level of design The design of the system in terms of components and connectors and their arrangements.

7. Fault Tolerance Through Dynamic or Standby Redundancy 7.5 Forward Recovery Systems Upon the detection of a failure, the system discards the current.

CprE 458/558: Real-Time Systems

Winter-Spring 2001Codesign of Embedded Systems1 Introduction to HW/SW Co-Synthesis Algorithms Part of HW/SW Codesign of Embedded Systems Course (CE )

HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms.

Page 1 Copyright © Alexander Allister Shvartsman CSE 6510 (461) Fall 2010 Selected Notes on Fault-Tolerance (12) Alexander A. Shvartsman Computer.

1  Staunstrup and Wolf Ed. “Hardware Software codesign: principles and practice”, Kluwer Publication, 1997  Gajski, Vahid, Narayan and Gong, “Specification,

 Introduction Introduction  Definition of Operating System Definition of Operating System  Abstract View of OperatingSystem Abstract View of OperatingSystem.

Ekrem Kocaguneli 11/29/2010. Introduction CLISSPE and its background Application to be Modeled Steps of the Model Assessment of Performance Interpretation.

Failure Spread in Redundant UMTS Core Network n Author: Tuomas Erke, Helsinki University of Technology n Supervisor: Timo Korhonen, Professor of Telecommunication.

Towards a Contract-based Fault-tolerant Scheduling Framework for Distributed Real-time Systems Abhilash Thekkilakattil, Huseyin Aysan and Sasikumar Punnekkat.

9/14/2015B.Ramamurthy1 Operating Systems : Overview Bina Ramamurthy CSE421/521.

1 Fault Tolerance in the Nonstop Cyclone System By Scott Chan Robert Jardine Presented by Phuc Nguyen.

B.Ramamurthy9/19/20151 Operating Systems u Bina Ramamurthy CS421.

An Introduction to Software Architecture

CSE 303 – Software Design and Architecture

Software Metrics - Data Collection What is good data? Are they correct? Are they accurate? Are they appropriately precise? Are they consist? Are they associated.

Eric Keller, Evan Green Princeton University PRESTO /22/08 Virtualizing the Data Plane Through Source Code Merging.

◦ What is an Operating System? What is an Operating System? ◦ Operating System Objectives Operating System Objectives ◦ Services Provided by the Operating.

Reconfiguration Based Fault-Tolerant Systems Design - Survey of Approaches Jan Balach, Jan Balach, Ondřej Novák FIT, CTU in Prague MEMICS 2010.

HW/SW PARTITIONING OF FLOATING POINT SOFTWARE APPLICATIONS TO FIXED - POINTED COPROCESSOR CIRCUITS - Nalini Kumar Gaurav Chitroda Komal Kasat.

Real-Time Operating Systems for Embedded Computing 李姿宜 R ,06,10.

Computer Engineering Group Brandenburg University of Technology at Cottbus 1 Ressource Reduced Triple Modular Redundancy for Built-In Self-Repair in VLIW-Processors.

Control Systems Design Part: FS Slovak University of Technology Faculty of Material Science and Technology in Trnava 2007.

Fault-Tolerant Systems Design Part 1.

Secure Systems Research Group - FAU 1 Active Replication Pattern Ingrid Buckley Dept. of Computer Science and Engineering Florida Atlantic University Boca.

CprE 458/558: Real-Time Systems

Task Graph Scheduling for RTR Paper Review By Gregor Scott.

DATABASE MANAGEMENT SYSTEM ARCHITECTURE

1 Embedded Computer System Laboratory Systematic Embedded Software Gerneration from SystemC.

1-1 Software Development Objectives: Discuss the goals of software development Identify various aspects of software quality Examine two development life.

1 Copyright  2001 Pao-Ann Hsiung SW HW Module Outline l Introduction l Unified HW/SW Representations l HW/SW Partitioning Techniques l Integrated HW/SW.

CPSC 873 John D. McGregor Session 9 Testing Vocabulary.

1 Copyright  2001 Pao-Ann Hsiung SW HW Module Outline l Introduction l Unified HW/SW Representations l HW/SW Partitioning Techniques l Integrated HW/SW.

Using Software Rules To Enhance FPGA Reliability Chandru Mirchandani Lockheed-Martin Transportation & Security Solutions September 7-9, 2005 P226/MAPLD2005.

CPSC 871 John D. McGregor Module 8 Session 1 Testing.

Pipelined and Parallel Computing Partition for 1 Hongtao Du AICIP Research Dec 1, 2005 Part 2.

System-on-Chip Design Hao Zheng Comp Sci & Eng U of South Florida 1.

Physically Aware HW/SW Partitioning for Reconfigurable Architectures with Partial Dynamic Reconfiguration Sudarshan Banarjee, Elaheh Bozorgzadeh, Nikil.

Parallel Computing Presented by Justin Reschke

Testing Overview Software Reliability Techniques Testing Concepts CEN 4010 Class 24 – 11/17.

CPSC 372 John D. McGregor Module 8 Session 1 Testing.

Powerpoint Templates Data Communication Muhammad Waseem Iqbal Lecture # 07 Spring-2016.

Real-time Software Design

System-on-Chip Design

Programmable Hardware: Hardware or Software?

REAL-TIME OPERATING SYSTEMS

FPGA: Real needs and limits

Chapter 1: Introduction

Real-time Software Design

Gabor Madl Ph.D. Candidate, UC Irvine Advisor: Nikil Dutt

Operating Systems : Overview

Operating Systems Bina Ramamurthy CSE421 11/27/2018 B.Ramamurthy.

Fault Tolerance Distributed Web-based Systems

Operating Systems : Overview

An Introduction to Software Architecture

Operating Systems : Overview

Operating Systems : Overview

Hardware Assisted Fault Tolerance Using Reconfigurable Logic

Mark McKelvin EE249 Embedded System Design December 03, 2002

Co-designed Virtual Machines for Reliable Computer Systems

Seminar on Enterprise Software

Presentation transcript:

Fault Detection in a HW/SW CoDesign Environment Prepared by A. Gaye Soykök

Outline Introduction System Specification Fault model Some terminology Methodology Analysis Reliable communication HW/SW Partitioning

Introduction System reliability aspects are generally considered to the end of the design process, at low abstraction levels Working at low abstraction levels introduces more overhead Not all systems can be considered at low levels It is better to handle fault detection at higher levels It is better to asses if fault detection should be done in HW or SW for system performance

Introduction At system level several parameters are considered and an alternative design is chosen among several alternatives –Time constraints –Power consumption –Testability –Area

Introduction Fault detection facilities are introduced at system level –HW/SW binding of components is affected System Specification: which parts are critical and need fault detection Design methodologies: how these detection facilities are applied either in HW or SW HW/SW partitioning: which parts are in SW, which are in HW. Guided by methodologies

System Specification Language must support.. User should eb able to specify which sections require reliability aspects For ex: SystemC or OCCAM Architecture; CPU(dsp or general purpose), Coprocessors, (ASIC or FPGA)

FAULT MODEL Single Functional Failure –Any number of physical faults causes a functional model to perform incorrectly –HW is faulty, software is affected by hardware –CPU, communication channels, one of Co processors, memory may fail –Module failure is detected before any other fails Temporal, architectural and informational redundancy is adopted

Some Terminology Nominal :original system function elements Checking: redundant elements for fault detection Checker: element to compare checking and nominal Each of these elements can be independently implemented in either HW or SW

HW or SW Nominal SW, Checker SW, Checking SW Checking and checker are either executed by system processor or a dedicated processor Ex: Self checking SW, Assertions, Dual_processor and VLIW

HW or SW (Cont’d) Nominal SW, checker HW and checking SW Interface for functional Redundancy check, VLIW with hardware, Dma checker Nominal SW, checker HW and checking HW CED solutions are implemented totally in HW, EX: Dynamically configurable checker

HW or SW (Cont’d) Nominal HW, Checker HW, Checking HW Classical Approach. Ex: Duplication, TSC devices

Methodologies Analysis - Concepts Number and type of processing elements Whether special architecture is necessary Synchronization issues between processing elements Allocation of checker memory space Checker structure and complexity Selection of a checker methodolgy to raise errors in case of mismatches

Methodologies Analysis - Metrics Detection latency: the time between the instant an error occurs and the instance it is detected Coverage: how many of the existing faults can be detected Performance degradation: overhead caused by fault detection facilities compared to nominal functions

Methodologies Analysis – Metrics (Cont’d) Material cost: cost of physical components Design Cost: effort needed to design the system

Reliable Communication Apart from data processing communication needs to be reliable Hardware redundancy ; lines duplication Information redundancy; data encoding Best effective when data encoding is used when SW is involved and hardware sections employ dedicated lines (dublicated, encoded)

HW/SW Partitioning After systems is specified, methodologies has been assessed, different alternatives have been produced with cost functions partitioning step takes place. Evaluate cost functions, evaluate constraints of the user Reliability aspects make it more complex Make partitioning in two stages!

HW/SW Partitioning (Cont’d) First level: classical aspects and functions are taken into account Second level: given the first solution reliability aspects are introduced and a solution between solution set that has the best trade off and that satisfies the first constraints is chosen. If no reliability constraints is given second level is not carried

HW/SW Partitioning (Cont’d) If specific architecture is required for reliability (for example dual processor) fist level benefits from earlier partitioning solutions A solution may not exist after reliability constraints are introduced and first level may need to be repeated

HW/SW Partitioning (Cont’d) Reliability constraints may be which druve the second stage –Hard, ex: % 100 fault coverage –Soft, ex: any fault coverage Parameters considered –Fault coverage –Performance degradation –Detection latency –Area overhead

Conclusion Design for reliability has been merged into HW/SW codesign process resulting in a final design that has on-line fault detection properties Future work is introducing fault tolerancy into HW/SW codesign process