NMP ST8 Dependable Multiprocessor (DM)

Slides:



Advertisements
Similar presentations
Full-System Timing-First Simulation Carl J. Mauer Mark D. Hill and David A. Wood Computer Sciences Department University of Wisconsin—Madison.
Advertisements

System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
Rad Hard By Software for Space Multicore Processing David Bueno, Eric Grobelny, Dave Campagna, Dave Kessler, and Matt Clark Honeywell Space Electronic.
Abstract HyFS: A Highly Available Distributed File System Jianqiang Luo, Mochan Shrestha, Lihao Xu Department of Computer Science, Wayne State University.
Experimental Evaluation of a SIFT Environment for Parallel Spaceborne Applications K. Whisnant, Z. Kalbarczyk, R.K. Iyer, P. Jones Center for Reliable.
Team 7 / May 24, 2006 Web Based Automation & Security Client Capstone Design Advisor Prof. David Bourner Team Members Lloyd Emokpae (team Lead) Vikash.
Advanced Processing Systems Honeywell Proprietary1 12/04/2003 Honeywell UF HCS & Honeywell DSES Opportunities Presented by Advanced Processing Systems.
Fault Detection in a HW/SW CoDesign Environment Prepared by A. Gaye Soykök.
Making Services Fault Tolerant
International Workshop on Satellite Based Traffic Measurement Berlin, Germany September 9th and 10th 2002 TECHNISCHE UNIVERSITÄT DRESDEN Onboard Computer.
Slide 1 ITC 2005 Gunnar Carlsson 1, David Bäckström 2, Erik Larsson 2 2) Linköpings Universitet Department of Computer Science Sweden 1) Ericsson Radio.
1 Making Services Fault Tolerant Pat Chan, Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong Miroslaw Malek.
NMP ST8 High Performance Dependable Multiprocessor 10th High Performance Embedded Computing M.I.T. Lincoln Laboratory September 20, 2006 Dr.
John David Eriksen Jamie Unger-Fink High-Performance, Dependable Multiprocessor.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Dual Core System-on-Chip Design to Support Inter- Satellite Communications Liza Rodriguez Aurelio Morales EEL Embedded Systems Dept. of Electrical.
Scientific Computing in Space Using COTS Processors
CLUSTER COMPUTING Prepared by: Kalpesh Sindha (ITSNS)
The Pursuit for Efficient S/C Design The Stanford Small Sat Challenge: –Learn system engineering processes –Design, build, test, and fly a CubeSat project.
RSC Williams MAPLD 2005/BOF-S1 A Linux-based Software Environment for the Reconfigurable Scalable Computing Project John A. Williams 1
S1.6 Requirements: KnightSat C&DH RequirementSourceVerification Source Document Test/Analysis Number S1.6-1Provide reliable, real-time access and control.
UNIX System Administration OS Kernal Copyright 2002, Dr. Ken Hoganson All rights reserved. OS Kernel Concept Kernel or MicroKernel Concept: An OS architecture-design.
© 2005 Mercury Computer Systems, Inc. Yael Steinsaltz, Scott Geaghan, Myra Jean Prelle, Brian Bouzas,
1 Reconfigurable Environment For Analysis and Test of Software Systems (REATSS) Dan McCaugherty /19/2004.
A brief overview about Distributed Systems Group A4 Chris Sun Bryan Maden Min Fang.
IMPROUVEMENT OF COMPUTER NETWORKS SECURITY BY USING FAULT TOLERANT CLUSTERS Prof. S ERB AUREL Ph. D. Prof. PATRICIU VICTOR-VALERIU Ph. D. Military Technical.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Command and Data Handling (C&DH)
Providing Bluetooth Functionality on Embedded Devices: A look at Embedded Operating Systems and Bluetooth Stacks Brian Fox Supervisors: Dr Greg Foster.
InfiniSwitch Company Confidential. 2 InfiniSwitch Agenda InfiniBand Overview Company Overview Product Strategy Q&A.
.1 RESEARCH & TECHNOLOGY DEVELOPMENT CENTER SYSTEM AND INFORMATION SCIENCES JHU/MIT Proprietary Titan MESSENGER Autonomy Experiment.
April 2000Dr Milan Simic1 Network Operating Systems Windows NT.
A Case Study in HW/SW Codesign and RC Project Risk Management: The Honeywell Reconfigurable Space Computer (HRSC) Jeremy Ramos Advanced Processing Systems.
NMP ST8 Dependable Multiprocessor Precis Presentation Dr. John R. Samson, Jr. Honeywell Defense & Space Systems U.S. Highway 19 North Clearwater,
ATCA based LLRF system design review DESY Control servers for ATCA based LLRF system Piotr Pucyk - DESY, Warsaw University of Technology Jaroslaw.
Page 1 Reconfigurable Communications Processor Principal Investigator: Chris Papachristou Task Number: NAG Electrical Engineering & Computer Science.
CRISP & SKA WP19 Status. Overview Staffing SKA Preconstruction phase Tiered Data Delivery Infrastructure Prototype deployment.
Heavy and lightweight dynamic network services: challenges and experiments for designing intelligent solutions in evolvable next generation networks Laurent.
August 3-4, 2004 San Jose, CA Developing a Complete VoIP System Asif Naseem Senior Vice President & CTO GoAhead Software.
Embedding Constraint Satisfaction using Parallel Soft-Core Processors on FPGAs Prasad Subramanian, Brandon Eames, Department of Electrical Engineering,
Control in ATLAS TDAQ Dietrich Liko on behalf of the ATLAS TDAQ Group.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
Experimental Evaluation of System-Level Supervisory Approach for SEFIs Mitigation Mrs. Shazia Maqbool and Dr. Craig I Underwood Maqbool 1 MAPLD 2005/P181.
MAPLD 2005/254C. Papachristou 1 Reconfigurable and Evolvable Hardware Fabric Chris Papachristou, Frank Wolff Robert Ewing Electrical Engineering & Computer.
March 2004 At A Glance NASA’s GSFC GMSEC architecture provides a scalable, extensible ground and flight system approach for future missions. Benefits Simplifies.
DAQ Development P. Roger Williamson Hansen Experimental Physics Laboratory Stanford University GLAST Collaboration Meeting GSFC February 10, 1999.
Accelerated Long Range Traverse (ALERT) Paul Springer Michael Mossey.
Test-as-You Fly SpaceWire for Solar Probe Plus
Abstract A Structured Approach for Modular Design: A Plug and Play Middleware for Sensory Modules, Actuation Platforms, Task Descriptions and Implementations.
ATCA at UIUC M. Haney, M. Kasten High Energy Physics Z. Kalbarczyk, T. Pham, T. Nguyen Coordinated Science Laboratory ILLINOIS UNIVERSITY OF ILLINOIS AT.
August 2003 At A Glance The IRC is a platform independent, extensible, and adaptive framework that provides robust, interactive, and distributed control.
Rad Hard By Software for Space Multicore Processing David Bueno, Eric Grobelny, Dave Campagna, Dave Kessler, and Matt Clark Honeywell Space Electronic.
Greg Alkire/Brian Smith 197 MAPLD An Ultra Low Power Reconfigurable Task Processor for Space Brian Smith, Greg Alkire – PicoDyne Inc. Wes Powell.
NMP ST8 Dependable Multiprocessor (DM)
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
Dependable Multiprocessing with the Cell Broadband Engine Dr. David Bueno- Honeywell Space Electronic Systems, Clearwater, FL Dr. Matt Clark- Honeywell.
NMP ST8 Dependable Multiprocessor (DM) Precis Presentation Dr. John R. Samson, Jr. Honeywell Defense & Space Systems U.S. Highway 19 North Clearwater,
March 2004 At A Glance The AutoFDS provides a web- based interface to acquire, generate, and distribute products, using the GMSEC Reference Architecture.
Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.
Constructing a system with multiple computers or processors 1 ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson. Jan 13, 2016.
CFS Use at Goddard The Johns Hopkins University Applied Physics Laboratory core Flight Software System Workshop October 26, 2015 Alan Cudmore – NASA Goddard.
JSTAR Independent Test Capability (ITC) Core Flight System (CFS) Utilization October 26, 2015 Justin R Morris NASA IV&V Program.
Honeywell Defense & Space, Clearwater, FL
cFE FSW at APL & FSW Reusability
Dependable Multiprocessing with the Cell Broadband Engine
RECONFIGURABLE PROCESSING AND AVIONICS SYSTEMS
Mark McKelvin EE249 Embedded System Design December 03, 2002
Co-designed Virtual Machines for Reliable Computer Systems
Outline Operating System Organization Operating System Examples
® IRL Solutions File Number Here.
Presentation transcript:

NMP ST8 Dependable Multiprocessor (DM) Dr. John R. Samson, Jr. Honeywell Defense & Space Systems 13350 U.S. Highway 19 North Clearwater, Florida 33764 (727) 539 - 2449 john.r.samson@honeywell.com High Performance Embedded Computing Workshop (HPEC) 18 – 20 September 2007

Outline Introduction Current Status & Future Plans - Dependable Multiprocessor * technology - overview - hardware architecture - software architecture Current Status & Future Plans TRL6 Technology Validation TRL7 Flight Experiment Summary & Conclusion * formerly known as the Environmentally-Adaptive Fault-Tolerant Computer (EAFTC); The Dependable Multiprocessor effort is funded under NASA NMP ST8 contract NMO-710209. This presentation has not been published elsewhere, and is hereby offered for exclusive publication except that Honeywell reserves the right to reproduce the material in whole or in part for its own use and where Honeywell is so obligated by contract, for whatever use is required thereunder.

DM Technology Advance: Overview A high-performance, COTS-based, fault tolerant cluster onboard processing system that can operate in a natural space radiation environment high throughput, low power, scalable, & fully programmable >300 MOPS/watt (>100) high system availability > 0.995 (>0.95) high system reliability for timely and correct delivery of data >0.995 (>0.95) technology independent system software that manages cluster of high performance COTS processing elements technology independent system software that enhances radiation upset tolerance NASA Level 1 Requirements (Minimum) Benefits to future users if DM experiment is successful: - 10X – 100X more delivered computational throughput in space than currently available - enables heretofore unrealizable levels of science data and autonomy processing - faster, more efficient applications software development -- robust, COTS-derived, fault tolerant cluster processing -- port applications directly from laboratory to space environment --- MPI-based middleware --- compatible with standard cluster processing application software including existing parallel processing libraries - minimizes non-recurring development time and cost for future missions - highly efficient, flexible, and portable SW fault tolerant approach applicable to space and other harsh environments - DM technology directly portable to future advances in hardware and software technology

Dependable Multiprocessor Technology Desire - -> ‘Fly high performance COTS multiprocessors in space’ Single Event Upset (SEU): Radiation induces transient faults in COTS hardware causing erratic performance and confusing COTS software - robust control of cluster - enhanced, SW-based, SEU-tolerance Cooling: Air flow is generally used to cool high performance COTS multiprocessors, but there is no air in space - tapped the airborne-conductively-cooled market Power Efficiency: COTS only employs power efficiency for compact mobile computing, not for scalable multiprocessing - tapped the high performance density mobile market - To satisfy the long-held desire to put the power of today’s PCs and supercomputers in space, three key issues, SEUs, cooling, & power efficiency, need to be overcome DM has addressed and solved all three issues DM Solution DM Solution DM Solution

DM Hardware Architecture Co-Processor Memory Volatile & NV Net & Instr IO Main Processor Mass Data Storage Unit * Custom S/C or Sensor I/0 * * Examples: Other mission-specific functions

... DMM Top-Level Software Layers DMM – Dependable Multiprocessor Middleware Hardware Honeywell RHSBC DMM OS – WindRiver VxWorks 5.4 cPCI (TCP/IP over cPCI) Extreme 7447A Application OS – WindRiver PNE-LE (CGE) Linux DMM components and agents. Scientific Application Generic Fault Tolerant Framework OS/Hardware Specific System Controller Data Processor Application Programming Interface (API) SAL (System Abstraction Layer) ... FPGA Mission Specific Applications Policies Configuration Parameters S/C Interface SW and SOH And Exp. Data Collection

DMM Software Architecture “Stack”

Examples: User-Selectable Fault Tolerance Modes Fault Tolerance Option Comments NMR Spatial Replication Services Multi-node HW SCP and Multi-node HW TMR NMR Temporal Replication Services Multiple execution SW SCP and Multiple Execution SW TMR in same node with protected voting ABFT Existing or user-defined algorithm; can either detector detect or detect and correct data errors with less overhead than NMR solution ABFT with partial Replication Services Optimal mix of ABFT to handle data errors and Replication Services for critical control flow functions Check-pointing Roll Back User can specify one or more check-points within the application, including the ability to roll all the way back to the original Roll forward As defined by user Soft Node Reset DM system supports soft node reset Hard Node Reset DM system supports hard node reset Fast kernel OS reload Future DM system will support faster OS re-load for faster recovery Partial re-load of System Controller/Bridge Chip configuration and control registers Faster recovery that complete re-load of all registers in the device Complete System re-boot System can be designed with defined interaction with the S/C; TBD missing heartbeats will cause the S/C to cycle power

DM Technology Readiness & Experiment Development Status and Future Plans 10/27/06 5/17/06 10/08 9/08 * NASA ST8 Project Confirmation Review TRL5 Technology Validation TRL6 Technology Validation Technology in Relevant Environment Technology Demonstration in a Relevant Environment * X* Launch 11/09 * Mission 1/10 - 6/10 * 5/31/06 6/27/07 Preliminary Design Review Critical Design Review Flight Readiness Review TRL7 Technology Validation X* X* Preliminary Experiment HW & SW Design & Analysis Final Experiment HW & SW Design & Analysis Built/Tested HW & SW Ready to Fly Flight 5/06, 4/07, & 5/07 * Per direction from NASA Headquarters 8/3/07; The ST8 project ends with TRL6 Validation Preliminary Radiation Testing Final Radiation Testing Test results indicate DM components will survive and upset adequately @ 455 km x 960 km x 98.2o orbit Critical Component Survivability & Preliminary Rates Complete Component & System-Level Beam Tests Key: - Complete

DM Phase C/D Flight Testbed System Point-to-Point Ethernet System Controller: Wind River OS - VxWorks 5.4 Honeywell RHSBC (PPC 603e) Data Processor: - PNE-LE 4.0 (CGE) Linux Extreme 6031 PPC 7447a with AltiVec co-processor Memory Card: Aitech S990 Networks: cPCI Ethernet: 100Mb/s System Controller Data Processor Data Processor Data Processor Data Processor (Emulates Mass Service) Rad Tolerant Memory Module RS422 Interface Message Process Spacecraft Computer SCIP DMM DMM DMM DMM DMM cPCI SCIP – S/C Interface Process

DM Phase C/D Flight Testbed Custom Commercial Open cPCI Chassis System Controller (flight RHSBC) Backplane Ethernet Extender Cards Flight-like Mass Memory Module Flight-like COTS DP nodes

TRL6 Technology Validation Demonstration (1) Automated Fault Injection Tests: CTSIM or S/C Emulator Host NFTAPE System Controller Ethernet RTMM DP Boards Chassis KEY: RTMM - Rad Tolerant Memory Module DP - COTS Data Processor NFTAPE – Network Fault Tolerance And Performance Evaluation tool CTSIM – Command & Telemetry Simulator DP Board with NFTAPE kernel Injector and NFTAPE interface cPCI Phase C/D Testbed System

TRL6 Technology Validation Demonstration (2) System-Level Proton Beam Tests: Additional Cooling Fan Aperture for Radiation Beam . Proton Beam Radiation Source System Controller DP Board on cPCI Extender Card Borax Shield Ethernet CTSIM or S/C Emulator RTMM DP Boards Extender Card cPCI Phase C/D Test Bed KEY: RTMM - Rad Tolerant Memory Module DP - COTS Data Processor CTSIM – Command & Telemetry Simulator

Dependable Multiprocessor Experiment Payload on the ST8 “NMP Carrier” Spacecraft Power Supply Module MIB Mass Memory Module DM Payload RHPPC-SBC System Controller Test, Telemetry, & Power Cables 4-xPedite 6031 DP nodes ST8 Orbit: - sun-synchronous - 955 km x 460km @ 98.2o inclination Software Multi-layered System SW OS, DMM, APIs, FT algorithms SEU-Tolerance detection autonomous, transparent recovery Applications 2DFFT, LUD, Matrix Multiply, GSFC Neural Sensor application Multi-processing parallelism, redundancy combinable FT modes Flight Hardware Dimensions 10.6 x 12.2 x 24.0 in. (26.9 x 30.9 x 45.7 cm) Weight (Mass) ~ 61.05 lbs (27.8 kg) Power ~ 121 W (max) The ST8 DM Experiment Payload is a stand-alone, self-contained, bolt-on system.

Overview of DM Payload Flight Experiment Operation S/C DM Payload Note: 1) The data collected for the periodic SOH message include s summary experiment statistics on the environment and on system operation and performance S/C DM DM Warms 2) Data collection for the Experiment Data Telemetry message i s triggered by Warm - Up To Start - Up detection of a System - Level SEU event Power On Temperature Continuous execution after start - up, as long as DM experiment is “ on ” S/C DM DM Init. 1) Syst . Cntrl . Operational Power Up 2) DP Nodes Power On Sequence 3) Syst . SW Uplink or DM DM DM DM DM S/C DM Responds Environment Experiment System System Payload To Command Data Application Controller SW * Command Collection Sequence Periodic DM Data Collection SOH System for Periodic Message Controller SOH Message Experiment DM Data Collection Telemetry System for Experiment System - Level Message Telemetry Msg. SEU Event Controller Detection S/C Imm . DM DM Power Power Off System Down * S/C Interface, OS, HAM, DMM Indication Controller Sequence

DM TRL6 “Wind Tunnel” with COTS 7447a ST8 Flight Boards DM Technology - Platform Independence DM technology has already been ported successfully to a number of platforms with heterogeneous HW and SW elements - Pegasus II with Freescale 7447a 1.0GHz processor with AltiVec vector processor with existing DM TRL5 Testbed - 35-Node Dual 2.4GHz Intel Xeon processors with 533MHz front-side bus and hyper-threading (Kappa Cluster) - 9-Node Dual Motorola G4 7455 @ 1.42 GHz, with AltiVec vector processor (Sigma Cluster) - DM flight experiment 7447a COTS processing boards with DM TRL5 Testbed - State-of-the-art IBM multi-core Cell processor -- DMM working on Cell; awaiting integration & demonstration with the DM TRL5 Testbed DM TRL6 “Wind Tunnel” with COTS 7447a ST8 Flight Boards 35-Node Kappa Cluster at UF DM TRL5 Testbed System With COTS 750fx boards

NASA GSFC Application Port to DM – Demonstrated Ease of Use Time to port a previously unseen application, the NASA Goddard Neural System Application written in FORTRAN and Java, to the DM TRL5 testbed.* Approximately one man-week, including time to find and test FORTRAN compilers that would work on the DM system ! * Port performed by Adam Jacobs, doctoral student at the University of Florida, member of the ST8 DM team. Neural System application provided by Dr. Steve Curtis (NASA GFSC) and Dr. Michael Rilee (CSC/NASA GFSC)

Summary & Conclusion Flying high performance COTS in space is a long-held desire/goal - Space Touchstone - (DARPA/NRL) - Remote Exploration and Experimentation (REE) - (NASA/JPL) - Improved Space Architecture Concept (ISAC) - (USAF) NMP ST8 DM project is bringing this desire/goal closer to reality Successful DM Experiment CDR on 6/27/07 DM technology is applicable to wide range of missions - science and autonomy missions - landers/rovers - CEV docking computer - MKV - UAVs (Unattended Airborne Vehicles) - UUVs (Unattended or Un-tethered Undersea Vehicles) - ORS (Operationally Responsive Space) - Stratolites - ground-based systems - rad hard space applications