FLASH Mitigation Strategies for Space Applications Charles Howard Southwest Research Institute.

Slides:



Advertisements
Similar presentations
CS 346 – April 4 Mass storage –Disk formatting –Managing swap space –RAID Commitment –Please finish chapter 12.
Advertisements

Triple-Parity RAID and Beyond Hai Lu. RAID RAID, an acronym for redundant array of independent disks or also known as redundant array of inexpensive disks,
FPGA (Field Programmable Gate Array)
Flash storage memory and Design Trade offs for SSD performance
Thank you for your introduction.
Computer Organization and Architecture
Scrubbing Approaches for Kintex-7 FPGAs
Radiation Effects on FPGA and Mitigation Strategies Bin Gui Experimental High Energy Physics Group 1Journal Club4/26/2015.
HPEC 2012 Scrubbing Optimization via Availability Prediction (SOAP) for Reconfigurable Space Computing Quinn Martin Alan George.
April 30, Cost efficient soft-error protection for ASICs Tuvia Liran; Ramon Chips Ltd.
ICAP CONTROLLER FOR HIGH-RELIABLE INTERNAL SCRUBBING Quinn Martin Steven Fingulin.
0 秘 Type of NAND FLASH Discuss the Differences between Flash NAND Technologies: SLC :Single Level Chip MLC: Multi Level Chip TLC: Tri Level Chip Discuss:
DC/DC Switching Power Converter with Radiation Hardened Digital Control Based on SRAM FPGAs F. Baronti 1, P.C. Adell 2, W.T. Holman 2, R.D. Schrimpf 2,
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
L189/MAPLD2004Carmichael 1 A Triple Module Redundancy Scheme for SEU Mitigation of Static Latch-Based FPGAs (“Birds-of-a-Feather”) Carl Carmichael 1, Brendan.
Overview Finite State Machines - Sequential circuits with inputs and outputs State Diagrams - An abstraction tool to visualize and analyze sequential circuits.
CSE 451: Operating Systems Winter 2010 Module 13 Redundant Arrays of Inexpensive Disks (RAID) and OS structure Mark Zbikowski Gary Kimura.
Chapter 6 RAID. Chapter 6 — Storage and Other I/O Topics — 2 RAID Redundant Array of Inexpensive (Independent) Disks Use multiple smaller disks (c.f.
RAID Ref: Stallings. Introduction The rate in improvement in secondary storage performance has been considerably less than the rate for processors and.
Radiation Effects and Mitigation Strategies for modern FPGAs 10 th annual workshop for LHC and Future experiments Los Alamos National Laboratory, USA.
S1.6 Requirements: KnightSat C&DH RequirementSourceVerification Source Document Test/Analysis Number S1.6-1Provide reliable, real-time access and control.
12004 MAPLD: 141Buchner Single Event Effects Testing of the Atmel IEEE1355 Protocol Chip Stephen Buchner 1, Mark Walter 2, Moses McCall 3 and Christian.
File system support on Multi Level Cell (MLC) flash in open source April 17, 2008 Kyungmin Park Software Laboratories Samsung Electronics.
MAPLD 2005 Anthony Lai, Radiation Tolerant Computer Design.
Redundant Array of Inexpensive Disks aka Redundant Array of Independent Disks (RAID) Modified from CCT slides.
Introduction to Computing: Lecture 4
HARDWARE ARCHITECTURE
MODULE 5: Main Memory.
J. Christiansen, CERN - EP/MIC
Chapter 3 Internal Memory. Objectives  To describe the types of memory used for the main memory  To discuss about errors and error corrections in the.
Memory Interface A Course in Microprocessor Electrical Engineering Dept. University of Indonesia.
2/2/2009 Marina Artuso LHCb Electronics Upgrade Meeting1 Front-end FPGAs in the LHCb upgrade The issues What is known Work plan.
I/O Computer Organization II 1 Introduction I/O devices can be characterized by – Behavior: input, output, storage – Partner: human or machine – Data rate:
Redundant Array of Independent Disks.  Many systems today need to store many terabytes of data.  Don’t want to use single, large disk  too expensive.
Memory 1 ©Paul Godin Created March 2008 Edit April 2011.
Experimental Evaluation of System-Level Supervisory Approach for SEFIs Mitigation Mrs. Shazia Maqbool and Dr. Craig I Underwood Maqbool 1 MAPLD 2005/P181.
High Performance Embedded Computing (HPEC) Workshop 23−25 September 2008 John Holland & Eliot Glaser Northrop Grumman Corporation P.O. Box 1693 Baltimore,
ISUAL Mass Memory Robert Abiad. NCKU UCB Tohoku Mass Memory R. Abiad IFR 5-7 Mar Outline Description Requirements Interfaces Block Diagram Usage.
L/O/G/O Input Output Chapter 4 CS.216 Computer Architecture and Organization.
Computer Architecture Lecture 32 Fasih ur Rehman.
Thomas Schwarz, S.J. Qin Xin, Ethan Miller, Darrell Long, Andy Hospodor, Spencer Ng Summarized by Leonid Kibrik.
LaRC MAPLD 2005 / A208 Ng 1 Radiation Tolerant Intelligent Memory Stack (RTIMS) Tak-kwong Ng, Jeffrey Herath Electronics Systems Branch Systems Engineering.
1 CzajkowskiMAPLD 2005/138 Radiation Hardened, Ultra Low Power, High Performance Space Computer Leveraging COTS Microelectronics With SEE Mitigation D.
1 Lecture 27: Disks Today’s topics:  Disk basics  RAID  Research topics.
A Simplified Approach to Fault Tolerant State Machine Design for Single Event Upsets Melanie Berg.
Chapter 5 Internal Memory. contents  Semiconductor main memory - organisation - organisation - DRAM and SRAM - DRAM and SRAM - types of ROM - types of.
Gunjeet Kaur Dronacharya Group of Institutions. Outline I Random-Access Memory Memory Decoding Error Detection and Correction Read-Only Memory Programmable.
Objectives : At the end of this lesson, students should be able to : i.Identify the types of memory chip and their functions. ii.Define the difference.
Computer Architecture Chapter (5): Internal Memory
Actel Antifuse FPGA Information – Radiation Tests Actel Antifuse FPGA – A54SX72A 72K gates 208 pqfp package 2.5v to 5.0v I/O tolerant $62 each for tested.
Xilinx V4 Single Event Effects (SEE) High-Speed Testing Melanie D. Berg/MEI – Principal Investigator Hak Kim, Mark Friendlich/MEI.
Chapter 7 Input/Output and Storage Systems. 2 Chapter 7 Objectives Understand how I/O systems work, including I/O methods and architectures. Become familiar.
P201-L/MAPLD SEE Validation of SEU Mitigation Methods for FPGAs Carl Carmichael 1, Sana Rezgui 1, Gary Swift 2, Jeff George 3, & Larry Edmonds 2.
William Stallings Computer Organization and Architecture 7th Edition
SEU Mitigation Techniques for Virtex FPGAs in Space Applications
William Stallings Computer Organization and Architecture 7th Edition
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
Flash EEPROM Emulation Concepts
William Stallings Computer Organization and Architecture 8th Edition
Introduction I/O devices can be characterized by I/O bus connections
Influence of Cheap and Fast NVRAM on Linux Kernel Architecture
William Stallings Computer Organization and Architecture 7th Edition
William Stallings Computer Organization and Architecture 8th Edition
BIC 10503: COMPUTER ARCHITECTURE
CSE 451: Operating Systems Winter 2009 Module 13 Redundant Arrays of Inexpensive Disks (RAID) and OS structure Mark Zbikowski Gary Kimura 1.
Design of a ‘Single Event Effect’ Mitigation Technique for Reconfigurable Architectures SAJID BALOCH Prof. Dr. T. Arslan1,2 Dr.Adrian Stoica3.
Mark Zbikowski and Gary Kimura
William Stallings Computer Organization and Architecture 8th Edition
Xilinx Kintex7 SRAM-based FPGA
Presentation transcript:

FLASH Mitigation Strategies for Space Applications Charles Howard Southwest Research Institute

2 FLASH Mitigation Strategies for Space Applications Abstract The MMS mission requires a high density non-volatile solid state recorder. The SSR will be implemented with screened commercial FLASH devices, characterized for radiation effects (both TID and SEE). In an extensive collaborative effort by NEPP and SWRI, multiple manufacturers and devices have been characterized. The additional SEU failure modes exhibited by FLASH devices compel mitigation techniques to extend beyond the traditional bit error correction. A discussion of mitigation techniques and tradeoffs between FPGA complexity/utilization, bandwidth and total memory will be presented.

3 FLASH Mitigation Strategies for Space Applications Why FLASH? “I am your density” –George McFly, Back to the Future SDRAM –512Mx8 in an MCM? SRAM –Yeah, right… FLASH –512Mx8 discrete parts 1Gx8 available –4Gx8 MCMs (8Gx8 possible) –NON-VOLATILE…

4 FLASH Mitigation Strategies for Space Applications Why NOT FLASH? Space qualified parts? –General availability sorely lacking –No Rad foundry providing FLASH Legacy / Lack thereof –Radiation testing of commercial products is a strenuous process… –Each wafer lot must be tested –“Long term” availability for commodity parts? NOT!

5 FLASH Mitigation Strategies for Space Applications NEPP/SWRI testing of FLASH SEE response is generally excellent for all flash products –Error cross-sections orders of magnitude lower than for standard volatile memories None of the parts suffered SEL –There were other destructive effects, usually failure of the erase circuit. The SEFI rate is a concern with flash memories. –What do you call a SEFI that won’t clear after a power cycle?

6 FLASH Mitigation Strategies for Space Applications FLASH Memory in Space Environment “The SEFI (Single Event Functional Interrupt) rate is of greater concern for space applications than the bit error rate” –TID and SEE Response of Advanced 4G NAND Flash Memories NSREC08, T.R. Oldham

7 FLASH Mitigation Strategies for Space Applications Mitigation Considerations Class of Error –SEUs –SEL –SEFI –“Permanent” SEFI Cost of implementation/mitigation –Area –Mass –Power –Required FPGA logic

8 FLASH Mitigation Strategies for Space Applications Error Classes SEU –Address to satisfy MAR Some form of ECC SEL –Sufficiently low to neglect Component design issue SEFI (part becomes nominal after power cycle/reset) –More likely than SEU, must address Detect & power cycle/reset Permanent SEFI –More likely than SEU, must address –Different mitigation approach! ???

9 FLASH Mitigation Strategies for Space Applications Module Topology

10 FLASH Mitigation Strategies for Space Applications CAVEAT STATEMENTS I am not doing the probability calculations Consider a DWORD storage system for reference Permanent SEFIs are not recoverable: –Loss of Erase, Write or Read Circuit –Can approximate the loss of a component Block based failures and permanent SEFIs are roughly equivalent –Lose a “unit” of data (BLOCK x 4 x n) ~ “component” Simple addressing and memory management –No exotic stuff like link listing

11 FLASH Mitigation Strategies for Space Applications Design Options Unmitigated SEC/DED (Traditional EDAC) Reed-Solomon Parallel Reed-Solomon TMR Redundancy ECC “Plus”

12 FLASH Mitigation Strategies for Space Applications Unmitigated 0% more memory –Area / Power / Mass 1x Implementation concerns –Addressing schemeSimple –Memory management metricsSimple Utilization -- logic required to implement –I/O count1x –GatesBaseline Susceptibility –BitAny Single Bit Error –Byte or componentNOPE…

13 FLASH Mitigation Strategies for Space Applications SEC/DED 25% more memory –Area / Power / Mass 1.25x Implementation concerns –Addressing schemeSimple –Memory management metricsSimple Utilization -- logic required to implement –I/O count1.25x –GatesHamming cost Susceptibility (Immunity) –BitAny Single Bit Error –Byte or componentNOPE…

14 FLASH Mitigation Strategies for Space Applications Reed Solomon (Block) 25% more memory –Area / Power / Mass 1.25x Implementation concerns –Addressing schemeStraightforward –Memory management metricsSimple Utilization -- logic required to implement –I/O count1.25x –GatesEncoder/Decoder/RAM –BandwidthLikely Adverse Susceptibility(Immunity) –Bit, byteMany/codeblock –Component failuresNOPE…

15 FLASH Mitigation Strategies for Space Applications Parallel Reed Solomon 50% more memory –Area / Power / Mass 1.5x Implementation concerns –Addressing schemeSimple –Memory management metrics Utilization -- logic required to implement –I/O count1.5x –GatesEncoder/Decoder Susceptibility (Immunity) –Bit, byte, byte “plus”YEAH! –SOME component failures2/3 (NOT IN THE RS)

16 FLASH Mitigation Strategies for Space Applications TMR 200% more memory –Area / Power / Mass 3x Implementation concerns –Addressing schemeSimple –Memory management metricsSimple Utilization -- logic required to implement –I/O count3X or TDM –Bus loading / signal integrityOuch… –GatesVoters (plus) Susceptibility (Immunity) –Bit, byte or componentOH, YEAH! We can handle anything!

17 FLASH Mitigation Strategies for Space Applications Redundant Memory X% more memory –Area / Power / Mass X Implementation concerns –Addressing schemeSimple –Memory management metricsSimple Utilization -- logic required to implement –I/O countX –GatesMinimal Susceptibility (Immunity) –Bit, byte or componentNope.

18 FLASH Mitigation Strategies for Space Applications ECC with Warm Spare 25-50% more memory per dword –Area / Power / Mass 1.5x Implementation concerns –Addressing schemeSimple –Memory management metricsStraightforward Utilization -- logic required to implement –I/O count1.5x –Bus loading / signal integrity –GatesECC & steering Susceptibility (Immunity) –Bit, byte or componentOH, YEAH! We can handle anything!

19 FLASH Mitigation Strategies for Space Applications Memory Topology

20 FLASH Mitigation Strategies for Space Applications Failure 1

21 FLASH Mitigation Strategies for Space Applications Failure 2

22 FLASH Mitigation Strategies for Space Applications Observations ECC covers SEU errors Warm Spare compensates for SEFIs and block errors ECC with Warm Spare is a superior option –Susceptibility to permanent SEFIs plummets –Memory availability remains near 100% Block based errors mapped to spare SEFI based errors map to spare ECC with Warm Spare is roughly equivalent to full TMR at half the power, mass, area, and cost

23 FLASH Mitigation Strategies for Space Applications Summary Memory modules allow highest density/area Mitigation is user’s choice depending upon design goals but must cover SEFI and SEU ECC with Warm Spare is roughly equivalent to full TMR at half the power, mass, area, and cost TID and SEE Response of an Advanced Samsung 4Gb NAND Flash Memory (NSREC07); T. R. Oldham, M. Friendlich, J. W. Howard, Jr., M. D. Berg, H. S. Kim, T. L. Irwin, and K. A. LaBel TID and SEE Response of Advanced 4G NAND Flash Memories (NSREC08); T. R. Oldham, Fellow, IEEE, M. Suhail, M. R. Friendlich, M. A. Carts, R.L. Ladbury, Member, IEEE, H. S. Kim, M. D. Berg, C. Poivey, Member, IEEE, S. P. Buchner, Member, IEEE, A. B. Sanders, C. M. Seidleck, and K. A. LaBel, Member, IEEE SEE and TID of Emerging Non-Volatile Memories; D.N. Nguyen and L.Z. Scheick, Jet Propulsion Laboratory California Institute of Technology, A Case Study of Single Event Functional Interrupts (SEFIs) in COTS SDRAMS (NSREC08); Joe Benedetto and George Ott, Radiation Assured Devices