A 3D Data Transformation Processor Dimitrios Megas, Kleber Pizolato, Timothy Levin, and Ted Huffmire WESS 2012 October 11, 2012.

Slides:



Advertisements
Similar presentations
TIE Extensions for Cryptographic Acceleration Charles-Henri Gros Alan Keefer Ankur Singla.
Advertisements

Jared Casper, Ronny Krashinsky, Christopher Batten, Krste Asanović MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA A Parameterizable.
A reconfigurable system featuring dynamically extensible embedded microprocessor, FPGA, and customizable I/O Borgatti, M. Lertora, F. Foret, B. Cali, L.
Khaled A. Al-Utaibi  Computers are Every Where  What is Computer Engineering?  Design Levels  Computer Engineering Fields  What.
Using Hardware Vulnerability Factors to Enhance AVF Analysis Vilas Sridharan RAS Architecture and Strategy AMD, Inc. International Symposium on Computer.
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
CSC457 Seminar YongKang Zhu December 6 th, 2001 About Network Processor.
MotoHawk Training Model-Based Design of Embedded Systems.
Hardware Support for Trustworthy Systems Ted Huffmire ACACES 2012 Fiuggi, Italy.
Addressing Supply Chain Security with Split Manufacturing Ted Huffmire Summer UCI CECS Seminar July 27, 2012.
3Dsec: Trustworthy System Security through 3-D Integrated Hardware Ted Huffmire 14 January 2009.
3D-MAPS: 3D Massively Parallel Processor with Stacked Memory Dae Hyun Kim, Krit Athikulwongse, Michael Healy, Mohammad Hossain, Moongon Jung, et al. Georgia.
Zheming CSCE715.  A wireless sensor network (WSN) ◦ Spatially distributed sensors to monitor physical or environmental conditions, and to cooperatively.
Processor Technology and Architecture
Energy Evaluation Methodology for Platform Based System-On- Chip Design Hildingsson, K.; Arslan, T.; Erdogan, A.T.; VLSI, Proceedings. IEEE Computer.
Lesson 11-Virtual Private Networks. Overview Define Virtual Private Networks (VPNs). Deploy User VPNs. Deploy Site VPNs. Understand standard VPN techniques.
Chapter 13 Embedded Systems
Chapter 4 Processor Technology and Architecture. Chapter goals Describe CPU instruction and execution cycles Explain how primitive CPU instructions are.
Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Project performed by: Naor Huri Idan Shmuel.
I/O Subsystem Organization and Interfacing Cs 147 Peter Nguyen
Mining Behavior Models Wenke Lee College of Computing Georgia Institute of Technology.
UCB November 8, 2001 Krishna V Palem Proceler Inc. Customization Using Variable Instruction Sets Krishna V Palem CTO Proceler Inc.
1 RAMP Infrastructure Krste Asanovic UC Berkeley RAMP Tutorial, ISCA/FCRC, San Diego June 10, 2007.
Using Programmable Logic to Accelerate DSP Functions 1 Using Programmable Logic to Accelerate DSP Functions “An Overview“ Greg Goslin Digital Signal Processing.
General FPGA Architecture Field Programmable Gate Array.
Measuring zSeries System Performance Dr. Chu J. Jong School of Information Technology Illinois State University 06/11/2012 Sponsored in part by Deer &
Secure Embedded Processing through Hardware-assisted Run-time Monitoring Zubin Kumar.
 Chasis / System cabinet  A plastic enclosure that contains most of the components of a computer (usually excluding the display, keyboard and mouse)
Department of Computer and Information Science, School of Science, IUPUI Dale Roberts, Lecturer Computer Science, IUPUI CSCI.
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.
Computer System Architectures Computer System Software
Bootstrapping Trust in Commodity Computers Bryan Parno, Jonathan McCune, Adrian Perrig 1 Carnegie Mellon University.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
Hardware Trust Implications of 3-D Integration Ted Huffmire (NPS), Timothy Levin (NPS), Michael Bilzor (NPS), Cynthia E. Irvine (NPS), Jonathan Valamehr.
Introspective 3D Chips S. Mysore, B. Agrawal, N. Srivastava, S. Lin, K. Banerjee, T. Sherwood (UCSB), ASPLOS 2006 Shimin Chen (LBA Reading Group Presentation)
Hardware Support for Trustworthy Systems Ted Huffmire ACACES 2012 Fiuggi, Italy.
Lecture 18 Lecture 18: Case Study of SoC Design ECE 412: Microcomputer Laboratory.
1 3-General Purpose Processors: Altera Nios II 2 Altera Nios II processor A 32-bit soft core processor from Altera Comes in three cores: Fast, Standard,
Architecture for Protecting Critical Secrets in Microprocessors Ruby Lee Peter Kwan Patrick McGregor Jeffrey Dwoskin Zhenghong Wang Princeton Architecture.
Eric Keller, Evan Green Princeton University PRESTO /22/08 Virtualizing the Data Plane Through Source Code Merging.
Computer Science Open Research Questions Adversary models –Define/Formalize adversary models Need to incorporate characteristics of new technologies and.
1 Architectural Support for Copy and Tamper Resistant Software David Lie, Chandu Thekkath, Mark Mitchell, Patrick Lincoln, Dan Boneh, John Mitchell and.
IVEC: Off-Chip Memory Integrity Protection for Both Security and Reliability Ruirui Huang, G. Edward Suh Cornell University.
The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
Threats and Challenges in FPGA Security Ted Huffmire Naval Postgraduate School December 10, 2008.
SHA-3 Candidate Evaluation 1. FPGA Benchmarking - Phase Round-2 SHA-3 Candidates implemented by 33 graduate students following the same design.
J. Christiansen, CERN - EP/MIC
Hardware Support for Trustworthy Systems Ted Huffmire ACACES 2012 Fiuggi, Italy.
Buffer-On-Board Memory System 1 Name: Aurangozeb ISCA 2012.
Topics of presentation
Computer Organization and Design Computer Abstractions and Technology
Performance Characterization and Architecture Exploration of PicoRadio Data Link Layer Mei Xu and Rahul Shah EE249 Project Fall 2001 Mentor: Roberto Passerone.
出處 :2010 2nd International Conference on Signal Processing Systems (ICSPS) 作者 :Zhidong Shen 、 Qiang Tong 演講者 : 碩研資管一甲 吳俊逸.
MAPLD 2005/254C. Papachristou 1 Reconfigurable and Evolvable Hardware Fabric Chris Papachristou, Frank Wolff Robert Ewing Electrical Engineering & Computer.
Computer Organization & Assembly Language © by DR. M. Amer.
ATtiny23131 A SEMINAR ON AVR MICROCONTROLLER ATtiny2313.
Moats and Drawbridges: An Isolation Primitive for Reconfigurable Hardware Based Systems Ted Huffmire, Brett Brotherton, Gang Wang, Timothy Sherwood, Ryan.
Field Programmable Port Extender (FPX) 1 Modular Design Techniques for the FPX.
Department of Computer Science and Engineering Applied Research Laboratory Architecture for a Hardware Based, TCP/IP Content Scanning System David V. Schuehler.
Authors – Jeahyuk huh, Doug Burger, and Stephen W.Keckler Presenter – Sushma Myneni Exploring the Design Space of Future CMPs.
R ECONFIGURABLE SECURITY SUPPORT FOR EMBEDDED SYSTEMS 1 AKSHATA VARDHARAJ.
Sunpyo Hong, Hyesoon Kim
Company LOGO Network Management Architecture By Dr. Shadi Masadeh 1.
Introduction to Computers - Hardware
Architecture & Organization 1
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
Architecture & Organization 1
Protect Your Hardware from Hacking and Theft
Chapter 1 Introduction.
Presentation transcript:

A 3D Data Transformation Processor Dimitrios Megas, Kleber Pizolato, Timothy Levin, and Ted Huffmire WESS 2012 October 11, 2012

Disclaimer The views presented in this talk are those of the speaker and do not necessarily reflect the views of the United States Department of Defense or the National Science Foundation.

Split Manufacturing Face-to-Back (F2B) Bonding

Basic Idea Combine using 3D integration: – Processor – Compression coprocessor – Cryptographic coprocessor

Basic Idea CPU Layer + Coprocessor Layer

Basic Idea Real-time trace collection – Compress trace prior to transmission to off-chip storage for offline program analysis Optional encryption step can protect the compressed data from interception – High-performance stand-alone encryption service – XTRec: Secure Real-time Execution Trace Recording on Commodity Platforms (CMU) – Trusted computing: mitigate glitch attack against TPM (runtime hash of memory, capture sequence of instructions executed)

Basic Idea Real-time trace collection – The amount of data collected depends on the granularity of the collection and the speed of the system – Monitoring and collecting more signals results in a larger data stream

Outline Motivation and Background Design Goals Design Choices System Architecture Conclusions and Future Work

Outline Motivation and Background Design Goals Design Choices System Architecture Conclusions and Future Work

Cryptographic Coprocessing 3D vs. 2D

Medical Image Processing [Cong 2011]

3D-MAPS V1 vs V2 Georgia Tech [Kim et al., ISSCC 2012] 3D-MAPS V13D-MAPS V2 # of tiers2 (1 logic, 1 SRAM)5 (2 logic, 3 DRAM) # of cores64128 Memory capacity256KB SRAM256MB DRAM & 512KB SRAM Logic footprint5mm X 5mm10mm X 10mm DRAM footprint-20mm X 12mm Bonding styleF2FF2F and F2B TSV/F2F usage~ 50K / ~50K~ 150K / ~185K Memory access*2048 bit/cycle SRAM1024 bit/cycle DRAM freq / power277MHz / 4.0W175MHz / 10.4W * Wide-I/O allows 512 bit/cycle DRAM access

Stack Up Comparison TSV usage – 3D-MAPS V1: For I/O (204 redundancy) – 3D-MAPS V2: For I/O (204 redundancy) and DRAM access (9 redundancy)

What is 3Dsec? Economics of High Assurance –High NRE Cost, Low Volume –Gap between DoD and Commercial Disentangle security from the COTS –Use a separate chip for security –Use 3-D Integration to combine: Control Plane Computation Plane –Need to add posts to the COTS chip design Dual use of computation plane

Pro’s and Con’s Why not use a co-processor? On-chip? Pro’s –High bandwidth and low latency –Controlled lineage –Direct access to internal structures Con’s –Thermal and cooling –Design and testing –Manufacturing yield

Cost Cost of fabricating systems with 3-D –Fabricating and testing the security layer –Bonding it to the host layer –Fabricating the vias –Testing the joined unit

Circuit-Level Modifications Passive vs. Active Monitoring –Tapping –Re-routing –Overriding –Disabling

3-D Application Classes Enhancement of native functions Secure alternate service Isolation and protection Passive monitoring –Information flow tracking –Runtime correctness checks –Runtime security auditing

Outline Motivation and Background Design Goals Design Choices System Architecture Conclusions and Future Work

Design Goals High Performance Ability to gather and compress architectural state of a processor at runtime

Outline Motivation and Background Design Goals Design Choices System Architecture Conclusions and Future Work

Design Choices Manufacturing process – Face-to-face (F2F) Compression algorithm/hw – Two stages: filtering + general-purpose Crypto algorithm/hw – AES-128, SHA-1, SHA-512 Interface between planes – 128 F2F vias up, 32 down (direct connection)

Design Choices Other Issues – Coordination between planes Control words in special registers – Interface within control plane Output of compression  input of crypto – Delivery of I/O and power Use existing capability of computation plane – Computation plane hardware High-performance general-purpose processor – Clock synchronization Tree network

Compression Study Use TCgen to compress a set of trace files generated using Pin – Traces capture memory access behavior of various Linux applications Vary parameters of TCgen for each field – TCgen is prediction-based compression – Which algorithm is most effective? Apply general-purpose compression in second stage (gzip)

Trace Files (generated by Pin) Instruction CountPC ADDRESSSize 8 0x52d70b0x5913c x543cc60xbff x543cc70xbff x52d6bb0xbff1025c4 330x52d6be0xbff x52d6c20xbff x52d6c80xbff x52d6c90xbff1026c4 370x9bcb440xa1a x6eb1260xbff102684

PC Field Number of correct predictions (%) for each configuration of TCgen when compressing the PC field (average of all 5 trace files)

Data Address Field Number of correct predictions (%) for each configuration of TCgen when compressing address field (average of all 5 trace files)

PC Field Compression ratio for the PC field

Data Address Field Compression ratio for the data address field

Outline Motivation and Background Design Goals Design Choices System Architecture Conclusions and Future Work

Computation Plane CPU

Control Plane Compression coprocessor (DFCM + gzip)

Control Plane gzip unit (within compression coprocessor)

Control Plane AES/SHA

Control Plane Microprocessor interface unit

Full 3D System 3D IC

Outline Motivation and Background Design Goals Design Choices System Architecture Conclusions and Future Work

Conclusions Applications: trusted computing, reverse engineering of malicious software, post-mortem analysis of system that has suffered an attack Simple preprocessing can decrease bandwidth (also gives power advantages) There is much to do before making silicon. It is useful to quantify the high-level tradeoffs: – Data to compress – Sampling rate – Number of TSVs – Throughput

Future Work Independent I/O and power delivery – How to share the I/O of computation plane? Floor Planning – How much logic/memory can you fit between the TSVs? It would be helpful for the 3D chip to be pin- compatible with the 2D package. – Use a network/share the TSVs? Joining dissimilar technology nodes – Use buffers, redundant hardware

Future Work More types of trace files – General-purpose interface, migration path – Can you test/verify computation plane without knowing what the control plane will be? – Characteristics of a “typical” trace file? Hierarchy of compression, for power not just for compression ratio? – Lossy compression?! Trust issues – Who generates the write signal? – How to protect the key? – Can monitored software turn off monitoring? Hardware implementation – Simulation – FPGA prototype – Tape-out

Split Manufacturing Discussion Points – Can we trust the result of split manufacturing? – Could this approach harm security? – Is it worth it? When is it worth it? – Why not use trusted foundry always? – Are trusted foundries a band aid solution to offshoring trend? – How to trust trusted foundry? – Why not use redundancy with majority vote? – Can we do everything from scratch?

Split Manufacturing Discussion Points – How to raise alarm if network interface is controlled by adversary? Use challenge-response protocols? – Security architecture Packaging considerations Distributed posts, policy state? If computation plane can perform AES, why perform AES in control plane?

Questions? faculty.nps.edu/tdhuffmi