1 Complete Information Flow Tracking from the Gates Up Tiwari, Wassel, Mazloom, Mysore, Chong, Sherwood, UCSB, ASPLOS 2009 Shimin Chen LBA Reading Group.

Slides:



Advertisements
Similar presentations
Chapter # 4 BIS Database Systems
Advertisements

1 Knowledge Representation Introduction KR and Logic.
Analysis of Computer Algorithms
Bounded Model Checking of Concurrent Data Types on Relaxed Memory Models: A Case Study Sebastian Burckhardt Rajeev Alur Milo M. K. Martin Department of.
Cognitive Computing 2013 Consciousness and Computations 7. THE SELF APPLICABILITY PROBLEM (SAP); PENROSE ON UNDERSTANDING UNDERSTANDING Mark Bishop.
Turing Machines January 2003 Part 2:. 2 TM Recap We have seen how an abstract TM can be built to implement any computable algorithm TM has components:
Bimode Cascading: Adaptive Rehashing for ITTAGE Indirect Branch Predictor Y.Ishii, K.Kuroyanagi, T.Sawada, M.Inaba, and K.Hiraki.
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors Onur Mutlu, The University of Texas at Austin Jared Start,
Supporting Security at the Gate Level: Opportunities and Misconceptions Tim Sherwood UC Santa Barbara.
Course Outline Presentation Term: F09 Faculty Name : Asma Sanam Larik Course Name :INTRO TO COMPUTING Course Code : CSE145 Section :1 Semester : 1.
Part 4: combinational devices
Problems and Their Classes
How SAS implements structured programming constructs
COMMUNICATING SEQUENTIAL PROCESSES C. A. R. Hoare The Queen’s University Belfast, North Ireland.
1 Turing Machines and Equivalent Models Section 13.2 The Church-Turing Thesis.
Combinational Logic.
RISC and Pipelining Prof. Sin-Min Lee Department of Computer Science.
Chapter 10- Instruction set architectures
CompSci A Peek at the Lower Levels  It is good to have a sense of what happens at the hardware level  Not required for this course  It may.
CS420 lecture one Problems, algorithms, decidability, tractability.
1 Section 14.1 Computability Some problems cannot be solved by any machine/algorithm. To prove such statements we need to effectively describe all possible.
Relational Data Mining in Finance Haonan Zhang CFWin /04/2003.
The Programming Discipline Professor Stephen K. Kwan 2010 Things you need to know (learn) for developing large computer programs.
Wish Branches A Review of “Wish Branches: Enabling Adaptive and Aggressive Predicated Execution” Russell Dodd - October 24, 2006.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
Code Generation for Basic Blocks Introduction Mooly Sagiv html:// Chapter
Microprocessors Introduction to ia64 Architecture Jan 31st, 2002 General Principles.
Chapter 4 Processor Technology and Architecture. Chapter goals Describe CPU instruction and execution cycles Explain how primitive CPU instructions are.
Chapter 11: Limitations of Algorithmic Power
Henry Hexmoor1 Chapter 10- Control units We introduced the basic structure of a control unit, and translated assembly instructions into a binary representation.
Embedded Systems: Hardware Computer Processor Basics ISA (Instruction Set Architecture) RTL (Register Transfer Language) Main reference: Peckol, Chapter.
The Processor Data Path & Control Chapter 5 Part 1 - Introduction and Single Clock Cycle Design N. Guydosh 2/29/04.
Theory of Computation. Computation Computation is a general term for any type of information processing that can be represented as an algorithm precisely.
Dr. Abdel-Rahman Al-Qawasmi
Digital System Design EEE344 Lecture 3 Introduction to Verilog HDL Prepared by: Engr. Qazi Zia, Assistant Professor EED, COMSATS Attock1.
Final Exam Review Instructor : Yuan Long CSC2010 Introduction to Computer Science Apr. 23, 2013.
CC0002NI – Computer Programming Computer Programming Er. Saroj Sharan Regmi Week 7.
Introspective 3D Chips S. Mysore, B. Agrawal, N. Srivastava, S. Lin, K. Banerjee, T. Sherwood (UCSB), ASPLOS 2006 Shimin Chen (LBA Reading Group Presentation)
Lecture 4. RAM Model, Space and Time Complexity
PhD Defense Mohit Tiwari University of California, Santa Barbara Design and Verification of Information Flow Secure Systems.
Static Program Analyses of DSP Software Systems Ramakrishnan Venkitaraman and Gopal Gupta.
Gary MarsdenSlide 1University of Cape Town Chapter 5 - The Processor  Machine Performance factors –Instruction Count, Clock cycle time, Clock cycles per.
1 Section 13.2 The Church-Turing Thesis The Church-Turing Thesis: Anything that is intuitively computable can be be computed by a Turing machine. It is.
1 Text Reference: Warford. 2 Computer Architecture: The design of those aspects of a computer which are visible to the programmer. Architecture Organization.
Confidentiality/date line: 13pt Arial Regular, white Maximum length: 1 line Information separated by vertical strokes, with two spaces on either side Disclaimer.
INTRODUCTION Programming – process of composing several instructions to perform certain tasks. Program – product of programming which contains several.
Complexity & Computability. Limitations of computer science  Major reasons useful calculations cannot be done:  execution time of program is too long.
Simultaneous Information Flow Security and Circuit Redundancy in Boolean Gates Ryan Kastner Department of Computer Science & Engineering.
Digital Computer Concept and Practice Copyright ©2012 by Jaejin Lee Control Unit.
CompSci Today’s topics Machine Architecture The basic machine Basic programming Assembler programming Upcoming Language Translation Reading Great.
TRUSTED FLOW: Why, How and Where??? Moti Yung Columbia University.
1 Turing Machines and Equivalent Models Section 13.1 Turing Machines.
1 Arithmetic Where we've been: –Abstractions: Instruction Set Architecture Assembly Language and Machine Language What's up ahead: –Implementing the Architecture.
Control units In the last lecture, we introduced the basic structure of a control unit, and translated our assembly instructions into a binary representation.
IA64 Complier Optimizations Alex Bobrek Jonathan Bradbury.
Systems Architecture, Fourth Edition 1 Processor Technology and Architecture Chapter 4.
Mid-Year Review. Coding Problems In general, solve the coding problems by doing it piece by piece. Makes it easier to think about Break parts of code.
Digital Computer Concept and Practice Copyright ©2012 by Jaejin Lee Control Unit.
1  2004 Morgan Kaufmann Publishers No encoding: –1 bit for each datapath operation –faster, requires more memory (logic) –used for Vax 780 — an astonishing.
Logic Gates and Boolean Algebra Introduction to Logic II.
Computers’ Basic Organization
Chapter 2.3 Binary Logic.
Topic 5: Processor Architecture Implementation Methodology
Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt
Boolean Logic Boolean Logic is considered to be the basic of digital electronics. We know that a computer’s most basic operation is based on digital electronics.
Topic 5: Processor Architecture
Branch instructions We’ll implement branch instructions for the eight different conditions shown here. Bits 11-9 of the opcode field will indicate the.
Control units In the last lecture, we introduced the basic structure of a control unit, and translated our assembly instructions into a binary representation.
Review: The whole processor
rePLay: A Hardware Framework for Dynamic Optimization
Presentation transcript:

1 Complete Information Flow Tracking from the Gates Up Tiwari, Wassel, Mazloom, Mysore, Chong, Sherwood, UCSB, ASPLOS 2009 Shimin Chen LBA Reading Group

2 Introduction In a traditional microprocessor, information is leaked practically everywhere and by everything Can be a serious problem for exceptionally sensitive financial, military, and personal data Cryptography, authentication Developers in these domains are willing to go to remarkable lengths to minimize the amount of leaked information flushing the cache before and after executing a piece of critical code (Osvik et al. 2006) attempting to scrub the branch predictor state (Aciicmez et al. 2007) normalizing the execution time of loops by hand (Kocher 1996) randomizing or prioritizing the placement of data into the cache (Lee et al. 2005) Previous works on DIFT are not adequate

3 GLIFT: Gate-Level Information-Flow Tracking This paper: presents a processor architecture and implementation can track all information flows A novel logic discipline: GLIFT logic Augment arbitrary logic blocks with tracking logic Make compositions of augmented blocks Synthesizable processor implementation with a restricted ISA Provably-sound information-flow tracking Allow tasks such as public-key cryptography and message authentication

4 Theoretical Understanding In a Turing-complete machine, the general problem of determining whether information flows in a program from variable x to variable y is undecidable: any procedure purported to decide it could be applied to the statement if f(x) halts then y := 0 and thus provide a solution to the halting problem for arbitrary recursive function (Denning and Denning 1977). The paper builds a machine: by construction, will not allow unbounded execution All hidden flows of information are made explicit

5 Outline Introduction Gate Level Information Flow Tracking Architecture Evaluation Conclusions

6 Idea Understand how information flows through primitive logic gates Compose these gates together into more complex structures Treat the whole processor as a logical function Operates on a set of inputs Results in a set of outputs The trust of outputs should be determined based on the trust of inputs Assumption: Binary state: trusted (0) or untrusted (1)

7 GLIFT for an AND gate AND Gate AND Gate Truth Table Shadow logic for AND Gate Partial truth table for the shadow logic

8 Composing Larger Functions Use MUX as a simple example The shadow logic can be composed from shadow logics of gates Not minimum but always sound, for example, the two inputs to the OR gate cannot be both 1 If S is trusted and the selected input is trusted, o is trusted If S is untrusted, o is untrusted unless both a and b are trusted and are equal

9 Outline Introduction Gate Level Information Flow Tracking Architecture Evaluation Conclusions

10 Step 1: Handling Conditionals Problem with conventional architecture If X is untrusted, then PC becomes untrusted Selected instruction becomes untrusted Bits that select target register are untrusted All of the registers may be marked as untrusted Must keep PC trusted

11 Solution: Predication All the instructions are executed If predicate is 0, the instruction does not have effects: target register is not overwritten PC is trusted Predicates can become untrusted Suppose P0 is untrusted

12 Example The line selecting R2 is untrusted The other control lines are trusted R2 will be marked untrusted no matter P0= 0 or 1 End result: no matter the untrusted predicate is true or not, the destination is marked as untrusted. target

13 Step 2: Handling Loops Loops are hard for (i=0; i<=X; i++) A[i]=1; Information flow from X to A[X+1] A[X+1]==0 tells us about X Information flow from X to A[X+n] for all n Implicit timing channel

14 Solution: Statically Specify Number of Iterations countjump instruction: Specify number of loop iterations jump target address Example (my understanding from the description) Loop start address: …… …… countjump # iterations, loop start address The first time countjump is encountered, the # iterations is loaded into an internal loop counter register The loop counter register is decremented every time countjump is encountered, and PC loop start address When the register becomes 0, PC PC + 1 countjump cannot be predicated

15 Early Termination In C, we have break statement that can terminate a loop early Here, the paper proposes: Predicate all the instructions in the loop with the termination condition When the termination condition becomes true, the loop body does not have effects

16 Step 3: Constraining Loads and Stores Indirect loads and stores are bad e.g., M[reg] value If reg is untrusted, then essentially all the memory locations become untrusted Intuitively, the problem is that accessing one untrusted address causes every other address to become implicitly untrusted by virtue of them not being accessed or modified. Limit the ISA to only allow: Direct load/store: addresses are immediate constants Loop-relative addressing: load-looprel, store-looprel e.g., load-looprel R0, 0x100, C0 Loads M[0x100 + C0] C0..C7 are counters: explicitly initialized by init-counter, and incremented by a fixed value w/ increment-counter counter operations cannot be predicated

17 Proof-of-Concept Implementation Verilog Use Alteras QuartusII software to synthesize it onto a Stratix II FPGA 32-bit machine 64KB Instruction memory, 64KB Data Memory Registers: A program counter 8 general purpose registers 2 predicate registers 8 registers to store loop counters (that count down the number of iterations) 8 other registers to store explicit array indices (used as offsets for load-looprel and store-looprel instructions). No pipelining

18 Augment the Processor with GLIFT Logic Each bit of processor state is explicitly shadowed: every register gets a shadow register every memory has a shadow RAM The logic and signals are shadowed by generating the proper trust propagation logic

19 ISA

20 A code snippet from the SubBytes function in AES encryption algorithm Basically this is the following in C: for (i=0; i<16; i++) { state[i] = SBox[state[i]]; }

21 Outline Introduction Gate Level Information Flow Tracking Architecture Evaluation Conclusions

22 Hardware Impact Alteras Nios is a commercial product: RISC instruction set, reasonably optimized Nios econ: unpipelined 6 stage core, without caches, branch-predictors etc. Nios std: pipelined, 4KB instruction cache GLIFT base: unpipelined, no tracking GLIFT full: GLIFT base + tracking

23 Hardware Impact 70 % area increase compared to GLIFT base Small frequency degradation: adding GLIFT tracking does not have big impact on the latency

24 Application Kernels Dynamic instruction counts vary substantially FSM and AES have a lot of table look-ups, which become full table iterations

25 Conclusions Bigger, slower, harder to program, and computationally less powerful For the first time, provides the ability to account for all information flows through the chip. My learning: Understanding deeper about information leaks Efforts to prevent leaks are very significant Sacrifice programmability: restrictions on loop, load/store Proof-of-concept does not even talk about issues such as cache