Performing Security Auditing In Hardware

Slides:



Advertisements
Similar presentations
1 TCP - Part I Relates to Lab 5. First module on TCP which covers packet format, data transfer, and connection management.
Advertisements

Processor Technology and Architecture
1 COMP541 Sequencing – III (Sequencing a Computer) Montek Singh April 9, 2007.
Chapter 4 Processor Technology and Architecture. Chapter goals Describe CPU instruction and execution cycles Explain how primitive CPU instructions are.
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
Chapter 6 Memory and Programmable Logic Devices
Process-to-Process Delivery:
RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696
Chapter 5 Transport layer With special emphasis on Transmission Control Protocol (TCP)
Introduction to Networks CS587x Lecture 1 Department of Computer Science Iowa State University.
TCP : Transmission Control Protocol Computer Network System Sirak Kaewjamnong.
Vladimír Smotlacha CESNET Full Packet Monitoring Sensors: Hardware and Software Challenges.
Multiple-bus organization
BASIC COMPUTER ARCHITECTURE HOW COMPUTER SYSTEMS WORK.
Buffering Techniques Greg Stitt ECE Department University of Florida.
COMP541 Memories II: DRAMs
The Transport Layer Implementation Services Functions Protocols
System-on-Chip Design Homework Solutions
Chapter 13: I/O Systems Modified by Dr. Neerja Mhaskar for CS 3SH3.
Bus Systems ISA PCI AGP.
Computer Organization and Architecture + Networks
Variable Word Width Computation for Low Power
Control Unit Lecture 6.
Operating Systems (CS 340 D)
Introduction to microprocessor (Continued) Unit 1 Lecture 2
Chap 7. Register Transfers and Datapaths
Performance of Single-cycle Design
The 8085 Microprocessor Architecture
TCP.
Introduction of microprocessor
Cache Memory Presentation I
Register Transfer and Microoperations
Dr. Michael Nasief Lecture 2
FPGA Implementation of Multicore AES 128/192/256
Switching Techniques In large networks there might be multiple paths linking sender and receiver. Information may be switched as it travels through various.
IOS Network Model 2nd semester
CENTRAL PROCESSING UNIT CPU (microprocessor)
An Introduction to Microprocessor Architecture using intel 8085 as a classic processor
Introduction to Pentium Processor
Network Concepts Devices
LESSON 2.1_B Networking Fundamentals Understand Switches.
Number Representations and Basic Processor Architecture
Switching Techniques In large networks there might be multiple paths linking sender and receiver. Information may be switched as it travels through various.
Process-to-Process Delivery:
Instruction encoding We’ve already seen some important aspects of processor design. A datapath contains an ALU, registers and memory. Programmers and compilers.
Remote Management of the Field Programmable Port Extender (FPX)
The performance requirements for DSP applications continue to grow and the traditional solutions do not adequately address this new challenge Paradigm.
Switching Techniques.
Implementing an OpenFlow Switch on the NetFPGA platform
Five Key Computer Components
Memory Organization.
Guest Lecturer TA: Shreyas Chand
April 3 Fun with MUXes Implementing arbitrary logical functions
* From AMD 1996 Publication #18522 Revision E
Computer Architecture
The 8085 Microprocessor Architecture
Instruction encoding We’ve already seen some important aspects of processor design. A datapath contains an ALU, registers and memory. Programmers and compilers.
ECE 352 Digital System Fundamentals
Instruction Set Principles
TCP - Part I Relates to Lab 5. First module on TCP which covers packet format, data transfer, and connection management.
Memory System Performance Chapter 3
Levels in Processor Design
CPU Structure CPU must:
How does the CPU work? CPU’s program counter (PC) register has address i of the first instruction Control circuits “fetch” the contents of the location.
CHAPTER-3 REGISTER TRANSFER LANGUAGE AND MICROOPERATIONS
Process-to-Process Delivery: UDP, TCP
Chapter 13: I/O Systems.
Computer Operation 6/22/2019.
Transport Layer 9/22/2019.
Presentation transcript:

Performing Security Auditing In Hardware Fuzzing Processor Performing Security Auditing In Hardware TONY Tony Fynn Dustin Locke

Overview What is fuzzing? Project goals Architecture details Optimizations Performance Conclusion TONY

What is Fuzzing? Sending semi-random data to an application to try and make it misbehave Used to detect vulnerabilities TONY Send data that looks good, but can break the application Properly formatted, but with nasty values Used for in-the-field vulnerability assessment Can detect various vulnerabilities (buffer overflows, format strings, integer overflows, etc.) Generally the fuzzer is an intermediary. *Starts with some good data source *Intercepts and modifies it *Sends it on its way to the application *Waits to see what happens (hopefully the application crashes) In general, fuzzing is very effective and accounts for a large number of found vulnerabilities 11010010 01010101

Acknowledgment number Types of Fuzzing TCP Packet Source port Destination port Sequence number Acknowledgment number Hdr length Reserved/flags Window size Checksum Urgent pointer Dumb fuzzing Intelligent fuzzing Options Selectively fuzzes certain fields Naively fuzzes all data Data TONY *There are two types of fuzzing Dumb (or naïve) fuzzing Intelligent fuzzing *Dumb fuzzing just mangles all available data (that is, it is naïve about what data it corrupts) *Intelligent fuzzing uses knowledge of the data structure to fuzz only “interesting” fields, and leave other necessary fields alone (such as checksums, routing addresses, and ports)

Goals Ability to fuzz multiple types of data (robust) Intelligent fuzzing Using structural knowledge to our advantage High-speed The goal would be to have a network protocol fuzzer that accepts packets on one side, mangles them, and sends them on their way through the other side For our purposes, we perform the fuzzing operation on data from input files TONY There are fuzzers out there, but they are almost exclusively software applications SPIKE, Protos, Smudge, Peach, etc… A company called Security Innovations in Florida makes a “fuzzing appliance” called Hydra, but is just a linux box w/ 2 network cards (fuzzing is still done in software) So, we want to implement a fuzzer In hardware What do we want to do with this project? Don’t want to be limited to the type of data we can fuzz (e.g., we don’t want to build simply a TCP fuzzer). We want to build an “intelligent” fuzzer For live networked applications, we want it to be able to keep up with reasonably high-bandwidth line speeds *The canonical example would be a fuzzer that sits between two networked hosts and mangles data as it passes through Our goal is not to build such a network fuzzer, but to create the hardware that would make such a fuzzer possible Thus, our testing was done with input data files, not live network data 1011 0110 Fuzzer

Architecture Register File – 256-bit registers, 32-bit mask New Instructions – fzlw, fzsw, fuzz, mskh, mskl Fuzzing Unit 256-bit SRAM MUX mask MUX FUZZER SRAM 32 FUZZING UNIT 256 – BIT FUZZING REGISTERS 1 data 256 + addr wr_en PC ALU GENERAL REGISTERS DATA MEMORY MUX IMEM MUX

Fuzzing Unit Takes as input a data word and a mask specifying which bytes are “fuzzable” in the data word Generates a random number and XORs fuzzable data bytes with corresponding random number bytes 11010110 DUSTIN The operation of the fuzzing unit is fairly simple. Generate a random number internally Get the mask over the bus and expand it to its 256-bit representation (just duplicate each bit 8 times) And the random number with the mask expansion to get a temporary value Get the actual data value over the input bus, and XOR it with the temporary value to get the fuzzed result Send the result out over the output bus 00001111 00000110 11010000 11010110

Register File 256 bit word length Parallel 32-bit data/mask registers Read operation puts data word as well as its corresponding mask on the data output lines Register 1 Mask 1 Register 2 Mask 2 DUSTIN 8 parallel register pairs 256-bit data 32-bit mask Each bit in the mask corresponds to a byte of the data When a fuzzing operation is initiated, both the data and its corresponding mask are sent out to the fuzzing unit. The data is loaded from the special SRAM The mask is set manually by the programmer using “mask high” and “mask low” instructions Register 3 Mask 3 Register 4 Mask 4 … … Register 8 Mask 8

Optimizations Mask in register file is per byte, not per-bit Each bit masks an entire byte in the data word 256-bit random number generated from 32 parallel 8-bit random numbers Prevents an expensive 256-bit multiply Drastically reduces gate delay of fuzzer DUSTIN One of our goals was to perform fuzzing at relatively high speeds. To ensure this happened, we introduced some optimizations. Our original design used a bit-to-bit data mask for the data to be fuzzed. Loading the mask registers would take many instructions without a compact representation Logically, intelligent fuzzing is done based on “fields” which are generally at the byte granularity So, we modified the mask registers to be bit-to-byte, and thus only 32-bits wide instead of 256. Now setting the mask takes two instructions (one for each nibble, high and low) Secondly, we need essentially a 256-bit random number to XOR our data with. Random number is generated using a multiply and an add Doing a 256-bit multiply is prohibitively expensive Instead, we have effectively 32 8-bit random number generators These are combined to produce a single 256-bit random number Increased hardware, but drastically reduced gate delay

Data Throughput Fuzzing unit has maximum gate delay of 21ns Translates to maximum clock speed of about 48 MHz Effectively fuzz 256 bits of data in 5 clock cycles (for large amounts of data and a full pipeline) Resulting maximum throughput is ~2.5 Gbps for dedicated application Able to keep up with line speed of OC-48 fiber line (~2.5 Gbps) TONY Synthesis of our individual units shows the fuzzer to be the bottleneck, at 21ns. This means the maximum clock speed for our processor is about 48 MHz It takes at most 5 instructions to fuzz a single 256-bit word of data (one load, two set masks, and a fuzz) This means our maximum throughput is about 3 Gigabits per second Assumes a full pipeline and dedicated application Able to keep up with an OC-48 fiber line Note that for block fuzzing (i.e., where the mask does not change), this will be faster 1011 0110 Fuzzer

Conclusion/Summary Able to fuzz multiple types of data? Yes Able to perform intelligent fuzzing? Use of data mask allows selective fuzzing High speed? Able to keep up with OC-48 It is entirely possible to perform intelligent, reconfigurable fuzzing in hardware at high speeds TONY Our goal was to show that fuzzing can be done in hardware efficiently, and that it can be done at high speeds We were able to fuzz data at a high enough rate to keep up with an OC-48 fiber line We also wanted the fuzzing to be as effective and robust as software applications Intelligent fuzzing is enabled through mask registers and control of how the data is fuzzed is given to the programmer

Questions TONY