 Parallel Deposit (bit scatter)  Deposits in the result register, at positions flagged by 1’s in r 3, the right justified bits from r 2 Yedidya Hilewitz.

Slides:



Advertisements
Similar presentations
TIE Extensions for Cryptographic Acceleration Charles-Henri Gros Alan Keefer Ankur Singla.
Advertisements

Chapter 9 Computer Design Basics. 9-2 Datapaths Reminding A digital system (or a simple computer) contains datapath unit and control unit. Datapath: A.
© 2010 Kettering University, All rights reserved..
Cryptography and Network Security Chapter 5 Fifth Edition by William Stallings Lecture slides by Lawrie Brown.
Intel’s MMX Dr. Richard Enbody CSE 820. Michigan State University Computer Science and Engineering Why MMX? Make the Common Case Fast Multimedia and Communication.
Advanced Information Security 4 Field Arithmetic
Design Technology Center National Tsing Hua University IC-SOC Design Driver Highlights Cheng-Wen Wu.
Performing Advanced Bit Manipulations Efficiently in General-Purpose Processors Yedidya Hilewitz and Ruby B. Lee Princeton Architecture Lab for Multimedia.
Zheming CSCE715.  A wireless sensor network (WSN) ◦ Spatially distributed sensors to monitor physical or environmental conditions, and to cooperatively.
1 ALUs. 2 Topics: ALU Overview - core of the integer datapath - 2 operands, 32-bits wide, plus control signals Exercise: A simple multiplier.
Cryptography and Network Security
Data Parallel Algorithms Presented By: M.Mohsin Butt
Cryptography and Network Security (AES) Dr. Monther Aldwairi New York Institute of Technology- Amman Campus 10/18/2009 INCS 741: Cryptography 10/18/20091Dr.
An Efficient and Scalable Pattern Matching Scheme for Network Security Applications Department of Computer Science and Information Engineering National.
Cryptography and Network Security Chapter 5. Chapter 5 –Advanced Encryption Standard "It seems very simple." "It is very simple. But if you don't know.
Cryptography and Network Security Chapter 5 Fourth Edition by William Stallings.
Lecture 23 Symmetric Encryption
Copyright 2008 Koren ECE666/Koren Part.6a.1 Israel Koren Spring 2008 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer.
CS 591 C3S C ryptography & S teganography S ecure S ystem By: Osama Khaleel.
1 DSP Implementation on FPGA Ahmed Elhossini ENGG*6090 : Reconfigurable Computing Systems Winter 2006.
Chapter 1 Introduction. Computer Architecture selecting and interconnecting hardware components to create computers that meet functional, performance.
Cryptography and Network Security
Chapter 6-2 Multiplier Multiplier Next Lecture Divider
Practical PC, 7th Edition Chapter 17: Looking Under the Hood
RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696
 Author: Kubilay Atasu, Florian Doerfler, Jan van Lunteren, Christoph Hagleitner  Publisher: 2013 FPL  Presenter: Yuen-Shuo Li  Date: 2013/10/30 1.
Architecture for Protecting Critical Secrets in Microprocessors Ruby Lee Peter Kwan Patrick McGregor Jeffrey Dwoskin Zhenghong Wang Princeton Architecture.
Computer Architecture and Organization Introduction.
AES Background and Mathematics CSCI 5857: Encoding and Encryption.
Chapter 8 Problems Prof. Sin-Min Lee Department of Mathematics and Computer Science.
LOGO Hardware side of Cryptography Anestis Bechtsoudis Patra 2010.
Fast Memory Addressing Scheme for Radix-4 FFT Implementation Presented by Cheng-Chien Wu, Master Student of CSIE,CCU 1 Author: Xin Xiao, Erdal Oruklu and.
Chapter 1 Introduction. Architecture & Organization 1 Architecture is those attributes visible to the programmer —Instruction set, number of bits used.
Lecture 6: Multiply, Shift, and Divide
Cryptography Team Presentation 2
Low-Power and Area-Efficient Carry Select Adder on Reconfigurable Hardware Presented by V.Santhosh kumar, B.Tech,ECE,4 th Year, GITAM University Under.
“Implementation of a RC5 block cipher algorithm and implementing an attack on it” Cryptography Team Presentation 1.
Johann A. Briffa Mahesh Theru Manohar Das A Robust Method For Imperceptible High- Capacity Information Hiding in Images. INTRODUCTION  The art of Hidden.
Description of a New Variable-Length Key, 64-Bit Block Cipher (BLOWFISH) Bruce Schneier BY Sunitha Thodupunuri.
Linear Feedback Shift Register. 2 Linear Feedback Shift Registers (LFSRs) These are n-bit counters exhibiting pseudo-random behavior. Built from simple.
Computer Organization - 1. INPUT PROCESS OUTPUT List different input devices Compare the use of voice recognition as opposed to the entry of data via.
General Concepts of Computer Organization Overview of Microcomputer.
PLX : Instruction Set Architecture Shih-Hsueh, Chang.
Indira Gandhi National Open University presents. A Video Lecture Course: Computer Platforms.
Lecture 2 Microprocessor Architecture Image from:
Introduction to Microprocessors
Lecture 23 Symmetric Encryption
LOGIC OPTIMIZATION USING TECHNOLOGY INDEPENDENT MUX BASED ADDERS IN FPGA Project Guide: Smt. Latha Dept of E & C JSSATE, Bangalore. From: N GURURAJ M-Tech,
Implementing algorithms for advanced communication systems -- My bag of tricks Sridhar Rajagopal Electrical and Computer Engineering This work is supported.
Advanced Encryption Standard Dr. Shengli Liu Tel: (O) Cryptography and Information Security Lab. Dept. of Computer.
EKT 221 : Chapter 4 Computer Design Basics
DATA & COMPUTER SECURITY (CSNB414) MODULE 3 MODERN SYMMETRIC ENCRYPTION.
Lecture5 – Introduction to Cryptography 3/ Implementation Rice ELEC 528/ COMP 538 Farinaz Koushanfar Spring 2009.
EKT 221 : Digital 2 Computer Design Basics Date : Lecture : 2 hrs.
EE141 Arithmetic Circuits 1 Chapter 14 Arithmetic Circuits Rev /12/2003 Rev /05/2003.
POWER OPTIMIZATION IN RANDOM PATTERN GENERATOR By D.Girish Kumar 108W1D8007.
AN ENHANCED LOW POWER HIGH SPEED ADDER FOR ERROR TOLERANT APPLICATIONS BY K.RAJASHEKHAR, , VLSI Design.
Hardware-rooted Trust for Secure Key Management & Transient Trust
Computer Design Basics
New Cache Designs for Thwarting Cache-based Side Channel Attacks
Architecture & Organization 1
Digital Signal Processors
Architecture & Organization 1
Unit 2 “Implementation of a RC5 block cipher algorithm and implementing an attack on it”
STUDY AND IMPLEMENTATION
Cryptography Team Presentation 1
User-mode Secret Protection (SP) architecture
Computer Design Basics
Cryptography and Network Security Chapter 5 Fifth Edition by William Stallings Lecture slides by Lawrie Brown.
A Quadratic-Residue-based Fragile Watermarking Scheme
Presentation transcript:

 Parallel Deposit (bit scatter)  Deposits in the result register, at positions flagged by 1’s in r 3, the right justified bits from r 2 Yedidya Hilewitz and Ruby B. Lee, “Fast Bit Gather, Bit Scatter and Bit Permutation Instructions for Commodity Microprocessors,” to appear in Journal of VLSI Signal Processing Systems. Yedidya Hilewitz and Ruby B. Lee, “Fast Bit Compression and Expansion with Parallel Extract and Parallel Deposit Instructions,” Proceedings of the IEEE 17th International Conference on Application-Specific Systems, Architectures and Processors (ASAP), pp , September 11-13, 2006 (Best Paper Award) Advanced Bit Manipulation Instructions for Commodity Processors Yedidya Hilewitz and Ruby B. Lee Princeton Architecture Laboratory for Multimedia and Security Department of Electrical Engineering, Princeton University Background and Motivation  Advanced bit manipulations are not well supported by commodity microprocessors  These operations are performed using “programming tricks” (see Hacker’s Delight )  Bit manipulations play a role in applications of increasing importance  We propose adding direct support for a few key bit manipulation operations to accelerate these applications Example Applications New Instructions Butterfly and Inverse Butterfly Parallel Extract and Parallel Deposit Bit Matrix Multiply Summary and Conclusions Ongoing and Future Work Applications (and Speedup)  Permutation  Butterfly and Inverse Butterfly  Bit Gather and Bit Scatter  Parallel Extract and Parallel Deposit  Bit Matrix Multiply  Other bit manipulation instructions (not covered here)  Bit matrix transpose  Population count  Advanced bit manipulations play an important role in many applications  We have introduced a few select bit manipulation instructions that speed up these applications  We have evolved the shifter to a new design using butterfly and inverse butterfly datapaths to support basic and advanced bit manipulation instructions  Advanced bit manipulations are no longer esoteric “programming tricks” but rather supported directly by microprocessors at only a marginal cost  Cryptography  Random number generation  Von Neumann Extractor  Toeplitz Matrix Multiply  Steganography  Cryptanalysis (Gaussian elimination)  Other applications:  Binary compression  Binary image morphology  Bioinformatics  Communications coding  FFT  Finite field arithmetic  Integer compression  Pattern matching  Other applications suggested by you! (up to 2.24× speedup) (9.9× speedup) (14.9× speedup) (2.92× speedup)  Identify new applications where bit manipulation instructions are useful (e.g., LFSR and FCSR RNGs, software radio)  Implementation  Refine current circuit implementation  Integrate new shifter in scalable crypto co- processor (PAX)  Butterfly  lg( n ) stages of n 2:1 MUXes split into n /2 pairs that pass through or swap inputs  bfly+ibfly = general permutation network  Any of the n ! permutations of n bits can be done with one pass of both instructions  Inverse Butterfly  Parallel Extract (bit gather)  extracts bits from r 2 flagged by 1’s in r 3 and compresses and right justifies in result register r2r2r2r2 r1r1r1r1 r3r3r3r r2r2r2r2 r1r1r1r1 r3r3r3r3  Cryptography – permutations in ciphers and hash functions, e.g., TDES:  Random Number Generators – extract bits from source of entropy  Von Neumann Extractor (Intel RNG) – given bit-pair sequence { x 2 i, x 2 i +1 } from entropy pool, extract x 2 i if the bits differ:  Toeplitz Matrix Multiply Extractor – multiply bit string from entropy pool by a binary Toeplitz matrix:  LSB Steganography – embed secret message in least significant bits of image or audio file:  bmm.n C = B, A A, B, C : n × n bit matrices: C = A × B mod 2 for i from 1 to n for j from 1 to n c i, j = a i,1 b 1,j  a i,2 b 2,j  …  a i,n b n,j  bmm.8 unit (pictured above) can be directly incorporated into the ALU (<¼ size) Yedidya Hilewitz and Ruby B. Lee, “Achieving Very Fast Bit Matrix Multiplication in Commodity Microprocessors,” Princeton University Department of Electrical Engineering Technical Report CE-L , August New Shifter Architecture  Brand new shifter architecture that replaces the shifter with a new unit that directly supports bit manipulation operations  New shifter performs  basic shifter operations:  shift, rotate, extract and deposit  multimedia shift-permute operations:  mix  advanced bit manipulation operations:  bfly, ibfly, pex, pdep Yedidya Hilewitz and Ruby B. Lee, “A New Basis for Shifters in General-Purpose Processors for Existing and Advanced Bit Manipulations,” to appear in IEEE Transactions on Computers. Yedidya Hilewitz and Ruby B. Lee, “Performing Advanced Bit Manipulations Efficiently in General-Purpose Processors,” Proceedings of 18 th IEEE Symposium on Computer Arithmetic (ARITH-18), June 2007.