Accelerating Homomorphic Evaluation on Reconfigurable Hardware Thomas Pöppelmann, Michael Naehrig, Andrew Putnam, Adrian Macias.

Slides:



Advertisements
Similar presentations
Deep Packet Inspection: Where are We? CCW08 Michela Becchi.
Advertisements

14. Aug Towards Practical Lattice-Based Public-Key Encryption on Reconfigurable Hardware SAC 2013, Burnaby, Canada Thomas Pöppelmann and Tim Güneysu.
Commercial FPGAs: Altera Stratix Family Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
+ Accelerating Fully Homomorphic Encryption on GPUs Wei Wang, Yin Hu, Lianmu Chen, Xinming Huang, Berk Sunar ECE Dept., Worcester Polytechnic Institute.
TIE Extensions for Cryptographic Acceleration Charles-Henri Gros Alan Keefer Ankur Singla.
ECE 734: Project Presentation Pankhuri May 8, 2013 Pankhuri May 8, point FFT Algorithm for OFDM Applications using 8-point DFT processor (radix-8)
1 SECURE-PARTIAL RECONFIGURATION OF FPGAs MSc.Fisnik KRAJA Computer Engineering Department, Faculty Of Information Technology, Polytechnic University of.
System Simulation Of 1000-cores Heterogeneous SoCs Shivani Raghav Embedded System Laboratory (ESL) Ecole Polytechnique Federale de Lausanne (EPFL)
VEGAS: Soft Vector Processor with Scratchpad Memory Christopher Han-Yu Chou Aaron Severance, Alex D. Brant, Zhiduo Liu, Saurabh Sant, Guy Lemieux University.
Multithreaded FPGA Acceleration of DNA Sequence Mapping Edward Fernandez, Walid Najjar, Stefano Lonardi, Jason Villarreal UC Riverside, Department of Computer.
Zheming CSCE715.  A wireless sensor network (WSN) ◦ Spatially distributed sensors to monitor physical or environmental conditions, and to cooperatively.
Computes the partial dot products for only the diagonal and upper triangle of the input matrix. The vector computed by this architecture is added to the.
Characterization Presentation Neural Network Implementation On FPGA Supervisor: Chen Koren Maria Nemets Maxim Zavodchik
MEMOCODE 2007 HW/SW Co-design Contest Documentation of the submission by Eric Simpson Pengyuan Yu Sumit Ahuja Sandeep Shukla Patrick Schaumont Electrical.
A Performance and Energy Comparison of FPGAs, GPUs, and Multicores for Sliding-Window Applications From J. Fowers, G. Brown, P. Cooke, and G. Stitt, University.
Radu Muresan CODES+ISSS'04, September 8-10, 2004, Stockholm, Sweden1 Current Flattening in Software and Hardware for Security Applications Authors: R.
Introduction to FPGA and DSPs Joe College, Chris Doyle, Ann Marie Rynning.
Field Programmable Gate Array (FPGA) Layout An FPGA consists of a large array of Configurable Logic Blocks (CLBs) - typically 1,000 to 8,000 CLBs per chip.
System Architecture A Reconfigurable and Programmable Gigabit Network Interface Card Jeff Shafer, Hyong-Youb Kim, Paul Willmann, Dr. Scott Rixner Rice.
Study of AES Encryption/Decription Optimizations Nathan Windels.
Thomas Pöppelmann Hardware Security Group Horst Görtz Institute for IT Security Implementing Lattice-Based Cryptography.
RUN-TIME RECONFIGURATION FOR AUTOMATIC HARDWARE/SOFTWARE PARTITIONING Tom Davidson, Karel Bruneel, Dirk Stroobandt Ghent University, Belgium Presenting:
Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.
ObliviStore High Performance Oblivious Cloud Storage Emil StefanovElaine Shi
Digital signature using MD5 algorithm Hardware Acceleration
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.
Ross Brennan On the Introduction of Reconfigurable Hardware into Computer Architecture Education Ross Brennan
IEEE Globecom-2006, NXG-02: Broadband Access ©Copyright All Rights Reserved 1 FPGA based Acceleration of Linear Algebra Computations. B.Y. Vinay.
A Compact and Efficient FPGA Implementation of DES Algorithm Saqib, N.A et al. In:International Conference on Reconfigurable Computing and FPGAs, Sept.
Performance and Overhead in a Hybrid Reconfigurable Computer O. D. Fidanci 1, D. Poznanovic 2, K. Gaj 3, T. El-Ghazawi 1, N. Alexandridis 1 1 George Washington.
Types of Computers Mainframe/Server Two Dual-Core Intel ® Xeon ® Processors 5140 Multi user access Large amount of RAM ( 48GB) and Backing Storage Desktop.
COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering.
Efficient FPGA Implementation of QR
Company LOGO High Performance Processors Miguel J. González Blanco Miguel A. Padilla Puig Felix Rivera Rivas.
Timing Channel Protection for a Shared Memory Controller Yao Wang, Andrew Ferraiuolo, G. Edward Suh Feb 17 th 2014.
Research on Reconfigurable Computing Using Impulse C Carmen Li Shen Mentor: Dr. Russell Duren February 1, 2008.
Advanced Computer Architecture, CSE 520 Generating FPGA-Accelerated DFT Libraries Chi-Li Yu Nov. 13, 2007.
SHA-3 Candidate Evaluation 1. FPGA Benchmarking - Phase Round-2 SHA-3 Candidates implemented by 33 graduate students following the same design.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
Enabling Dynamic Data and Indirect Mutual Trust for Cloud Computing Storage Systems.
PROCStar III Performance Charactarization Instructor : Ina Rivkin Performed by: Idan Steinberg Evgeni Riaboy Semestrial Project Winter 2010.
J. Greg Nash ICNC 2014 High-Throughput Programmable Systolic Array FFT Architecture and FPGA Implementations J. Greg.
Swankoski MAPLD 2005 / B103 1 Dynamic High-Performance Multi-Mode Architectures for AES Encryption Eric Swankoski Naval Research Lab Vijay Narayanan Penn.
An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines By: David Chui Supervisor: Professor P. Chow.
Network On Chip Platform
Implementing and Optimizing a Direct Digital Frequency Synthesizer on FPGA Jung Seob LEE Xiangning YANG.
FPGA Implementation of RC6 including key schedule Hunar Qadir Fouad Ramia.
Attribute-Based Encryption With Verifiable Outsourced Decryption.
Proposal for an Open Source Flash Failure Analysis Platform (FLAP) By Michael Tomer, Cory Shirts, SzeHsiang Harper, Jake Johns
Harnessing the Cloud for Securely Outsourcing Large- Scale Systems of Linear Equations.
Company LOGO Project Characterization Spring 2008/9 Performed by: Alexander PavlovDavid Domb Supervisor: Mony Orbach GPS/INS Computing System.
A New Class of High Performance FFTs Dr. J. Greg Nash Centar ( High Performance Embedded Computing (HPEC) Workshop.
DDRIII BASED GENERAL PURPOSE FIFO ON VIRTEX-6 FPGA ML605 BOARD PART B PRESENTATION STUDENTS: OLEG KORENEV EUGENE REZNIK SUPERVISOR: ROLF HILGENDORF 1 Semester:
2/19/2016http://csg.csail.mit.edu/6.375L11-01 FPGAs K. Elliott Fleming Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
Mona: Secure Multi-Owner Data Sharing for Dynamic Groups in the Cloud.
Philipp Gysel ECE Department University of California, Davis
Corflow Online Tutorial Eric Chung
Hardware Architecture
Packing Techniques for Homomorphic Encryption Schemes Scott Thompson CSCI-762 4/28/2016.
HPEC 2003 Linear Algebra Processor using FPGA Jeremy Johnson, Prawat Nagvajara, Chika Nwankpa Drexel University.
M. Bellato INFN Padova and U. Marconi INFN Bologna
Programmable Hardware: Hardware or Software?
Hiba Tariq School of Engineering
Hybrid Cloud Architecture for Software-as-a-Service Provider to Achieve Higher Privacy and Decrease Securiity Concerns about Cloud Computing P. Reinhold.
Implementation of IDEA on a Reconfigurable Computer
Efficient CRT-Based RSA Cryptosystems
Shenghsun Cho, Mrunal Patel, Han Chen, Michael Ferdman, Peter Milder
Types of Computers Mainframe/Server
Dynamic High-Performance Multi-Mode Architectures for AES Encryption
Final Project presentation
Presentation transcript:

Accelerating Homomorphic Evaluation on Reconfigurable Hardware Thomas Pöppelmann, Michael Naehrig, Andrew Putnam, Adrian Macias

Outline Motivation Somewhat Homomorphic Encryption Implementation Results Conclusion Copyright © Infineon Technologies AG All rights reserved.

Outline Motivation Somewhat Homomorphic Encryption Implementation Results Conclusion Copyright © Infineon Technologies AG All rights reserved.

Motivation: Secure Computation › Homomorphic encryption allows computation on data without revealing the data itself to the entity performing the computation › Premium example: cloud computing –Clients encrypt data and store encrypted data in the cloud –The cloud server performs computation but an attacker (malicious admin, hacker, intelligence agency) cannot see the actual data –The client just gets the result › Practical usable for specialized, high risk, or privacy sensitive applications –Genomic data processing –Medical data processing –Machine learning Copyright © Infineon Technologies AG All rights reserved.

Motivation: Homomorphic evaluation Encrypt ? ? ? ? ? ? ADD MUL ? ? Decrypt 1500 ADD ? ? Decrypt 80 Untrusted environment, e.g., the cloud SW Evaluation This work: How can we use FPGAs to accelerate homomorphic evaluation? Copyright © Infineon Technologies AG All rights reserved.

Outline Motivation Somewhat Homomorphic Encryption Implementation Results Conclusion Copyright © Infineon Technologies AG All rights reserved.

YASHE [BLLN’13] ]Joppe W. Bos, Kristin Lauter, Jake Loftus, Michael Naehrig: Improved Security for a Ring-Based Fully Homomorphic Encryption Scheme. IMACC 2013, Copyright © Infineon Technologies AG All rights reserved.

YASHE’ Algorithms Copyright © Infineon Technologies AG All rights reserved.

YASHE RMult c’ polynomial multiplication Multiplication by scalar and rounding n log(q) Copyright © Infineon Technologies AG All rights reserved.

YASHE KeySwitch Several polynomial multiplications by constant evaluation key Copyright © Infineon Technologies AG All rights reserved.

YASHE (Selected) Parameters DescriptionParameterSet-ISet-II Polynomial coefficientsn Modulusq Plaintext spacet1024 Windows sizew Evaluation keysq/w18 Supported multiplications (levels)L19 Size of one polynomial-62 KB1 MB Size of evaluation key-124 KB8 MB  Parameters are supposed to provide at least 80 bits of security Copyright © Infineon Technologies AG All rights reserved.

Outline Motivation Somewhat Homomorphic Encryption Implementation Results Conclusion Copyright © Infineon Technologies AG All rights reserved.

Target Platform: Catapult › Microsoft project [PCC+14] to accelerate cloud services using Altera Stratix V FPGAs –Each server contains an FPGA –FPGAs are connected by complex dedicated communication network › 2x throughput over software- only Bing ranking in 1632 server/FPGA prototype › Of independent interest to CHES community [PCC+14] Andrew Putnam et al, A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services. ISCA Copyright © Infineon Technologies AG All rights reserved.

Implementation Challenges on FPGAs › Target device: Stratix V –172,600 ALMs (logic blocks) –4.9 MB internal memory –1590 embedded multipliers › Implementation Challenge –Usage of external memory –Large multipliers required (e.g., 512x512-bits) –Development more complex FPGA (4 MB) DRAM0 (4 GB) DRAM0 (4 GB) DDR PC PCIe DRAM1 (4 GB) DRAM1 (4 GB) DDR Evaluation operations Copyright © Infineon Technologies AG All rights reserved.

Number Theoretic Transform Copyright © Infineon Technologies AG All rights reserved.

Cached-NTT Copyright © Infineon Technologies AG All rights reserved.

Architecture › NttCore computes one or several groups of NTT in parallel › NTTButterfly implements pipelined 512x512-bit multiplier (supports also 1040x1040-bit in 4 cycles) › Dual buffering of data set in FPGA to support computation and memory access in parallel Copyright © Infineon Technologies AG All rights reserved. Microcode engine wit approx. 15 instructions

Outline Motivation Somewhat Homomorphic Encryption Implementation Results Conclusion Copyright © Infineon Technologies AG All rights reserved.

Results › SET II implementation requires 141,090 (82%) ALMs, 577 (36%) DSPs and 17,626,400 (43%) BRAM bits on Altera Stratix V › Congested design and constraints of DRAM and PCIe controller result in relatively low clock frequency › Cycle counts measured online to account for DRAM latency ImplementationMultAdd Set I MHz 675,326 cycles19,057 cycles 6.75 ms0.19 ms Set II MHz 3,212,506 cycles61,775 cycles ms0.94 ms Copyright © Infineon Technologies AG All rights reserved.

Comparison [LN14] Tancrède Lepoint, Michael Naehrig: A Comparison of the Homomorphic Encryption Schemes FV and YASHE. AFRICACRYPT 2014 [RJVDV’15] Roy et al. Modular Hardware Architecture for Somewhat Homomorphic Function Evaluation. CHES 2015 AuthorParameterArchitectureMultAdd Our workYASHE, n=16384FPGA (Stratix 5)49 ms1 ms [LN’14]YASHE, n= GHz Intel i749 ms0.7 ms [RJVDV’15]YASHE, n=32768FPGA (Virtex-7)112 ms0.17 ms Copyright © Infineon Technologies AG All rights reserved. › Schemes and parameters differ widely › Costs of FPGAs are hard to compare (FPGA+I/O+board) › Numerous ASIC, GPU implementations published while schemes got better and better

Outline Motivation Somewhat Homomorphic Encryption Implementation Results Conclusion Copyright © Infineon Technologies AG All rights reserved.

Conclusion › Cached-FFT/NTT solves problem of limited on-chip memory › Large multiplier (1040x1040) is a bottleneck => Chinese remainder theorem (see next talk) › Biggest effort/time was spend on external memory interface –Has to be taken into account for evaluation of costs and performance –Simulation and debugging more complex Copyright © Infineon Technologies AG All rights reserved.

Future Work › Further evaluation of different implementation techniques (maybe CRT+cached-NTT) › Exploration of different and larger parameter sets › Can evaluation operations (MUL/ADD) be performed directly in frequency (NTT/FFT) domain › Multi-FPGA design or specialized board FPGA RAM Copyright © Infineon Technologies AG All rights reserved.

Thank You Thank you for your attention! Any Questions? Contact: Full paper: Copyright © Infineon Technologies AG All rights reserved.