Harnessing Multicore Processors for High Speed Secure Transfer Raj Kettimuthu Argonne National Laboratory.

Slides:



Advertisements
Similar presentations
1 Memory Performance and Scalability of Intel’s and AMD’s Dual-Core Processors: A Case Study Lu Peng 1, Jih-Kwon Peir 2, Tribuvan K. Prakash 1, Yen-Kuang.
Advertisements

Structure of Computer Systems
Conventional Encryption: Algorithms
Computer Abstractions and Technology
MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.
OPTERON (Advanced Micro Devices). History of the Opteron AMD's server & workstation processor line 2003: Original Opteron released o 32 & 64 bit processing.
INTEL COREI3 INTEL COREI5 INTEL COREI7 Maryam Zeb Roll#52 GFCW Peshawar.
Chapter 8 Hardware Conventional Computer Hardware Architecture.
CPU Processor Speed Timeline Speed =.02 Mhz Year= 1972 Transistors= 3500 It takes 66, CPU’s to equal 1 i7.
Processor history / DX/SX SX/DX Pentium 1997 Pentium MMX
CSCE101 – 4.2, 4.3 October 17, Power Supply Surge Protector –protects from power spikes which ruin hardware. Voltage Regulator – protects from insufficient.
Multiprocessors ELEC 6200 Computer Architecture and Design Instructor: Dr. Agrawal Yu-Chun Chen 10/27/06.
ECE 526 – Network Processing Systems Design
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
HS06 on the last generation of CPU for HEP server farm Michele Michelotto 1.

 The processor number is one of several factors, along with processor brand, specific system configurations and system-level benchmarks, to be.
ORIGINAL AUTHOR JAMES REINDERS, INTEL PRESENTED BY ADITYA AMBARDEKAR Overview for Intel Xeon Processors and Intel Xeon Phi coprocessors.
Sven Ubik, Petr Žejdl CESNET TNC2008, Brugges, 19 May 2008 Passive monitoring of 10 Gb/s lines with PC hardware.
GridFTP Guy Warner, NeSC Training.
Performance Tradeoffs for Static Allocation of Zero-Copy Buffers Pål Halvorsen, Espen Jorde, Karl-André Skevik, Vera Goebel, and Thomas Plagemann Institute.
I/O bound Jobs Multiple processes accessing the same disc leads to competition for the position of the read head. A multi -threaded process can stream.
CuMAPz: A Tool to Analyze Memory Access Patterns in CUDA
Computer Performance Computer Engineering Department.
Kenichi Kourai (Kyushu Institute of Technology) Takuya Nagata (Kyushu Institute of Technology) A Secure Framework for Monitoring Operating Systems Using.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Company LOGO High Performance Processors Miguel J. González Blanco Miguel A. Padilla Puig Felix Rivera Rivas.
Multi-core architectures. Single-core computer Single-core CPU chip.
MIDeA :A Multi-Parallel Instrusion Detection Architecture Author: Giorgos Vasiliadis, Michalis Polychronakis,Sotiris Ioannidis Publisher: CCS’11, October.
Multi-Core Architectures
Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
Multi-core.  What is parallel programming ?  Classification of parallel architectures  Dimension of instruction  Dimension of data  Memory models.
Chris Rapier, Benjamin Bennett Pittsburgh Supercomputing Center Mardi Gras 2008 Enabling High Performance Bulk Data Transfers With SSH Chris Rapier Benjamin.
History of Microprocessor MPIntroductionData BusAddress Bus
Buffer-On-Board Memory System 1 Name: Aurangozeb ISCA 2012.
A Measurement Based Memory Performance Evaluation of High Throughput Servers Garba Isa Yau Department of Computer Engineering King Fahd University of Petroleum.
PFPC: A Parallel Compressor for Floating-Point Data Martin Burtscher 1 and Paruj Ratanaworabhan 2 1 The University of Texas at Austin 2 Cornell University.
Srihari Makineni & Ravi Iyer Communications Technology Lab
Parallel TCP Bill Allcock Argonne National Laboratory.
Parallelization and Characterization of Pattern Matching using GPUs Author: Giorgos Vasiliadis 、 Michalis Polychronakis 、 Sotiris Ioannidis Publisher:
DYNES Storage Infrastructure Artur Barczyk California Institute of Technology LHCOPN Meeting Geneva, October 07, 2010.
TinySec : Link Layer Security Architecture for Wireless Sensor Networks Chris Karlof :: Naveen Sastry :: David Wagner Presented by Anil Karamchandani 10/01/2007.
Evolution of Microprocessors Microprocessor A microprocessor incorporates most of all the functions of a computer’s central processing unit on a single.
Computer Architecture By Chris Van Horn. CPU Basics “Brains of the Computer” Fetch Execute Cycle Instruction Branching.
Hyper Threading Technology. Introduction Hyper-threading is a technology developed by Intel Corporation for it’s Xeon processors with a 533 MHz system.
Multi-Core Development Kyle Anderson. Overview History Pollack’s Law Moore’s Law CPU GPU OpenCL CUDA Parallelism.
Shashwat Shriparv InfinitySoft.
GridFTP Richard Hopkins
Copyright © Curt Hill Parallelism in Processors Several Approaches.
MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.
Chris Rapier, Benjamin Bennett Pittsburgh Supercomputing Center HPN-SSH TIP’08 Enabling High Performance Bulk Data Transfers With SSH Chris Rapier Benjamin.
PROCESSOR Ambika | shravani | namrata | saurabh | soumen.
GridFTP Guy Warner, NeSC Training Team.
Lab Activities 1, 2. Some of the Lab Server Specifications CPU: 2 Quad(4) Core Intel Xeon 5400 processors CPU Speed: 2.5 GHz Cache : Each 2 cores share.
Programming Multi-Core Processors based Embedded Systems A Hands-On Experience on Cavium Octeon based Platforms Lab Exercises: Lab 1 (Performance measurement)
Central Processing Unit (CPU) The Computer’s Brain.
Analyzing Memory Access Intensity in Parallel Programs on Multicore Lixia Liu, Zhiyuan Li, Ahmed Sameh Department of Computer Science, Purdue University,
Recent experience with PCI-X 2.0 and PCI-E network interfaces and emerging server systems Yang Xia Caltech US LHC Network Working Group October 23, 2006.
MAHARANA PRATAP COLLEGE OF TECHNOLOGY SEMINAR ON- COMPUTER PROCESSOR SUBJECT CODE: CS-307 Branch-CSE Sem- 3 rd SUBMITTED TO SUBMITTED BY.
CPU Central Processing Unit
Hot Processors Of Today
Core i7 micro-processor
Intel’s Core i7 Processor
File Transfer Issues with TCP Acceleration with FileCatalyst
Chapter 1 Introduction.
Hybrid Programming with OpenMP and MPI
Network Systems and Throughput Preservation
Multicore and GPU Programming
Presentation transcript:

Harnessing Multicore Processors for High Speed Secure Transfer Raj Kettimuthu Argonne National Laboratory

Security Clear  No security at all Authentication  Verify the connection only, data is clear Integrity  Verify each packet, data is clear  High processing requirements Private  Encrypt all data  Very high processing requirements

Security Security context  Includes a symmetric key  Associated with each TCP connection  Used for faster security processing Cipher Block Chaining  TLS/SSL, GSI, etc  Previously encrypted cipher text is used in the encryption of the current block  Cannot parallelize

High Performance Transfers True security makes the CPU the bottleneck  Thus auth or clear are selected  there is no data integrity Post transfer checksums  Used to verify data once transfer completes  Should be considered part of transfer time  Less efficient (most re-open and read data)  cannot parallelize

Parallel Streams Multiple TCP streams between endpoints  Network optimization  Used to work around TCP backoff A security context with each stream  Can use to parallelize security processing  Each stream has own context  cipher block chain associated with that context

Multiple Cores Up coming technologies  Quad core commodity  Future will bring..... Parallel processing power  No increase in memory  No increase in bus bandwidth  No increase in NIC bandwidth Increases the ratio of processing power to network speed

Transfer Resources and Streams Memory needed is a function of BWDP  BWDP total = RTT * BW Parallel streams do not increase the BDWP  Each stream shares an equal portion  BWDP stream = RTT * (BW/|streams|) Required resources are not a function of |streams|  memory is function of BWDP  bus/disk/NIC speed is a function of transfer rate

Full Encryption Perfect application of multiple cores  1 stream per processor  Allows parallelism of security processing  Requires a core, but no more memory  Results show linear or close to linear increase on different dual-core architectures  Some interesting results on dual-core architectures with hyper threading

Results Pentium Dual Core 1.1 GHz Security Level Single P1 Single P2 Pool P1 Pool P2 Clear Authenticated Safe Private

Results Itanium Dual Core 1.2 GHz Security Level Single P1 Single P2 Pool P1 Pool P2 Clear Authenticated899 Safe Private

Results Opteron 64 bit Dual Core 2.4 GHz Security Level Single P1 Single P2 Pool P1 Pool P2 Clear897 Authenticated897 Safe Private

Results Xeon dual core 2.8 GHz with hyper threading

Results Xeon dual core 2.8 GHz with hyper threading

Results Opteron dual core 1GHz with hyper threading

Results Opteron dual core 1GHz with hyper threading

Results Experiments to see the effect of high thread count and high number of parallel streams |Stream| > min(|CPU|, |Thread|) does not fetch much benefit |Thread| > |CPU| does not fetch any benefit - it seems to hurt the performance

Results

Stripes Striping to extrapolate for higher number of cores 4 way striping with dual cores on each stripe Results show that 6 cores may be sufficient to get a throughput for fully encrypted transfers that is equivalent to that of a clear transfer with Gigabit NICs  Intel Xeon dual core 2.4 GHz 2 way stripe with 1 thread per stripe is better than 1 way stripe with 2 threads  Suspected cause is availability of 2 Gigabit NICs for 2 way stripe  Need to do more tests to verify this

Stripes

S1,C2 S2,C2