MIT Lincoln Laboratory 999999-1 XYZ 3/11/2005 VSIPL and SAR Performance on Multiple Generations of Intel® Processors Peter Carlston, Platform Architect,

Slides:



Advertisements
Similar presentations
Intel Pentium 4 ENCM Jonathan Bienert Tyson Marchuk.
Advertisements

Original Authors: Stefan Rusu, Simon Tam, Harry Muljono, Jason Stinson, David Ayers, Jonathan Chang, Raj Varada, Matt Ratta, Sailesh Kottapalli Some slides.
Intel® Core™ Duo Processor Behrooz Jafarnejad Winter 2006.
OPTERON (Advanced Micro Devices). History of the Opteron AMD's server & workstation processor line 2003: Original Opteron released o 32 & 64 bit processing.
INTEL COREI3 INTEL COREI5 INTEL COREI7 Maryam Zeb Roll#52 GFCW Peshawar.
Microprocessors I Time: Sundays & Tuesdays 07:30 to 8:45 Place: EE 4 ( New building) Lecturer: Bijan Vosoughi Vahdat Room: VP office, NE of Uni Office.
Processor history / DX/SX SX/DX Pentium 1997 Pentium MMX
Introduction to ARM Architecture, Programmer’s Model and Assembler Embedded Systems Programming.
Linear Algebra on GPUs Vasily Volkov. GPU Architecture Features SIMD architecture – Don’t be confused by scalar ISA which is only a program model We use.
Cosc 2150 Current CPUs Intel and AMD processors. Notes The information is current as of Dec 5, 2014, unless otherwise noted. The information for this.
Copyright © 2006, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners Intel® Core™ Duo Processor.
Embedded Systems Programming
Introduction to Intel Core Duo Processor Architecture Al-Asli, Mohammed.
Kathy Grimes. Signals Electrical Mechanical Acoustic Most real-world signals are Analog – they vary continuously over time Many Limitations with Analog.
Hiep Hong CS 147 Spring Intel Core 2 Duo. CPU Chronology 2.
1 Comparing The Intel ® Core ™ 2 Duo Processor to a Single Core Pentium ® 4 Processor at Twice the Speed Performance Benchmarking and Competitive Analysis.
111 *Other names and brands may be claimed as the property of others Q Sell Up Guide Intel ® Core™ i7 (Bloomfield) vs. Lynnfield Positioning Intel.
Debunking the 100X GPU vs. CPU Myth: An Evaluation of Throughput Computing on CPU and GPU Presented by: Ahmad Lashgar ECE Department, University of Tehran.
Processors Menu  INTEL Core™ i Processor INTEL Core™ i Processor  INTEL Core i Processor INTEL Core i Processor  AMD A K.
/ 6.338: Parallel Computing Project FinalReport Parallelization of Matrix Multiply: A Look At How Differing Algorithmic Approaches and CPU Hardware.
COMPUTER ARCHITECTURE
ORIGINAL AUTHOR JAMES REINDERS, INTEL PRESENTED BY ADITYA AMBARDEKAR Overview for Intel Xeon Processors and Intel Xeon Phi coprocessors.
Adam Meyer, Michael Beck, Christopher Koch, and Patrick Gerber.
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
Intel® IPP. Fighting for the performance Intel® IPP. Fighting for the performance Novosibirsk, 2008 Boris Sabanin Novosibirsk, 2008 Boris Sabanin.
Christopher Mitchell CDA 6938, Spring The Discrete Cosine Transform  In the same family as the Fourier Transform  Converts data to frequency domain.
Current Computer Architecture Trends CE 140 A1/A2 29 August 2003.
Implementation of Parallel Processing Techniques on Graphical Processing Units Brad Baker, Wayne Haney, Dr. Charles Choi.
Company LOGO High Performance Processors Miguel J. González Blanco Miguel A. Padilla Puig Felix Rivera Rivas.
Pre-Pentium Intel Processors /
Processor Development The following slides track three developments in microprocessors since Clock Speed – the speed at which the processor can carry.
History of Microprocessor MPIntroductionData BusAddress Bus
Select the 2nd Gen Intel® Core™ Processor that is Best for YourBusiness Intel® Core™ i3 Processor— Affordable Business PC. CPU Frequency 3.3 GHz with.
© 2007 SET Associates Corporation SAR Processing Performance on Cell Processor and Xeon Mark Backues, SET Corporation Uttam Majumder, AFRL/RYAS.
 Copyright, HiPERiSM Consulting, LLC, George Delic, Ph.D. HiPERiSM Consulting, LLC (919) P.O. Box 569, Chapel Hill, NC.
Hyper Threading (HT) and  OPs (Micro-Operations) Department of Computer Science Southern Illinois University Edwardsville Summer, 2015 Dr. Hiroshi Fujinoki.
David Abbott - JLAB DAQ group Embedded-Linux Readout Controllers (Hardware Evaluation)
CPU Inside Maria Gabriela Yobal de Anda L#32 9B. CPU Called also the processor Performs the transformation of input into output Executes the instructions.
1 Latest Generations of Multi Core Processors
Hyper Threading Technology. Introduction Hyper-threading is a technology developed by Intel Corporation for it’s Xeon processors with a 533 MHz system.
Intel Confidential - NDA Only *Other names and brands may be claimed as the property of others AMD* Athlon* 64 X (2x1 MB L2 Cache, 2.40 GHz) Intel®
1)Leverage raw computational power of GPU  Magnitude performance gains possible.
Yang Yu, Tianyang Lei, Haibo Chen, Binyu Zang Fudan University, China Shanghai Jiao Tong University, China Institute of Parallel and Distributed Systems.
Today’s Computers By Sharif Safdary. The right computer for you. Advantages to Laptops Advantages to Laptops Size Size Weight Weight Portability Portability.
THE BRIEF HISTORY OF 8085 MICROPROCESSOR & THEIR APPLICATIONS
DSP Architectures Additional Slides Professor S. Srinivasan Electrical Engineering Department I.I.T.-Madras, Chennai –
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA
Copyright © 2004, Dillon Engineering Inc. All Rights Reserved. An Efficient Architecture for Ultra Long FFTs in FPGAs and ASICs  Architecture optimized.
TI Information – Selective Disclosure Implementation of Linear Algebra Libraries for Embedded Architectures Using BLIS September 28, 2015 Devangi Parikh.
Chap 4: Processors Mainly manufactured by Intel and AMD Important features of Processors: Processor Speed (900MHz, 3.2 GHz) Multiprocessing Capabilities.
PROCESSOR Ambika | shravani | namrata | saurabh | soumen.
I7’s Core. Intel’s Core i7 Content Overview Socket SSE 4.2 Instruction Set Cores –Intel Quickpath Interconnect –Nehalem - new micro-architecture –EP,
Lab Activities 1, 2. Some of the Lab Server Specifications CPU: 2 Quad(4) Core Intel Xeon 5400 processors CPU Speed: 2.5 GHz Cache : Each 2 cores share.
C.E. Goutis V.I.Kelefouras University of Patras Department of Electrical and Computer Engineering VLSI lab Date: 20/11/2015 Compilers for Embedded Systems.
Hardware Architecture
HPEC-1 SMHS 7/7/2016 MIT Lincoln Laboratory Focus 3: Cell Sharon Sacco / MIT Lincoln Laboratory HPEC Workshop 19 September 2007 This work is sponsored.
Embedded Software Design Week III Processor Basics Raspberry Pi -> Blinking LEDs & pushing buttons.
Intel and AMD processors
Multiple Processor Systems
ARM Embedded Systems
MPI Software Technology, Inc. VSIPL for Diverse Architectures
Presented by: Tim Olson, Architect
Matrix Multiplication Continued
A Quantitative Analysis of Stream Algorithms on Raw Fabrics
Core i7 micro-processor
Intel’s Core i7 Processor
Linchuan Chen, Peng Jiang and Gagan Agrawal
64 BIT COMPUTING By: Kapil Kaushik VIII Sesmester(IT)
Introduction to CUDA Programming
CS 286 Computer Organization and Architecture
Presentation transcript:

MIT Lincoln Laboratory XYZ 3/11/2005 VSIPL and SAR Performance on Multiple Generations of Intel® Processors Peter Carlston, Platform Architect, Embedded Computing Division Intel Corporation

MIT Lincoln Laboratory XYZ 3/11/2005 VSIPL and SAR Benchmark Systems Benchmark Systems and Methodology –Freescale MCP8641D 1 GHz GE Fanuc DSP230 single board computer; AXISLib VSIPL 2.3; VxWorks 6.6; 2 cores; 1MB cache/core; ~25W max processor power –Intel Core 2 Duo SL GHz HP 2530P Laptop; N.A. Software VSIPL Beta for Intel Architecture; Linux; 2 cores; 6 MB shared cache; 31.5 W max power incl. chipset –Intel Core i5-430M 2.0 GHz (fixed) Acer Aspire AS Laptop; N.A. Software VSIPL AVX Beta; Linux; 2 cores; 3MB shared cache; ~39W max power incl. chipset –Intel Core i7 2710QE 2.0 GHz (fixed) Intel ‘Emerald Lake’ Customer Ref. Board; N.A. Software VSIPL AVX Beta; 4 cores; 6MB shared cache; Linux; ~51W max power incl. chipset –Intel Core i7-2715QE 2.1 GHz; Intel Turbo Boost enabled GE Intelligent Platforms DSP280; AXISLib-AVX Beta; 1.0; Scientific Linux 6.0 x64, Kernel ; ~51W max power incl. chipset VSIPL Tests conducted by GE-IP and NA Software –Single Thread tests only; data in warm caches where possible SAR/SARMTI Tests conducted by NA Software, 2010

MIT Lincoln Laboratory XYZ 3/11/2005 VSIPL 1D FFT Relative Performance –2008 Intel processor vs Freescale: Clock speed benefits Intel 256K points –2010 Intel processor: Minor or no improvement vs 2008 Intel processor –2011 Intel processor: Major micro-architecture enhancement: 2x load/store bandwidth; ring bus; 256-bit SIMD registers (Intel® AVX) vsip_ccfftip_f

MIT Lincoln Laboratory XYZ 3/11/2005 VSIPL and SAR Performance on Multiple Generations of Intel® Processors VSIPL* Function Intel® Core™ Processor 2011 vs Freescale* MPC- 8641D Intel® Core™ Processors 2010 vs 2008 Intel® Core™ Processors 2011 vs 2008 Intel® Core™ Processors 2011 vs 2010 Intel® Core™ i AVX vs SSE 1D FFT5 – 14 X < 1.1X X2.3 – 3.5 X X Multiple 1D FFT5.4 – 10.4 X2.5 – 3.3 X2.6 – 3.7 X1.3 – 1.6X 2D FFT, non-square matrices No Data 2 – 2.3 X0.7 – 1.3X 2D FFT, smaller square matrices 2.3 – 3.1 X1.1 – 1.6 X Complex Matrix Transpose 7 – 26 X < 1.1X 0.6 – 2.14 X2 – 4.1 XNone Vector Multiply4.6 – 72 X2.3 – 8.3 X2.1 – 4.5 X1 – 2.9 X Vector Sine1.1 – 1.4 X3.6 – 3.7 X3.1 – 4.7 X0.4 – 1.8 X Vector Cosine1.1 – 1.4 X2.2 – 2.6 X2.8 – 2.9 X1.4 X Vector Square Root2.6 – 4.5 X2.3 – 3.2 X2.3 – 2.9 X X Vector Scatter No Data 4.2 – 8.2x1.5 – 4.1 X Vector Gather2.5 – 4.8x1.5 – 4.1 X

MIT Lincoln Laboratory XYZ 3/11/2005 SAR/SARMTI Performance on 2010 and 2011 Intel® Processors –Core for core, the 2011 Intel® processor performs >2X better on these algorithms –Running with 4 threads yields a 3-4X performance benefit (additional 12W max thermal design power) N.A. Software Algorithm System 2 Threads (cores) 4 Threads (cores) SAR Intel® Core™ i7-2710QE with Intel® AVX 1.0 (2011) s0.027 s Intel® Core™ i5-430M with Intel SSE 4.2 (2010) s0.121 s* Intel® Core™ i7-2710QE Speed Up2.3X4.4X SARMTI Intel® Core™ i7-2710QE with Intel® AVX 1.0 (2011) 6.03 s3.841 s Intel® Core™ i5-430M with Intel SSE 4.2 (2010) s s* Intel® Core™ i7-2710QE Speed Up2.5X3.5X * 4 thread timings utilize Intel Hyper Threading since only a 2-core version is available.