Unconventional applications of Intel® Xeon Phi™ Processor (KNL)

Slides:



Advertisements
Similar presentations
Introduction to .NET Framework
Advertisements

XEON PHI. TOPICS What are multicore processors? Intel MIC architecture Xeon Phi Programming for Xeon Phi Performance Applications.
Erhan Erdinç Pehlivan Computer Architecture Support for Database Applications.
Chapter 4: Threads. Overview Multithreading Models Threading Issues Pthreads Windows XP Threads.
CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.
Copyright Arshi Khan1 System Programming Instructor Arshi Khan.
JavaScript Event Loop Not yo mama’s multithreaded approach. slidesha.re/ZPC2nD.
ORIGINAL AUTHOR JAMES REINDERS, INTEL PRESENTED BY ADITYA AMBARDEKAR Overview for Intel Xeon Processors and Intel Xeon Phi coprocessors.
Processes Part I Processes & Threads* *Referred to slides by Dr. Sanjeev Setia at George Mason University Chapter 3.
Microsoft Visual Basic 2005 CHAPTER 1 Introduction to Visual Basic 2005 Programming.
Computer System Architectures Computer System Software
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
1 8/29/05CS360 Windows Programming Professor Shereen Khoja.
@2011 Mihail L. Sichitiu1 Android Introduction Platform Overview.
9/13/20151 Threads ICS 240: Operating Systems –William Albritton Information and Computer Sciences Department at Leeward Community College –Original slides.
Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
DISTRIBUTED COMPUTING. Computing? Computing is usually defined as the activity of using and improving computer technology, computer hardware and software.
A New Parallel Debugger for Franklin: DDT Katie Antypas User Services Group NERSC User Group Meeting September 17, 2007.
Chapter 4: Threads. 4.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th edition, Jan 23, 2005 Chapter 4: Threads Overview Multithreading.
A few issues on the design of future multicores André Seznec IRISA/INRIA.
Intro to dot Net Dr. John Abraham UTPA CSCI 3327.
Key Components of.NET Framework  Common Language Runtime  CLR at Design time  CLR at Runtime  Class Library  Assemblies  Namespaces  ASP.NET  Applications.
Yang Yu, Tianyang Lei, Haibo Chen, Binyu Zang Fudan University, China Shanghai Jiao Tong University, China Institute of Parallel and Distributed Systems.
Microsoft .NET A platform that can be used for building and running windows and web applications such that the software is platform and device-independent.
Operating Systems: Internals and Design Principles
Application Lifecycle Management Tools for C++ in Visual Studio 2012 Rong Lu Program Manager Visual C++ Microsoft Corporation DEV316.
EKT303/4 Superscalar vs Super-pipelined.
Co-Processor Architectures Fermi vs. Knights Ferry Roger Goff Dell Senior Global CERN/LHC Technologist |
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 4: Threads.
Lab Activities 1, 2. Some of the Lab Server Specifications CPU: 2 Quad(4) Core Intel Xeon 5400 processors CPU Speed: 2.5 GHz Cache : Each 2 cores share.
Martin Kruliš by Martin Kruliš (v1.1)1.
Introduction to threads
M. Bellato INFN Padova and U. Marconi INFN Bologna
Intel Many Integrated Cores Architecture
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Chapter 4: Threads.
Modern supercomputers, Georgian supercomputer project and usage areas
Software Architecture in Practice
Visit for more Learning Resources
Introduction to Visual Basic 2008 Programming
Introduction to .NET Core
Processes and Threads Processes and their scheduling
Geant4 MT Performance Soon Yung Jun (Fermilab)
Parallel Processing - introduction
OCR on Knights Landing (Xeon-Phi)
CE-105 Spring 2007 Engr. Faisal ur Rehman
Multi-Processing in High Performance Computer Architecture:
Architecture Background
Operating System Concepts
Module 1: Getting Started
Operating Systems (CS 340 D)
Multi-Processing in High Performance Computer Architecture:
Chapter 4: Threads.
Template for IXPUG EMEA Ostrava, 2016
Compiler Back End Panel
Microsoft Connect /1/2018 2:36 AM
Compiler Back End Panel
Cloud Web Filtering Platform
1.1 The Characteristics of Contemporary Processors, Input, Output and Storage Devices Types of Processors.
CHAPTER 4:THreads Bashair Al-harthi OPERATING SYSTEM
Operating Systems (CS 340 D)
Prof. Leonardo Mostarda University of Camerino
Many-Core Graph Workload Analysis
IXPUG, SC’16 Lightning Talk Kavitha Chandrasekar*, Laxmikant V. Kale
CSE 502: Computer Architecture
Run time performance for all benchmarked software.
Running C# in the browser
Hypervisor A hypervisor or virtual machine monitor (VMM) is computer software, firmware or hardware that creates and runs virtual machines. A computer.
CS Introduction to Operating Systems
Presentation transcript:

Unconventional applications of Intel® Xeon Phi™ Processor (KNL) Antonio Cisternino (@cisterni)

Intel® Xeon Phi™ Processor Knights Landing (KNL) Differently from its predecessor (Knights Corner) KNL features full x86 support and it’s bootable Each core has 2 FPUs 512b wide for vectorization Core is derived from silvermont though significantly changed Modes support different workloads though using MCDRAM as cache is usually preferred

How the non FPU part works? We tested software stacks as far as possible from the typical HPC stack We used the following software stack: Linux Mono (open source .NET implementation) F# (using fsharpi the language REPL) Visual Studio Code + Ionide Firefox The platform allows to stress CPU (JIT and code) and memory (Garbage collection)

Visual Studio Code running on KNL accessed using X-Windows (Data science using FSLab)

(easier) Parallel programming Parallel task library 52x Sequential Parallel

A platform for microservices? Use of Suave F# web server to create a process exposing a simple function through HTTP Use of http_load tool for stress testing a number of URLs with different degree of parallelism Tested a large number (up to 128) web servers running on Dell C6320p with Intel® Xeon Phi™ Processor 7210 and on a Dell R730 with two Xeon E2680-v4 CPUs Measured: Latency of connection and data reception Number of fetches over 10 seconds made by a number of concurrent threads to all the servers We tried 256 Web servers though Xeon failed to properly start all the instances whilst KNL managed to start all of them

The (trivial) web server

64 web servers accessed by 64 threads XeonPHI Xeon E2680-v4 (2x)

64 Web servers accessed by 8 threads XeonPHI Xeon E2680-v4 (2x)

Comparisons of KNL vs Xeon 64/64 8/64 128/128 8/128

Conclusions A single Intel® Xeon Phi™ Processor core is less powerful (around 5-6x of a Xeon one) I/O bound workloads may nevertheless benefit from massively parallel architecture (including MCDRAM benefits) Mixed workload are possible allowing for vector- aware code to be coordinated by more traditional languages Productivity benefits from mature software stacks Intel® Xeon Phi™ Processor looks viable as a platform for microservices specially for the advantages of reduced power consumption