Home Exam 1: Video Encoding on Intel x86 using Streaming SIMD Extensions (SSE) and Advanced Vector Extensions (AVX) Home Exam 1: Video Encoding on Intel.

Slides:



Advertisements
Similar presentations
Changing picture brightness using SSE Changing picture brightness using SSE Another SSE example….
Advertisements

Assessment of Data Path Implementations for Download and Streaming Pål Halvorsen 1,2, Tom Anders Dalseng 1 and Carsten Griwodz 1,2 1 Department of Informatics,
INF5063 – GPU & CUDA Håkon Kvale Stensland iAD-lab, Department for Informatics.
Home Exam 2: Video Encoding on GPUs using nVIDIA CUDA with Managed Memory Home Exam 2: Video Encoding on GPUs using nVIDIA CUDA with Managed Memory September.
We’ll be spending minutes talking about Quiz 1 that you’ll be taking at the next class session before you take the Gateway Quiz today.
We’ll be spending a few minutes talking about Quiz 2 on Sections that you’ll be taking the next class session, before you work on Practice Quiz.
Philips Research ICS 252 class, February 3, The Trimedia CPU64 VLIW Media Processor Kees Vissers Philips Research Visiting Industrial Fellow
Streaming SIMD Extension (SSE)
Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.
Introduction Introduction Håkon Kvale Stensland August 26 th, 2011 INF5063: Programming heterogeneous multi-core processors.
Introduction Introduction Håkon Kvale Stensland August 19 th, 2012 INF5063: Programming heterogeneous multi-core processors.
Introduction Introduction Håkon Kvale Stensland August 28 th, 2012 INF5063: Programming heterogeneous multi-core processors.
Introduction Introduction Håkon Kvale Stensland August 22 th, 2014 INF5063: Programming heterogeneous multi-core processors.
M-JPEG M-JPEG April 15, 2015 INF5063: Programming heterogeneous multi-core processors.
INF5063 – GPU & CUDA Håkon Kvale Stensland Department for Informatics / Simula Research Laboratory.
INF5063 – GPU & CUDA Håkon Kvale Stensland iAD-lab, Department for Informatics.
Håkon Kvale Stensland Simula Research Laboratory
IXP: Bump in the Wire IXP: Bump in the Wire INF5063: Programming Asymmetric Multi-Core Processors 15 April 2015.
Introduction Introduction 28/ INF5063: Programming heterogeneous multi-core processors.
INF5062 – Competition Håkon Kvale Stensland & Håvard Espeland Simula Research Laboratory.
Implementing Parallel Graph Algorithms Spring 2015 Implementing Parallel Graph Algorithms Lecture 1: Introduction Roman Manevich Ben-Gurion University.
INF5063 – GPU & CUDA Håkon Kvale Stensland iAD-lab, Department for Informatics.
Introduction to Computer Programming in C
1 Distributed Computing Algorithms CSCI Distributed Computing: everything not centralized many processors.
Programming with CUDA, WS09 Waqar Saleem, Jens Müller Programming with CUDA and Parallel Algorithms Waqar Saleem Jens Müller.
Programming with CUDA WS 08/09 Lecture 13 Thu, 04 Dec, 2008.
Speex encoder project Presented by: Gruper Leetal Kamelo Tsafrir Instructor: Guz Zvika Software performance enhancement using multithreading, SIMD and.
This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit
Abstraction IS 101Y/CMSC 101 Computational Thinking and Design Tuesday, September 17, 2013 Carolyn Seaman University of Maryland, Baltimore County.
WHAT IS THIS? OBJECTIVE AND OUTCOMES Candidates should be able to: Describe and explain the CPU as fetching, decoding and executing of instructions and.
CIS4930/CDA5125 Parallel and Distributed Systems Florida State University CIS4930/CDA5125: Parallel and Distributed Systems Instructor: Xin Yuan, 168 Love,
Assignment 5/9 – 2005 INF 5070 – Media Servers and Distribution Systems:
Performance Enhancement of Video Compression Algorithms using SIMD Valia, Shamik Jamkar, Saket.
Performance of mathematical software Agner Fog Technical University of Denmark
Hyper Threading (HT) and  OPs (Micro-Operations) Department of Computer Science Southern Illinois University Edwardsville Summer, 2015 Dr. Hiroshi Fujinoki.
C o n f i d e n t i a l 1 Course: BCA Semester: III Subject Code : BC 0042 Subject Name: Operating Systems Unit number : 1 Unit Title: Overview of Operating.
EEL5708/Bölöni Lec 1.1 August 21, 2006 Lotzi Bölöni Fall 2006 EEL 5708 High Performance Computer Architecture Lecture 1 Introduction.
1 CS1430: Programming in C++ Section 2 Instructor: Qi Yang 213 Ullrich
UNDER THE GUIDANCE DR. K. R. RAO SUBMITTED BY SHAHEER AHMED ID : Encoding H.264 by Thread Level Parallelism.
CS533 Concepts of Operating Systems Class 1 Course Overview.
Intermediate 2 Computing Unit 2 - Software Development.
We’ll be spending a few minutes talking about Quiz 2 on Sections that you’ll be taking the next class session, before you work on Practice Quiz.
Lab Activities 1, 2. Some of the Lab Server Specifications CPU: 2 Quad(4) Core Intel Xeon 5400 processors CPU Speed: 2.5 GHz Cache : Each 2 cores share.
Architectural Effects on DSP Algorithms and Optimizations Sajal Dogra Ritesh Rathore.
Processor Performance & Parallelism Yashwant Malaiya Colorado State University With some PH stuff.
Processor Level Parallelism 1
Introduction Introduction September 26, 2016 INF5063: Programming heterogeneous multi-core processors.
High Performance Computing on an IBM Cell Processor --- Bioinformatics
CS533 Concepts of Operating Systems Class 1
Real-Time Ray Tracing Stefan Popov.
We’ll be spending minutes talking about Quiz 1 that you’ll be taking at the next class session before you take the Gateway Quiz today.
We’ll be spending a few minutes talking about Quiz 2 on Sections that you’ll be taking the next class session, before you work on Practice Quiz.
Lecture 2: Intro to the simd lifestyle and GPU internals
Vector Processing => Multimedia
Teaching Computing to GCSE
Compiler Back End Panel
CS533 Concepts of Operating Systems Class 1
VLIW DSP vs. SuperScalar Implementation of a Baseline H.263 Encoder
Compiler Back End Panel
INF5063: Programming heterogeneous multi-core processors
1.1 The Characteristics of Contemporary Processors, Input, Output and Storage Devices Types of Processors.
Team Project Review the background information and project teams on our course site Work with the engineers (4-5) and your PM team member(s) You have.
Distributed Computing:
Lecture on High Performance Processor Architecture (CS05162)
CS 286 Computer Organization and Architecture
Identifying the Work to Be Done
Hands on Session
Performance and Code Tuning Overview
CS533 Concepts of Operating Systems Class 1
Scalable light field coding using weighted binary images
Presentation transcript:

Home Exam 1: Video Encoding on Intel x86 using Streaming SIMD Extensions (SSE) and Advanced Vector Extensions (AVX) Home Exam 1: Video Encoding on Intel x86 using Streaming SIMD Extensions (SSE) and Advanced Vector Extensions (AVX) August 22 nd 2014 INF5063: Programming heterogeneous multi-core processors … because the OS-course is just to easy! Håkon Kvale Stensland

INF5063, Pål Halvorsen, Carsten Griwodz, Håvard Espeland, Håkon Stensland University of Oslo Video Encoding Pål wants to encode some videos on his computer… Pål wants to spend all his budget on a new PowerPoint 2013 license, so new hardware or a H.264 licence is out of the question! It is therefore your task to make the encoder as efficient as possible on a single core (the rest of the cores are occupied running PowerPoint animations).

INF5063, Pål Halvorsen, Carsten Griwodz, Håvard Espeland, Håkon Stensland University of Oslo Codec63 Codec63 is basically MJPEG with intra-frame dependencies. It adds Motion Estimation and Motion Compensation from the MJPEG example. It has a more efficient DCT implementation (1D vs. 2D) Will be the basis for all three home exams in the course. The precode that is handed out today is a basic single threaded Codec63 encoder and decoder written in C.

INF5063, Pål Halvorsen, Carsten Griwodz, Håvard Espeland, Håkon Stensland University of Oslo Precode You are not allowed to change out the Motion Estimation, Motion Compensation or DCT algorithms. You are not allowed to paste code from other projects / encoders. You can use your own machines for this assignment, but you have to make sure that the code works on the machines at the Simula network You only need to optimize the Codec63 encoder! Your implementation is supposed to be single threaded, and optimized to use the instruction level parallelism available in a single x86 core.

INF5063, Pål Halvorsen, Carsten Griwodz, Håvard Espeland, Håkon Stensland University of Oslo Your task Utilize the instruction level parallelism (ILP), CPU vector unit and other tricks to get the most performance out of a single core. Start by profiling the encoder to see which parts of the encoder that are the bottlenecks. Remember, after optimizing one part of the code, more profiling might be needed to find new bottlenecks. Write a scientific report with details on which parts of the encoding process that benefited from your optimizations. The report should also explain how your code works. Remember to also report on optimization that did not work as you expected!

INF5063, Pål Halvorsen, Carsten Griwodz, Håvard Espeland, Håkon Stensland University of Oslo How are you evaluated? Make sure that your implementation compiles and run, and that it can produce correct video output (we also check the motion prediction). The use of the parallelization potential (SIMD) in a x86 CPU for: Motion Estimation, Motion Compensation and DCT/iDCT. Quality of the report. Is profiling of the code done between the different steps and how are the different optimization attempts documented and discussed in the report. Presentation of your solution on a “poster session” is required to pass the exam!

INF5063, Pål Halvorsen, Carsten Griwodz, Håvard Espeland, Håkon Stensland University of Oslo Formal Information  Deadline: Thursday September 18 th – 12:00  The assignment will be graded, and count 33% of the final grade.  Deliver your code, report and poster to:  Prepare a poster (two A3 pages) and a short talk (2 minutes without slides) to pitch your poster for the class (September 19 th ). Best poster & presentation will be awarded!

INF5063, Pål Halvorsen, Carsten Griwodz, Håvard Espeland, Håkon Stensland University of Oslo Last but not least!  Codec63 precode available for download in git. Clone the repository and work on you own local version. git clone  Bugs in the code can be reported in Bitbucket’s issue tracking system, or on ( inf5063 at ifi.uio.no )

Good Luck! PS! Start early!