Compilation Techniques for Multimedia Processors Andreas Krall and Sylvain Lelait Technische Universitat Wien.

Slides:



Advertisements
Similar presentations
Optimizing Compilers for Modern Architectures Syllabus Allen and Kennedy, Preface Optimizing Compilers for Modern Architectures.
Advertisements

Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Intel ® Software Development.
1 Optimizing compilers Managing Cache Bercovici Sivan.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Parallelism & Locality Optimization.
MP3 Optimization Exploiting Processor Architecture and Using Better Algorithms Mancia Anguita Universidad de Granada J. Manuel Martinez – Lechado Vitelcom.
HW 2 is out! Due 9/25!. CS 6290 Static Exploitation of ILP.
1 CS 201 Compiler Construction Software Pipelining: Circular Scheduling.
Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Advanced Computer Architecture COE 501.
COMP4611 Tutorial 6 Instruction Level Parallelism
Breaking SIMD Shackles with an Exposed Flexible Microarchitecture and the Access Execute PDG Venkatraman Govindaraju, Tony Nowatzki, Karthikeyan Sankaralingam.
SIMD Single instruction on multiple data – This form of parallel processing has existed since the 1960s – The idea is rather than executing array operations.
Bernstein’s Conditions. Techniques to Exploit Parallelism in Sequential Programming Hierarchy of levels of parallelism: Procedure or Methods Statements.
Parallell Processing Systems1 Chapter 4 Vector Processors.
Carnegie Mellon Lessons From Building Spiral The C Of My Dreams Franz Franchetti Carnegie Mellon University Lessons From Building Spiral The C Of My Dreams.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 3 (and Appendix C) Instruction-Level Parallelism and Its Exploitation Computer Architecture.
COMP3221: Microprocessors and Embedded Systems Lecture 2: Instruction Set Architecture (ISA) Lecturer: Hui Wu Session.
Instruction Level Parallelism (ILP) Colin Stevens.
SCIENCES USC INFORMATION INSTITUTE An Open64-based Compiler Approach to Performance Prediction and Performance Sensitivity Analysis for Scientific Codes.
Addressing Optimization for Loop Execution Targeting DSP with Auto-Increment/Decrement Architecture Wei-Kai Cheng Youn-Long Lin* Computer & Communications.
Java for High Performance Computing Jordi Garcia Almiñana 14 de Octubre de 1998 de la era post-internet.
2015/6/21\course\cpeg F\Topic-1.ppt1 CPEG 421/621 - Fall 2010 Topics I Fundamentals.
Chapter 2 Instruction-Level Parallelism and Its Exploitation
A Preliminary Study On the Vectorization of Multimedia Applications for Multimedia Extensions Gang Ren Peng WuDavid Padua University of Illinois IBM T.J.
Compilation, Architectural Support, and Evaluation of SIMD Graphics Pipeline Programs on a General-Purpose CPU Mauricio Breternitz Jr, Herbert Hum, Sanjeev.
Semi-Automatic Composition of Data Layout Transformations for Loop Vectorization Shixiong Xu, David Gregg University of Dublin, Trinity College
This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit
L29:Lower Power Embedded Architecture Design 성균관대학교 조 준 동 교수,
Generic Software Pipelining at the Assembly Level Markus Pister
NATIONAL POLYTECHNIC INSTITUTE COMPUTING RESEARCH CENTER IPN-CICMICROSE Lab Design and implementation of a Multimedia Extension for a RISC Processor Eduardo.
SPL: A Language and Compiler for DSP Algorithms Jianxin Xiong 1, Jeremy Johnson 2 Robert Johnson 3, David Padua 1 1 Computer Science, University of Illinois.
Assembly Language for Intel-Based Computers, 4 th Edition Chapter 2: IA-32 Processor Architecture (c) Pearson Education, All rights reserved. You.
High level & Low level language High level programming languages are more structured, are closer to spoken language and are more intuitive than low level.
Short Vector SIMD Code Generation for DSP Algorithms
Multimedia Macros for Portable Optimized Programs Juan Carlos Rojas Miriam Leeser Northeastern University Boston, MA.
5-1 Chapter 5 - Languages and the Machine Department of Information Technology, Radford University ITEC 352 Computer Organization Principles of Computer.
Spiral: an empirical search system for program generation and optimization David Padua Department of Computer Science University of Illinois at Urbana-
System Software for Parallel Computing. Two System Software Components Hard to do the innovation Replacement for Tradition Optimizing Compilers Replacement.
AES Encryption Code Generator Undergraduate Research Project by Paul Magrath. Supervised by Dr David Gregg.
5-1 Chapter 5 - Languages and the Machine Principles of Computer Architecture by M. Murdocca and V. Heuring © 1999 M. Murdocca and V. Heuring Principles.
High-Level Transformations for Embedded Computing
CISC Machine Learning for Solving Systems Problems John Cavazos Dept of Computer & Information Sciences University of Delaware
Crosscutting Issues: The Rôle of Compilers Architects must be aware of current compiler technology Compiler Architecture.
CS/EE 5810 CS/EE 6810 F00: 1 Multimedia. CS/EE 5810 CS/EE 6810 F00: 2 New Architecture Direction “… media processing will become the dominant force in.
November 22, 1999The University of Texas at Austin Native Signal Processing Ravi Bhargava Laboratory of Computer Architecture Electrical and Computer.
Radix Sort and Hash-Join for Vector Computers Ripal Nathuji 6.893: Advanced VLSI Computer Architecture 10/12/00.
Compilers as Collaborators and Competitors of High-Level Specification Systems David Padua University of Illinois at Urbana-Champaign.
Programming Languages
Design of A Custom Vector Operation API Exploiting SIMD Intrinsics within Java Presented by John-Marc Desmarais Authors: Jonathan Parri, John-Marc Desmarais,
Turtle: A Constraint Imperative Programming Language Martin Grabmüller and Petra Hofstedt Fakultät IV – Elektrotechnik und Informatik.
A Retargetable Preprocessor for Multimedia Instructions* (work in progress) INRIA F. Bodin, G. Pokam, J. Simonnet *partially supported by ST Microelectronics.
 A macro represents a commonly used group of statements in the source programming language.  The macro processor replaces each macro instruction with.
Xinsong1 Multimedia Extension Technology survey Xinsong Yang Electrical and Computer Engineering 734 Final Project 5/10/2002.
Computer Operation. Binary Codes CPU operates in binary codes Representation of values in binary codes Instructions to CPU in binary codes Addresses in.
In Search of the Optimal WHT Algorithm J. R. Johnson Drexel University Markus Püschel CMU
Fakultät für informatik informatik 12 technische universität dortmund Prepass Optimizations - Session 11 - Heiko Falk TU Dortmund Informatik 12 Germany.
SIMD Programming CS 240A, Winter Flynn* Taxonomy, 1966 In 2013, SIMD and MIMD most common parallelism in architectures – usually both in same.
Computer Architecture Principles Dr. Mike Frank
Introduction to Advanced Topics Chapter 1 Text Book: Advanced compiler Design implementation By Steven S Muchnick (Elsevier)
Why to use the assembly and why we need this course at all?
Getting Started with Automatic Compiler Vectorization
Vector Processing => Multimedia
SIMD Programming CS 240A, 2017.
Performance Optimization for Embedded Software
STUDY AND IMPLEMENTATION
Register Pressure Guided Unroll-and-Jam
Coe818 Advanced Computer Architecture
Samuel Larsen and Saman Amarasinghe, MIT CSAIL
Samuel Larsen Saman Amarasinghe Laboratory for Computer Science
Presentation transcript:

Compilation Techniques for Multimedia Processors Andreas Krall and Sylvain Lelait Technische Universitat Wien

Motivation High processing power needed by multimedia applications Special instruction sets for multimedia data Implemented by special processors Multimedia instruction set extensions Visual instruction set(VIS) of UltraSPARC Altivec extension of PowerPC MMX extension of Pentium MAX-2 instruction set of HP PA-RISC

Problems and Solutions New functionality has not been exploited properly Have to code in assembly language Use provided system libraries Call macros in high-level language Compile a program coded in high-level language into multimedia instructions Classic vectorization Vectorization by loop unrolling

Classic Vectorization Loop analysis Loop normalization Scalar expansion Dependence analysis Vectorization Alignment management Strip mining Constant expansion Lower iteration space Lower alignment Instruction selection and register allocation

Classic Vectorization Loop analysis Loop normalization Scalar expansion Dependence analysis Vectorization Alignment management Strip mining Constant expansion Lower iteration space Lower alignment Instruction selection and register allocation

Classic Vectorization Loop analysis Loop normalization Scalar expansion Dependence analysis Vectorization Alignment management Strip mining Constant expansion Lower iteration space Lower alignment Instruction selection and register allocation

Classic Vectorization Loop analysis Loop normalization Scalar expansion Dependence analysis Vectorization Alignment management Strip mining Constant expansion Lower iteration space Lower alignment Instruction selection and register allocation

Vectorization by Loop Unrolling Loop analysis Compute unrolling degree Loop unrolling Dependence analysis Dependence verification Generation of vector instructions Alignment management Lower iteration space Lower alignment Instruction selection and register allocation

Vectorization by Loop Unrolling Loop analysis Compute unrolling degree Loop unrolling Dependence analysis Dependence verification Generation of vector instructions Alignment management Lower iteration space Lower alignment Instruction selection and register allocation

Vectorization by Loop Unrolling Loop analysis Compute unrolling degree Loop unrolling Dependence analysis Dependence verification Generation of vector instructions Alignment management Lower iteration space Lower alignment Instruction selection and register allocation

Experimental Results