Compilers as Collaborators and Competitors of High-Level Specification Systems David Padua University of Illinois at Urbana-Champaign.

Slides:



Advertisements
Similar presentations
Agenda Definitions Evolution of Programming Languages and Personal Computers The C Language.
Advertisements

Automatic Tuning of Scientific Applications Apan Qasem Ken Kennedy Rice University Houston, TX Apan Qasem Ken Kennedy Rice University Houston, TX.
© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.
ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto
Computer Architecture Lecture 7 Compiler Considerations and Optimizations.
Overview Motivations Basic static and dynamic optimization methods ADAPT Dynamo.
Programming Languages Marjan Sirjani 2 2. Language Design Issues Design to Run efficiently : early languages Easy to write correctly : new languages.
The Last Lecture Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 512 at Rice University have explicit permission.
Introductory Courses in High Performance Computing at Illinois David Padua.
Online Performance Auditing Using Hot Optimizations Without Getting Burned Jeremy Lau (UCSD, IBM) Matthew Arnold (IBM) Michael Hind (IBM) Brad Calder (UCSD)
Architectural Design Principles. Outline  Architectural level of design The design of the system in terms of components and connectors and their arrangements.
The Power of Belady ’ s Algorithm in Register Allocation for Long Basic Blocks Jia Guo, María Jesús Garzarán and David Padua jiaguo, garzaran,
From Cooper & Torczon1 Implications Must recognize legal (and illegal) programs Must generate correct code Must manage storage of all variables (and code)
MaJIC: Compiling MATLAB for speed and responsiveness George Almasi and David Padua.
Copyright Arshi Khan1 System Programming Instructor Arshi Khan.
COP4020 Programming Languages
272: Software Engineering Fall 2012 Instructor: Tevfik Bultan Lecture 4: SMT-based Bounded Model Checking of Concurrent Software.
CISC673 – Optimizing Compilers1/34 Presented by: Sameer Kulkarni Dept of Computer & Information Sciences University of Delaware Phase Ordering.
Query Processing Presented by Aung S. Win.
- 1 - EE898-HW/SW co-design Hardware/Software Codesign “Finding right combination of HW/SW resulting in the most efficient product meeting the specification”
Software Engineering Muhammad Fahad Khan
Reverse Engineering State Machines by Interactive Grammar Inference Neil Walkinshaw, Kirill Bogdanov, Mike Holcombe, Sarah Salahuddin.
SPL: A Language and Compiler for DSP Algorithms Jianxin Xiong 1, Jeremy Johnson 2 Robert Johnson 3, David Padua 1 1 Computer Science, University of Illinois.
TESTING.
Antoine Monsifrot François Bodin CAPS Team Computer Aided Hand Tuning June 2001.
High level & Low level language High level programming languages are more structured, are closer to spoken language and are more intuitive than low level.
CSE 303 – Software Design and Architecture
Overview of the Course Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Compiler course 1. Introduction. Outline Scope of the course Disciplines involved in it Abstract view for a compiler Front-end and back-end tasks Modules.
Automatic Performance Tuning Jeremy Johnson Dept. of Computer Science Drexel University.
FFT: Accelerator Project Rohit Prakash Anand Silodia.
Carnegie Mellon Generating High-Performance General Size Linear Transform Libraries Using Spiral Yevgen Voronenko Franz Franchetti Frédéric de Mesmay Markus.
Spiral: an empirical search system for program generation and optimization David Padua Department of Computer Science University of Illinois at Urbana-
Generative Programming. Automated Assembly Lines.
Advanced Compiler Design Early Optimizations. Introduction Constant expression evaluation (constant folding)  dataflow independent Scalar replacement.
Chapter 25: Code-Tuning Strategies. Chapter 25  Code tuning is one way of improving a program’s performance, You can often find other ways to improve.
Compiler Principles Fall Compiler Principles Lecture 0: Local Optimizations Roman Manevich Ben-Gurion University.
Investigating Adaptive Compilation using the MIPSpro Compiler Keith D. Cooper Todd Waterman Department of Computer Science Rice University Houston, TX.
1 EECS 6083 Compiler Theory Based on slides from text web site: Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
CS 460/660 Compiler Construction. Class 01 2 Why Study Compilers? Compilers are important – –Responsible for many aspects of system performance Compilers.
Introduction to Compilers. Related Area Programming languages Machine architecture Language theory Algorithms Data structures Operating systems Software.
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Overview of Compilers and JikesRVM John.
RUN-Time Organization Compiler phase— Before writing a code generator, we must decide how to marshal the resources of the target machine (instructions,
CISC Machine Learning for Solving Systems Problems John Cavazos Dept of Computer & Information Sciences University of Delaware
Introduction to Code Generation Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Software Engineering 2004 Jyrki Nummenmaa 1 BACKGROUND There is no way to generally test programs exhaustively (that is, going through all execution.
Compiler Optimizations ECE 454 Computer Systems Programming Topics: The Role of the Compiler Common Compiler (Automatic) Code Optimizations Cristiana Amza.
Library Generators and Program Optimization David Padua University of Illinois at Urbana-Champaign.
Review of Parnas’ Criteria for Decomposing Systems into Modules Zheng Wang, Yuan Zhang Michigan State University 04/19/2002.
A Memory-hierarchy Conscious and Self-tunable Sorting Library To appear in 2004 International Symposium on Code Generation and Optimization (CGO ’ 04)
Application of machine learning to RCF decision procedures Zongyan Huang.
1 Asstt. Prof Navjot Kaur Computer Dept PRESENTED BY.
From Use Cases to Implementation 1. Structural and Behavioral Aspects of Collaborations  Two aspects of Collaborations Structural – specifies the static.
From Use Cases to Implementation 1. Mapping Requirements Directly to Design and Code  For many, if not most, of our requirements it is relatively easy.
In Search of the Optimal WHT Algorithm J. R. Johnson Drexel University Markus Püschel CMU
©SoftMoore ConsultingSlide 1 Code Optimization. ©SoftMoore ConsultingSlide 2 Code Optimization Code generation techniques and transformations that result.
Tools and Libraries for Manycore Computing Kathy Yelick U.C. Berkeley and LBNL.
CS498 DHP Program Optimization Fall Course organization  Instructors: Mar í a Garzar á n David Padua.
Optimizing the Performance of Sparse Matrix-Vector Multiplication
Advanced Computer Systems
SOFTWARE DESIGN AND ARCHITECTURE
Chapter 1 Introduction.
Optimization Code Optimization ©SoftMoore Consulting.
Overview of the Course Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Department of Computer Science University of California, Santa Barbara
Department of Computer Science University of California, Santa Barbara
Rohan Yadav and Charles Yuan (rohany) (chenhuiy)
Presentation transcript:

Compilers as Collaborators and Competitors of High-Level Specification Systems David Padua University of Illinois at Urbana-Champaign

Towards a Synthesis There is much interaction and overlap between compilers and code generation from very high level specifications. Both technologies could merge into “supercompiler” technology.  Thesis, antithesis  synthesis

Higher Levels of Abstraction… One of the main goals of Software Research is to facilitate program development. Raise the level of abstraction. What rather than how.  Subroutines – Control abstraction  Data abstraction mechanisms

… Higher Levels of Abstraction Programming is simplified by using macro operations from a catalog. Modules (subroutines/classes/…)  Part of the language (Fortran 90, MATLAB, SETL)  Standard libraries  Hand–written  Automatically generated  Application specific (usually hand written)

Performance and Abstraction In many cases the main mechanism to attain high performance is to develop high-performance library routines.  For example, MATLAB programming style is to use functions as much as possible. This approach does not always work. Real applications make little use of pre-existing libraries.  One reason: Data structures are not always in the right format.  Another: The overhead associated with class accesses. For this reason, with current technology, Higher-level => Lower performance

Automatic Generation of Modules from Specifications… Several systems aim at generating the fastest possible routines for certain classes of computations  Relatively simple (algorithms)  Very high performance implementation can be tedious and time consuming. Examples of these systems include  ATLAS  FFTW  Spiral

… Automatic Generation of Modules from Specifications Other systems try to simplify the generation of complete applications. Although performance is also a concern, language design and correctness are the most important issues.  Ellpack  GPSS  Many CAD systems

ATLAS Generate several versions of BLAS routines  Different tile sizes  Different degrees of unrolling  Loop ordering is fixed Run all and choose the fastest

FFTW Recursive divide-and-conquer  Plan: factorization tree  Factorization stop at certain sizes  Execution: call codelets Codelet  Subroutines for small-size FFTs  Optimized and fully-unrolled  Generated by a dedicated compiler Adapt to environment at run-time  Dynamic programming F 1024 F 128 F 16 F8F8 F8F8 F rs = (I r  F s )L(F r  I s )T

SPIRAL Formula Generator SPL Compiler Performance Evaluation Search Engine DSP Transform Target Architecture DSP Libraries SPL Formulae C/FORTRAN Programs

Supercompilers … Integration of Very High Level Specifications with Conventional Languages Besides conventional subroutines selected from a catalog), the languages accepted by supercompilers would also call “macros” which could be used to generate code as a function of the  Target machine  Value of data  Structure of data  Shape of data  Rest of the program  Numerical properties

… Supercompilers … Macros could be subroutines or class methods. Expanding classes could include data representation selection (including data distribution)  SETL  Automatic Dense  Sparse techniques  Automatic data distribution techniques

… Supercompilers In theory at least, generating code from specifications rather than from specific HLL implementations should lead to better performance. All the benefits of abstraction without the performance penalty.

Vectorizers and High Level Specifications do i=1,n a(i)=b(i)+c(i) d(i)=a(i)+d(i-1) if (m > d(i)) m=d(i) end do do i=1,n a(i)=b(i)+c(i) end do do i=1,n d(i)=a(i)+d(i-1) end do do i=1,n if (m > d(i)) m=d(i) end do a(1:n)=b(1:n)+c(1:n) d(1:n) = lin-rec(a,d,1,n) m=min(m,d(1:n)

Back End Compilers and Supercompilers … Back End Compilers take care of  Machine code generation  Register allocation  Conventional optimizations But not really trusted by today’s module generation systems (Competitors)  The existence of ATLAS is just an indictment of current compiler technology.  FFTW does clustering to improve register allocation.  Spiral does a variety of conventional optimizations.

Optimizations in Spiral SPL Compiler C/Fortran Compiler Formula Generator * High-level scheduling * Loop transformation * High-level optimizations - Constant folding - Copy propagation - CSE - Dead code elimination * Low-level optimizations - Instruction scheduling - Register allocation

Basic Optimizations (FFT, N=2 5, SPARC, f77 – fast – O5)

Basic Optimizations (FFT, N=2 5, PII, g77 – O6 – malign-double)

Basic Optimizations (FFT, N=2 5, MIPS, f77 – O3)

Can Module Generators Rely on Back End Compilers ? Not always, but using backend compilers will always be necessary for portability (Collaborators). But … Compilers can hinder efforts to get good performance.  For example, bad register allocation can have a serious negative impact. Need a standard set of commands to control transformations applied by compiler

… Back End Compilers and Supercompilers In Supercompilers transformations should be done by the Back End whenever possible. Reason: Applies to all parts of the program not only to very high-level components.

Search … Search is an important component of module generators. Also used by conventional compilers, but compilers usually work with static predictions rather than actual execution times.  KAP tried all possible loop permutations.  SGI-PRO tries many combinations of unrolling of unrolling.  Superoptimizer and similar systems.  Most compiler optimization algorithms are heuristics with no search involved.

… Search … In Supercompilers search could also be done across several algorithms looking for a good data representation and data distribution for the whole program.

… Search … Search strategy could make use of actual execution times combined with static performance prediction  Static prediction not very accurate today.  Tight performance bounds to prune the search.  Some decisions could be made at run-time IF statements/multiversion loops JIT compilers

… Search Some search could be based on data dependent behavior  Profiling  “Representative” data set Search strategy is important given that space of possibilities is often large and not monotonic. And it is difficult to know how far the search process is from the optimum.  Need to develop tight bounds.

Size of Search Space N# of formulasN , , , ,646, ,649, ,039, ,693, ,968,801,519

Coverage Need a class of specifications large enough to represent most of the computation. Effectiveness of approach will depend on coverage. Current libraries are a good start. But … it is not clear how much these libraries typically cover. To impact programming in general current approaches would have to be extended to other domains such as sparse computations, sorting, searching. …

Conclusions As we understand better algorithm choices and their impact in performance it becomes feasible to automate much of the process of selecting data structures and algorithms to maximize performance. A first step: a repository of routines/classes with several implementations for each subroutine. But generation based on context could lead to better performance. In particular generation from very high-level specifications could allow the generation of code combining several operations in ways that is impossible to conceive with current encapsulation mechanisms.