1 Advance Computer Architecture CSE 8383 Ranya Alawadhi.

Slides:

Advertisements

Similar presentations

Optimizing Compilers for Modern Architectures Syllabus Allen and Kennedy, Preface Optimizing Compilers for Modern Architectures.

Advertisements

Computer Architecture Instruction-Level Parallel Processors

CSCI 4717/5717 Computer Architecture

School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Parallelism & Locality Optimization.

VLIW Very Large Instruction Word. Introduction Very Long Instruction Word is a concept for processing technology that dates back to the early 1980s. The.

CPE 731 Advanced Computer Architecture Instruction Level Parallelism Part I Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.

Compiler techniques for exposing ILP

Loop Unrolling & Predication CSE 820. Michigan State University Computer Science and Engineering Software Pipelining With software pipelining a reorganized.

COMP4611 Tutorial 6 Instruction Level Parallelism

POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? ILP: VLIW Architectures Marco D. Santambrogio:

1 COMP 740: Computer Architecture and Implementation Montek Singh Tue, Feb 24, 2009 Topic: Instruction-Level Parallelism IV (Software Approaches/Compiler.

1 Advanced Computer Architecture Limits to ILP Lecture 3.

Overview Motivations Basic static and dynamic optimization methods ADAPT Dynamo.

Vector Processing. Vector Processors Combine vector operands (inputs) element by element to produce an output vector. Typical array-oriented operations.

Parallell Processing Systems1 Chapter 4 Vector Processors.

Chapter 8. Pipelining. Instruction Hazards Overview Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline.

PART 4: (2/2) Central Processing Unit (CPU) Basics CHAPTER 13: REDUCED INSTRUCTION SET COMPUTERS (RISC) 1.

Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.

 Understanding the Sources of Inefficiency in General-Purpose Chips.

Instruction Level Parallelism (ILP) Colin Stevens.

State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.

Reduced Instruction Set Computers (RISC) Computer Organization and Architecture.

COMPUTER ORGANIZATIONS CSNB123 May 2014Systems and Networking1.

Basics and Architectures

Optimization software for apeNEXT Max Lukyanov,  apeNEXT : a VLIW architecture  Optimization basics  Software optimizer for apeNEXT  Current.

Is Out-Of-Order Out Of Date ? IA-64’s parallel architecture will improve processor performance William S. Worley Jr., HP Labs Jerry Huck, IA-64 Architecture.

Predicated Static Single Assignment (PSSA) Presented by AbdulAziz Al-Shammari

Page 1 Trace Caches Michele Co CS 451. Page 2 Motivation  High performance superscalar processors  High instruction throughput  Exploit ILP –Wider.

Introducing The IA-64 Architecture - Kalyan Gopavarapu - Kalyan Gopavarapu.

Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.

CS 211: Computer Architecture Lecture 6 Module 2 Exploiting Instruction Level Parallelism with Software Approaches Instructor: Morris Lancaster.

1 Code optimization “Code optimization refers to the techniques used by the compiler to improve the execution efficiency of the generated object code”

Compilers for Embedded Systems Ram, Vasanth, and VJ Instructor : Dr. Edwin Sha Synthesis and Optimization of High-Performance Systems.

Limits of Instruction-Level Parallelism Presentation by: Robert Duckles CSE 520 Paper being presented: Limits of Instruction-Level Parallelism David W.

M. Mateen Yaqoob The University of Lahore Spring 2014.

ECEG-3202 Computer Architecture and Organization Chapter 7 Reduced Instruction Set Computers.

Reduced Instruction Set Computers. Major Advances in Computers(1) The family concept —IBM System/ —DEC PDP-8 —Separates architecture from implementation.

COMPUTER ORGANIZATIONS CSNB123 NSMS2013 Ver.1Systems and Networking1.

Hy-C A Compiler Retargetable for Single-Chip Heterogeneous Multiprocessors Philip Sweany 8/27/2010.

3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.

Final Review Prof. Mike Schulte Advanced Computer Architecture ECE 401.

High Performance Embedded Computing © 2007 Elsevier Lecture 10: Code Generation Embedded Computing Systems Michael Schulte Based on slides and textbook.

1 Lecture 12: Advanced Static ILP Topics: parallel loops, software speculation (Sections )

3/2/2016© Hal Perkins & UW CSES-1 CSE P 501 – Compilers Optimizing Transformations Hal Perkins Autumn 2009.

Memory-Aware Compilation Philip Sweany 10/20/2011.

Topics to be covered Instruction Execution Characteristics

Code Optimization.

Advanced Architectures

Computer Architecture Principles Dr. Mike Frank

Optimization Code Optimization ©SoftMoore Consulting.

CS203 – Advanced Computer Architecture

Chapter 14 Instruction Level Parallelism and Superscalar Processors

CSCI1600: Embedded and Real Time Software

Levels of Parallelism within a Single Processor

Optimizing Transformations Hal Perkins Autumn 2011

Stephen Hines, David Whalley and Gary Tyson Computer Science Dept.

Yingmin Li Ting Yan Qi Zhao

STUDY AND IMPLEMENTATION

Coe818 Advanced Computer Architecture

Optimizing Transformations Hal Perkins Winter 2008

EE 4xx: Computer Architecture and Performance Programming

Levels of Parallelism within a Single Processor

Chapter 12 Pipelining and RISC

Dynamic Hardware Prediction

How to improve (decrease) CPI

Loop-Level Parallelism

CSCI1600: Embedded and Real Time Software

Research: Past, Present and Future

Presentation transcript:

1 Advance Computer Architecture CSE 8383 Ranya Alawadhi

CSE8383CSE Compilers for Instruction-Level Parallelism Schlansker, M., Conte, T. M., Dehnert, J., Ebcioglu, K., Fang, J. Z., and Thompson, C. L Compilers for Instruction-Level Parallelism. Computer 30, 12 (Dec. 1997), 63-69

CSE8383CSE Agenda  Instruction Level Parallelism (ILP)  ILP Compilers Roles  Areas of interest to ILP Compilers  Conclusion  Questions?

CSE8383CSE Instruction Level Parallelism (ILP)  Allows a sequence of instructions derived from a sequential program to be parallelized for execution on multiple pipeline functional units  Advantages: Improves performance The programmer is not required to rewrite existing applications Works with current software programs  Implementation: Hardware-centric Software-centric

CSE8383CSE ILP Compilers Roles  Enhance performance  Eliminate the complex processing needed to parallelize code  Accelerate the nonlooping codes prevalent in most applications

CSE8383CSE Optimization Criteria Operation count Vs Processor model

CSE8383CSE Statistical Compilation  Statistical information is used to : Predict the outcome of conditional braches Improve program optimization & scheduling Improve the performance of frequently taken paths  Statistical information : The location of operands in cache The probability of a memory alias The likelihood that an operand has a specific value

CSE8383CSE ILP Scheduling  To achieve high performance, ILP compilers must jointly schedule multiple basic blocks  The formation of scheduling regions is best performed using control flow statistics  ILP schedulers address complex trade-offs using heuristics based on approximations

CSE8383CSE Dynamic Compilation  Static Compilation: Tune code to a single implementation of a specific processor architecture  Dynamic Compilation Transparently customizes an exactable file during execution Uses information not known when the software was distributed

CSE8383CSE Program Analysis  Specially memory analysis  Benefits: Improves program schedules Improves code quality better cache hierarchy use  Analysis techniques derived from sequential processor may produce poor results in ILP processors  Performing analysis over large amounts of code can be unacceptably slow and consume too much memory

CSE8383CSE Program Transformation  Representation to find fine-grained parallelism: Program graph Machine model  Transformations that support ILP: Expression reassociation Loop unrolling Tail duplication Register renaming Procedure inlining

CSE8383CSE Increasing Hardware Parallelism  Current processor designs attempt to use more functional units to provide increased hardware parallelism  Compilers take an increasingly complex responsibilities to ensure efficient use of hardware resources  The number of operations “in flight” measures the amount of parallelism the compiler must provide to keep an ILP processor busy

CSE8383CSE Architectures & Compilers  To assess new architectures compilers must incorporate proposed architectural features  Compilers are the only way to evaluate an architecture’s performance on real applications

CSE8383CSE Promising Areas of Research  ILP compiler techniques are evolving from scientific computing technology into a broadly useful scalar technology  There are Obstacles inhibit the efficient use of hardware parallelism  In addition to the rewards gained from beneficial techniques, they can generate side effects!

CSE8383CSE Techniques to Reduce Compile Time  New strategies results in long compile times  To speed compilation: Careful application partitioning Better algorithms for analysis and optimization

CSE8383CSE Conclusion  ILP represents a paradigm shift that redefines the traditional field of compilation  ILP compilation presents challenges not addressed in traditional compilers  As we scale up the amount of hardware parallelism, compilers take on increasingly complex responsibilities to ensure efficient use of hardware resources

CSE8383CSE Questions?