Trace-Based Optimization for Precomputation and Prefetching Madhusudan Raman Supervisor: Prof. Michael Voss.

Slides:



Advertisements
Similar presentations
Analysis of Algorithms: time & space Dr. Jeyakesavan Veerasamy The University of Texas at Dallas, USA.
Advertisements

Computer Memory and Data Transfer
Program Analysis and Tuning The German High Performance Computing Centre for Climate and Earth System Research Panagiotis Adamidis.
Microprocessors. Von Neumann architecture Data and instructions in single read/write memory Contents of memory addressable by location, independent of.
Overview Motivations Basic static and dynamic optimization methods ADAPT Dynamo.
Basic Memory Management 1. Readings r Silbershatz et al: chapters
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
Helper Threads via Virtual Multithreading on an experimental Itanium 2 processor platform. Perry H Wang et. Al.
CPU Processor Speed Timeline Speed =.02 Mhz Year= 1972 Transistors= 3500 It takes 66, CPU’s to equal 1 i7.
Computer Systems. Computer System Components Computer Networks.
1 School of Computing Science Simon Fraser University CMPT 300: Operating Systems I Dr. Mohamed Hefeeda.
1 Lecture 6 Performance Measurement and Improvement.
Introduction and Motivation Microcontrollers vs. microprocessors uC: A complete computer system optimized for h/w control that encapsulates processor,
CSCE101 – 4.2, 4.3 October 17, Power Supply Surge Protector –protects from power spikes which ruin hardware. Voltage Regulator – protects from insufficient.
Associative Cache Mapping A main memory block can load into any line of cache Memory address is interpreted as tag and word (or sub-address in line) Tag.
Disco Running Commodity Operating Systems on Scalable Multiprocessors.
Midterm Tuesday October 23 Covers Chapters 3 through 6 - Buses, Clocks, Timing, Edge Triggering, Level Triggering - Cache Memory Systems - Internal Memory.
Fall 2001CS 4471 Chapter 2: Performance CS 447 Jason Bakos.
From Essentials of Computer Architecture by Douglas E. Comer. ISBN © 2005 Pearson Education, Inc. All rights reserved. 7.2 A Central Processor.
Processor Types And Instruction Sets Barak Perelman CS147 Prof. Lee.
The CPU The Central Presentation Unit Language Levels Fetch execute cycle Processor speed.
CH12 CPU Structure and Function
CPU Scheduling - Multicore. Reading Silberschatz et al: Chapter 5.5.
Motivation “Every three minutes a woman is diagnosed with Breast cancer” (American Cancer Society, “Detailed Guide: Breast Cancer,” 2006) Explore the use.
WHAT IS THIS? OBJECTIVE AND OUTCOMES Candidates should be able to: Describe and explain the CPU as fetching, decoding and executing of instructions and.
Basic Microcomputer Design. Inside the CPU Registers – storage locations Control Unit (CU) – coordinates the sequencing of steps involved in executing.
Lecture#14. Last Lecture Summary Memory Address, size What memory stores OS, Application programs, Data, Instructions Types of Memory Non Volatile and.
Unit 2 - Hardware Microprocessors & CPUs. What is a microprocessor? ● The brain of the computer, the microprocessor is responsible for organizing and.
Presented by: Sergio Ospina Qing Gao. Contents ♦ 12.1 Processor Organization ♦ 12.2 Register Organization ♦ 12.3 Instruction Cycle ♦ 12.4 Instruction.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
Types Of Computer- Mainframe Computers. Alla’ Abu-Sultaneh 9B1.
Transmeta and Dynamic Code Optimization Ashwin Bharambe Mahim Mishra Matthew Rosencrantz.
ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Execution Characteristics of SPEC CPU2000 Benchmarks: Intel C++ vs. Microsoft VC++
AES Encryption Code Generator Undergraduate Research Project by Paul Magrath. Supervised by Dr David Gregg.
Super computers Parallel Processing By Lecturer: Aisha Dawood.
The Central Processing Unit (CPU) and the Machine Cycle.
SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU.
CS5222 Advanced Computer Architecture Part 3: VLIW Architecture
Dynamo: A Transparent Dynamic Optimization System Bala, Dueterwald, and Banerjia projects/Dynamo.
Precomputation- based Prefetching By James Schatz and Bashar Gharaibeh.
CS 295 – Memory Models Harry Xu Oct 1, Multi-core Architecture Core-local L1 cache L2 cache shared by cores in a processor All processors share.
THE BRIEF HISTORY OF 8085 MICROPROCESSOR & THEIR APPLICATIONS
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Demand Paging.
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
Computer performance issues* Pipelines, Parallelism. Process and Threads.
Lecture on Central Process Unit (CPU)
System Hardware FPU – Floating Point Unit –Handles floating point and extended integer calculations 8284/82C284 Clock Generator (clock) –Synchronizes the.
Threaded Programming Lecture 1: Concepts. 2 Overview Shared memory systems Basic Concepts in Threaded Programming.
Logical & Physical Address Nihal Güngör. Logical Address In simplest terms, an address generated by the CPU is known as a logical address. Logical addresses.
M211 – Central Processing Unit
Lab Activities 1, 2. Some of the Lab Server Specifications CPU: 2 Quad(4) Core Intel Xeon 5400 processors CPU Speed: 2.5 GHz Cache : Each 2 cores share.
SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU.
CPUz 4 n00bz.
Central Processing Unit (CPU) The Computer’s Brain.
Interrupts and Exception Handling. Execution We are quite aware of the Fetch, Execute process of the control unit of the CPU –Fetch and instruction as.
Computer Operation. Binary Codes CPU operates in binary codes Representation of values in binary codes Instructions to CPU in binary codes Addresses in.
1 A simple parallel algorithm Adding n numbers in parallel.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
MAHARANA PRATAP COLLEGE OF TECHNOLOGY SEMINAR ON- COMPUTER PROCESSOR SUBJECT CODE: CS-307 Branch-CSE Sem- 3 rd SUBMITTED TO SUBMITTED BY.
Multi-Core CPUs Matt Kuehn. Roadmap ► Intel vs AMD ► Early multi-core processors ► Threads vs Physical Cores ► Multithreading and Multi-core processing.
CPU Central Processing Unit
Chapter 10: Computer systems (1)
OCR GCSE Computer Science Teaching and Learning Resources
Architecture Background
Computer Architecture 2
Computer Architecture
EE 4xx: Computer Architecture and Performance Programming
CS 286 Computer Architecture & Organization
CS Introduction to Operating Systems
Presentation transcript:

Trace-Based Optimization for Precomputation and Prefetching Madhusudan Raman Supervisor: Prof. Michael Voss

Motivation Processors read and write data from memory Over 50% of execution time Memory Access CPU Cache Main Memory

SMT/Hyperthreading Some CPUs run multiple threads at a time Pentium IV, IBM Power5 Can we use one thread to speed up another? Yes, Prefetch data into the shared cache Cache Main Memory CPU 1 CPU 2

TOPP Inspect application as it runs Detect “costly” memory accesses On the fly, generate and execute code to fetch program data before it is needed Inspect Generate Code CPU 2 - Execute Prefetching Code CPU 1 - Run Program Identify costly memory accesses

Why is this novel? Transparent to application user Could be made transparent to developer Optimizations done completely at runtime Uses Trace-Based Optimization Driven by built-in hardware performance counters