An analytical model for ATLAS

Slides:



Advertisements
Similar presentations
Locality / Tiling María Jesús Garzarán University of Illinois at Urbana-Champaign.
Advertisements

Strategy to solve complex problems
The Study of Cache Oblivious Algorithms Prepared by Jia Guo.
Computer Organization CS224 Fall 2012 Lesson 44. Virtual Memory  Use main memory as a “cache” for secondary (disk) storage l Managed jointly by CPU hardware.
Lecture 34: Chapter 5 Today’s topic –Virtual Memories 1.
CS420 lecture six Loops. Time Analysis of loops Often easy: eg bubble sort for i in 1..(n-1) for j in 1..(n-i) if (A[j] > A[j+1) swap(A,j,j+1) 1. loop.
Interactions - factorial designs. A typical application Synthesis catalysttemperature Yield of product Yield=f (catalyst, temperature) Is there an optimal.
Data Locality CS 524 – High-Performance Computing.
CSC 2300 Data Structures & Algorithms January 26, 2007 Chapter 2. Algorithm Analysis.
The Power of Belady ’ s Algorithm in Register Allocation for Long Basic Blocks Jia Guo, María Jesús Garzarán and David Padua jiaguo, garzaran,
Data Locality CS 524 – High-Performance Computing.
Cache Organization Topics Background Simple examples.
An Experimental Comparison of Empirical and Model-based Optimization Keshav Pingali Cornell University Joint work with: Kamen Yotov 2,Xiaoming Li 1, Gang.
How do you simplify? Simple Complicated.
Analytic Models and Empirical Search: A Hybrid Approach to Code Optimization A. Epshteyn 1, M. Garzaran 1, G. DeJong 1, D. Padua 1, G. Ren 1, X. Li 1,
Automatic Performance Tuning Jeremy Johnson Dept. of Computer Science Drexel University.
CS 162 Intro to Programming II Searching 1. Data is stored in various structures – Typically it is organized on the type of data – Optimized for retrieval.
ECE 454 Computer Systems Programming Memory performance (Part II: Optimizing for caches) Ding Yuan ECE Dept., University of Toronto
C.E. Goutis V.I.Kelefouras University of Patras Department of Electrical and Computer Engineering VLSI lab Date: 31/01/2014 Compilers for Embedded Systems.
The Price of Cache-obliviousness Keshav Pingali, University of Texas, Austin Kamen Yotov, Goldman Sachs Tom Roeder, Cornell University John Gunnels, IBM.
Cache-oblivious Programming. Story so far We have studied cache optimizations for array programs –Main transformations: loop interchange, loop tiling.
An Experimental Comparison of Empirical and Model-based Optimization Kamen Yotov Cornell University Joint work with: Xiaoming Li 1, Gang Ren 1, Michael.
Library Generators and Program Optimization David Padua University of Illinois at Urbana-Champaign.
CS2100 Computer Organisation Virtual Memory – Own reading only (AY2015/6) Semester 1.
A Language for the Compact Representation of Multiple Program Versions Sébastien Donadio 1,2, James Brodman 3, Thomas Roeder 4, Kamen Yotov 4, Denis Barthou.
Virtual Memory Ch. 8 & 9 Silberschatz Operating Systems Book.
Topic: Code Essential Questions Digital Cornell Notes.
Empirical Optimization. Context: HPC software Traditional approach  Hand-optimized code: (e.g.) BLAS  Problem: tedious to write by hand Alternatives:
基 督 再 來 (一). 經文: 1 你們心裡不要憂愁;你們信神,也當信我。 2 在我父的家裡有許多住處;若是沒有,我就早 已告訴你們了。我去原是為你們預備地去 。 3 我 若去為你們預備了地方,就必再來接你們到我那 裡去,我在 那裡,叫你們也在那裡, ] ( 約 14 : 1-3)
$200 $400 $600 $800 $1000 $200 $400 $600 $800 $1000 $200 $400 $600 $800 $1000 $200 $400 $600 $800 $1000 $200 $400 $600 $800 $1000 $200.
Explanation Answer Code Question A J M X D H P S K L B C X 4
A Comparison of Cache-conscious and Cache-oblivious Programs
Database Management System
Bell Work.
Empirical Search and Library Generators
The Hardware/Software Interface CSE351 Winter 2013
Section 7: Memory and Caches
Distribution and components
Probabilistic Miss Equations: Evaluating Memory Hierarchy Performance
Cache Miss Rate Computations
Rounding Learning Objectives: Able to round to the nearest 10, 100
What are imaginary and complex numbers?
Automatic Performance Tuning
Automatic Measurement of Instruction Cache Capacity in X-Ray
Design Ribbed and Flat Slabs
Part 3. Linear Programming
Optimizing MMM & ATLAS Library Generator
Слайд-дәріс Қарағанды мемлекеттік техникалық университеті
Clustering 77B Recommender Systems
.. -"""--..J '. / /I/I =---=-- -, _ --, _ = :;:.
Implementation of neural gas on Cell Broadband Engine
A Comparison of Cache-conscious and Cache-oblivious Codes
Lecture 2- Query Processing (continued)
II //II // \ Others Q.
Изразеното в настоящата презентация мнение обвързва единствено автора и не представлява официално становище на Комисията за финансов надзор Данил Джоргов.
I1I1 a 1·1,.,.,,I.,,I · I 1··n I J,-·
Subtracting Whole Numbers
Section 5.1 Inverse Functions
What does it take to produce near-peak Matrix-Matrix Multiply
Cache Models and Program Transformations
Cache-oblivious Programming
Unit 2. Day 16..
Cache Memories.
Mark Elliot National Centre for Research Methods
Slope Fields (6.1) January 10th, 2017.
Unit 2. Day 17..
. '. '. I;.,, - - "!' - -·-·,Ii '.....,,......, -,
The Price of Cache-obliviousness
Presentation transcript:

An analytical model for ATLAS Joint work with Keshav Pingali (Cornell) Gerald DeJong Maria Garzaran

Modeling for Tile Size (NB) Models of increasing complexity 3*NB2 ≤ C Whole work-set fits in L1 NB2 + NB + 1 ≤ C Fully Associative Optimal Replacement Line Size: 1 word or Line Size > 1 word LRU Replacement

Explanation for LRU Model (I)

Explanation for LRU Model (II) Each iteration of j requires: -NB2 elems of A -a column of C (NB elems) -a column of B (NB elems) In the middle of iteration j+1, being able to reuse the Elements of A requires holding not one, but two colums of B; and one extra element of C. Thus:

Finding MU, NU, KU Following our code-generating strategy, we need If we simplify MU=NU, we get Once NU is obtained, MU*NU+ MU + NU + TR ≤ NR NU2+ 2*NU + (TR-NR) ≤ 0 MU = (NR-TR-NU) / (NU+1)