ICC Module 3 Lesson 2 – Memory Hierarchies 1 / 13 © 2015 Ph. Janson Information, Computing & Communication Memory Hierarchies – Clip 9 – Locality School.

Slides:

Advertisements

Similar presentations

Instruction Set Design

Advertisements

CS492B Analysis of Concurrent Programs Memory Hierarchy Jaehyuk Huh Computer Science, KAIST Part of slides are based on CS:App from CMU.

Lecture 8: Memory Hierarchy Cache Performance Kai Bu

CMSC 611: Advanced Computer Architecture Cache Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from.

Simulations of Memory Hierarchy LAB 2: CACHE LAB.

Computer System Overview

Computer System Overview

1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Oct. 30, 2002 Topic: Caches (contd.)

Cache Organization Topics Background Simple examples.

Fast matrix multiplication; Cache usage

DATA LOCALITY & ITS OPTIMIZATION TECHNIQUES Presented by Preethi Rajaram CSS 548 Introduction to Compilers Professor Carol Zander Fall 2012.

Chapter 1 Computer System Overview Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles,

Computer Orgnization Rabie A. Ramadan Lecture 7. Wired Control Unit What are the states of the following design:

CMPE 421 Parallel Computer Architecture

Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.

1 Cache Memories Andrew Case Slides adapted from Jinyang Li, Randy Bryant and Dave O’Hallaron.

Computer Architecture Memory organization. Types of Memory Cache Memory Serves as a buffer for frequently accessed data Small  High Cost RAM (Main Memory)

3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems

ECE 454 Computer Systems Programming Memory performance (Part II: Optimizing for caches) Ding Yuan ECE Dept., University of Toronto

C.E. Goutis V.I.Kelefouras University of Patras Department of Electrical and Computer Engineering VLSI lab Date: 31/01/2014 Compilers for Embedded Systems.

The Goal: illusion of large, fast, cheap memory Fact: Large memories are slow, fast memories are small How do we create a memory that is large, cheap and.

ICC Module 3 Lesson 5 – IT Security 1 / 4 © 2015 Ph. Janson Information, Computing & Communication Security – Clip 0 – Introduction School of Computer.

ICC Module 3 Lesson 1 – Computer Architecture 1 / 26 © 2015 Ph. Janson Information, Computing & Communication Computer Architecture Clip 1 – Assembler.

Basic Memory Management 1. Readings r Silbershatz et al: chapters

ICC Module 3 Lesson 1 – Computer Architecture 1 / 12 © 2015 Ph. Janson Information, Computing & Communication Computer Architecture Clip 6 – Logic parallelism.

1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.

Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.

ICC Module 3 Lesson 2 – Memory Hierarchies 1 / 9 © 2015 Ph. Janson Information, Computing & Communication Memory Hierarchies – Clip 1 – Technologies School.

ICC Module 3 Lesson 4 – Networking 1 / 16 © 2015 Ph. Janson Information, Computing & Communication Networking – Clip 3 – Protocol encapsulation School.

ICC Module 3 Lesson 2 – Memory Hierarchies 1 / 6 © 2015 Ph. Janson Information, Computing & Communication Memory Hierarchies – Clip 2 – Concept School.

Memory Hierarchy: Terminology Hit: data appears in some block in the upper level (example: Block X)  Hit Rate : the fraction of memory access found in.

08/10/ NRL Hybrid QR Factorization Algorithm for High Performance Computing Architectures Peter Vouras Naval Research Laboratory Radar Division Professor.

CS.305 Computer Architecture Memory: Virtual Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.

ICC Module 3 Lesson 1 – Computer Architecture 1 / 9 © 2015 Ph. Janson Information, Computing & Communication Computer Architecture Clip 7 – Architectural.

ICC Module 3 Lesson 4 – Networking 1 / 9 © 2015 Ph. Janson Information, Computing & Communication Networking – Clip 6 – Internet addressing School of.

ICC Module 3 Lesson 1 – Computer Architecture 1 / 12 © 2015 Ph. Janson Information, Computing & Communication Computer Architecture Clip 5 – Memory Circuits.

Computer Science 320 Load Balancing. Behavior of Parallel Program Why do 3 threads take longer than two?

ICC Module 3 Lesson 1 – Computer Architecture 1 / 13 © 2015 Ph. Janson Information, Computing & Communication Computer Architecture Clip 2 – Von Neumann.

ICC Module 3 Lesson 1 – Computer Architecture 1 / 6 © 2015 Ph. Janson Information, Computing & Communication Computer Architecture Clip 3 – Instruction.

Threaded Programming Lecture 1: Concepts. 2 Overview Shared memory systems Basic Concepts in Threaded Programming.

Scaling Conway’s Game of Life. Why do parallelism? Speedup – solve a problem faster. Accuracy – solve a problem better. Scaling – solve a bigger problem.

ICC Module 3 Lesson 4 – Networking 1 / 4 © 2015 Ph. Janson Information, Computing & Communication Networking – Clip 0 – Introduction School of Computer.

ICC Module 3 Lesson 2 – Memory Hierarchies 1 / 14 © 2015 Ph. Janson Information, Computing & Communication Memory Hierarchies – Clip 5 – Reading School.

ICC Module 3 Lesson 1 – Computer Architecture 1 / 11 © 2015 Ph. Janson Information, Computing & Communication Module 3 : Systems.

Matrix Multiplication The Introduction. Look at the matrix sizes.

ICC Module 3 Lesson 3 – Storage 1 / 4 © 2015 Ph. Janson Information, Computing & Communication Storage – Clip 0 – Introduction School of Computer Science.

ICC Module 3 Lesson 2 – Memory Hierarchies 1 / 25 © 2015 Ph. Janson Information, Computing & Communication Memory Hierarchies – Clip 8 – Example School.

CMSC 611: Advanced Computer Architecture

Memory Hierarchy Ideal memory is fast, large, and inexpensive

CSE 351 Section 9 3/1/12.

Chapter 2 Memory and process management

Introduction To Computer Systems

The Goal: illusion of large, fast, cheap memory

Cache Memories CSE 238/2038/2138: Systems Programming

The Hardware/Software Interface CSE351 Winter 2013

Section 7: Memory and Caches

Architecture Background

CS 105 Tour of the Black Holes of Computing

Cache Miss Rate Computations

Bojian Zheng CSCD70 Spring 2018

Memory Hierarchies.

November 14 6 classes to go! Read

Lecture 22: Cache Hierarchies, Memory

M. Usha Professor/CSE Sona College of Technology

Lecture 15: Memory Design

Morgan Kaufmann Publishers Memory Hierarchy: Introduction

Cache Memories Professor Hugh C. Lauer CS-2011, Machine Organization and Assembly Language (Slides include copyright materials from Computer Systems:

Chapter 1 Computer System Overview

Cache Memories.

Presentation transcript:

ICC Module 3 Lesson 2 – Memory Hierarchies 1 / 13 © 2015 Ph. Janson Information, Computing & Communication Memory Hierarchies – Clip 9 – Locality School of Computer Science & Communications B. Falsafi (charts), Ph. Janson (commentary)

ICC Module 3 Lesson 2 – Memory Hierarchies 2 / 13 © 2015 Ph. Janson Outline ►Clip 1 – TechnologiesClip 1 ►Clip 2 – ConceptClip 2 ►Clip 3 – PrincipleClip 3 ►Clip 4 – ImplementationClip 4 ►Clip 5 – Reading memoryClip 5 ►Clip 6 – Writing memoryClip 6 ►Clip 7 – Cache management – the Least Recently Used algorithmClip 7 ►Clip 8 – A simulated exampleClip 8 ►Clip 9 – LocalityClip 9 First clipPrevious clipFirst clipPrevious clipNext clip

ICC Module 3 Lesson 2 – Memory Hierarchies 3 / 13 © 2015 Ph. Janson (in cache in n-1 cases) load place return 2 (in cache) return 0 (in cache) return 2 add s, n (0 + 2) write (in cache) (in cache) return 2 add n, -1 (2 – 1) write 13 (in cache) Cache ? ? Processor 12 n n s s Have a closer look at cache accesses Main memory m m n n s s While n > 0 s  s + n n  n – 1

ICC Module 3 Lesson 2 – Memory Hierarchies 4 / 13 © 2015 Ph. Janson (in cache in n-1 cases) load place return 2 (in cache) return 0 (in cache) return 2 add s, n (0 + 2) write (in cache) (in cache) return 2 add n, -1 (2 – 1) write 13 (in cache) Cache ? ? Processor 12 n n s s Two things are happening here: Fact 1: identical addresses are re-accessed over time Main memory m m n n s s While n > 0 s  s + n n  n – 1

ICC Module 3 Lesson 2 – Memory Hierarchies 5 / 13 © 2015 Ph. Janson That is what is called temporal locality ►In cache because of multiple accesses to identical addresses within a short period of time ►This is typical of practical reality All “interesting” algorithms include loops that re-access the same variables many times (During a week of winter sports, you reuse your snowboard every day) 12 n n s s

ICC Module 3 Lesson 2 – Memory Hierarchies 6 / 13 © 2015 Ph. Janson load place return 2 (in cache) return 0 (in cache) return 2 add s, n (0 + 2) write (in cache) (in cache) return 2 add n, -1 (2 – 1) write 13 (in cache) Cache ? ? Processor 12 n n s s Two things are happening here: Fact 2: addresses in the same bloc are re-accessed over time Main memory m m n n s s While n > 0 s  s + n n  n – 1

ICC Module 3 Lesson 2 – Memory Hierarchies 7 / 13 © 2015 Ph. Janson This is what is called spatial locality ►In cache because of multiple accesses to different addresses within a short range of space (blocks) ►This is typical of practical reality All “interesting” algorithms include work with related and closely located variables (when you go skiing, you need your left ski and your right ski … and your ski shoes) 12 n n s s

ICC Module 3 Lesson 2 – Memory Hierarchies 8 / 13 © 2015 Ph. Janson ►2 words instead of 4 ►More smaller blocks ►n and s would be in different blocks ►Which would cause 2 cache (de)faults instead of 1 ►Less spatial locality ►But performance maintained with a smaller cache Cache Processor 12 n n s s What if blocks were smaller ? Main memory : : m m : : n n s s 0 2 : 12 14

ICC Module 3 Lesson 2 – Memory Hierarchies 9 / 13 © 2015 Ph. Janson ►8 words instead of 4 ►Fewer blocks in cache at the same time ►m, n, and s no longer fit in memory at the same time ►Which would cause 2 cache (de)faults instead of 1 at each execution of the program ►Less temporal locality ►But can be compensated by a larger cache Cache Processor What if blocks were larger ? Main memory 8 m m n n s s s s n n s s s s

ICC Module 3 Lesson 2 – Memory Hierarchies 10 / 13 © 2015 Ph. Janson Spatial vs. temporal locality & block size Fewer large blocks Better spatial locality ✗ Worse temporal locality More small blocks Better temporal locality ✗ Worse spatial locality Cache (de)faults Block size The optimum depends on the cache size and the number and usage of variables of the program

ICC Module 3 Lesson 2 – Memory Hierarchies 11 / 13 © 2015 Ph. Janson Programmers cannot ignore locality

ICC Module 3 Lesson 2 – Memory Hierarchies 12 / 13 © 2015 Ph. Janson Adding the elements of a matrix by lines or by columns for (i=0;i<n;i++) { /* row by row */ for (j=0;j<n;j++) {/* then column by column */ acc += m[i][j]; } for (j=0;j<n;j++) { /* column by column */ for (i=0;i<n;i++) {/* then row by row */ acc += m[i][j]; }

ICC Module 3 Lesson 2 – Memory Hierarchies 13 / 13 © 2015 Ph. Janson Results