Interleaved Multi-Bank Scratchpad Memories: A Probabilistic Description of Access Conflicts DAC '15, June 07 - 11, 2015, San Francisco, CA, USA.

Slides:



Advertisements
Similar presentations
Dynamic Power Redistribution in Failure-Prone CMPs Paula Petrica, Jonathan A. Winter * and David H. Albonesi Cornell University *Google, Inc.
Advertisements

To Queue or Not to Queue? Physical queues can be really stressful and exhausting…
Modelling Cell Signalling Pathways in PEPA
Efficient Information Retrieval for Ranked Queries in Cost-Effective Cloud Environments Presenter: Qin Liu a,b Joint work with Chiu C. Tan b, Jie Wu b,
Slide 1 Bayesian Model Fusion: Large-Scale Performance Modeling of Analog and Mixed- Signal Circuits by Reusing Early-Stage Data Fa Wang*, Wangyang Zhang*,
Yasuhiro Fujiwara (NTT Cyber Space Labs)
METHODS FOR HAPLOTYPE RECONSTRUCTION
1 A class of Generalized Stochastic Petri Nets for the performance Evaluation of Mulitprocessor Systems By M. Almone, G. Conte Presented by Yinglei Song.
A Hybridized Planner for Stochastic Domains Mausam and Daniel S. Weld University of Washington, Seattle Piergiorgio Bertoli ITC-IRST, Trento.
1 Multi-Core Systems CORE 0CORE 1CORE 2CORE 3 L2 CACHE L2 CACHE L2 CACHE L2 CACHE DRAM MEMORY CONTROLLER DRAM Bank 0 DRAM Bank 1 DRAM Bank 2 DRAM Bank.
Planning under Uncertainty
Kuang-Hao Liu et al Presented by Xin Che 11/18/09.
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
Concurrent Markov Decision Processes Mausam, Daniel S. Weld University of Washington Seattle.
Hidden Markov Models Pairwise Alignments. Hidden Markov Models Finite state automata with multiple states as a convenient description of complex dynamic.
Using Sampled and Incomplete Profiles David Kaeli Department of Electrical and Computer Engineering Northeastern University Boston, MA
CAC for Multimedia Services in Mobile Cellular Networks : A Markov Decision Approach Speaker : Xu Jia-Hao Advisor : Ke Kai-Wei Date : 2004 / 11 / 18.
Nonlinear and Non-Gaussian Estimation with A Focus on Particle Filters Prasanth Jeevan Mary Knox May 12, 2006.
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
Optimization Methods One-Dimensional Unconstrained Optimization
Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.1 Introduction.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Optimal Fixed-Size Controllers for Decentralized POMDPs Christopher Amato Daniel.
Optimization Methods One-Dimensional Unconstrained Optimization
Data Cache Prefetching using a Global History Buffer Presented by: Chuck (Chengyan) Zhao Mar 30, 2004 Written by: - Kyle Nesbit - James Smith Department.
Quality Indicators (Binary ε-Indicator) Santosh Tiwari.
CuMAPz: A Tool to Analyze Memory Access Patterns in CUDA
Achieving Non-Inclusive Cache Performance with Inclusive Caches Temporal Locality Aware (TLA) Cache Management Policies Aamer Jaleel,
Random Sampling, Point Estimation and Maximum Likelihood.
1 Performance Analysis of Coexisting Secondary Users in Heterogeneous Cognitive Radio Network Xiaohua Li Dept. of Electrical & Computer Engineering State.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
1 ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 6: Optimality Criterion in MDPs Dr. Itamar Arel College of Engineering Department.
Chapter 7 Random-Number Generation
RESOURCES, TRADE-OFFS, AND LIMITATIONS Group 5 8/27/2014.
Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.
Reinforcement Learning 主講人:虞台文 Content Introduction Main Elements Markov Decision Process (MDP) Value Functions.
Optimal XOR Hashing for a Linearly Distributed Address Lookup in Computer Networks Christopher Martinez, Wei-Ming Lin, Parimal Patel The University of.
Scaling Area Under a Curve. Why do parallelism? Speedup – solve a problem faster. Accuracy – solve a problem better. Scaling – solve a bigger problem.
A Passive Approach to Sensor Network Localization Rahul Biswas and Sebastian Thrun International Conference on Intelligent Robots and Systems 2004 Presented.
By Edward A. Lee, J.Reineke, I.Liu, H.D.Patel, S.Kim
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)
Parallel Genetic Algorithms By Larry Hale and Trevor McCasland.
1 1 Slide Simulation Professor Ahmadi. 2 2 Slide Simulation Chapter Outline n Computer Simulation n Simulation Modeling n Random Variables and Pseudo-Random.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache And Pefetch Buffers Norman P. Jouppi Presenter:Shrinivas Narayani.
1 Guard Channel CAC Algorithm For High Altitude Platform Networks Dung D. LUONG TRAN Minh Phuong Anh Tien V. Do.
D Nagesh Kumar, IIScOptimization Methods: M8L5 1 Advanced Topics in Optimization Evolutionary Algorithms for Optimization and Search.
Sporadic model building for efficiency enhancement of the hierarchical BOA Genetic Programming and Evolvable Machines (2008) 9: Martin Pelikan, Kumara.
Scaling Conway’s Game of Life. Why do parallelism? Speedup – solve a problem faster. Accuracy – solve a problem better. Scaling – solve a bigger problem.
Jaroslaw Kutylowski 1 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Maintaining Communication Between an Explorer and a Base.
Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.
Distributed Learning for Multi-Channel Selection in Wireless Network Monitoring — Yuan Xue, Pan Zhou, Tao Jiang, Shiwen Mao and Xiaolei Huang.
1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ; Johns Hopkins University 2003 by CRC Press LLC 2005/4/26.
A Study of Data Partitioning on OpenCL-based FPGAs Zeke Wang (NTU Singapore), Bingsheng He (NTU Singapore), Wei Zhang (HKUST) 1.
DSS & Warehousing Systems
OPERATING SYSTEMS CS 3502 Fall 2017
Discrete-time Markov chain (DTMC) State space distribution
4- Performance Analysis of Parallel Programs
Tan Hongbing, Liu Sheng†, Chen Haiyan School of National University of
Chapter 6: CPU Scheduling
CPU Scheduling G.Anuradha
Module 5: CPU Scheduling
System Performance: Queuing
Chapter 6: CPU Scheduling
Javad Ghaderi, Tianxiong Ji and R. Srikant
of the IEEE Distributed Coordination Function
Chapter 6: CPU Scheduling
Module 5: CPU Scheduling
Chapter 6: CPU Scheduling
Module 5: CPU Scheduling
Presentation transcript:

Interleaved Multi-Bank Scratchpad Memories: A Probabilistic Description of Access Conflicts DAC '15, June , 2015, San Francisco, CA, USA

Background Shared on-chip memory with multiple separately accessible banks having a common address space for all processors Advantage: efficient communication between processors Disadvantage: interference among the processors Solution: more banks, optimizing the address mapping

Address Mapping contiguous mapping pseudo-random mapping sequentially interleaved mapping (SIM) The aim of this work is to quantitatively evaluate the properties and characteristics of SIM systems.

Outline Background Problem definition Occupancy distribution Markov model Evaluation

Problem Definition We consider a platform with c processor cores and b independently accessible memory banks. the access probability and the sequential access probability. A denotes the random number of accesses requested in any given cycle, and I represents the number of banks serving accesses in any given cycle. Given c, b,, and, compute the distribution of the number I of memory banks serving accesses.

The classic occupancy distribution Actual memory accesses a, a=c

Adding access probabilities A follows the binomial distribution

Limitations of the model Sequential access patterns of the applications cannot be taken into account. It ignores the fact that accesses that cannot be immediately served are served in subsequent cycles, then interfering with new accesses.

Markov Model

Memory throughput by Markov steady state

Transition probabilities For a state s, the associated throughput is

Adding sequential access patterns

s  t 1. s  s’, one access request removed from each’s queue 2. s’  t, distributing new access requests

Adding access probabilities

Experiment evaluation gem5 ARM simulator. The GSM, FFT, blowfish, string search and JPEG examples were chosen to obtain a high diversity in behaviour.

Accuracy of the occupancy model For a small number of banks, the throughput is likely to be close to that number. For a sufficiently large number of banks, the number of waiting accesses is small. The maximum relative error is of 12.0% for b=8.

Benchmark Results

Conclusions from the occupancy model As long as the ratio of banks and cores is constant, a system can be arbitrarily scaled without changing the throughput expectation per bank or per core. The throughput converges exponentially with the product pa*r to the maximum value b. For pa*r <0.3, the throughput can be regarded as growing approximately linearly with pa.

Application example: System design System with c=16 cores and b=32 banks. System 1: interleaving over all 32 banks. System 2: interleaving for 16 banks + one “private” memory bank for each core.

Application example: System design System 2 performs better for

Discussion of the synchronisation effect For <0.4, the synchronisation effect is insignificant. Even the speedup from pseq=0 to pseq=1 is less than 5% in this system. There are only few cases in which performance is likely to be a decisive factor for opting for a SIM system rather than for pseudo- random mapping.