Memory Access Scheduling and Binding Considering Energy Minimization in Multi- Bank Memory Systems Chun-Gi Lyuh, Taewhan Kim DAC 2004, June 7-11, 2004.

Slides:



Advertisements
Similar presentations
February 20, Spatio-Temporal Bandwidth Reuse: A Centralized Scheduling Mechanism for Wireless Mesh Networks Mahbub Alam Prof. Choong Seon Hong.
Advertisements

Energy-efficient Task Scheduling in Heterogeneous Environment 2013/10/25.
1 Presenter: Chien-Chih Chen. 2 Dynamic Scheduler for Multi-core Systems Analysis of The Linux 2.6 Kernel Scheduler Optimal Task Scheduler for Multi-core.
Towards Virtual Routers as a Service 6th GI/ITG KuVS Workshop on “Future Internet” November 22, 2010 Hannover Zdravko Bozakov.
Zheming CSCE715.  A wireless sensor network (WSN) ◦ Spatially distributed sensors to monitor physical or environmental conditions, and to cooperatively.
Chia-Yen Hsieh Laboratory for Reliable Computing Microarchitecture-Level Power Management Iyer, A. Marculescu, D., Member, IEEE IEEE Transaction on VLSI.
Analysis of power dissipation in embedded systems using real-time operating systems Dick, R.P. Lakshminarayana, G. Raghunathan, A. Jha, N.K. Dept. of Electr.
Improving the Efficiency of Memory Partitioning by Address Clustering Alberto MaciiEnrico MaciiMassimo Poncino Proceedings of the Design,Automation and.
Data Partitioning for Reconfigurable Architectures with Distributed Block RAM Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.
Array Allocation Taking into Account SDRAM Characteristics Hong-Kai Chang Youn-Long Lin Department of Computer Science National Tsing Hua University HsinChu,
HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms.
DAC 2001: Paper 18.2 Center for Embedded Computer Systems, UC Irvine Center for Embedded Computer Systems University of California, Irvine
Center for Embedded Computer Systems University of California, Irvine and San Diego SPARK: A Parallelizing High-Level Synthesis.
Compiler Research in HPC Lab R. Govindarajan High Performance Computing Lab.
Course Outline DayContents Day 1 Introduction Motivation, definitions, properties of embedded systems, outline of the current course How to specify embedded.
Advanced Energy Management in Cloud Computing multi data center environments Giuliana Carello, DEI, Politecnico di Milano Danilo.
COLLABORATIVE EXECUTION ENVIRONMENT FOR HETEROGENEOUS PARALLEL SYSTEMS Aleksandar Ili´c, Leonel Sousa 2010 IEEE International Symposium on Parallel & Distributed.
VOLTAGE SCHEDULING HEURISTIC for REAL-TIME TASK GRAPHS D. Roychowdhury, I. Koren, C. M. Krishna University of Massachusetts, Amherst Y.-H. Lee Arizona.
CuMAPz: A Tool to Analyze Memory Access Patterns in CUDA
Low-Power Wireless Sensor Networks
Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:
1 Using Multiple Energy Gears in MPI Programs on a Power- Scalable Cluster Vincent W. Freeh, David K. Lowenthal, Feng Pan, and Nandani Kappiah Presented.
Low-Power Gated Bus Synthesis for 3D IC via Rectilinear Shortest-Path Steiner Graph Chung-Kuan Cheng, Peng Du, Andrew B. Kahng, and Shih-Hung Weng UC San.
Energy Aware Task Mapping Algorithm For Heterogeneous MPSoC Based Architectures Amr M. A. Hussien¹, Ahmed M. Eltawil¹, Rahul Amin 2 and Jim Martin 2 ¹Wireless.
CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.
Storage Allocation for Embedded Processors By Jan Sjodin & Carl von Platen Present by Xie Lei ( PLS Lab)
Maximum Network Lifetime in Wireless Sensor Networks with Adjustable Sensing Ranges Cardei, M.; Jie Wu; Mingming Lu; Pervaiz, M.O.; Wireless And Mobile.
1 Exploring Custom Instruction Synthesis for Application-Specific Instruction Set Processors with Multiple Design Objectives Lin, Hai Fei, Yunsi ACM/IEEE.
CUDA Optimizations Sathish Vadhiyar Parallel Programming.
A Graph Based Algorithm for Data Path Optimization in Custom Processors J. Trajkovic, M. Reshadi, B. Gorjiara, D. Gajski Center for Embedded Computer Systems.
Embedding Constraint Satisfaction using Parallel Soft-Core Processors on FPGAs Prasad Subramanian, Brandon Eames, Department of Electrical Engineering,
3 rd Nov CSV881: Low Power Design1 Power Estimation and Modeling M. Balakrishnan.
Garo Bournoutian and Alex Orailoglu Proceedings of the 45th ACM/IEEE Design Automation Conference (DAC’08) June /10/28.
USC Search Space Properties for Pipelined FPGA Applications University of Southern California Information Sciences Institute Heidi Ziegler, Mary Hall,
An Operation Rearrangement Technique for Low-Power VLIW Instruction Fetch Dongkun Shin* and Jihong Kim Computer Architecture Lab School of Computer Science.
Resource Mapping and Scheduling for Heterogeneous Network Processor Systems Liang Yang, Tushar Gohad, Pavel Ghosh, Devesh Sinha, Arunabha Sen and Andrea.
2013/12/09 Yun-Chung Yang Partitioning and Allocation of Scratch-Pad Memory for Priority-Based Preemptive Multi-Task Systems Takase, H. ; Tomiyama, H.
6. Application mapping 6.1 Problem definition
Task Graph Scheduling for RTR Paper Review By Gregor Scott.
6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)
TSV-Constrained Micro- Channel Infrastructure Design for Cooling Stacked 3D-ICs Bing Shi and Ankur Srivastava, University of Maryland, College Park, MD,
1 Lic Presentation Memory Aware Task Assignment and Scheduling for Multiprocessor Embedded Systems Radoslaw Szymanek / Embedded System Design
Energy-Aware Scheduling for Aperiodic Tasks on Multi-core Processors Dawei Li and Jie Wu Department of Computer and Information Sciences Temple University,
OPTIMIZING DSP SCHEDULING VIA ADDRESS ASSIGNMENT WITH ARRAY AND LOOP TRANSFORMATION Chun Xue, Zili Shao, Ying Chen, Edwin H.-M. Sha Department of Computer.
Power Analysis of Embedded Software : A Fast Step Towards Software Power Minimization 指導教授 : 陳少傑 教授 組員 : R 張馨怡 R 林秀萍.
Mapping of Regular Nested Loop Programs to Coarse-grained Reconfigurable Arrays – Constraints and Methodology Presented by: Luis Ortiz Department of Computer.
Outline Motivation and Contributions Related Works ILP Formulation
Physically Aware HW/SW Partitioning for Reconfigurable Architectures with Partial Dynamic Reconfiguration Sudarshan Banarjee, Elaheh Bozorgzadeh, Nikil.
High Performance Embedded Computing © 2007 Elsevier Lecture 10: Code Generation Embedded Computing Systems Michael Schulte Based on slides and textbook.
Task Mapping and Partition Allocation for Mixed-Criticality Real-Time Systems Domițian Tămaș-Selicean and Paul Pop Technical University of Denmark.
Area-Efficient Instruction Set Synthesis for Reconfigurable System on Chip Designs Philip BriskAdam KaplanMajid Sarrafzadeh Embedded and Reconfigurable.
Determining Optimal Processor Speeds for Periodic Real-Time Tasks with Different Power Characteristics H. Aydın, R. Melhem, D. Mossé, P.M. Alvarez University.
RT-OPEX: Flexible Scheduling for Cloud-RAN Processing
OPERATING SYSTEMS CS 3502 Fall 2017
Memory Segmentation to Exploit Sleep Mode Operation
Sathish Vadhiyar Parallel Programming
Andrea Acquaviva, Luca Benini, Bruno Riccò
Operating Systems (CS 340 D)
Parallel Programming By J. H. Wang May 2, 2017.
ABSTRACT   Recent work has shown that sink mobility along a constrained path can improve the energy efficiency in wireless sensor networks. Due to the.
Babak Sorkhpour, Prof. Roman Obermaisser, Ayman Murshed
Operating Systems (CS 340 D)
Digital Processing Platform
Compiler Supports and Optimizations for PAC VLIW DSP Processors
Reiley Jeyapaul and Aviral Shrivastava Compiler-Microarchitecture Lab
Jinquan Dai, Long Li, Bo Huang Intel China Software Center
Market-based Dynamic Task Allocation in Mobile Surveillance Systems
Research on Embedded Hypervisor Scheduler Techniques
Code Transformation for TLB Power Reduction
CS703 - Advanced Operating Systems
Presentation transcript:

Memory Access Scheduling and Binding Considering Energy Minimization in Multi- Bank Memory Systems Chun-Gi Lyuh, Taewhan Kim DAC 2004, June 7-11, 2004 San Diego, California, USA

2015/7/14 Abstract  Memory-related activity is one of the major sources of energy consumption in embedded systems. Many types of memories used in embedded systems allow multiple operating modes (e.g., active, standby, map, power-down) to facilitate energy saving increases when the embedded systems use multiple memory banks in which their operating modes are controlled independently. In this paper, we propose integrated approach to the problem of maximally utilizing the operating modes of multiple memory banks by solving the three important tasks simultaneously: (1)assignmemt of variables to memory banks, (2)scheduling of memory access operations, and (3)determination of operating modes of banks. Specifically, for an instance of tasks 1 and 2, we formulate task 3 as a shortest path (SP) problem in a network and solved it optimally.

2015/7/14 Abstract (cont.)  We then develop an SP-based heuristic that solves tasks 2 and 3 efficiently in an integrated fashion. We then extend the proposed approach to address the limited register constraint in processor. From experiments with a set of benchmark programs, we confirm that the proposed approach is able to reduce the energy consumption by 15.76% over that by the conventional greedy approach.

2015/7/14 Outline  Abstract  What’s the problem  Related work  Motivating example  Problem formulation  Problem approach  Optimal cost computation SP-based network formulation Fast incremental SP computation  Consideration of register constraint  Experiment results  Conclusion

2015/7/14 What’s the problem  Lowering down the energy consumption in SoC design  Used better architecture to meet the performance (not for energy requirement)

2015/7/14 Related works  Parallelizing variable accesses among multiple memory bands: [1],[2]  Automatic data migration to reduce energy consumption by exploiting the temporal affinity among data: [3]  Low-energy partitioning for a given schedule of memory access in code: [4]  Execution profile based energy-optimal algo. For the automatic partitioning of on-chip SRAMs: [5]  Low-power task scheduling technique for multiple devices: [6]

2015/7/14 Motivating example

2015/7/14 Motivating example (cont.)

2015/7/14 Problem approach  Optimal cost computation  SP-based network formulation  Fast incremental SP computation  Consideration of register constraint

2015/7/14 Problem formulation  S(G): schedule of memory access in data-flow graph  B(S): variables according to S(G)  f mode (S,B,M i,c j ): operating mode  E(f mode (S,B,M i,c j )): energy consumed

2015/7/14 Problem formulation example (cont.)  B: {M1: i, a, index, y, temp; M2: x, k}  S: {M1:Load_i→Load_a → St_index →Load_y → St_temp; M2: Load_x → Load_k}

2015/7/14 SP-based network formulation  Initial phase: initial result of S and B  Refinement phase: divide scheduling task into two subtask:  r-scheduling: relative order to a bank  a-scheduling: absolute by r-scheduling  Definition: Switch: M1C i M2C j Move:M1C i M2C j

2015/7/14 SP-based network formulation (cont.)

2015/7/14 SP-based network formulation (cont.)

2015/7/14 SP-based network formulation (cont.)

2015/7/14 SP-based network  [a , b] : x = 11.54nJ  O(N^2)

2015/7/14 Fast incremental SP computation  O(N) ?

2015/7/14 Consideration of register constraint

2015/7/14 Experiment results

2015/7/14 Conclusion  Utilizing the operating modes  assigning variables to memory banks  scheduling of memory access operations  determining operating modes of banks SP(shortest path)-based integrated technique