Software solutions for challenges in embedded systems Sri Hari Krishna Narayanan, The Pennsylvania State University, USA, Theme While.

Slides:



Advertisements
Similar presentations
Automatic Data Movement and Computation Mapping for Multi-level Parallel Architectures with Explicitly Managed Memories Muthu Baskaran 1 Uday Bondhugula.
Advertisements

Tuning of Loop Cache Architectures to Programs in Embedded System Design Susan Cotterell and Frank Vahid Department of Computer Science and Engineering.
A Process Splitting Transformation for Kahn Process Networks Sjoerd Meijer.
ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto
Computer Abstractions and Technology
A SOFTWARE-ONLY SOLUTION TO STACK DATA MANAGEMENT ON SYSTEMS WITH SCRATCH PAD MEMORY Arizona State University Arun Kannan 14 th October 2008 Compiler and.
1 SECURE-PARTIAL RECONFIGURATION OF FPGAs MSc.Fisnik KRAJA Computer Engineering Department, Faculty Of Information Technology, Polytechnic University of.
725/ASP-DAC Using Loop Invariants to Fight Soft Errors in Data Caches Sri Hari Krishna N., Seung Woo Son, Mahmut Kandemir, Feihui Li Department of.
November 5, 2007 ACM WEASEL Tech Efficient Time-Aware Prioritization with Knapsack Solvers Sara Alspaugh Kristen R. Walcott Mary Lou Soffa University of.
Sparse Computations: Better, Faster, Cheaper! Padma Raghavan Department of Computer Science and Engineering, The Pennsylvania State University
Green Cloud Computing Hadi Salimi Distributed Systems Lab, School of Computer Engineering, Iran University of Science and Technology,
March 18, 2008SSE Meeting 1 Mary Hall Dept. of Computer Science and Information Sciences Institute Multicore Chips and Parallel Programming.
Introduction To System Analysis and Design
Compiler Challenges, Introduction to Data Dependences Allen and Kennedy, Chapter 1, 2.
EECC551 - Shaaban #1 Spring 2004 lec# Static Compiler Optimization Techniques We already examined the following static compiler techniques aimed.
Multiscalar processors
EECC551 - Shaaban #1 Winter 2002 lec# Static Compiler Optimization Techniques We already examined the following static compiler techniques aimed.
Advanced Topics in Algorithms and Data Structures 1 Two parallel list ranking algorithms An O (log n ) time and O ( n log n ) work list ranking algorithm.
CS 7810 Lecture 15 A Case for Thermal-Aware Floorplanning at the Microarchitectural Level K. Sankaranarayanan, S. Velusamy, M. Stan, K. Skadron Journal.
Accelerating Machine Learning Applications on Graphics Processors Narayanan Sundaram and Bryan Catanzaro Presented by Narayanan Sundaram.
To GPU Synchronize or Not GPU Synchronize? Wu-chun Feng and Shucai Xiao Department of Computer Science, Department of Electrical and Computer Engineering,
Architecture, Implementation, and Testing Architecture and Implementation Prescriptive architecture vs. descriptive architecture Prescriptive architecture:
Advances in Language Design
Unit VI. Keil µVision3/4 IDE for 8051 Tool for embedded firmware development Steps for using keil.
Verification technique on SA applications using Incremental Model Checking 컴퓨터학과 신영주.
Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
“Low-Power, Real-Time Object- Recognition Processors for Mobile Vision Systems”, IEEE Micro Jinwook Oh ; Gyeonghoon Kim ; Injoon Hong ; Junyoung.
15-740/ Oct. 17, 2012 Stefan Muller.  Problem: Software is buggy!  More specific problem: Want to make sure software doesn’t have bad property.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
Event Driven Programming
The Program Development Cycle
INTRODUCTION Crusoe processor is 128 bit microprocessor which is build for mobile computing devices where low power consumption is required. Crusoe processor.
Exploiting Program Hotspots and Code Sequentiality for Instruction Cache Leakage Management J. S. Hu, A. Nadgir, N. Vijaykrishnan, M. J. Irwin, M. Kandemir.
1 Tuning Garbage Collection in an Embedded Java Environment G. Chen, R. Shetty, M. Kandemir, N. Vijaykrishnan, M. J. Irwin Microsystems Design Lab The.
Embedding Constraint Satisfaction using Parallel Soft-Core Processors on FPGAs Prasad Subramanian, Brandon Eames, Department of Electrical Engineering,
Top-Down Design and Modular Development. The process of developing methods for objects is mostly a process of developing algorithms; each method is an.
Program Development Cycle Modern software developers base many of their techniques on traditional approaches to mathematical problem solving. One such.
Divergence-Aware Warp Scheduling
CS 460/660 Compiler Construction. Class 01 2 Why Study Compilers? Compilers are important – –Responsible for many aspects of system performance Compilers.
6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)
CML SSDM: Smart Stack Data Management for Software Managed Multicores Jing Lu Ke Bai, and Aviral Shrivastava Compiler Microarchitecture Lab Arizona State.
Using Loop Invariants to Detect Transient Faults in the Data Caches Seung Woo Son, Sri Hari Krishna Narayanan and Mahmut T. Kandemir Microsystems Design.
High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 3: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf.
Plug-in Architectures Presented by Truc Nguyen. What’s a plug-in? “a type of program that tightly integrates with a larger application to add a special.
Temperature-Sensitive Loop Parallelization for Chip Multiprocessors Sri HK Narayanan, Guilin Chen, Mahmut Kandemir, Yuan Xie Embedded Mobile Computing.
Hybrid Multi-Core Architecture for Boosting Single-Threaded Performance Presented by: Peyman Nov 2007.
Workload Clustering for Increasing Energy Savings on Embedded MPSoCs S. H. K. Narayanan, O. Ozturk, M. Kandemir, M. Karakoy.
Using Custom Accelerators in Wireless Systems Alex Papakonstantinou, Deming Chen Illinois Center for Wireless Systems Wireless SoC Design Trends and Challenges.
Kandemir224/MAPLD Reliability-Aware OS Support for FPGA-Based Systems M. Kandemir, G. Chen, and F. Li Department of Computer Science & Engineering.
Ramya Prabhakar, Seung Woo Son, Christina Patrick, Sri Hari Krishna Narayanan, Mahmut Kandemir Pennsylvania State University 4th International IEEE Security.
Memory-Aware Compilation Philip Sweany 10/20/2011.
Secure Execution of Computations in Untrusted Hosts S. H. K. Narayanan 1, M.T. Kandemir 1, R.R. Brooks 2 and I. Kolcu 3 1 Embedded Mobile Computing Center.
Performance Aware Secure Code Partitioning Sri Hari Krishna Narayanan, Mahmut Kandemir, Richard Brooks Presenter : Sri Hari Krishna Narayanan.
Compiler-Directed Power Density Reduction in NoC-Based Multi-Core Designs Sri Hari Krishna Narayanan, Mahmut Kandemir, Ozcan Ozturk Embedded Mobile Computing.
Static Translation of Stream Program to a Parallel System S. M. Farhad The University of Sydney.
XRD data analysis software development. Outline  Background  Reasons for change  Conversion challenges  Status 2.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
Just-In-Time Compilation. Introduction Just-in-time compilation (JIT), also known as dynamic translation, is a method to improve the runtime performance.
Kandemir224/MAPLD Reliability-Aware OS Support for FPGA-Based Systems M. Kandemir, G. Chen, and F. Li Department of Computer Science & Engineering.
Parallelisation of Desktop Environments Nasser Giacaman Supervised by Dr Oliver Sinnen Department of Electrical and Computer Engineering, The University.
An Approach for Enhancing Inter- processor Data Locality on Chip Multiprocessors Guilin Chen and Mahmut Kandemir The Pennsylvania State University, USA.
EMBEDDED SYSTEMS S.HIMABINDU
Crusoe Processor Seminar Guide: By: - Prof. H. S. Kulkarni Ashish.
Advanced Computer Systems
Chapter 19: Architecture, Implementation, and Testing
Performance Optimization for Embedded Software
Event Driven Programming
Instruction Level Parallelism (ILP)
How to improve (decrease) CPI
Code Transformation for TLB Power Reduction
Presentation transcript:

Software solutions for challenges in embedded systems Sri Hari Krishna Narayanan, The Pennsylvania State University, USA, Theme While hardware solutions to challenges in embedded systems are very important, software can also play an important role. Software techniques to solve three important problems are presented here. Phase1 – Profiling #define N 5000 #define ITER 1int du1[N], du2[N], du3[N];int au1[N][N][2], au2[N][N][2], au3[N][N][2];int a11=1, a12=-1, a13=-1; int a21=2, a22=3, a23=-3; int a31=5, a32=-5, a33=-2; int l;/* Initialization loop */ int sig = 1;int main(){ int kx; int ky; int kz;printf("Thread:%d\n",mp_numthreads()); for(kx = 0; kx < N; kx = kx + 1) { for(ky = 0; ky < N; ky = ky + 1) { for(kz = 0; kz <= 1; kz = kz + 1) { au1[kx][ky][kz] = 1; au2[kx][ky][kz] = 1; au3[kx][ky][kz] = 1; } }} }} /* main */ Cycle Times Chunk Sizes Energy Consumption Architecture Details Temperature Sensitive Schedule + Scheduler HotSpot Phase 2 -Temperature Sensitive Scheduling Phase 3 – Locality Based Scheduling Temperature & Locality Sensitive Schedule Scheduler #define N 5000 #define ITER 1int du1[N], du2[N], du3[N];int au1[N][N][2], au2[N][N][2], au3[N][N][2];int a11=1, a12=-1, a13=-1; int a21=2, a22=3, a23=-3; int a31=5, a32=-5, a33=-2; int l;/* Initialization loop */ int sig = 1;int main(){ int kx; int ky; int kz;printf("Thread:%d\n",mp_numthreads()); for(kx = 0; kx < N; kx = kx + 1) { for(ky = 0; ky < N; ky = ky + 1) { for(kz = 0; kz <= 1; kz = kz + 1) { au1[kx][ky][kz] = 1; au2[kx][ky][kz] = 1; au3[kx][ky][kz] = 1; } }} }} /* main */ Optimized, temperature sensitive code + Code Generator Phase 4 - Code Generation Omega Library Client Server Original Code Transformed Code Data Transformed Results Original Results Client Server Original Code Original Results Original Results Original Code Data Original Results Traditional computing “Mobile” computing for(…) { } Loop Body Invariant Detector Invariant Invariant Filter Invariant Code Modifier for(…) { } Loop Body Checker Code Phase 1 Phase 2 Default Code Mapping Module #define N 5000 #define ITER 1int du1[N], du2[N], du3[N];int au1[N][N][2], au2[N][N][2], au3[N][N][2];int a11=1, a12=-1, a13=-1; int a21=2, a22=3, a23=-3; int a31=5, a32=-5, a33=- 2; int l;/* Initialization loop */ int sig = 1;int main(){ int kx; int ky; int kz;printf("Thread:%d\n",mp_numthreads()); for(kx = 0; kx < N; kx = kx + 1) { for(ky = 0; ky < N; ky = ky + 1) { for(kz = 0; kz <= 1; kz = kz + 1) { au1[kx][ky][kz] = 1; au2[kx][ky][kz] = 1; au3[kx][ky][kz] = 1; } }} }} /* main */ Code ILP Module Default (performance oriented) Mapping Overall power density reduced mapping Thermal aware mapping 0% 20% 40% 60% 80% 100% iter-merge adi heap-sort bubble-sort mxm Error Detection Rate Our approachFull duplication Securing code semantics Soft Error Detection Temperature crisis alleviation in CMPs and NoCs Challenge : To prevent chips from overheating Solution : 1. On chip temperature rises when the chip is run continuously at high power density. 2. Develop a schedule that allows processors to cool down in between executing portions of the task. 3. Such a schedule is possible because applications do not make use of all the processor cores available. Challenge : To prevent NoC based chips from overheating Solution : 1. On chip temperature rises through high power density. 2. Applications typically do not use all the available processor cores 3. Design a task to processor mapping that reduces the power density. Challenge : To prevent an untrusted host from gleaning the semantics of the clients code. i.e. to prevent reverse engineering. Solution : Translation of the original code produces a new code that: a) is semantically different, b) accesses data in a different pattern, and c) whose stores take place to different locations. “Mobile” computing with code level transformations Challenge : To detect single event upsets in the data caches. Solution : 1.Invariants are data properties that must hold throughout the execution. 2.Checking to see if they hold true during execution, gives and indication of whether the execution was successful of if a soft error occurred. The graphs show how the peak on chip temperature is always maintained below a preset threshold that is indicated by the solid line. Percentage of Execution during which a thermal emergency is experienced. Normalized execution time. The soft error detection rate The increase in code size due to checker code References 1.Sri Hari Krishna N., M. Kandemir, R. R. Brooks, I. Kolcu. Secure Execution of Computations in Untrusted Hosts, in the proceedings of ADA-EUROPE Sri Hari Krishna Narayanan, Mahmut Kandemir. Compiler-Directed Power Density Reduction in NoC- Based Multi-Core Designs, in the proceedings of ISQED Sri Hari Krishna Narayanan, Guilin Chen, Mahmut Kandemir, Yuan Xie. Temperature-Sensitive Loop Parallelization for Chip Multiprocessors, in the proceedings of ICCD Sri Hari Krishna N., Seung Woo Son, M. Kandemir, Feihui Li. Using Loop Invariants to Fight Soft Errors in Data Caches, in the proceedings of ASP-DAC 2005.