Performance Aware Secure Code Partitioning Sri Hari Krishna Narayanan, Mahmut Kandemir, Richard Brooks Presenter : Sri Hari Krishna Narayanan.

Slides:



Advertisements
Similar presentations
U of Houston – Clear Lake
Advertisements

An Array-Based Algorithm for Simultaneous Multidimensional Aggregates By Yihong Zhao, Prasad M. Desphande and Jeffrey F. Naughton Presented by Kia Hall.
Fast Algorithms For Hierarchical Range Histogram Constructions
Design Guidelines for Maximizing Lifetime and Avoiding Energy Holes in Sensor Networks with Uniform Distribution and Uniform Reporting Stephan Olariu Department.
Starting Parallel Algorithm Design David Monismith Based on notes from Introduction to Parallel Programming 2 nd Edition by Grama, Gupta, Karypis, and.
Chapter 7 – Classification and Regression Trees
Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.
Computability Start complexity. Motivation by thinking about sorting. Homework: Finish examples.
CS 171: Introduction to Computer Science II
1 Lecture 11 Sorting Parallel Computing Fall 2008.
Message Passing Fundamentals Self Test. 1.A shared memory computer has access to: a)the memory of other nodes via a proprietary high- speed communications.
Locality-Aware Request Distribution in Cluster-based Network Servers 1. Introduction and Motivation --- Why have this idea? 2. Strategies --- How to implement?
Efficient Parallelization for AMR MHD Multiphysics Calculations Implementation in AstroBEAR.
Parallel Computation in Biological Sequence Analysis Xue Wu CMSC 838 Presentation.
A Resource-level Parallel Approach for Global-routing-based Routing Congestion Estimation and a Method to Quantify Estimation Accuracy Wen-Hao Liu, Zhen-Yu.
DOMAIN NAME SYSTEM. Introduction  There are several applications that follow client server paradigm.  The client/server programs can be divided into.
Algorithms for Self-Organization and Adaptive Service Placement in Dynamic Distributed Systems Artur Andrzejak, Sven Graupner,Vadim Kotov, Holger Trinks.
Data Structures and Algorithms Session 13 Ver. 1.0 Objectives In this session, you will learn to: Store data in a tree Implement a binary tree Implement.
A Node-Centric Load Balancing Algorithm for Wireless Sensor Networks Hui Dai, Richar Han Department of Computer Science University of Colorado at Boulder.
– 1 – Basic Machine Independent Performance Optimizations Topics Load balancing (review, already discussed) In the context of OpenMP notation Performance.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
Chapter 1 Algorithm Analysis
Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Summary of Contributions Background: MapReduce and FREERIDE Wavelet.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:
Chapter 9 – Classification and Regression Trees
Lecture 10 Trees –Definiton of trees –Uses of trees –Operations on a tree.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Ver Chapter 10: Trees Data Abstraction & Problem Solving with C++
April 14, 2003Hang Cui, Ji-Rong Wen and Tat- Seng Chua 1 Hierarchical Indexing and Flexible Element Retrieval for Structured Document Hang Cui School of.
Chapter 10-A Trees Modified
Sieve of Eratosthenes by Fola Olagbemi. Outline What is the sieve of Eratosthenes? Algorithm used Parallelizing the algorithm Data decomposition options.
COMP20010: Algorithms and Imperative Programming Lecture 1 Trees.
Parallel dynamic batch loading in the M-tree Jakub Lokoč Department of Software Engineering Charles University in Prague, FMP.
Autonomic scheduling of tasks from data parallel patterns to CPU/GPU core mixes Published in: High Performance Computing and Simulation (HPCS), 2013 International.
Record Linkage in a Distributed Environment
Eliminating affinity tests and simplifying shared accesses in UPC Rahul Garg*, Kit Barton*, Calin Cascaval** Gheorghe Almasi**, Jose Nelson Amaral* *University.
Software solutions for challenges in embedded systems Sri Hari Krishna Narayanan, The Pennsylvania State University, USA, Theme While.
Using Loop Invariants to Detect Transient Faults in the Data Caches Seung Woo Son, Sri Hari Krishna Narayanan and Mahmut T. Kandemir Microsystems Design.
OPTIMIZING DSP SCHEDULING VIA ADDRESS ASSIGNMENT WITH ARRAY AND LOOP TRANSFORMATION Chun Xue, Zili Shao, Ying Chen, Edwin H.-M. Sha Department of Computer.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University.
Pipelined and Parallel Computing Partition for 1 Hongtao Du AICIP Research Nov 3, 2005.
Computer Science 320 Load Balancing. Behavior of Parallel Program Why do 3 threads take longer than two?
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
Foundation of Computing Systems Lecture 4 Trees: Part I.
BINARY TREES Objectives Define trees as data structures Define the terms associated with trees Discuss tree traversal algorithms Discuss a binary.
Data Structure and Algorithms
Parallel Computing Presented by Justin Reschke
Genetic algorithms for task scheduling problem J. Parallel Distrib. Comput. (2010) Fatma A. Omara, Mona M. Arafa 2016/3/111 Shang-Chi Wu.
Process by Dr. Amin Danial Asham. References Operating System Concepts ABRAHAM SILBERSCHATZ, PETER BAER GALVIN, and GREG GAGNE.
Department of Computer Science, Johns Hopkins University Lecture 7 Finding Concurrency EN /420 Instructor: Randal Burns 26 February 2014.
High Performance Computing Seminar
Talal H. Noor, Quan Z. Sheng, Lina Yao,
Trees Chapter 15.
Recursive Objects (Part 4)
Controlled Kernel Launch for Dynamic Parallelism in GPUs
Week 6 - Wednesday CS221.
Task Scheduling for Multicore CPUs and NUMA Systems
Section 8.1 Trees.
i206: Lecture 13: Recursion, continued Trees
Map interface Empty() - return true if the map is empty; else return false Size() - return the number of elements in the map Find(key) - if there is an.
Spare Register Aware Prefetching for Graph Algorithms on GPUs
Text Categorization Berlin Chen 2003 Reference:
Trees.
Chapter 20: Binary Trees.
The Ohio State University
Binary Trees.
Kostas Kolomvatsos, Christos Anagnostopoulos
Communication Driven Remapping of Processing Element (PE) in Fault-tolerant NoC-based MPSoCs Chia-Ling Chen, Yen-Hao Chen and TingTing Hwang Department.
Software Design Lecture : 36.
Mattan Erez The University of Texas at Austin
Presentation transcript:

Performance Aware Secure Code Partitioning Sri Hari Krishna Narayanan, Mahmut Kandemir, Richard Brooks Presenter : Sri Hari Krishna Narayanan

2 Outline Introduction to secure code partitioning Motivation through multi level security Our Code Partitioning Domain Workload Balancing Algorithm Example Results

3 Introduction Secure code partitioning is a process that partitions code and data among several mutually untrusted hosts that need to co-operate to complete a task in parallel. Original Application and data H0H0 H1H1 H2H2 H3H3 H4H4 Compiler thread Authenticated trust declarations

4 Why performance aware? - Secure code partitioning when performed in a performance agnostic manner can lead to skewed load across the hosts. Let us look at Multi-Level-Security (MLS) Uses qualifiers on data to classify them according to their sensitivity level. Uses qualifiers on hosts to classify them according to their capability level. Motivation – Multi level security Classic MLS lattice of 4 levels unclassified < confidential < secret < top secret Data A : (CONFIDENTIAL) Data B : (SECRET) Data C : (UNCLASSIFIED) Categories – Used to separate data COMINT, HUMINT, ELINT Data A : (CONFIDENTIAL, {ELINT}) Data B : (SECRET, {COMINT}) Data C : (UNCLASSIFIED, {HUMINT})

5 Multilevel Security Host A : (CONFIDENTIAL, {COMINT, HUMINT}) Host B : (CONFIDENTIAL, {ELINT}) Host C : (TOP SECRET, {COMINT, HUMINT, ELINT}) Data A : (CONFIDENTIAL, {ELINT}) Data B : (SECRET, {COMINT}) Data C : (UNCLASSIFIED, {HUMINT}) Data AData BData C COMINTHUMINTELINT TOP SECRET SECRET CONFIDENTIAL UNCLASSIFIED

6 Multilevel Security Host A : (CONFIDENTIAL, {COMINT, HUMINT}) Host B : (CONFIDENTIAL, {ELINT}) Host C : (TOP SECRET, {COMINT, HUMINT, ELINT}) Data A : (CONFIDENTIAL, {ELINT}) Data B : (SECRET, {COMINT}) Data C : (UNCLASSIFIED, {HUMINT}) Host C Host B Host C Host A Host C Data AData BData C COMINTHUMINTELINT TOP SECRET SECRET CONFIDENTIAL UNCLASSIFIED

7 Our Code Partitioning Domain The aim is to partition a given task evenly among a set of hosts. However, different hosts have access to different portions of the data. Further, hosts are hierarchically related. E.g. Host 1 can access all the data that Host 4 and Host 3 can. H2H2 H0H0 H1H1 H4H4 H3H3 H0H0 H2H2 H1H1 H4H4 H3H3 Data DecompositionsHost Hierarchy Tree

8 Compiler thread Our Code Partitioning Domain Original Application and Data H0H0 H1H1 H2H2 H3H3 H4H4 H2H2 H0H0 H1H1 H4H4 H4H4 H0H0 H2H2 H1H1 H4H4 H3H3

9 Workload Balancing Challenges There are three challenges Representing the data and computation that can be performed on them –In this work, we target codes that are structured as a series of loops that access data. –So the unit of workload distribution is a loop iteration. Calculating the initial/default workload on each host Reassigning the workload of the hosts –Three algorithms

10 Determining the data and the iterations Data accessed by a host in an iteration All iterations that access a particular data object on a host that may be accessed by a host All the iterations that may be executed on a host Default iterations that are executed on a host

11 Reassigning the workload The ideal average, N avg is calculated. BottomToTop allocates to every host, at most N avg iterations. TopToBottom increases the allowed number of iterations for unbalanced hosts. 1: N avg := Totalnumberofiterations / Totalnumberofhosts 2: BottomToTop(h root,N avg ) 3: while carryout(h root, I k ) > 0 do 4: N avg := N avg + N avg * 0.1 5: TopToBottom(h root,N avg, 0) 6: end while ReassignHHT()

12 Bottom To Top 1: for all H i in HHT visited in post-order fashion do 2: if H i is unbalanced then 3: Pass on iterations to the parent node 4: else 5: if H i is a leaf then 6: Mark H i as balanced 7: else 8: Temp:= N avg –H i ’ iterations 9: H i accepts Temp number of iterations from H i ’s children 10: end if 11: end if 12: end for

13 Top To Bottom 1: for all hosts in the HHT in pre-order fashion do 2: if H i is balanced then 3: Return 4: end if 5: if the increased iterations available to H i ’s parent balance Hi then 6: Mark H i as balanced 7: Return 8: else 9: carry reduce := 0 10: if the increased iterations available to H i balance it then 11: Mark H i as balanced 12: Return 13: else 14: Balance Hi as much as allowed. 15: Call TopToBottom on Hi’s children recursively if it is required to balance them. 16: end if 17: end if 18: end for

14 Example Based on the Gauss Seidel method H4H4 H1H1 H2H2 H0H0 H5H5 H6H6 H7H7 H3H3 H4H4 H1H1 H2H2 H0H0 H5H5 H6H6 H7H7 H3H3 for(i = 2 to N-1) for(j = 2 to N-1) B[i, j] := (A[i -1,j]+ A[i + 1,j] + A[i, j -1]+ A[i, j + 1]) * 1/ά ; endfor H2H2 H1H1 H3H3 H6H6 H7H7 H4H4 H2H2 H5H5 Array A Array BHHT

15 Example continued H0H0 H1H1 H3H3 H6H6 H7H7 H4H4 H2H2 H5H5 HHT

16 Example H 0 30 H 1 80 H 3 20 H 6 40 H 7 40 H 4 40 H 2 80 H 5 70 Assignment of initial iterations. N avg = 400/8 = 50 Maximum load on any node is 80. While the average is 50.

17 Example – Operation of BottomtoTop H 0 50 H 1 50 H 3 20 H 6 40 H 7 40 H 4 40 H 2 50 H Navg = 400/8 = > N avg ? H 1 80 H H 5 70 H 2 80

18 Example – Operation of BottomtoTop H 0 50 H 1 50 H 3 20 H 6 40 H 7 40 H 4 40 H 2 50 H Navg = 400/8 = 50 H 1 50 H 0 50

19 Example Toptobottom H 0 55 H 1 55 H 3 20 H 6 40 H 7 40 H 4 40 H 2 55 H H H 1 50 H H Balanced Increase allowed Load, N avg to 55

20 Example continued H 0 55 H 1 55 H 3 20 H 6 40 H 7 40 H 4 40 H 2 55 H

21 Example continued H 0 60 H 1 60 H 3 20 H 6 40 H 7 40 H 4 40 H 2 60 H Increase allowed load to 60

22 Example continued H 0 66 H 1 60 H 3 20 H 6 40 H 7 40 H 4 40 H 2 66 H Increase allowed load to 66

23 Example continued H 0 68 H 1 60 H 3 20 H 6 40 H 7 40 H 4 40 H 2 66 H Increase allowed load to 72 The HHT is now ‘balanced’ at the root node. Maximum load on any node is 68 (from 80).

24 Experimental Results Two metrics were studied for multiple HHTs Execution Time (EXE) Standard deviation of workload (STD) Two scenarios were studied Default data decomposition + multiple HHTs Default HHT + multiple decompositions

25 Experimental Results – (1/2) Default HHT + multiple decompositions Overall finish time for the different data decompositions with a default HHT. STD for the different data decompositions with a default HHT.

26 Experimental Results – (2/2) Default HHT + multiple decompositions Overall finish time for the different HHT with a default data decomposition STD for the different HHTs with a default data decompostion

27 Conclusion Showed that load balancing is required in secure code partitioning Proposed performance aware secure code partitioning to reduces the overall finish time. Better load balancing compared to the original method is achieved as well.

Thank you Sri Hari Krishna Narayanan