Download presentation
Presentation is loading. Please wait.
1
Security-Driven Heuristics and A Fast Genetic Algorithm for Trusted Grid Job Scheduling Shanshan Song, Ricky Kwok, and Kai Hwang University of Southern California Los Angeles, CA 90089 USA Presented by Shanshan Song at the IEEE IPDPS’05, Denver, Colorado, April 6, 2005 The work was supported by the NSF ITR Grant 0325409
2
http://GridSec.usc.edu April 6, 2005 2 Presentation Outline: Motivations Motivations The System Model The System Model Three security-driven scheduling strategies To bind security to existing time-driven heuristics for parallel job scheduling A New Space-Time Genetic Algorithm (STGA) Performance Metrics and Workloads Performance Metrics and Workloads NAS and PSA Benchmark Results NAS and PSA Benchmark Results Conclusions Conclusions
3
http://GridSec.usc.edu April 6, 2005 3 Motivations Highly shared Grid resources create severe insecurity problems and privacy concerns. Highly shared Grid resources create severe insecurity problems and privacy concerns. Most schedulers ignored the ‘risky’ factor when scheduling large number of jobs in a risky real-life Grid environment. Most schedulers ignored the ‘risky’ factor when scheduling large number of jobs in a risky real-life Grid environment.
4
http://GridSec.usc.edu April 6, 2005 4 … Deterministic … Adaptive Historical database … Security - Driven Model: High secure Low secure site High demand Low demand job Parallel Job Scheduling Scenario in Risky Computational Grids
5
http://GridSec.usc.edu April 6, 2005 5 (a) Secure (b) Risky (c) f - Risky Historical database The bad thing always could happen --- Murphy’s Law We are scared, Let us just wait … We don’t care, just do it. I am courageous, not a kid anymore … I calculate too, maybe I am lucky … I run a calculated risk, but wait a while …
6
http://GridSec.usc.edu April 6, 2005 6 Three Scheduling Modes Secure mode – Allocate jobs only to those Grid sites with security level exceeding the job requirement (SD < SL) Secure mode – Allocate jobs only to those Grid sites with security level exceeding the job requirement (SD < SL) Risky mode – Allocate jobs to any available Grid sites without checking the risk level or the job demand Risky mode – Allocate jobs to any available Grid sites without checking the risk level or the job demand f - risky mode – Allocate jobs to those Grid sites taking at most f risk. E.g.: f = 0.5 (50%) f - risky mode – Allocate jobs to those Grid sites taking at most f risk. E.g.: f = 0.5 (50%) Secure f -Risky Risky Risk Scale: 0 f 100% The Failure Model:
7
http://GridSec.usc.edu April 6, 2005 7 Scheduling Heuristics under Three Risky Modes Min-Min heuristic: Min-Min heuristic: For each job, the resource site that gives the earliest expected completion time is determined first. The job that has the minimum earliest expected completion time is determined and then assigned to the corresponding site. Sufferage heuristic: Sufferage heuristic: The Sufferage heuristic is based on the idea that better mappings can be generated by assigning a site to a job that would “suffer” most in terms of expected completion time if that particular site is not assigned to it. Heuristic operational modes: Heuristic operational modes: Secure, f - Risky, Risky
8
http://GridSec.usc.edu April 6, 2005 8 Genetic Algorithm (GA) Genetic Algorithm (GA) is a popular technique used for searching large solution spaces Genetic Algorithm (GA) is a popular technique used for searching large solution spaces It is powerful for generating good solution It is not widely deployed for its long computation time Number of Evolution Iterations Solution Quality Generate Random Initial Population Good Solution is found STGA Starting Point GA STGA Traditional GA vs. STGA in term of Number of Evolution Iterations
9
http://GridSec.usc.edu April 6, 2005 9 How does STGA Work? STGA: Space-Time Genetic Algorithm InputSolution (%,**, ###) (423…56) …… (%,****, ###) (368…89) Lookup Table (%%, ***, ####) One batch of jobs (456 … 34) … (167 … 89) Randomly Generated Solutions (123 … 786) GA Final Solution Initial Population
10
http://GridSec.usc.edu April 6, 2005 10 STGA Convergence Time Converge at 50 iterations, FAST!!!
11
http://GridSec.usc.edu April 6, 2005 11 Performance Metrics and Workloads Performance Metrics Performance Metrics Makespan, slowdown ratio, and average response time Site utilization Number of failed jobs & number of risk-taking jobs Numerical Aerodynamic Simulation (NAS) Workload Numerical Aerodynamic Simulation (NAS) Workload A package contains three months worth of sanitized accounting records for the 128-node iPSC/860 located in the Numerical Aerodynamic Simulation (NAS) Systems Division at NASA Ames Research Center. Parameter Sweep Application (PSA) Workload Parameter Sweep Application (PSA) Workload Contains a set of independent tasks Each task has some input files for different parameters
12
http://GridSec.usc.edu April 6, 2005 12 Performance Results (Makespan) NAS trace workload (16000 jobs, 12 sites) NAS trace workload (16000 jobs, 12 sites) Job arrival rate and workload are from trace data Job arrival rate and workload are from trace data STGA evolution iterations: 100 (GA: STGA evolution iterations: 100 (GA: 1000 iterations)
13
http://GridSec.usc.edu April 6, 2005 13 Performance Results (Response Time) NAS trace workload (16000 jobs, 12 sites) NAS trace workload (16000 jobs, 12 sites) Job arrival rate and workload are from trace data Job arrival rate and workload are from trace data
14
http://GridSec.usc.edu April 6, 2005 14 Performance Results (Utilization) NAS trace workload (16000 jobs, 12 sites) NAS trace workload (16000 jobs, 12 sites) Job arrival rate and workload are from trace data Job arrival rate and workload are from trace data
15
http://GridSec.usc.edu April 6, 2005 15 Scalability Analysis The scalability analysis is conducted on Number of simulated jobs (PSA workload) The scalability analysis is conducted on Number of simulated jobs (PSA workload) N = 1000, 2000, 5000, and 10000
16
http://GridSec.usc.edu April 6, 2005 16 Conclusions Security binding technique can be applied to improve any time-driven heuristics for online scheduling of parallel jobs in an open risky Grid computing environment. Security binding technique can be applied to improve any time-driven heuristics for online scheduling of parallel jobs in an open risky Grid computing environment. The new STGA algorithm works by swiftly generating good scheduling solutions based on a prior job execution experience on Grid platforms. Both NAS and PSA benchmark results show the superiority of STGA over the heuristics algorithms applied. The new STGA algorithm works by swiftly generating good scheduling solutions based on a prior job execution experience on Grid platforms. Both NAS and PSA benchmark results show the superiority of STGA over the heuristics algorithms applied.
17
http://GridSec.usc.edu April 6, 2005 17 Min-Min and Sufferage Heuristics Min-Min heuristics: Min-Min heuristics: For each job, the resource site that gives the earliest expected completion time is determined first. The job that has the minimum earliest expected completion time is determined and then assigned to the corresponding site. Sufferage heuristics: Sufferage heuristics: The Sufferage heuristic is based on the idea that better mappings can be generated by assigning a site to a job that would “suffer” most in terms of expected completion time if that particular site is not assigned to it. Job1Job2Job3 Site1357 Site2243 Site36910 Expected Time to Complete Matrix Job1Job2Job3 Site1357 Site2243 Site36910 Expected Time to Complete Matrix Suffer value: 1 1 4
18
http://GridSec.usc.edu April 6, 2005 18 Genetic Algorithm Overview Genetic Algorithms (GAs) are a popular technique used for searching large solution spaces Genetic Algorithms (GAs) are a popular technique used for searching large solution spaces ‘selection’, ‘crossover’, and ‘mutation’ operations ‘selection’, ‘crossover’, and ‘mutation’ operations Selection keep good solutions Selection keep good solutions Crossover global optimization Crossover global optimization Mutation local jumping Mutation local jumping 0 1 0 1 0 1 1 0 0 1 0 0 0 1 0 0 1 0 0 1 0.3 0.6 0.9 0.6 Initial Population 1 1 0 0 1 0 0 0 1 0 0 1 0 0 1 0.9 0.6 After selection 0 0 0 1 0 1 1 0 1 0 0 0 0 1 0 0 1 0 0 1 1.0 0.4 0.9 0.6 After crossover 0 0 0 0 1 1 1 0 1 0 1 0 0 1 0 0 1 0 0 1 1.0 0.4 0.8 0.6 After mutation 0 0 0 0 1
19
http://GridSec.usc.edu April 6, 2005 19 How does GA apply to job scheduling? What we have: What we have: A set of resource sites A number of jobs Solution need to generate: Solution need to generate: Job and site mapping site4site3site5site2site2 Job1 Job2 Job3 Job4 Job5 One solution (chromosome in GA) site4site3site5site2site2 site3site1site5site3site2 site1site3site4site2site6 Initial population (size=200) Job1 Job2 Job3 Job4 Job5
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.