Robust Network Supercomputing with Malicious Processes (Reliably Executing Tasks Upon Estimating the Number of Malicious Processes) Kishori M. Konwar*

Robust Network Supercomputing with Malicious Processes (Reliably Executing Tasks Upon Estimating the Number of Malicious Processes) Kishori M. Konwar* Sanguthevar Rajasekaran Alexander A. Shvartsman *Computer Science & Engineering Department University of Connecticut Storrs, CT

2 Motivation  Internet supercomputing is increasingly becoming a powerful tool for harnessing massive amounts of computational resources  availability of high bandwidth Internet connections  there is an enormous number of processes around the world  comes at a cost substantially lower than acquiring a supercomputer or building a cluster of powerful machines

4 TASKS

6 PrimeNet Server  PrimeNet Server is a distributed, massively parallel scientific computing Internet Supercomputer  Supported by Entropia.com and ranks among the most powerful computers in the world  A project comprised of about 30,000 PCs and laptops  Currently sustains a 22,296 billion floating point operations per second (gigaflops) (operations that involve fractional numbers )

7 SETI@home  SETI@home project a massive distributed cooperative computer  Used for analysis of gigabytes of data for Search for Extraterrestrial Intelligence (SETI)  Comprises of millions of voluntary machines around  SETI@home project reported its speed to be more than 57,290 billion floating point operations per second

8 Reliability Issues  The master and perhaps certain workers are reliable  they will correctly execute the tasks assigned by the server  However, workers are commonly unreliable  they may return to the master incorrect results due to unintended failures caused, e.g., by over-clocked processors  may deceivingly claim to have performed assigned work so as to obtain incentive such as getting higher rank

10 Some Previous Studies  [FGLS05] Assumed the worker processes might act maliciously and hence deliberately return wrong results.  goal is to design algorithm that enable the master to accept correct results with high probability at a lower cost  they provided a randomized algorithm  unfortunately the cost complexity results depend on several parameters and hard to interpret

11 Some Previous Studies (cont’d)  [GM05] considered the problem of maximizing the expected number of correct result  the tasks are dependent  any worker computes correctly with probability p < 1 any incorrectly computed task corrupts all dependent tasks  the goal is to compute a schedule that maximizes expected number of correct results under a given time constraint  they showed the optimization problem to be NP-hard  provided some solutions on a restricted DAG

12 Overview  Models of Computation  Stopping Rule Algorithm based solution  Detection of Faulty Processors  Performing Tasks with Faulty Workers  Conclusions

14 Models of Computation  Processes takes steps in lock steps, i.e., in synchrony  Processes communicate by exchanging messages  The tasks are independent and idempotent  Processes are subject to failures and can return incorrect results maliciously  Workers, P = {1,2,..., n} and a master M

15 Work Complexities   [CDS01] defined as work complexity or available processor steps  All steps taken by processes during execution of the algorithm are counted including the steps of the idling and waiting non-faulty processes  work  [DHW92] define work as the number of performed tasks counting multiplicities  Approach does not charge for idling and waiting this is called task oriented work

16 Few Comments  work   We say that an even E occurs with high probability (w.h.p.) to mean that Pr[E] = 1 – O(n -  ) for some constant  > 0.

17 Modeling Failures  Failure model F a  f-fraction, 0 < f < ½ of the n workers may fail  Each possibly faulty worker independently exhibits faulty behavior with probability 0 < p < ½.  The master has no a priori knowledge of f and p.

18 Modeling Failures (cont’d)  Failure model F b  There is a fixed bound on the f-fraction, 0 < f < ½ of the n workers that can be faulty  Any worker from the remaining (1-f)-fraction of the workers fails with probability 0 < p <1/2 independently of other workers  The master knows the values of f and p.

19 Algorithmic Template  procedure for master process M, task T Choose a set S  P Send task T to each processor p  S Wait for the results from the processes in S Decide on the result value v from the responses  procedure for worker w  P Wait to receive a task from master M Upon receiving a task from M Execute the task Send the result to M

21 ( ,  )-approximation algorithm  Z is a random variable distributed in the interval [0,1] with mean  Z  Z 1, Z 2, Z 3.... are independently and identically distributed according to the random variable Z  An ( ,  )-approximation algorithm, with 0 <  < 1,  > 0 for estimating  Z satisfies Pr[  Z (1-  )    Z (1+  ) ] > 1 -  where is the estimated value of  Z

22 Stopping Rule Algorithm [Dagum, Karp, Luby, and Ross 1995] Input Parameters ( ,  ) with 0 0 Let  1 = 1 + (1+  )  // = 0.72 &  = 4 log(2/  )/  2 Initialize N  0, S  0 While S <  1 do: N  N+1, S  S + Z N Output: Z   1 /N

23 Stopping Rule Theorem Theorem (Stopping Rule Theorem) [Dagum, Karp, Luby, and Ross] Let Z be a random variable in [0,1] with  Z = E[Z] > 0. Let be the estimate produced and let N Z be the number of experiments that SRA runs with respect to Z on input  and . Then, (i) Pr[  Z (1-  )    Z (1+  ) ] > 1 -  (ii) E[N Z ]   1 /  Z and (iii) Pr[N Z >(1+  )  1 /  Z ]   /2

24 Algorithm A f,p to estimate f and p

25 Work Complexity of A f,p Theorem: Algorithm A f,p is an ( ,  )-approximation algorithm, 0 0, for the estimation of f and p with work complexity O(log 2 n), complexity O(n log n), message complexity O(log 2 n) and time complexity O(log n), with high probability.

27 Detection of Faulty Processors  Lemma: It is not possible to perform all the n tasks correctly, in the failure model F a with linear complexity (i.e., O(n)) with high probability.

28 Detection of Faulty Processors  procedure for master process M Initially, F  For t = 0, …. k log n, k > 0 Choose a set S  P \ F Send each process p  S “test” task Wait for the results from the processes in S If the response is faulty F  F  {p: p is a faulty process} End If End For

29 Detection of Faulty Processors  Lemma: The algorithm detects all faulty processes among the n workers in O(log n) time with O(n) work with high probability  Theorem[Karp 04]: Suppose that a(x) is a non-decreasing, continuous function that is strictly increasing on {x | a(x) >0}, and m(x) is a continuous function. Then for every positive real x and every positive integer t, Pr[T(x) > u(x) + ta(x)]  (m(x)/x) t where u(x) is the solution to the equation u(x)=a(x) + u(m(x)) with m 0 (x) :=0 and m i+1 (x):= m(m i (x)).

31 Performing Tasks under F a procedure for master process M: Initially, C , J  set of n tasks Randomly choose a set, possibly with repetition, S  P, |S|=kn/log n workers k>0 is a constant For i = 1, …, k' log n, k' > 0 Send to each worker p  S a “test” task Collect the responses from all the workers. End For If all the responses from a worker p  S are correct then C  C  {p} End if For i=1, …, n/|C| Send |C| jobs from J, not sent in previous iteration, one to each worker in C. Collect the responses from the C workers End For

32 Work and Time Complexities Theorem: The algorithm performs all n tasks correctly in O(log n) time and has O(n) work and complexities, with high probability.

34 Performing Tasks under F b procedure for master process M, For t = 0, …. k log n, k > 0 Choose a random permutation  R S n Foreach j  [n] Send task to processor  (j) End For Collect the responses from all the workers End For Foreach j  [n] Choose the majority of the results of computation for task as the result End For

35 Work and Time Complexities Theorem: The algorithm performs all n tasks correctly in O(log n) time and has and work complexities O(n log n), for 0 ½ with high probability

37 Conclusions  Perform tasks under above models where the tasks are dependent  The dependency graph can be DAG  Quantify work and time complexities on some characteristics of the DAG

Robust Network Supercomputing with Malicious Processes (Reliably Executing Tasks Upon Estimating the Number of Malicious Processes) Kishori M. Konwar*

Similar presentations

Presentation on theme: "Robust Network Supercomputing with Malicious Processes (Reliably Executing Tasks Upon Estimating the Number of Malicious Processes) Kishori M. Konwar*"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Robust Network Supercomputing with Malicious Processes (Reliably Executing Tasks Upon Estimating the Number of Malicious Processes) Kishori M. Konwar*

Similar presentations

Presentation on theme: "Robust Network Supercomputing with Malicious Processes (Reliably Executing Tasks Upon Estimating the Number of Malicious Processes) Kishori M. Konwar*"— Presentation transcript:

Similar presentations

About project

Feedback