Trust-Sensitive Scheduling on the Open Grid Jon B. Weissman with help from Jason Sonnek and Abhishek Chandra Department of Computer Science University.

Slides:

Advertisements

Similar presentations

Decentralizing Grids Jon Weissman University of Minnesota E-Science Institute Nov

Advertisements

Google News Personalization: Scalable Online Collaborative Filtering

Hadi Goudarzi and Massoud Pedram

LASTor: A Low-Latency AS-Aware Tor Client

Variance reduction techniques. 2 Introduction Simulation models should be coded such that they are efficient. Efficiency in terms of programming ensures.

Doc.: IEEE /0604r1 Submission May 2014 Slide 1 Modeling and Evaluating Variable Bit rate Video Steaming for ax Date: Authors:

G. Alonso, D. Kossmann Systems Group

Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 19 Scheduling IV.

A Server-less Architecture for Building Scalable, Reliable, and Cost-Effective Video-on-demand Systems Jack Lee Yiu-bun, Raymond Leung Wai Tak Department.

All Hands Meeting, 2006 Title: Grid Workflow Scheduling in WOSE (Workflow Optimisation Services for e- Science Applications) Authors: Yash Patel, Andrew.

Ashish Gupta Under Guidance of Prof. B.N. Jain Department of Computer Science and Engineering Advanced Networking Laboratory.

The Organic Grid: Self- Organizing Computation on a Peer-to-Peer Network Presented by : Xuan Lin.

Source-Adaptive Multilayered Multicast Algorithms for Real- Time Video Distribution Brett J. Vickers, Celio Albuquerque, and Tatsuya Suda IEEE/ACM Transactions.

A Heuristic Bidding Strategy for Multiple Heterogeneous Auctions Patricia Anthony & Nicholas R. Jennings Dept. of Electronics and Computer Science University.

Diagnosis on Computational Grids for Detecting Intelligent Cheating Nodes Felipe Martins Rossana M. Andrade Aldri L. dos Santos Bruno SchulzeJosé N. de.

Dept. of Computer Science & Engineering, CUHK1 Trust- and Clustering-Based Authentication Services in Mobile Ad Hoc Networks Edith Ngai and Michael R.

Improving Robustness in Distributed Systems Jeremy Russell Software Engineering Honours Project.

Lecture 3: A Case for RAID (Part 1) Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California.

Smart Redundancy for Distributed Computation George Edwards Blue Cell Software, LLC Yuriy Brun University of Washington Jae young Bang University of Southern.

Fault-tolerant Adaptive Divisible Load Scheduling Xuan Lin, Sumanth J. V. Acknowledge: a few slides of DLT are from Thomas Robertazzi ’ s presentation.

Multi-server Optimal Bandwidth Monitoring for QoS based Multimedia Delivery Anup Basu, Irene Cheng and Yinzhe Yu Department of Computing Science U. of.

What we will cover…  CPU Scheduling  Basic Concepts  Scheduling Criteria  Scheduling Algorithms  Evaluations 1-1 Lecture 4.

Scientific Computing Department Faculty of Computer and Information Sciences Ain Shams University Supervised By: Mohammad F. Tolba Mohammad S. Abdel-Wahab.

Self-Organizing Agents for Grid Load Balancing Junwei Cao Fifth IEEE/ACM International Workshop on Grid Computing (GRID'04)

Robust Network Supercomputing with Malicious Processes (Reliably Executing Tasks Upon Estimating the Number of Malicious Processes) Kishori M. Konwar*

SOFTWARE ENGINEERING1 Introduction. Software Software (IEEE): collection of programs, procedures, rules, and associated documentation and data SOFTWARE.

An Integration Framework for Sensor Networks and Data Stream Management Systems.

Resource Management in Volunteer Computing Grids An analysis of the different approaches to maximizing throughput on a BOINC grid Presented by Geoffrey.

Young Suk Moon Chair: Dr. Hans-Peter Bischof Reader: Dr. Gregor von Laszewski Observer: Dr. Minseok Kwon 1.

Trust-Aware Optimal Crowdsourcing With Budget Constraint Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department.

1 Heterogeneity in Multi-Hop Wireless Networks Nitin H. Vaidya University of Illinois at Urbana-Champaign © 2003 Vaidya.

A Distributed Clustering Framework for MANETS Mohit Garg, IIT Bombay RK Shyamasundar School of Tech. & Computer Science Tata Institute of Fundamental Research.

Kevin Ross, UCSC, September Service Network Engineering Resource Allocation and Optimization Kevin Ross Information Systems & Technology Management.

Budget-based Control for Interactive Services with Partial Execution 1 Yuxiong He, Zihao Ye, Qiang Fu, Sameh Elnikety Microsoft Research.

1 Wenguang WangRichard B. Bunt Department of Computer Science University of Saskatchewan November 14, 2000 Simulating DB2 Buffer Pool Management.

Euro-Par, A Resource Allocation Approach for Supporting Time-Critical Applications in Grid Environments Qian Zhu and Gagan Agrawal Department of.

Adapted from the original presentation made by the authors Reputation-based Framework for High Integrity Sensor Networks.

Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.

Tarun Bansal, Bo Chen and Prasun Sinha

ICOM 6115: Computer Systems Performance Measurement and Evaluation August 11, 2006.

Byzantine fault-tolerance COMP 413 Fall Overview Models –Synchronous vs. asynchronous systems –Byzantine failure model Secure storage with self-certifying.

Scalable Computing on Open Distributed Systems Jon Weissman University of Minnesota National E-Science Center CLADE 2008.

©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development 3.

OPERETTA: An Optimal Energy Efficient Bandwidth Aggregation System Karim Habak†, Khaled A. Harras‡, and Moustafa Youssef† †Egypt-Japan University of Sc.

Resource Mapping and Scheduling for Heterogeneous Network Processor Systems Liang Yang, Tushar Gohad, Pavel Ghosh, Devesh Sinha, Arunabha Sen and Andrea.

Toward Fault-tolerant P2P Systems: Constructing a Stable Virtual Peer from Multiple Unstable Peers Kota Abe, Tatsuya Ueda (Presenter), Masanori Shikano,

Department of Electronic Engineering Challenges & Proposals INFSO Information Day e-Infrastructure Grid Initiatives 26/27 May.

Advanced Decision Architectures Collaborative Technology Alliance An Interactive Decision Support Architecture for Visualizing Robust Solutions in High-Risk.

5 May CmpE 516 Fault Tolerant Scheduling in Multiprocessor Systems Betül Demiröz.

Chapter 10 Verification and Validation of Simulation Models

Re-Configurable Byzantine Quorum System Lei Kong S. Arun Mustaque Ahamad Doug Blough.

Efficient Distribution of Key Chain Commitments for Broadcast Authentication in Distributed Sensor Networks Donggang Liu and Peng Ning Department of Computer.

Uppsala, April 12-16th 2010EGEE 5th User Forum1 A Business-Driven Cloudburst Scheduler for Bag-of-Task Applications Francisco Brasileiro, Ricardo Araújo,

1 1 Slide © 2004 Thomson/South-Western Simulation n Simulation is one of the most frequently employed management science techniques. n It is typically.

Part I Web Service Composition

Parameter Sweep and Resources Scaling Automation in Scalarm Data Farming Platform J. Liput, M. Paciorek, M. Wrona, M. Orzechowski, R. Slota, and J. Kitowski.

Hongjie Zhu,Chao Zhang,Jianhua Lu Designing of Fountain Codes with Short Code-Length International Workshop on Signal Design and Its Applications in Communications,

Csci 418/618 Simulation Models Dr. Ken Nygard, IACC 262B

Euro-Par, HASTE: An Adaptive Middleware for Supporting Time-Critical Event Handling in Distributed Environments ICAC 2008 Conference June 2 nd,

Decentralized Trust Management for Ad-Hoc Peer-to-Peer Networks Thomas Repantis Vana Kalogeraki Department of Computer Science & Engineering University.

 Attacks and threats  Security challenge & Solution  Communication Infrastructure  The CA hierarchy  Vehicular Public Key  Certificates.

OPERATING SYSTEMS CS 3502 Fall 2017

Job Scheduling in a Grid Computing Environment

Analyzing Security and Energy Tradeoffs in Autonomic Capacity Management Wei Wu.

Location Cloaking for Location Safety Protection of Ad Hoc Networks

A Framework for Automatic Resource and Accuracy Management in A Cloud Environment Smita Vijayakumar.

CPU Scheduling G.Anuradha

Chapter 6: CPU Scheduling

Smita Vijayakumar Qian Zhu Gagan Agrawal

Modeling and Evaluating Variable Bit rate Video Steaming for ax

Presentation transcript:

Trust-Sensitive Scheduling on the Open Grid Jon B. Weissman with help from Jason Sonnek and Abhishek Chandra Department of Computer Science University of Minnesota Trends in HPDC Workshop Amsterdam 2006

Background Public donation-based infrastructures are attractive –positives: cheap, scalable, fault tolerant (UW- Condor, –negatives: “hostile” - uncertain resource availability/connectivity, node behavior, end- user demand => best effort service

Background Such infrastructures have been used for throughput-based applications –just make progress, all tasks equal Service applications are more challenging –all tasks not equal –explicit boundaries between user requests –may even have SLAs, QoS, etc.

Service Model Distributed Service –request -> set of independent tasks –each task mapped to a donated node –makespan –E.g. BLAST service user request (input sequence) + chunk of DB form a task

BOINC + BLAST workunit = input_sequence + chunk of DB generated when a request arrives

The Challenge Nodes are unreliable –timeliness: heterogeneity, bottlenecks, … –cheating: hacked, malicious (> 1% of SETi nodes), misconfigured –failure –churn For a service, this matters

Some data- timeliness Computation Heterogeneity - both across and within nodes Communication Heterogeneity - both across and within nodes PlanetLab – lower bound

The Problem for Today Deal with node misbehavior Result verification –application-specific verifiers – not general –redundancy + voting Most approaches assume ad-hoc replication –under-replicate: task re-execution (^ latency) –over-replicate: wasted resources (v throughput) Using information about the past behavior of a node, we can intelligently size the amount of redundancy

System Model

Problems with ad-hoc replication Unreliable node Reliable node Task x sent to group A Task y sent to group B

Smart Replication Reputation –ratings based on past interactions with clients –simple sample-based prob. (r i ) over window  –extend to worker group (assuming no collusion) => likelihood of correctness (LOC) Smarter Redundancy –variable-sized worker groups –intuition: higher reliability clients => smaller groups

Terms LOC (Likelihood of Correctness), g –computes the ‘actual’ probability of getting a correct answer from a group of clients (group g) Target LOC ( target ) –the task success-rate that the system tries to ensure while forming client groups –related to the statistics of the underlying distribution

Trust Sensitive Scheduling Guiding metrics –throughput  : is the number of successfully completed tasks in an interval –success rate s: ratio of throughput to number of tasks attempted

Scheduling Algorithms First-Fit –attempt to form the first group that satisfies target Best-Fit –attempt to form a group that best satisfies target Random-Fit –attempt to form a random group that satisfies target Fixed-size –randomly form fixed sized groups. Ignore client ratings. Random and Fixed are our baselines Min group size = 3

Scheduling Algorithms

Scheduling Algorithms (cont’d)

Different Groupings target =.5

Evaluation Simulated a wide-variety of node reliability distributions Set target to be the success rate of Fixed –goal: match success rate of fixed (which over- replicates) yet achieve higher throughput –if desired, can drive tput even higher (but success rate would suffer)

Comparison gain: % open question: how much better could we have done?

Non-stationarity Nodes may suddenly shift gears –deliberately malicious, virus, detach/rejoin –underlying reliability distribution changes Solution –window-based rating (reduce  from infinite) Experiment: “blackout” at round 300 (30% effected)

Role of target Key parameter Too large –groups will be too large (low throughput) Too small –groups will be too small (low success rate) Adaptively learn it (parameterless) –maximizing  * s : “goodput” –or could bias toward  or s

Adaptive algorithm Multi-objective optimization –choose target LOC to simultaneously maximize throughput  and success rate s  1  2 s –use weighted combination to reduce multiple objectives to a single objective –employ hill-climbing and feedback techniques to control dynamic parameter adjustment

Adapting target Blackout example

Throughput (  1 =1,  2 =0)

Current/Future Work Implementation of reputation-based scheduling framework (BOINC and PL) Mechanisms to retain node identities (hence r i ) under node churn –“node signatures” that capture the characteristics of the node

Current/Future Work (cont’d) Timeliness –extending reliability to encompass time –a node whose performance is highly variable is less reliable Client collusion –detection: group signatures –prevention: combine quiz-based tasks with reputation systems form random-groupings

Thank you.