Multicast Pull Scheduling Kirk Pruhs
The Big Problem Movie Distribution Database Replication via Internet Harry Potter Book Download Software Download Olympics Pay-Per-View Movies Today’s Internet Audience Size Content Richness
The Standard Centralized Unicast Pull Approach is Not Scalable Creates unnecessary network congestion Overloads the server
One Possible Solution: Multicast
Response Times for the 3 Different Multicast Distribution Methods Average response time Low load High load Multicast Push Unicast Pull Multicast Pull
Appropriate Distribution Method Depends on Popularity of Data Multicast Push Multicast Pull Unicast Pull
Another Application: (RODS) Realtime Outbreak Detection System Developed at Pitt Deployed in Utah for Winter Olympics Now collects information on 70% of doctors visits in Utah Since the anthrax attacks, RODS has received lots of funding
Project Goals Build a prototype data dissemination system that uses all three basic data dissemination methods appropriately Supported by an NSF grant from ANIR program Joint work with Panos Chysanthis and Vincenzo Liberatore Study the interesting data management problems that arise in such a system Supported by an NSF grant from CCR program
Middleware Architecture and Data Management Issues Application Layer Multicast Transport Layer, e.g. Java Reliable Multicast Document Selection Multicast Push Scheduling Indexing Caching Server Side Client Side Multicast Pull Scheduling Unicast Pull Scheduling
Rest of the Talk: Multicast Pull Scheduling From Newsweek magazine From
Simple Example Instance of Multicast Pull Scheduling Schedule Average response time = ( )/4 Input
Standard Worst-case Algorithm Analysis Technique Competitive ratio of algorithm A is max I A(I)/Opt(I) A(I) is the average response time on input I using algorithm A Opt(I) is the average response time for the optimal schedule For example, a 2-competitive algorithm A guarantees that it will produce a schedule with average response time at most twice of the optimal average response time
Warm-up Problem: Unit Sized Documents Obvious Algorithm? Most Requests First (MRF): Broadcast the document with the most requests Surprisingly, MRF has unbounded competitive ratio (proof next slide) Moral: Multicast pull scheduling is trickier than it might first appear
Most Requests First (MRF) is not an O(1)-competitive algorithm MRF Optimal Average response time ~ n Average response time ~ 1 Input n jobs n 2 jobs
There is no O(1)-competitive Online Server Scheduling Algorithm for Multicast Pull Input Online schedule Optimal schedule n jobs n 2 jobs Average response time ~ n Average response time ~ 1
Resource Augmentation Analysis Compare the limited (e.g. online) algorithm with more resources (e.g. a faster processor or more processors) to the optimal algorithm with less resources Online algorithm A is s-speed c-competitive if max I A s (I)/Opt 1 (I) < c Subscript denotes processor speed Example: A 2-speed 3-competitive algorithm equipped with a speed 2 processor guarantees an average response time at most 3 times the optimal average response time for a 1 speed processor
Classic Server QoS Curves Average response time Low load High load Fast processor Slow Processor Online Optimal Online is not O(1)-competitive Online is O(1)-speed O(1)-competitive
Old Chinese Saying: Three blind shoemakers are better than one politician
Most Requests First (MRF) is not an O(1)-competitive algorithm MRF Optimal Average response time ~ n Average response time ~ 1 O(1)-speed Input n jobs n 2 jobs
The Power of the Adversary in Multicast Pull Scheduling Recall general lower bound instance Intuition: The adversary forces the online algorithm to labor on sequential work
Definition of Parallel and Sequential Work Rate work is completed Rate work is completed Processing power devoted to the work low high low high Processing power devoted to the work Parallel work Sequential work
Another Application Where Sequential Work Arises: Scheduling Jobs on a Multi-Processor Sequential work Parallel work Input One Possible Optimal Schedule P1P1 P2P2 Average response time = ( )/3
The Main Result to Date (with Jeff Edmonds) A method to construct a multicast pull scheduling algorithm B from a nonclairvoyant unicast scheduling algorithm A. If algorithm A is an s-speed c-competitive algorithm when jobs have parallel and sequential components, then B is a (2+ ε )s-speed c- competitive Formalizes the surprising insight that the difficulty of multicast pull scheduling = the difficulty of unicast scheduling of jobs with parallel and sequential components
Constructing the Multicast Pull Algorithm B from the Unicast Algorithm A Multicast pull input Unicast input Algorithm A’s unicast schedule Algorithm B’s multicast pull schedule
Equipoise Algorithm for Unicast Scheduling of Jobs with Parallel and Sequential Components Equipoise (Round Robin) transmits each file at the same rate. Edmonds (1999) showed that the algorithm Equipoise is a (2+ ε )-speed O(1 + 1/ε)- competitive algorithm
The BEquipoise Multicast Pull Algorithm BEquipoise broadcasts each document at a rate proportional to the number of requests to that document The algorithm BEquipoise is a (4+ε)- speed O(1 + 1/ε)-competitive algorithm BEquipoise will work reasonably well if the server load < ¼ Bequipoise is not an 2-speed O(1)- competitive algorithm
Possible O(1)-competitive (1+ε)-speed Algorithms ? Unit sized files: Longest Wait First (LWF): Send out the document where the sum of the ages of the outstanding requests is maximized Arbitrary sized files: Longest Total Stretch First (LTSF): Send out the document where the sum of the ages of the outstanding requests, divided by the file size, is maximized Appear to be the current experimental champions (Acharya and Muthukrishnan)
Future Directions Are LWF and LTSF (1 + ε)-speed O(1 + 1/ε)-competitive algorithms for multicast pull scheduling? If not, is an (1 + ε)-speed O(1 + 1/ε)-competitive algorithm possible? One possibility is to find an (1 + ε)-speed O(1 + 1/ε)- competitive algorithm for unicast scheduling of jobs with arbitrary speed-up curves, and to remove the factor of two in the speed in our reduction. Is there an O(1)-competitive polynomial-time offline algorithm for multicast pull scheduling? The problems are known to be NP-hard (Erlebach and Hall) Open for both the case of unit sized files and arbitrary sized files
Every talk has to have a Dilbert cartoon.