A Server-less Architecture for Building Scalable, Reliable, and Cost-Effective Video-on-demand Systems Jack Lee Yiu-bun, Raymond Leung Wai Tak Department of Information Engineering The Chinese University of Hong Kong
Contents 1. Introduction 2. Challenges 3. Server-less Architecture 4. Performance 5. Conclusion
1. Introduction Traditional Client-server Architecture Clients connect to server and request for streaming Server capacity limits the system capacity Cost increases with system scale Server-less Architecture Motivated by the availability of powerful user devices Each user node contributes to the system Memory Network bandwidth Storage Costs shared by users
1. Introduction Composed of clusters Each node serves as a mini server
2. Challenges Video Data Storage Retrieval and Transmission Scheduling Fault Tolerance Distributed Directory Service Heterogeneous User Nodes System Adaptation – node joining/leaving
3. Server-less Architecture Storage Policy Video data is divided into fixed-size blocks and then distributed among nodes in the cluster (data striping) Low storage requirement, load balanced Capable of fault tolerance using redundant blocks (discussed later)
3. Server-less Architecture Retrieval and Transmission Scheduling Round-based scheduler Retrieval scheduling in terms of macro rounds composed of GSS groups (micro rounds) Transmission lasts for one macro round
3. Server-less Architecture Fault Tolerance Recover from not a single node failure, but multiple simultaneously node failures as well Redundancy by Forward Error Correction (FEC) Code e.g. Reed-Solomon Erasure Code (REC)
4. Performance Evaluation Reliability Analysis Find out the system mean time to failure (MTTF) Assuming independent node failure/repair rate Tolerate up to h failures by redundancy Analysis by Markov chain model
4. Performance Evaluation Redundancy Level Defined as the proportion of nodes serving redundant data Redundancy level versus number of nodes on achieving the target system MTTF
4. Performance Evaluation System Response Time Sum of the scheduling delay and the prefetch delay Prefetch Delay Time required to receive the first group of blocks from all nodes Increases linearly with system scale – not scalable Ultimately limits the cluster size What is the Solution? Multiple parity groups
4. Performance Evaluation Multiple Parity Groups Instead of single parity group, the redundancy is encoded with multiple parity groups Playback begins after receiving the data of first parity group
4. Performance Evaluation Multiple Parity Groups Performance gain: shorten the prefetch delay Drawback: higher redundancy level to maintain the same system MTTF Tradeoff between response time and redundancy level
4. Performance Evaluation System Response Time Increases with cluster size Shortened by using multiple parity groups
4. Performance Evaluation System Dimensioning What are the system configurations if the system a.achieves a MTTF of 10,000 hours, and b.keeps under a response time constraint of 5 seconds?
5. Conclusion Server-less Architecture Scalable Acceptable redundancy level to achieve reasonable response time in a cluster Further scale up by forming new autonomous clusters Reliable Fault tolerance by redundancy Comparable reliability as high-end server by the analysis using Markov chain Cost-Effective Costs shared by all users
5. Conclusion Future Work Distributed Directory Service Heterogeneous User Nodes Dynamic System Adaptation Node joining/leaving Data re-distribution
End of Presentation Thank you.