Prediction-based Prefetching to Support VCR-like Operations in Gossip-based P2P VoD Systems Tianyin Xu, Weiwei Wang, Baoliu Ye Wenzhong Li, Sanglu Lu, Yang Gao Nanjing University Dislab, NJU CS
Nanjing University 2 Outline Background P2P VoD streaming; Gossip-based systems; VCR-like interactive behavior. Motivation Solutions System architecture; Prefetching model; Data scheduling; VCR-like operation support. Data scheduling; VCR-like operation support. Performance Evaluation Conclusions Dislab, NJU CS
Nanjing University Background (1) P2P media streaming Everyone can be a content producer/provider. Cache-and-relay mechanism: peers actively cache media contents and further relay them to other peers that are expecting them. 3 * P2P live streaming is very successful! -CoolStreaming (INFOCOM’05), -PPLive, Joost Dislab, NJU CS
Nanjing University Background (2) P2P VoD streaming is challenging! Provide free access to any segment in the video at anytime by VCR-like operations. VCR-like (Video Cassette Recorder) operations random seek, pause, fast forward/backward (FF/FB) For VCR-like operations, “jump” process is the most important. Most VCR-like operations can be implemented by “jump”. –random seek & pause: 1 jump; FF/FB: series of jump; 4 Dislab, NJU CS
Nanjing University Motivation (1) How to support the “jump”? Optimizing the index overlay to realize fast segment relocation Jump => locate-and-download process; Necessary, but far more sufficient. Prediction-based Prefetching Expect a zero jump delay; Proactively prefetch segments that are likely to be requested by future VCR- like operations; Rely on prediction accuracy. 5 Question: Is the prediction feasible? Dislab, NJU CS
Nanjing University User Access Patterns (1) User rarely view the movie from the beginning to the end. The total playing time of a user is quite limited and tends to be short. Because some popular segments (called highlights) attract more user requests than non-popular segments. Brampton et al., NOSSDAV-2007 Zheng et al., P2PMMS Dislab, NJU CS
Nanjing University User Access Patterns (2) Probability distribution of object and segment popularity Log-normal distribution Zipf distribution Brampton et al., NOSSDAV-2007 Yu et al., EUROSYS Dislab, NJU CS
Nanjing University User Access Patterns (3) Fast Forward is more frequent than Fast Backward. Short Jump is more frequent than Long Jump. Cheng et al., IPTPS-2007 Cheng et al., IPTPS-2007 Brampton et al., NOSSDAV-2007 Brampton et al., NOSSDAV Dislab, NJU CS
Nanjing University Motivation (2) Our Objective: Effective Prediction-based Prefetching Scheme Effective Prediction-based Prefetching Scheme Effective prediction model Based on user access patterns Easy to be integrated in current P2P VoD systems Practical data scheduling 9 Dislab, NJU CS
Nanjing University System Architecture (1) Solution 1: Let the server do prediction for each user [1] Pro: Server has large volumes of user viewing logs Con: poor scalability Solution 2: Let the client exchange user logs and do prediction [2] Pro: scalable Cons: 1. lack of large volumes of user logs 2. high computing cost & training time [1] Huang et al, “A User-Aware Prefetching Mechanism for Video Streaming”, WWW [2] He et al, “VOVO: VCR-Oriented Video-On-Demand in Large-Scale Peer-to-Peer Networks”, TPDS Our solution: Server side: offline pattern mining => prediction model Peer side: lightweight online prediction Dislab, NJU CS
Nanjing University System Architecture (2) Take full advantage of tracker Tracker has large volume of user viewing logs; Every node have to contact the tracker to join the system initiate its neighbor & partner list 11 Dislab, NJU CS
Nanjing University Prediction Approach: Overview Frequent Sequential Pattern Mining PerfixSpan[1] : Mining Sequential Patterns Efficiently by Prefix- Projected Pattern Growth. Splitting Video Segments into Abstract States Mapping User Logs to Abstract States Construct Contingency Table (CCT) Model Utilization [1] Pei et al., “Mining Sequential Patterns by Pattern Growth: The PrefixSpan Approach”, TKDE Dislab, NJU CS
Nanjing University Prediction Approach (1) Frequent Sequential Patterns Dislab, NJU CS
Nanjing University Prediction Approach (2) Sequential patterns found may be overlapped? e.g. and Splitting Approach Filter out the sub-patterns e.g.,,, Scan over the remaining sequential patterns Cut them into intervals without overlapping - e.g. and [1,7],[8,12] Take intervals not exist in the mined sequential patterns as separate intervals Split the contiguous intervals into appropriate granularity intervals(States) - MIN, MAX 14 Dislab, NJU CS
Nanjing University Prediction Approach (3) Map Raw User logs into State Transitions e.g. map to [1,6] [7,13] Transition Table Construction Simple Frequency Counting 15 Dislab, NJU CS
Nanjing University Data Scheduling Two stage scheduling strategy: Stage 1: fetch urgent segments into playback buffer Guarantee the continuity of normal playback Urgent line mechanism [1] Stage 2: prefetch based on prediction Reduce jump latency Utilize residual bandwidth [1] Li et al., “ContinuStreaming: Achieving High Plackback Continuity of Gossip-based Peer-to-Peer Streaming”, IPDPS Dislab, NJU CS
Nanjing University VCR-like Operation Support The jump process caused by VCR-like operations: Case 1. The jump segment is already prefetched on the local peerCase 1. The jump segment is already prefetched on the local peer => Just playback!! Case 2. The jump segment is cached on the partners’ bufferCase 2. The jump segment is cached on the partners’ buffer => download and playback! Case 3. Neither cached on the local peer nor cached by the partnersCase 3. Neither cached on the local peer nor cached by the partners => relocate, connect and download 17 Dislab, NJU CS
Nanjing University Simulation Settings User Log Generation Modify GISMO [1] –Using log-normal distribution to let users trend to jump around hot scenes. The simulation is built on top of a topology of 5000 peer nodes based on the transit-stub model generated by GT-ITM. The streaming rate is S = 256 Kpbs, the download bandwidth is randomly distributed in [1.5S, 5S]. The default size of the playback buffer is 30Mbytes, i.e., each peer can cache 120 second recent stream (100 for playback, 20 for prefetching). The arrival of peers follows the Poisson Process with λ = 5. [1] GISMO: A Generator of Internet Streaming Media Objects and Workloads 18 Dislab, NJU CS
Nanjing University Performance Evaluation (1) 19 Dislab, NJU CS
Nanjing University Performance Evaluation (2) 2 Dislab, NJU CS
Nanjing University Performance Evaluation (3) 3 Dislab, NJU CS
Nanjing University Performance Evaluation 4 Dislab, NJU CS
Nanjing University Conclusions A practical architecture that can be used in almost all existing P2P VoD systems A novel and simple prediction approach State abstraction plays an important role A two stage data scheduling 23 Dislab, NJU CS
Nanjing University 24 Dislab, NJU CS The End