Presentation is loading. Please wait.

Presentation is loading. Please wait.

Does Internet media traffic really follow the Zipf-like distribution? Lei Guo 1, Enhua Tan 1, Songqing Chen 2, Zhen Xiao 3, and Xiaodong Zhang 1 1 Ohio.

Similar presentations


Presentation on theme: "Does Internet media traffic really follow the Zipf-like distribution? Lei Guo 1, Enhua Tan 1, Songqing Chen 2, Zhen Xiao 3, and Xiaodong Zhang 1 1 Ohio."— Presentation transcript:

1 Does Internet media traffic really follow the Zipf-like distribution? Lei Guo 1, Enhua Tan 1, Songqing Chen 2, Zhen Xiao 3, and Xiaodong Zhang 1 1 Ohio State University 2 George Mason University 3 IBM Research Physical explanations of media access patterns In general, despite the different techniques and systems used for media delivery, the greater the median file size of a workload, the greater the stretch factor of the stretched exponential model of its reference rank distribution. Physical meaning of parameter c Same system (KaZaa network) WorkloadKaZaa-03KaZaa-02 File size5 MB300 MB c0.140.45 Same delivery approach (streaming) WorkloadIFILM-06ST-CLT-05HPLabs-99CTVoD-04 File size2.25 MB4.5 MB120 MB300 MB c0.150.20.30.4 Similar file size, different systems/approaches WorkloadKaZaa-03IFILM-06KaZaa-02CTVoD-04 ApproachP2PStreamingP2PStreaming File size5 MB2.25 MB300 MB c0.15 0.450.4 Physical meaning of parameter a For media systems with constant and, and constant median file size, stretch factor c is a time-invariant constant. Parameter a increases with time gradually, but tends to converge to a constant. Different from Web objects, media objects have large file sizes and long lifespan. Single-parameter Zipf-like distribution cannot well characterize the access patterns of media objects. In stretched exponential model, parameter c characterizes the effect of media file sizes, and parameter a characterizes the non-stationary effect of media access aging. In a coarse time granularity, media systems often have constant object birth rate and constant media request rate. Consider the average number of references per object in the system: old objects born before t = 0 new objects born after t = 0 Reference rank distribution of media objects is non-stationary number of requests number of accessed objects and a have significant impacts on caching performance ! entertainment workloads workload of a server Internet media traffic: Zipf-like or not? It is commonly agreed that Web traffic follows the Zipf-like distribution. However, existing studies on media traffic are largely workload specific due to the variety of media delivery systems and the diversity of media content, and the observed access patterns are often different from or even conflict with each other. The stretched exponential of Internet media traffic i : rank of media objects y : number of references N : number of objects c : stretch factor a : minus of the slope b : normalization factor Analyzing a wide variety of media workloads collected from different kinds of Internet media systems, we find that the reference ranks of media objects follow the stretched exponential (SE) distribution: VoD media Live media Web media P2P media log i ycyc b slope : -a SE model is accepted by Chi-square test while Zipf model is rejected. In a short time period, most accessed objects are old: Implications on media caching Conclusion Internet media access patterns follow the stretched exponential distribution. The performance of media caching with a client-server model is far less effective than that of Web content caching. The stretched exponential distribution lays out an analytical foundation to establish peer-to-peer caching systems for delivering the rapidly increasing Internet media content. Request concentration in SE workloads decreases with parameter c while increases with parameter a. Thus, for media workloads, the request concentration increases with time. Long-term caching can exploit higher request concentration with huge amount of storage. Assume N objects with unit storage volume, cache size is  N, the optimal hit ratios of SE and Zipf workloads are shown in the left figure. When concentration dominates the locality, caching of media (SE) workload is far less efficient than that of Web (Zipf-like) workload. Long-term caching With long-term caching, request correlation can be further exploited due to the decay of object popularity. However, it may need months to years and huge amount of storage to have a significant performance improvement. With scalable storage and huge amount of pre-existing media content in potential users, P2P-based caching system seems attractive. Client-server model is not efficient Temporal locality comes from request concentration and request correlation. For short periods such as one week, object popularity is almost stationary, thus locality mainly comes from concentration. The increase of slows down after a long time: workload of a server client side workload In existing measurements, the reported Zipf-like observations are either very rough, for example, only the head or tail of the distribution curve follows Zipf law, or may contain extraneous traffic such as streaming media ads, which do not reflect real user access pattern. A general model of Internet media access patterns is highly desirable for traffic engineering on the Internet and is critical to design, benchmark, and evaluate Internet media distribution systems. USITS’01: Zipf-like NOSSDAV’02: non Zipf-like MMCN’00: non Zipf-like EUROSYS’06: Zipf-like Web media systemsVoD media systemsLive streaming and IPTV systems IMW’02: Zipf-like IMC’04: non Zipf-like P2P media systems SOSP’03: non Zipf-like INFOCOM’04: Zipf-like


Download ppt "Does Internet media traffic really follow the Zipf-like distribution? Lei Guo 1, Enhua Tan 1, Songqing Chen 2, Zhen Xiao 3, and Xiaodong Zhang 1 1 Ohio."

Similar presentations


Ads by Google