by Huang et al., SOSP 2013 An Analysis of Facebook Photo Caching Presented by Phuong Nguyen Some animations and figures are borrowed from the original paper and presentation
Photos on Facebook: Overview Profile Feed Album billion photos, as of Sep 2013
Photos on Facebook: Overview 3 Storage Backend FB Cache Layers Full-stack Study Akamai CDN
FACEBOOK PHOTO CACHING: HOW IT WORKS? 4
Client-based Browser Cache Client Browser Cache Client 5 Local Fetch
Geo-distributed Edge Cache (FIFO) Edge Cache (Tens) Browser Cache Client PoP (Millions) 6
Single Global Origin Cache (FIFO) Browser Cache Edge Cache Origin Cache PoP ClientData Center (Tens)(Millions)(Four) 7 Hash(url)
Haystack Backend Backend (Haystack) Browser Cache Edge Cache Origin Cache PoP ClientData Center (Tens)(Millions)(Four) 8
FULL-STACK CACHE STUDY: DATA COLLECTION 9
Objective: collecting a representative sample that could permits correlation of events related to the same request Trace Collection Instrumentation Scope Backend (Haystack) Browser Cache Edge Cache Origin Cache PoP ClientData Center 10
Sampling Strategies Request-based: sampling requests randomly Bias on popular content Objected-based: focused on some subset of photos selected by a deterministic test on photoId Fair coverage of unpopular photos Cross stack analysis 11
WORKLOAD ANALYSIS 12
Analysis Objectives Traffic sheltering effects of caches Photo popularity distribution Geographic traffic distribution & collaborative caching Can we make the cache better? Impact of sizes & algorithm Could we know which photos to cache? 13
ANALYSIS: TRAFFIC SHELTERING 14
Traffic Sheltering 77.2M 26.6M 11.2M 7.6M Backend (Haystack) Browser Cache Edge Cache Origin Cache PoP ClientData Center 65.5% 58.0% 31.8% R Traffic Share 65.5%20.0%4.6% 9.9% 15
ANALYSIS: PHOTO POPULARITY IMPACT 16
Popularity Distribution Skewness is reduced after layers of cache 17
Popularity Impact on Caches 18
ANALYSIS: GEOGRAPHIC TRAFFIC DISTRIBUTION & COLLABORATIVE CACHING 19
Substantial Remote Traffic at Edge 20 Atlanta 20% local Miami 35% local Dallas 50% local Chicago 60% local LA 18% local NYC 35% local
Substantial Remote Traffic at Edge 21 Atlanta 20% local 5% Dallas 35% D.C. 5% NYC 20% Miami 5% California 10% Chicago Atlanta has 80% requests served by remote Edges
Collaborative Edge 22
Impact of Using Collaborative Edge Collaborative Edge increases hit ratio by 18% 18% 23 Collaborative
ANALYSIS: IMPACTS OF CACHE SIZE & ALGORITHM 24
Potential Improvement Study Methodology: cache simulation Replay the trace (25% warm up) Evaluate using remaining 75% Improvement factors: Cache size Caching algorithm Evaluation metric: hit ratio 25
Edge Cache with Different Sizes & Algorithms Infinite Cache 26 The same hit ratio can be achieved with a smaller cache and higher-performing algorithms
Edge Cache with Different Sizes & Algorithms Infinite Cache 27 Sophisticated algorithm can achieve better hit ratio with the same cache size
ANALYSIS: WHICH PHOTOS TO CACHE? 28
Intuitions Properties that intuitively associated with photo traffic: The age of photos The number of Facebook followers associated with the owner 29
Content Age Affect Age-based cache replacement algorithm could be effective Fresh content is popular and tends to be effectively cached throughout the hierarchy 30
Social Affect The more popular photo owner is, the more likely the photo is to be accessed Browser caches tend to have lower hit ratios for popular users (“viral” effect) 31
DISCUSSIONS 32
Discussions 33 Evaluation method: Only consider desktop clients, excluding mobile clients Trends by mobility of users Sampling: object-based sampling might not represent realistic workload Impact of caching done by Akamai CDN Correlating requests method is not perfect Latency issue Evaluation mainly focuses on hit ratio & traffic sheltering, not latency Latency of collaborative caching is note evaluated
Discussions (cont.) 34 Other potential improvements: Improved caching algorithm taking into account metadata of photos Optimal placement of resizing functionality along the stack The use of Clairvoyant caching might be possible based on predicting future accesses E.g., photos from the same album, photos appear on news feed, etc. Solve geographical diversity by improving routing policy (e.g., put more weight into locality aspect)
THANK YOU! 35