By Huang et al., SOSP 2013 An Analysis of Facebook Photo Caching Presented by Phuong Nguyen Some animations and figures are borrowed from the original.

Slides:



Advertisements
Similar presentations
Finding a needle in Haystack Facebook’s Photo Storage
Advertisements

Qi Huang, Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook) An Analysis of Facebook Photo.
On the Steady-State of Cache Networks Elisha J. Rosensweig Daniel S. Menasche Jim Kurose.
Alex Cheung and Hans-Arno Jacobsen August, 14 th 2009 MIDDLEWARE SYSTEMS RESEARCH GROUP.
Availability in Globally Distributed Storage Systems
Latency-sensitive hashing for collaborative Web caching Presented by: Xin Qi Yong Yang 09/04/2002.
Workloads Experimental environment prototype real sys exec- driven sim trace- driven sim stochastic sim Live workload Benchmark applications Micro- benchmark.
What should you Cache? A Global Analysis on YouTube Related Video Caching Dilip Kumar Krishnappa, Michael Zink and Carsten Griwodz NOSSDAV 2013.
19 Historical overview Main challenge: How to distribute content in high quality over the Internet cost-effectively? • Traditional “Best-effort” model:
1 A Framework for Lazy Replication in P2P VoD Bin Cheng 1, Lex Stein 2, Hai Jin 1, Zheng Zhang 2 1 Huazhong University of Science & Technology (HUST) 2.
Improving Proxy Cache Performance: Analysis of Three Replacement Policies Dilley, J.; Arlitt, M. A journal paper of IEEE Internet Computing, Volume: 3.
1 The Content and Access Dynamics of a Busy Web Server: Findings and Implications Venkata N. Padmanabhan Microsoft Research Lili Qiu Cornell University.
Network Traffic Measurement and Modeling CSCI 780, Fall 2005.
What’s a Web Cache? Why do people use them? Web cache location Web cache purpose There are two main reasons that Web cache are used:  to reduce latency.
Web Caching Robert Grimm New York University. Before We Get Started  Interoperability testing  Type theory 101.
Internet Cache Pollution Attacks and Countermeasures Yan Gao, Leiwen Deng, Aleksandar Kuzmanovic, and Yan Chen Electrical Engineering and Computer Science.
Web Caching Robert Grimm New York University. Before We Get Started  Illustrating Results  Type Theory 101.
Squirrel: A decentralized peer- to-peer web cache Paul Burstein 10/27/2003.
The Medusa Proxy A Tool For Exploring User- Perceived Web Performance Mimika Koletsou and Geoffrey M. Voelker University of California, San Diego Proceeding.
RIPQ: Advanced Photo Caching on Flash for Facebook Linpeng Tang (Princeton) Qi Huang (Cornell & Facebook) Wyatt Lloyd (USC & Facebook) Sanjeev Kumar (Facebook)
Content Delivery Networks (CDN) Dr. Yingwu Zhu Reverse Proxy Reverse Proxy Reverse Proxy Intranet Web Cache Architecure Browser Local ISP cache L4 Switch.
Web Caching and Content Delivery. Caching for a Better Web Performance is a major concern in the Web Proxy caching is the most widely used method to improve.
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.
Distributing Content Simplifies ISP Traffic Engineering Abhigyan Sharma* Arun Venkataramani* Ramesh Sitaraman*~ *University of Massachusetts Amherst ~Akamai.
Storage Allocation in Prefetching Techniques of Web Caches D. Zeng, F. Wang, S. Ram Appeared in proceedings of ACM conference in Electronic commerce (EC’03)
+ Offline Optimal Ads Allocation in SNS Advertising Hui Miao, Peixin Gao.
A Bandwidth-aware Memory-subsystem Resource Management using Non-invasive Resource Profilers for Large CMP Systems Dimitris Kaseridis, Jeffery Stuecheli,
On the Scale and Performance of Cooperative Web Proxy Caching University of Washington Alec Wolman, Geoff Voelker, Nitin Sharma, Neal Cardwell, Anna Karlin,
By : Garima Indurkhya Jay Parikh Shraddha Herlekar Vikrant Naik.
Web Cache Replacement Policies: Properties, Limitations and Implications Fabrício Benevenuto, Fernando Duarte, Virgílio Almeida, Jussara Almeida Computer.
Healing the Web: An Overview of CoDeeN & Related Projects Vivek Pai, Larry Peterson + many others Princeton University.
Evaluation Methods and Challenges. 2 Deepak Agarwal & Bee-Chung ICML’11 Evaluation Methods Ideal method –Experimental Design: Run side-by-side.
1 On the Placement of Web Server Replicas Lili Qiu, Microsoft Research Venkata N. Padmanabhan, Microsoft Research Geoffrey M. Voelker, UCSD IEEE INFOCOM’2001,
Web Caching and Content Distribution: A View From the Interior Syam Gadde Jeff Chase Duke University Michael Rabinovich AT&T Labs - Research.
Aditya Akella The Performance Benefits of Multihoming Aditya Akella CMU With Bruce Maggs, Srini Seshan, Anees Shaikh and Ramesh Sitaraman.
Microsoft Research1 Characterizing Alert and Browse Services for Mobile Clients Atul Adya, Victor Bahl, Lili Qiu Microsoft Research USENIX Annual Technical.
1 On the Placement of Web Server Replicas Lili Qiu, Microsoft Research Venkata N. Padmanabhan, Microsoft Research Geoffrey M. Voelker, UCSD IEEE INFOCOM’2001,
Optimal Content Delivery with Network Coding Derek Leong, Tracey Ho California Institute of Technology Rebecca Cathey BAE Systems CISS 2009 March 19, 2009.
Measuring and Evaluating Large-Scale CDNs Huang et al. An offensive analysis Daniel Burgener John Otto F'09 - EECS 440 Adv Net - 7 Oct 2009.
CONTENT DELIVERY NETWORKS
PROP: A Scalable and Reliable P2P Assisted Proxy Streaming System Computer Science Department College of William and Mary Lei Guo, Songqing Chen, and Xiaodong.
Embedded System Lab. 정범종 A_DRM: Architecture-aware Distributed Resource Management of Virtualized Clusters H. Wang et al. VEE, 2015.
Parallel Crawlers Efficient URL Caching for World Wide Web Crawling Presenter Sawood Alam AND.
Serving Photos at Scaaale: Caching and Storage An Analysis of Facebook Photo Caching. Huang et al. Finding a Needle in a Haystack. Beaver et al. Vlad Niculae.
Overlay Networks : An Akamai Perspective
CSE791 COURSE PRESENTATION QIUWEN CHEN Workload-aware Storage.
Web Prefetching Lili Qiu Microsoft Research March 27, 2003.
Content Delivery Networks: Status and Trends Speaker: Shao-Fen Chou Advisor: Dr. Ho-Ting Wu 5/8/
Video Caching in Radio Access network: Impact on Delay and Capacity
On the Placement of Web Server Replicas Yu Cai. Paper On the Placement of Web Server Replicas Lili Qiu, Venkata N. Padmanabhan, Geoffrey M. Voelker Infocom.
On the scale and performance of cooperative Web proxy caching 2/3/06.
© 2003, Carla Ellis Strong Inference J. Pratt Progress in science advances by excluding among alternate hypotheses. Experiments should be designed to disprove.
LIOProf: Exposing Lustre File System Behavior for I/O Middleware
Whole Page Performance Leeann Bent and Geoffrey M. Voelker University of California, San Diego.
Anonymity in Peer-assisted CDNs: Inference Attacks and Mitigation Yaoqi Jia, Guangdong Bai, Prateek Saxena, and Zhenkai Liang National University of Singapore.
Popularity Prediction of Facebook Videos for Higher Quality Streaming
Finding a needle in Haystack: Facebook’s photo storage OSDI 2010
Wonkwang Shin, Byoung-Yoon Min and Dong Ku Kim
Measurement-based Design
Notes Onur Ascigil, Vasilis Sourlas, Ioannis Psaras, and George Pavlou
Finding a needle in Haystack: Facebook’s photo storage OSDI 2010
Finding a Needle in Haystack : Facebook’s Photo storage
Steve Ko Computer Sciences and Engineering University at Buffalo
Summary Presented by : Aishwarya Deep Shukla
An Analysis of Facebook photo Caching
Memory Management for Scalable Web Data Servers
Steve Ko Computer Sciences and Engineering University at Buffalo
Edge computing (1) Content Distribution Networks
Qingbo Zhu, Asim Shankar and Yuanyuan Zhou
It Followed Me Home: Exploring Strong Last Hop Devices and CDNs
Presentation transcript:

by Huang et al., SOSP 2013 An Analysis of Facebook Photo Caching Presented by Phuong Nguyen Some animations and figures are borrowed from the original paper and presentation

Photos on Facebook: Overview Profile Feed Album billion photos, as of Sep 2013

Photos on Facebook: Overview 3 Storage Backend FB Cache Layers Full-stack Study Akamai CDN

FACEBOOK PHOTO CACHING: HOW IT WORKS? 4

Client-based Browser Cache Client Browser Cache Client 5 Local Fetch

Geo-distributed Edge Cache (FIFO) Edge Cache (Tens) Browser Cache Client PoP (Millions) 6

Single Global Origin Cache (FIFO) Browser Cache Edge Cache Origin Cache PoP ClientData Center (Tens)(Millions)(Four) 7 Hash(url)

Haystack Backend Backend (Haystack) Browser Cache Edge Cache Origin Cache PoP ClientData Center (Tens)(Millions)(Four) 8

FULL-STACK CACHE STUDY: DATA COLLECTION 9

Objective: collecting a representative sample that could permits correlation of events related to the same request Trace Collection Instrumentation Scope Backend (Haystack) Browser Cache Edge Cache Origin Cache PoP ClientData Center 10

Sampling Strategies Request-based: sampling requests randomly Bias on popular content Objected-based: focused on some subset of photos selected by a deterministic test on photoId Fair coverage of unpopular photos Cross stack analysis 11

WORKLOAD ANALYSIS 12

Analysis Objectives Traffic sheltering effects of caches Photo popularity distribution Geographic traffic distribution & collaborative caching Can we make the cache better? Impact of sizes & algorithm Could we know which photos to cache? 13

ANALYSIS: TRAFFIC SHELTERING 14

Traffic Sheltering 77.2M 26.6M 11.2M 7.6M Backend (Haystack) Browser Cache Edge Cache Origin Cache PoP ClientData Center 65.5% 58.0% 31.8% R Traffic Share 65.5%20.0%4.6% 9.9% 15

ANALYSIS: PHOTO POPULARITY IMPACT 16

Popularity Distribution Skewness is reduced after layers of cache 17

Popularity Impact on Caches 18

ANALYSIS: GEOGRAPHIC TRAFFIC DISTRIBUTION & COLLABORATIVE CACHING 19

Substantial Remote Traffic at Edge 20 Atlanta 20% local Miami 35% local Dallas 50% local Chicago 60% local LA 18% local NYC 35% local

Substantial Remote Traffic at Edge 21 Atlanta 20% local 5% Dallas 35% D.C. 5% NYC 20% Miami 5% California 10% Chicago Atlanta has 80% requests served by remote Edges

Collaborative Edge 22

Impact of Using Collaborative Edge Collaborative Edge increases hit ratio by 18% 18% 23 Collaborative

ANALYSIS: IMPACTS OF CACHE SIZE & ALGORITHM 24

Potential Improvement Study Methodology: cache simulation Replay the trace (25% warm up) Evaluate using remaining 75% Improvement factors: Cache size Caching algorithm Evaluation metric: hit ratio 25

Edge Cache with Different Sizes & Algorithms Infinite Cache 26 The same hit ratio can be achieved with a smaller cache and higher-performing algorithms

Edge Cache with Different Sizes & Algorithms Infinite Cache 27 Sophisticated algorithm can achieve better hit ratio with the same cache size

ANALYSIS: WHICH PHOTOS TO CACHE? 28

Intuitions Properties that intuitively associated with photo traffic: The age of photos The number of Facebook followers associated with the owner 29

Content Age Affect Age-based cache replacement algorithm could be effective Fresh content is popular and tends to be effectively cached throughout the hierarchy 30

Social Affect The more popular photo owner is, the more likely the photo is to be accessed Browser caches tend to have lower hit ratios for popular users (“viral” effect) 31

DISCUSSIONS 32

Discussions 33 Evaluation method: Only consider desktop clients, excluding mobile clients Trends by mobility of users Sampling: object-based sampling might not represent realistic workload Impact of caching done by Akamai CDN Correlating requests method is not perfect Latency issue Evaluation mainly focuses on hit ratio & traffic sheltering, not latency Latency of collaborative caching is note evaluated

Discussions (cont.) 34 Other potential improvements: Improved caching algorithm taking into account metadata of photos Optimal placement of resizing functionality along the stack The use of Clairvoyant caching might be possible based on predicting future accesses E.g., photos from the same album, photos appear on news feed, etc. Solve geographical diversity by improving routing policy (e.g., put more weight into locality aspect)

THANK YOU! 35