Steve Ko Computer Sciences and Engineering University at Buffalo

Slides:



Advertisements
Similar presentations
Finding a needle in Haystack Facebook’s Photo Storage
Advertisements

The Google File System Authors : Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung Presentation by: Vijay Kumar Chalasani 1CS5204 – Operating Systems.
The Zebra Striped Network File System Presentation by Joseph Thompson.
CSE 486/586 CSE 486/586 Distributed Systems Case Study: Facebook f4 Steve Ko Computer Sciences and Engineering University at Buffalo.
By: Chris Hayes. Facebook Today, Facebook is the most commonly used social networking site for people to connect with one another online. People of all.
The Google File System. Why? Google has lots of data –Cannot fit in traditional file system –Spans hundreds (thousands) of servers connected to (tens.
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
1 External Sorting Chapter Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing.
331 Lec20.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
BigTable A System for Distributed Structured Storage Yanen Li Department of Computer Science University of Illinois at Urbana-Champaign
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Distributed Shared Memory Steve Ko Computer Sciences and Engineering University at Buffalo.
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Serverless Network File Systems Overview by Joseph Thompson.
World Wide Web “WWW”, "Web" or "W3". World Wide Web “WWW”, "Web" or "W3"
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
Empirical Quantification of Opportunities for Content Adaptation in Web Servers Michael Gopshtein and Dror Feitelson School of Engineering and Computer.
Lecture 10 Page 1 CS 111 Summer 2013 File Systems Control Structures A file is a named collection of information Primary roles of file system: – To store.
 Introduction  Architecture NameNode, DataNodes, HDFS Client, CheckpointNode, BackupNode, Snapshots  File I/O Operations and Replica Management File.
Scalable Data Scale #2 site on the Internet (time on site) >200 billion monthly page views Over 1 million developers in 180 countries.
Access Methods File store information When it is used it is accessed & read into memory Some systems provide only one access method IBM support many access.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Consistency Steve Ko Computer Sciences and Engineering University at Buffalo.
Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung
Finding a needle in Haystack: Facebook’s photo storage OSDI 2010
Apache Ignite Data Grid Research Corey Pentasuglia.
Jonathan Walpole Computer Science Portland State University
Memory COMPUTER ARCHITECTURE
Advanced Topics in Concurrency and Reactive Programming: Case Study – Google Cluster Majeed Kassis.
Steve Ko Computer Sciences and Engineering University at Buffalo
Steve Ko Computer Sciences and Engineering University at Buffalo
CSE 486/586 Distributed Systems Distributed Hash Tables
Finding a needle in Haystack: Facebook’s photo storage OSDI 2010
CSE 486/586 Distributed Systems Consistency --- 2
Finding a Needle in Haystack : Facebook’s Photo storage
CSE 486/586 Distributed Systems Consistency --- 1
File System Structure How do I organize a disk into a file system?
Overview of SDN Controller Design
Caches II CSE 351 Spring 2017 Instructor: Ruth Anderson
An Analysis of Facebook photo Caching
CSI 400/500 Operating Systems Spring 2009
The Google File System Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung Google Presented by Jiamin Huang EECS 582 – W16.
EECE.4810/EECE.5730 Operating Systems
Chapter 11: File System Implementation
Steve Ko Computer Sciences and Engineering University at Buffalo
Data Structures and Algorithms
Filesystems 2 Adapted from slides of Hank Levy
Storage Systems for Managing Voluminous Data
Overheads for Computers as Components 2nd ed.
Chapter 5 Exploiting Memory Hierarchy : Cache Memory in CMP
Log-Structured File Systems
Chapter 2: Operating-System Structures
Log-Structured File Systems
CS 3410, Spring 2014 Computer Science Cornell University
CSC3050 – Computer Architecture
Outline Review of Quiz #1 Distributed File Systems 4/20/2019 COP5611.
Log-Structured File Systems
Lecture 13: Computer Memory
Chapter 2: Operating-System Structures
CSE 486/586 Distributed Systems Distributed Hash Tables
CSE 542: Operating Systems
Page Cache and Page Writeback
CSE 486/586 Distributed Systems Distributed File Systems
CSE 486/586 Distributed Systems Consistency --- 2
CSE 486/586 Distributed Systems Case Study: Facebook Photo Stores
CSE 486/586 Distributed Systems Consistency --- 1
Log-Structured File Systems
Virtual Memory 1 1.
Presentation transcript:

Steve Ko Computer Sciences and Engineering University at Buffalo CSE 486/586 Distributed Systems Web Content Distribution---2 Case Study: Facebook Haystack Steve Ko Computer Sciences and Engineering University at Buffalo

Recap Engineering principle Example? Make the common case fast, and rare cases correct (From Patterson & Hennessy books) This principle cuts through generations of systems. Example? CPU Cache Knowing common cases == understanding your workload E.g., read dominated? Write dominated? Mixed?

Items sorted by popularity Recap “Hot” vs. “very warm” vs. “warm” photos Hot: Popular, a lot of views Very warm: Somewhat popular, still a lot of views Warm: Unpopular, but still a lot of views in aggregate Popularity Items sorted by popularity

“Very warm” and “warm” Photos Hot photos are served by a CDN. Warm photo characteristics Not so much popular Not entirely “cold,” i.e., occasional views A lot in aggregate Does not want to cache everything in CDN due to diminishing returns Facebook stats (in their 2010 paper) 260 billion images (~20 PB) 1 billion new photos per week (~60 TB) One million image views per second at peak Approximately 10% not served by CDN, but still a lot

Popularity Comes with Age

Facebook Photo Storage Three generations of photo storage NFS-based (today) Haystack (today): Very warm photos f4 (next time): Warm photos Characteristics After-CDN storage Each generation solves a particular problem observed from the previous generation.

1st Generation: NFS-Based Browser sends an HTTP request Web server returns a CDN URL. (Akamai) If CDN has it, returns If not, retrieves from the photo store.

1st Generation: NFS-Based Each photo  single file Observed problem Thousands of files in each directory Extremely inefficient due to meta data management 10 disk operations for a single image: chained filesystem inode reads for its directory and itself & the file read In fact, a well-known problem with many files in a directory Be aware when you do this. The inode space (128 or 256 bytes) runs out. A lot of operations necessary for meta data retrieval.

CSE 486/586 Administrivia PA4 due 5/6 (Friday) Final: Thursday, 5/12, 8am – 11am at Knox 20

2nd Generation: Haystack Custom-designed photo storage What would you try? (Hint: too many files!) Starting point: One big file with many photos Reduces the number of disk operations required to one All meta data management done in memory Design focus Simplicity Something buildable within a few months Three components Directory Cache Store

Haystack Architecture Browser HTTP request Directory lookup for photos (Web server constructs these URLs) Depending on the URL, browser contacts CDN or Cache.

Haystack Directory Helps the URL construction for an image http://⟨CDN⟩/⟨Cache⟩/⟨Machine id⟩/⟨Logical volume, Photo⟩ Staged lookup CDN strips out its portion. Cache strips out its portion. Machine strips out its portion Logical & physical volumes A logical volume is replicated as multiple physical volumes Physical volumes are stored. Each volume contains multiple photos. Directory maintains this mapping

Haystack Cache Facebook-operated CDN using DHT Photo IDs as the key Further removes traffic to Store Mainly caches newly-uploaded photos High cache hit rate (due to caching new photos)

Haystack Store Maintains physical volumes One volume is a single large file (100GB) with many photos (needles) Cookie: random number embedded in the URL Key: photo id Alternate key: photo size key (small, medium, etc.) Flags: delete status Padding for 8-byte alignment

Haystack Store Metadata managed in memory Write/delete Indexing (key, alternate key) to (flags, size, volume offset) Quick lookup for both read and write Disk operation only required for actual image read Write/delete Append-only Delete is marked, later garbage-collected. Indexing For fast memory metadata construction

Daily Stats with Haystack Photos uploaded: ~120 M Haystack photos written: ~1.44 B Photos viewed: 80 – 100 B Thumbnails: 10.2% Small: 84.4% Medium: 0.2% Large: 5.2% Haystack photos read: 10 B

Summary Two different types of workload for a social networking Web service Posts: read/write Photos: write-once, read-many Photo workload Zipf distribution “Hot” photos can be handled by CDN “Warm” photos have diminishing returns. Haystack: Facebook’s 2nd generation photo storage Goal: reducing disk I/O for warm photos One large file with many photos Metadata stored in memory Internal CDN