Download presentation
Presentation is loading. Please wait.
Published byStacy Windham Modified over 10 years ago
1
Finding a needle in Haystack Facebook’s Photo Storage
Shakthi Bachala
2
Outline Scenario Goal Problem Previous Approach Current Approach
Evaluation Advantages Critic Conclusion
3
Scenario : April 2009 October 2011 Total 15 billion Photos
4*15 billion images= 60 billion images 1.5 petabytes of data 65 billion Photos 4*65 billion images = 260 billion images 20 petabytes of data Upload Rate 220 million photos / week 25 terabytes of data 1 billion photos / week 60 terabytes of data Serving Rate 550,000 images / sec 1 million images / sec
4
Goal High throughput and low latency Fault-tolerant Cost-effective
Simple
5
Previous Approach : Typical design for Photo Sharing
6
Previous Approach : NFS based design for Photo Sharing at facebook
7
Previous Approach – NFS based design
Traditional file system architecture performs poorly under Facebook's kind of workload NFS - based Design: CDN effectively serves the hottest photos (profile pictures and recently updated photos), but facebook also generates a lot of requests for less popular images (long tail images). These are not handled by CDN Normal website had 99% CDN hit rate but facebook had around 80% CDN hit rate
8
Long Tail Issue
9
Previous Approach cont..
Problems with that approach were: Wastage of storage capacity due to metadata Large metadata per file Each image stored as a file Large number of disk operations for reads Because of large directories (large directories containing thousands of files) Change of the directory structures and changing from large directories to small directories has brought down the iops approximately from 10 to
10
Current Approach – Haystack Architecture
11
Current Approach- Haystack Components
The main components of Haystack architecture are: Haystack Directory Haystack Cache Haystack Store
12
Current Approach- Haystack Directory
The main goals of directory are: Map logical volumes to physical volumes 3 Physical volumes( on 3 nodes) per one logical volume Load balance Writes across logical volumes Reads across physical volumes (any of the 3 stores) Caching strategy: Whether the photo request should be handled by the CDN or by the cache URL generation volume id, Image id> The directory would Identify the logical volumes that are read only either because of operational reason or because those volumes have reached their storage capacity
13
Current Approach- Haystack Cache
The Cache receives HTTP requests for photos from browser or CDNs It is a distributed hash table with photo id as the key to locate the cached data If the photo id is missing in cache , the cache fetches the data from photo server and replies it to the browser or CDN depending on the request
14
Current Approach- Haystack Cache
Caches a photo if it satisfies the following two conditions: The request directly come from a user and instead of CDN Facebook’s experience with the NFS-based design showed post-CDN caching is ineffective as it is unlikely that a request misses in the CDN would hit in our internal cache The photos is fetched by the write enabled store Photos are most heavily accessed soon after they are uploaded File systems generally work better when doing either writes or reads but not both
15
Current Approach- Haystack Cache Hit Rate
16
Current Approach : Haystack Store
Replaces the storage and photo server layer in NFS based Design with this structure:
17
Current Approach : Haystack Store
Storage : 12x 1TB SATA, RAID6 Filesytem: Single approx. 10 TB xfs filesystem. Haystack: Log structured , append only object store containing needles as object abstractions 100 haystacks per node each 100GB in size
18
Current Approach: Haystack Store File
19
Current Approach: Operations in Haystack
Photo Read Look up offset /size of the image in the incore index Read Data (approx. 1 iop) Photo Write Asynchronously append images one by one to the haystack file Next haystack file when becomes full Asynchronously append index records to the index file Flush index file if too many dirty index records Update incore index
20
Current Approach: Operations in Haystack
Photo Delete Lookup offset of the image in the incore index Mark the image needle flag as “DELETED” Update incore index Index File: Provides minimum metadata to locate the needle in the Haystack store Subset of Header metadata
21
Current Approach: Haystack Index File
22
Haystack Based Design - Photo Upload
23
Haystack Based Design - Photo Download
24
Current Approach: Operations in Haystack
Filesystem: Haystack uses XFS, an extent based file system It has two main advantages: The block maps for several contiguous large files can be small enough to be stored in the main memory XFS provides efficient file pre allocation, mitigating fragmentation and reigning in how large block maps can grow
25
Current Approach: Haystack Optimization
Compaction: Infrequent online operation Create a copy of haystack skipping duplicates and deleted photos The patterns of deletes to photo views, young photos are a lot more likely to be deleted Last year about 25% of the photos got deleted
26
Current Approach: Haystack Optimization
Saving More Memory: With the following two techniques store machines reduced their main memory footprints by 20% Eliminate the need for an in-memory representation of flags by setting the offset to be 0 for deleted photos. Store machine do not keep track of cookie values in main memory and instead check the supplied cookie after reading from the disk
27
Current Approach: Haystack Optimization
Batch Uploads: Disks perform better with large sequential writes instead of small random writes, so facebook uses batch uploads whenever possible Many users upload entire albums to facebook instead of each picture which gives an opportunity to batch the uploads
28
Evaluation -Data
29
Evaluation – Production Workload
30
Advantages Simple design
Decrease number of disk operations by reducing the average metadata per photo This system is robust enough to handle a very large amount of data Fault Tolerant
31
Critic I thought this approach is very facebook specific . Any other?
32
Conclusion Built a simple but robust data storage mechanism for facebook photo storage to accommodate long tail of photo requests which was not possible by previous approaches
33
References http://www.usenix.org/event/osdi10/tech/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.