Download presentation
Presentation is loading. Please wait.
1
Storage Systems for Managing Voluminous Data
CS455: Introduction To Distributed Systems Meghana Santhapur Pratyusha Reddy Degapudi Rasika Warade Department Of Computer Science
2
WHY IS THIS PROBLEM IMPORTANT ?
Big Data universe beginning to explode Store and manage large volumes of data efficiently Selection of a storage system for a particular use case So where do we see this explosion? source:
3
PROBLEM CHARACTERIZATION
April 2009 Current Total 15 billion photos 60 billion images 1.5 PB 65 billion photos 260 billion images 20 PB Upload rate 220 million photos per week 25 TB 1 billion photos per week 60 TB Server rate 550,000 images per sec 1 million images per sec Photo storage systems Block storage? File storage? Object storage? Source: Facebook Engineering research group
4
TRADE_OFF SPACE FOR SOLUTIONS
Thousands of files in each directory with more than 10 disk operations Directory size reduced to store hundreds of images with 3 disk operations to maintain File handles cached in Photo servers File handles of every image in memcache No decrease in caches and disk operations giving Overhead on Metadata OBJECT STORAGE
5
FACEBOOK HAYSTACK STORE LAYOUT
DOMINANT APPROACHES Can you find a needle in a haystack? FACEBOOK HAYSTACK STORE LAYOUT It maintains an incore index for all photos This eliminated unneccesary metadata Source: Facebook Engineering research group
6
DOMINANT APPROACHES (Contd…)
MObStor/DORA HBase Dora provides backend service to MObStor Elimination of Metadata from Object Storage Use of data locators Top of Hadoop framework Vector data is converted to Well Known Binary and Well Known Text Can handle spatial images
7
Object Storage Haystack Store Mobstor/DORA INSIGHTS GLEANED
Ignores the file system and puts everything in a bucket Lot faster than file systems Performance does not degrade as the cluster grows Object Storage Old NFS infrastructure was replaced due to more file system metadata Allows storage of multiple photos in single file with less metadata Haystack Store Supported high request rates for operational storage Added features to do object storage on cheaper systems Mobstor/DORA
8
WHAT THE PROBLEM SPACE IN FUTURE WOULD LOOK LIKE
9
TRADE_OFF SPACE AND SOLUTIONS IN THE FUTURE
Currently, Object storage is the Smartest Solution for voluminous data Scaling in a less expensive way, suggests open source programs Storage Systems supporting both object and block storage, or may be file storage altogether Metadata can be further reduced using dynamic data structure to locate servers responsible for data Which is the World’s largest Haystack possible?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.