Presentation is loading. Please wait.

Presentation is loading. Please wait.

Storage Systems for Managing Voluminous Data

Similar presentations


Presentation on theme: "Storage Systems for Managing Voluminous Data"— Presentation transcript:

1 Storage Systems for Managing Voluminous Data
By Manoj Krishna Panguluri Bhavik Mistry Sandeep Kasavaraju CS455: Introduction to Distributed Systems Department of Computer Science , Colorado State University

2 Importance Increase in data from various sources
To process the data, some form of storage is required. Some areas where this research is importation: Geographic Information Systems High Energy Physics Satellite Imaging

3 Problem Characterization
Information handled is in terms of Petabytes. Capacity of hard disk not increasing proportionally. Challenges : Contiguous Storage Retrieval from Distributed Storage Management of data arriving at high rates. Scalable approach

4 Trade-off Space Problem with relational model leading to NoSQL.
ACID vs BASE Categories of NoSQL Key-value Stores Column-Oriented Databases Document Store Graph Database

5 Dominant Approaches Key-Value Store : Dynamo
Highly available, store data on solid state drives. Different in terms of target requirements. Column-Oriented Databases : BigTable Offers consistency, fault tolerance and persistence. Three components : Library, Master server, Tablet server. Offers access control at column family level. Cassandra Provides high availability with no single point of failure. Aims to run on top of infrastructure of hundreds of nodes. Manages persistent state even when components fail.

6 Dominant Approaches(Cont..)
Document Store : MongoDB Provides features like aggregation, ad hoc queries, indexing etc. Stored in BSON format. Uses GridFS for storage. Applications include CERN’s LHC, UIDAI Aadhar Graph Databases : Neo4j Provides object oriented, flexible network structure. Reliable, ACID compliant, highly available and scalable. Used in software involving complex relationships like social networking.

7 Insights Gleaned Dynamo : Availability over Consistency
Feature for dynamic replication and accessing it. Column indices to store data and usage of compression for efficient storage. Operations and mechanism in Bigtable and Cassandra. Use of document as a value and feature to have different internal structure. Graph structure for storing information and concept of direct pointers.

8 Problem Space in the Future
90% of the data in the world today has been created in the last two years alone. Data growth is being driven by unstructured data and billions of large objects. Unstructured data leads to increased reliance on file storage pools, growing and increased storage administration. Research may be put into optimized storage. Companies will be looking to create custom data storage mechanisms.

9 Trade-off Space and Solutions in Future
Focus shift from retrieval time to storage Object Storage seems to be the biggest base BigData as a Service (BDaaS) Diffcult to predict exact nature of data Generic Data Stores for unstructured data Data Store of Databases Ultra Compression + Distributed Storage


Download ppt "Storage Systems for Managing Voluminous Data"

Similar presentations


Ads by Google