Storage Systems for Managing Voluminous Data

Slides:



Advertisements
Similar presentations
Large Scale Computing Systems
Advertisements

Data Management in the Cloud Paul Szerlip. The rise of data Think about this o For the past two decades, the largest generator of data was humans -- now.
Cassandra Database Project Alireza Haghdoost, Jake Moroshek Computer Science and Engineering University of Minnesota-Twin Cities Nov. 17, 2011 News Presentation:
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky.
Bigtable: A Distributed Storage System for Structured Data F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach M. Burrows, T. Chandra, A. Fikes, R.E.
Systems analysis and design, 6th edition Dennis, wixom, and roth
Getting Biologists off ACID Ryan Verdon 3/13/12. Outline Thesis Idea Specific database Effects of losing ACID What is a NoSQL database Types of NoSQL.
Apache Cassandra - Distributed Database Management System Presented by Jayesh Kawli.
NoSQL Databases Oracle - Berkeley DB Rasanjalee DM Smriti J CSC 8711 Instructor: Dr. Raj Sunderraman.
NoSQL Databases Oracle - Berkeley DB. Content A brief intro to NoSQL About Berkeley Db About our application.
1 Dennis Kafura – CS5204 – Operating Systems Big Table: Distributed Storage System For Structured Data Sergejs Melderis 1.
(C) 2008 Clusterpoint(C) 2008 ClusterPoint Ltd. Empowering You to Manage and Drive Down Database Costs April 17, 2009 Gints Ernestsons, CEO © 2009 Clusterpoint.
NOSQL Implementation and examples Maciej Matuszewski.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, From SQL to NoSQL Xiao Yu Mar 2012.
Big Data Yuan Xue CS 292 Special topics on.
Department of Computer Science, Johns Hopkins University EN Instructor: Randal Burns 24 September 2013 NoSQL Data Models and Systems.
Group members: Phạm Hoàng Long Nguyễn Huy Hùng Lê Minh Hiếu Phan Thị Thanh Thảo Nguyễn Đức Trí 1 BIG DATA & NoSQL Topic 1:
BIG DATA/ Hadoop Interview Questions.
Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,
Grid Services for Digital Archive Tao-Sheng Chen Academia Sinica Computing Centre
Why NO-SQL ?  Three interrelated megatrends  Big Data  Big Users  Cloud Computing are driving the adoption of NoSQL technology.
From RDBMS to Hadoop A case study Mihaly Berekmeri School of Computer Science University of Manchester Data Science Club, 14th July 2016 Hayden Clark,
Neo4j: GRAPH DATABASE 27 March, 2017
Data Analytics 1 - THE HISTORY AND CONCEPTS OF DATA ANALYTICS
rain technology (redundant array of independent nodes)
CS 405G: Introduction to Database Systems
NO SQL for SQL DBA Dilip Nayak & Dan Hess.
NoSQL: Graph Databases
DBSI Teaser Presentation
and Big Data Storage Systems
BigData - NoSQL Hadoop - Couchbase
CSE 775 – Distributed Objects Bekir Turkkan & Habib Kaya
Advanced Topics in Concurrency and Reactive Programming: Case Study – Google Cluster Majeed Kassis.
CS122B: Projects in Databases and Web Applications Winter 2017
A free and open-source distributed NoSQL database
Based on: NoSQL Databases Based on:
MongoDB Er. Shiva K. Shrestha ME Computer, NCIT
Every Good Graph Starts With
NoSQL Database and Application
CLUSTER COMPUTING Presented By, Navaneeth.C.Mouly 1AY05IS037
NOSQL.
Christian Stark and Odbayar Badamjav
NOSQL databases and Big Data Storage Systems
Hadoop and NoSQL at Thomson Reuters
Failure recovery and Checkpointing in Distributed Systems
Replication and Availability in Distributed systems
Consistency in Distributed Systems
NoSQL Systems Overview (as of November 2011).
1 Demand of your DB is changing Presented By: Ashwani Kumar
NOSQL and CAP Theorem.
Storage Systems for Managing Voluminous Data
3 Cloud Computing.
Distributed File Systems
NoSQL Databases Antonino Virgillito.
Big Data Young Lee BUS 550.
Naman shah Harshil shah Priyank BambhrOLIA
Process Migration Troy Cogburn and Gilbert Podell-Blume
Replication and Availability in Distributed Systems
CONSISTENCY IN DISTRIBUTED SYSTEMS
Cloud Computing for Data Analysis Pig|Hive|Hbase|Zookeeper
Introduction to NoSQL Database Systems
Introduction To Distributed Systems
NoSQL databases An introduction and comparison between Mongodb and Mysql document store.
DBMS Physical Design Physical design is concerned with the placement of data and selection of access methods for efficiency and ongoing maintenance.
Distributed Graph Algorithms
Advanced Geospatial Techniques: Aiding Earth Observation Applications
Working with GEOLocation Data
Copyright © JanBask Training. All rights reserved Get Started with Hadoop Hive HiveQL Languages.
Presentation transcript:

Storage Systems for Managing Voluminous Data By Manoj Krishna Panguluri Bhavik Mistry Sandeep Kasavaraju CS455: Introduction to Distributed Systems Department of Computer Science , Colorado State University

Importance Increase in data from various sources To process the data, some form of storage is required. Some areas where this research is importation: Geographic Information Systems High Energy Physics Satellite Imaging

Problem Characterization Information handled is in terms of Petabytes. Capacity of hard disk not increasing proportionally. Challenges : Contiguous Storage Retrieval from Distributed Storage Management of data arriving at high rates. Scalable approach

Trade-off Space Problem with relational model leading to NoSQL. ACID vs BASE Categories of NoSQL Key-value Stores Column-Oriented Databases Document Store Graph Database

Dominant Approaches Key-Value Store : Dynamo Highly available, store data on solid state drives. Different in terms of target requirements. Column-Oriented Databases : BigTable Offers consistency, fault tolerance and persistence. Three components : Library, Master server, Tablet server. Offers access control at column family level. Cassandra Provides high availability with no single point of failure. Aims to run on top of infrastructure of hundreds of nodes. Manages persistent state even when components fail.

Dominant Approaches(Cont..) Document Store : MongoDB Provides features like aggregation, ad hoc queries, indexing etc. Stored in BSON format. Uses GridFS for storage. Applications include CERN’s LHC, UIDAI Aadhar Graph Databases : Neo4j Provides object oriented, flexible network structure. Reliable, ACID compliant, highly available and scalable. Used in software involving complex relationships like social networking.

Insights Gleaned Dynamo : Availability over Consistency Feature for dynamic replication and accessing it. Column indices to store data and usage of compression for efficient storage. Operations and mechanism in Bigtable and Cassandra. Use of document as a value and feature to have different internal structure. Graph structure for storing information and concept of direct pointers.

Problem Space in the Future 90% of the data in the world today has been created in the last two years alone. Data growth is being driven by unstructured data and billions of large objects. Unstructured data leads to increased reliance on file storage pools, growing and increased storage administration. Research may be put into optimized storage. Companies will be looking to create custom data storage mechanisms.

Trade-off Space and Solutions in Future Focus shift from retrieval time to storage Object Storage seems to be the biggest base BigData as a Service (BDaaS) Diffcult to predict exact nature of data Generic Data Stores for unstructured data Data Store of Databases Ultra Compression + Distributed Storage