Computer Science Storage Systems and Sensor Storage Research Overview.

Slides:



Advertisements
Similar presentations
Flash storage memory and Design Trade offs for SSD performance
Advertisements

High Performance Cluster Computing Architectures and Systems Hai Jin Internet and Cluster Computing Center.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
Ceph: A Scalable, High-Performance Distributed File System
Ceph: A Scalable, High-Performance Distributed File System Priya Bhat, Yonggang Liu, Jing Qin.
High Availability Group 08: Võ Đức Vĩnh Nguyễn Quang Vũ
1 Cheriton School of Computer Science 2 Department of Computer Science RemusDB: Transparent High Availability for Database Systems Umar Farooq Minhas 1,
CMPT 300: Final Review Chapters 8 – Memory Management: Ch. 8, 9 Address spaces Logical (virtual): generated by the CPU Physical: seen by the memory.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Contiki A Lightweight and Flexible Operating System for Tiny Networked Sensors Presented by: Jeremy Schiff.
1 Rethinking Data Management for Storage-centric Sensor Networks Yanlei Diao, Deepak Ganesan, Gaurav Mathur, and Prashant Shenoy CIDR 2007 Proceedings.
Based on last years lecture notes, used by Juha Takkinen.
Chapter 13 Embedded Systems
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Re-thinking Data Management for Storage-Centric Sensor Networks Deepak Ganesan University.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Virtualization in Data Centers Prashant Shenoy
Nikolay Tomitov Technical Trainer SoftAcad.bg.  What are Amazon Web services (AWS) ?  What’s cool when developing with AWS ?  Architecture of AWS 
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Hyperion: High Volume Stream Archival for Restrospective Querying Peter Desnoyers.
Module 14: Scalability and High Availability. Overview Key high availability features available in Oracle and SQL Server Key scalability features available.
Copyright © 2012 Cleversafe, Inc. All rights reserved. 1 Combining the Power of Hadoop with Object-Based Dispersed Storage.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Interposed Request Routing for Scalable Network Storage Darrell Anderson, Jeff Chase, and Amin Vahdat Department of Computer Science Duke University.
Data Center Infrastructure
SensIT PI Meeting, January 15-17, Self-Organizing Sensor Networks: Efficient Distributed Mechanisms Alvin S. Lim Computer Science and Software Engineering.
Module 12: Designing High Availability in Windows Server ® 2008.
RAMCloud: Concept and Challenges John Ousterhout Stanford University.
RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University (Joint work with Diego Ongaro, Ryan Stutsman, Steve Rumble, Mendel.
Disk Structure Disk drives are addressed as large one- dimensional arrays of logical blocks, where the logical block is the smallest unit of transfer.
Meeting the Data Protection Demands of a 24x7 Economy Steve Morihiro VP, Programs & Technology Quantum Storage Solutions Group
Presented by CH.Anusha.  Apache Hadoop framework  HDFS and MapReduce  Hadoop distributed file system  JobTracker and TaskTracker  Apache Hadoop NextGen.
Logging in Flash-based Database Systems Lu Zeping
Transparency in Distributed Operating Systems Vijay Akkineni.
CSE 451: Operating Systems Section 10 Project 3 wrap-up, final exam review.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Re-thinking Data Management for Storage-Centric Sensor Networks Deepak Ganesan University.
Mobile Relay Configuration in Data-Intensive Wireless Sensor Networks.
Page 110/12/2015 CSE 30341: Operating Systems Principles Network-Attached Storage  Network-attached storage (NAS) is storage made available over a network.
Massively Distributed Database Systems - Distributed DBS Spring 2014 Ki-Joune Li Pusan National University.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
DISTRIBUTED COMPUTING Introduction Dr. Yingwu Zhu.
Investigating Survivability Strategies for Ultra-Large Scale (ULS) Systems Vanderbilt University Nashville, Tennessee Institute for Software Integrated.
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.
Ceph: A Scalable, High-Performance Distributed File System
INNOV-10 Progress® Event Engine™ Technical Overview Prashant Thumma Principal Software Engineer.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
Efficient Live Checkpointing Mechanisms for computation and memory-intensive VMs in a data center Kasidit Chanchio Vasabilab Dept of Computer Science,
1 MSRBot Web Crawler Dennis Fetterly Microsoft Research Silicon Valley Lab © Microsoft Corporation.
Copyright © 2006, GemStone Systems Inc. All Rights Reserved. Increasing computation throughput with Grid Data Caching Jags Ramnarayan Chief Architect GemStone.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
 Introduction  Architecture NameNode, DataNodes, HDFS Client, CheckpointNode, BackupNode, Snapshots  File I/O Operations and Replica Management File.
Hyperion :High Volume Stream Archival Divya Muthukumaran.
1 Efficient Mixed-Platform Clouds Phillip B. Gibbons, Intel Labs Michael Kaminsky, Michael Kozuch, Padmanabhan Pillai (Intel Labs) Gregory Ganger, David.
Features Scalability Manage Services Deliver Features Faster Create Business Value Availability Latency Lifecycle Data Integrity Portability.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Modul ke: Fakultas Program Studi Teknologi Pusat Data 13 FASILKOM Teknik Informatika Infrastruktur Pusat Data.
SQL Server 2012 Session: 1 Session: 4 SQL Azure Data Management Using Microsoft SQL Server.
SEDA. How We Got Here On Tuesday we were talking about Multics and Unix. Fast forward years. How has the OS (e.g., Linux) changed? Some of Multics.
Cluster computing. 1.What is cluster computing? 2.Need of cluster computing. 3.Architecture 4.Applications of cluster computing 5.Advantages of cluster.
Next Generation of Apache Hadoop MapReduce Owen
Distributed Systems Architecure. Architectures Architectural Styles Software Architectures Architectures versus Middleware Self-management in distributed.
BIG DATA/ Hadoop Interview Questions.
Gorilla: A Fast, Scalable, In-Memory Time Series Database
Niosha Behnam CMPE 259 – Fall  Real-time data availability is not required for all sensor networks.  Robust disconnected operation is a needed.
Design and Implementation of a High- Performance Distributed Web Crawler Vladislav Shkapenyuk, Torsten Suel 실시간 연구실 문인철
Overview Continuation from Monday (File system implementation)
How Yahoo! use to serve millions of videos from its video library.
THE GOOGLE FILE SYSTEM.
Disk Scheduling The operating system is responsible for using hardware efficiently — for the disk drives, this means having a fast access time and disk.
Presentation transcript:

Computer Science Storage Systems and Sensor Storage Research Overview

Computer Science Storage Research Overview Hyperion –High volume stream archival system Bandwidth efficient data migration in enterprise storage systems Use of flash-storage in data centers

Computer Science Hyperion Stream Store Streaming data common in environments such as network monitoring, system monitoring, sensors, RFID –Archive data for retrospective querying, forensics Hyperion: high volume stream archival for distributed network monitoring –Gigabit link: 250K packets per second –Archive and index in real-time, while supporting interactive querying –Neither commodity rdbms nor general-purpose file systems suitable [Usenix 2007]

Computer Science Hyperion Design Multiple monitor nodes, each monitoring multiple network links StreamFS: high-performance stream file system Local index: multi-level signature index based on bloom filters Distributed index for querying multiple nodes Can scale to million pkts/s with StreamFs and 200K pkts/s indexing per core on a commodity multi-core PC Monitor/ capture StreamFS Signature index Distributed index Hyperion node

Computer Science Online Data Migration Enterprise storage systems: multiple volumes mapped onto each array –Load imbalances and hotspots can occur Goal: automatically resolve hotspots on volumes in large storage systems Focus: minimize migration cost (bytes migrated to resolve hotspot) Bandwidth-to-space ratio algorithm –Displace and swap of volumes –Implemented in Linux lvm [ICAC 06]

Computer Science Semantic-aware Replication Replication for disaster recovery: synchronous replication for tight recovery point objectives –Latency increases with geographic separation –Use of intermediary does not improve consistency –Too stringent for certain applications Semantic-aware replication: hybrid approach –Use synchronous replication for “important” writes –Use asynchronous replication for other writes –Automatically infer which mode to use for each request –Transparent to applications

Computer Science Flash-storage in Data Centers Flash-based storage becoming popular –Higher performance but also higher cost than disk drives How can flash storage be exploited in data centers? Use flash drives as an accelerator between disk storage and servers –Focus on video storage where performance is key Exploit flash disk as non-volatile storage in servers –Fast hibernate / resume => efficient power management in data centers

Computer Science Sensor Storage Overview Flash memory becoming extremely energy- efficient Exploit flash memory trends to design more efficient in-network sensor storage and querying systems –Capsule: flash-based object storage system –STONES: storage-centric sensor networks CC1000 CC2420 Telos STM NOR Atmel NOR Communication Storage Micron NAND 128MB Energy Cost (uJ/byte) Generation of Sensor Platform

Computer Science Capsule Overview Object-based storage abstraction Energy and memory optimized library of objects Checkpointing and rollback for failure recovery Storage reclamation to deal with finite storage capacity Portable to NAND/NOR flash memories and different sensor platforms [SenSys 06]

Computer Science StonesDB Overview Query Engine Partitioned Access Methods StonesDB: flash memory- optimized archival data management architecture that supports sensor data storage, indexing, and aging of data. [CIDR 07]

Computer Science Extra Slides

Computer Science Mapping App Data Needs to Storage Debug logs Data Archival & Indexing Signal Processing Packet Queue Map application data structures to Capsule objects that offer efficient flash implementation Calibration Tables ? ? Pages on Flash Data Processing Queue Array Stream Stack File Index

Computer Science Local Data Management Stack

Computer Science Distributed Data Management Stack

Computer Science STONES Design an archival data management architecture that: –Supports energy-efficient sensor data storage, indexing, and aging by optimizing for flash memories. –Supports energy-efficient processing of SQL-type queries, as well as data mining and search queries. –Is configurable to heterogeneous sensor platforms with different memory and processing constraints.

Computer Science Technology Trends in Storage Generation of Sensor Platform CC1000 CC2420 Telos STM NOR Atmel NOR Communication Storage Micron NAND 128MB Energy Cost (uJ/byte)