Hyperion :High Volume Stream Archival Divya Muthukumaran.

Slides:



Advertisements
Similar presentations
Storing Data: Disk Organization and I/O
Advertisements

More on File Management
Introduction to Database Systems1 Records and Files Storage Technology: Topic 3.
NETWORK LAYER (1) T.Najah AlSubaie Kingdom of Saudi Arabia Prince Norah bint Abdul Rahman University College of Computer Since and Information System NET331.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Chapter 11 Indexing and Hashing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
File Processing : Hash 2015, Spring Pusan National University Ki-Joune Li.
Precept 6 Hashing & Partitioning 1 Peng Sun. Server Load Balancing Balance load across servers Normal techniques: Round-robin? 2.
11-May-15CSE 542: Operating Systems1 File system trace papers The Zebra striped network file system. Hartman, J. H. and Ousterhout, J. K. SOSP '93. (ACM.
Log-Structured Memory for DRAM-Based Storage Stephen Rumble, Ankita Kejriwal, and John Ousterhout Stanford University.
Modern Information Retrieval
BTrees & Bitmap Indexes
1 Overview of Storage and Indexing Chapter 8 (part 1)
P2P: Advanced Topics Filesystems over DHTs and P2P research Vyas Sekar.
Quick Review of material covered Apr 8 B+-Tree Overview and some definitions –balanced tree –multi-level –reorganizes itself on insertion and deletion.
1.1 CAS CS 460/660 Introduction to Database Systems File Organization Slides from UC Berkeley.
1 Overview of Storage and Indexing Chapter 8 1. Basics about file management 2. Introduction to indexing 3. First glimpse at indices and workloads.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Hyperion: High Volume Stream Archival for Restrospective Querying Peter Desnoyers.
NFS. The Sun Network File System (NFS) An implementation and a specification of a software system for accessing remote files across LANs. The implementation.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts Amherst Operating Systems CMPSCI 377 Lecture.
Check Disk. Disk Defragmenter Using Disk Defragmenter Effectively Run Disk Defragmenter when the computer will receive the least usage. Educate users.
File Systems (2). Readings r Silbershatz et al: 11.8.
Layers of a DBMS Query optimization Execution engine Files and access methods Buffer management Disk space management Query Processor Query execution plan.
Computer Measurement Group, India Reliable and Scalable Data Streaming in Multi-Hop Architecture Sudhir Sangra, BMC Software Lalit.
Computers Are Your Future Tenth Edition Chapter 12: Databases & Information Systems Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall1.
1 Multi-Protocol Label Switching (MPLS). 2 MPLS Overview A forwarding scheme designed to speed up IP packet forwarding (RFC 3031) Idea: use a fixed length.
File Systems and Disk Management. File system Interface between applications and the mass storage/devices Provide abstraction for the mass storage and.
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
1 Adapted from Pearson Prentice Hall Adapted form James A. Senn’s Information Technology, 3 rd Edition Chapter 7 Enterprise Databases and Data Warehouses.
14-15 May,2002 EVLA Correlator Backend Functional Design Tom Morgan 1 Backend Preliminary Functional Design.
Lec 3: Infrastructure of Network Management Part2 Organized by: Nada Alhirabi NET 311.
March 16 & 21, Csci 2111: Data and File Structures Week 9, Lectures 1 & 2 Indexed Sequential File Access and Prefix B+ Trees.
Chapter 6 – Connectivity Devices
Authors: Haowei Yuan, Tian Song, and Patrick Crowley Publisher: ICCCN 2012 Presenter: Chai-Yi Chu Date: 2013/05/22 1.
1 Overview of Storage and Indexing Chapter 8 (part 1)
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Serverless Network File Systems Overview by Joseph Thompson.
Jennifer Rexford Princeton University MW 11:00am-12:20pm Measurement COS 597E: Software Defined Networking.
Storage Structures. Memory Hierarchies Primary Storage –Registers –Cache memory –RAM Secondary Storage –Magnetic disks –Magnetic tape –CDROM (read-only.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
McGraw-Hill©The McGraw-Hill Companies, Inc., 2004 Connecting Devices CORPORATE INSTITUTE OF SCIENCE & TECHNOLOGY, BHOPAL Department of Electronics and.
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
CIS 250 Advanced Computer Applications Database Management Systems.
CS 540 Database Management Systems
Chapter 5 Record Storage and Primary File Organizations
1 CSCE 520 Test 2 Info Indexing Modified from slides of Hector Garcia-Molina and Jeff Ullman.
W4118 Operating Systems Instructor: Junfeng Yang.
What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently and safely. Provide.
( ) 1 Chapter # 8 How Data is stored DATABASE.
File Systems and Disk Management
File-System Management
CS222: Principles of Data Management Lecture #4 Catalogs, Buffer Manager, File Organizations Instructor: Chen Li.
CS522 Advanced database Systems
Updating SF-Tree Speaker: Ho Wai Shing.
Distributed File Systems
Google File System CSE 454 From paper by Ghemawat, Gobioff & Leung.
File Systems and Disk Management
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Introduction to Database Systems
File Systems and Disk Management
CS222/CS122C: Principles of Data Management Lecture #4 Catalogs, File Organizations Instructor: Chen Li.
File Systems and Disk Management
File Systems and Disk Management
1 Multi-Protocol Label Switching (MPLS). 2 MPLS Overview A forwarding scheme designed to speed up IP packet forwarding (RFC 3031) Idea: use a fixed length.
CS222p: Principles of Data Management Lecture #4 Catalogs, File Organizations Instructor: Chen Li.
File Systems and Disk Management
File Systems and Disk Management
File Systems and Disk Management
Ch 17 - Binding Protocol Addresses
DBMS Physical Design Physical design is concerned with the placement of data and selection of access methods for efficiency and ongoing maintenance.
Presentation transcript:

Hyperion :High Volume Stream Archival Divya Muthukumaran

Area Network Monitoring Identify problems due to overloaded and/or crashed servers, network connections or other devices Example: To determine the status of a webserver, monitoring software may periodically send an HTTP request to fetch a pageHTTP

Live Monitoring Packets are examined in real time Compute and continually update traffic statistics Discard the captured packet headers once examined Why the need to store packet headers?

Live Monitoring Packets are examined in real time Compute and continually update traffic statistics Discard the captured packet headers once examined Why the need to store packet headers? Example: Network forensics To go back and examine the root cause of a problem Ex: See how an intruder gained entry, How a worm infection happened

What is the need of such a system? Querying and examining live data Data Archival Capture the data at wire speeds, Index and store them Efficiently support retrieval and processing of archived data Specifically designed to handle needs of high volume stream archival

Why not traditional databases? Some statistics A single GB link can generate over 100,000 packets and tens of MBs of archival data. A monitor may record from Multiple links.

Design Principles Support Queries not reads Implies the need to maintain indexes Writes Sequential and Immutable Archive locally, summarize globally Scalability Vs Need to avoid flooding Scalability: Favors local archiving and indexing to avoid network writes Need to answer Distributed queries: favors sharing information across nodes

Hyperion Three Key components Stream File System High volume archiving and querying Multi-level index structure High update rates + reasonable lookup performance Distributed index layer Distributes a summary of local indices to enable distributed querying

Design choices for the Hyperion Storage System Storage of multiple high-speed traffic streams without loss Support for concurrent read activity without loss of write performance Re-use of storage in a buffer-like fashion

Stream File System Stores Streams as opposed to files Characteristics Recycled : When storage is full new data replaces old data. In a GP File system new data is lost old is retained Immutable Record-oriented: data is written in fixed or variable length records

Can we use a GP FS? Need to map streams files

LogFile Rotation

Stream FS

Stream FS Organization Los-structured FS What problem? Cleaning/Garbage collection StreamFS solves the cleaning problem Guarantee : Storage guarantee for each stream Small segment size Check if next segment is a surplus. If yes then overwrite, otherwise skip.

Stream FS Organization Los-structured FS What problem? Cleaning/Garbage collection StreamFS solves the cleaning problem Guarantee : Storage guarantee for each stream Small segment size (1 or ½ MB) Check if next segment is a surplus. If yes then overwrite, otherwise skip. Advantages? Storage Reservation Best effort use of remaining storage

Reads First get index Use index to get data Persistent Handles Returned from each write operation Passed to read op to retrieve data What does the handle contain? Disk location, approximate length Allows data to be retrieved directly

Handle issues Validate the handle. How? Self certifying record header Id of the stream Permissions of the stream Record length Hash (used for validating the handle)

Stream FS Organization Record Variable length On-disk record + header Block Fixed length Multiple records of the same stream Block Map Every nth block (stream ID + in-stream sequence number for each of the preceding n-1 blocks) Used for easy write allocation

Stream FS Organization

Indexing Uses signature based Indices Signature for each segment  Can check if a record with a key k is present in the segment or not  Does not tell you where the record is present in the segment

Multi-level Indices

Multi Level Indices Uses a Bloom Filter Hash (key) -> b bits In b bits k bits are set to 1 H(key1)||H(key2)…||H(keyn) = Hs (Signature) How to check for presence of a record? Compute hash of its key kr, H(kr) If a bit in H(kr) is set but not set in Hs then the value is not present False positives

Distributed Index How to handle distributed queries without flooding? Maintain distributed index Integrated view of all nodes Coarse-grain summary of data at each node is needed Can use the top level index in the Hyperion One index node per time interval All nodes send their top-level indices to this node Temporally–distributed index