U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Hyperion: High Volume Stream Archival for Restrospective Querying Peter Desnoyers.

Slides:



Advertisements
Similar presentations
Finding a needle in Haystack Facebook’s Photo Storage
Advertisements

M AINTAINING L ARGE A ND F AST S TREAMING I NDEXES O N F LASH Aditya Akella, UW-Madison First GENI Measurement Workshop Joint work with Ashok Anand, Steven.
More on File Management
Introduction to Database Systems1 Records and Files Storage Technology: Topic 3.
Multi-granular, multi-purpose and multi-Gb/s monitoring on off-the-shelf systems TELE9752 Group 3.
Chapter 4 : File Systems What is a file system?
Mendel Rosenblum and John K. Ousterhout Presented by Travis Bale 1.
File Systems.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts Amherst Operating Systems CMPSCI 377 Lecture.
Log-Structured Memory for DRAM-Based Storage Stephen Rumble, Ankita Kejriwal, and John Ousterhout Stanford University.
Chapter 11: File System Implementation
CMPT 300: Final Review Chapters 8 – Memory Management: Ch. 8, 9 Address spaces Logical (virtual): generated by the CPU Physical: seen by the memory.
G Robert Grimm New York University Sprite LFS or Let’s Log Everything.
Lecture 6 – Google File System (GFS) CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation.
CS 333 Introduction to Operating Systems Class 18 - File System Performance Jonathan Walpole Computer Science Portland State University.
METU Department of Computer Eng Ceng 302 Introduction to DBMS Disk Storage, Basic File Structures, and Hashing by Pinar Senkul resources: mostly froom.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Re-thinking Data Management for Storage-Centric Sensor Networks Deepak Ganesan University.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Operating Systems CMPSCI 377 Lecture.
Computer Science Storage Systems and Sensor Storage Research Overview.
G Robert Grimm New York University Sprite LFS or Let’s Log Everything.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Operating Systems CMPSCI 377 Lecture.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts Amherst Operating Systems CMPSCI 377 Lecture.
THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED FILE SYSTEM M. Rosenblum and J. K. Ousterhout University of California, Berkeley.
Sensor Networks Storage Sanket Totala Sudarshan Jagannathan.
1 The Google File System Reporter: You-Wei Zhang.
Objectives Learn what a file system does
File Systems and Disk Management. File system Interface between applications and the mass storage/devices Provide abstraction for the mass storage and.
Flashing Up the Storage Layer I. Koltsidas, S. D. Viglas (U of Edinburgh), VLDB 2008 Shimin Chen Big Data Reading Group.
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
Comp 335 – File Structures Why File Structures?. Goal of the Class To develop an understanding of the file I/O process. Software must be able to interact.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Re-thinking Data Management for Storage-Centric Sensor Networks Deepak Ganesan University.
Speaker: 吳晋賢 (Chin-Hsien Wu) Embedded Computing and Applications Lab Department of Electronic Engineering National Taiwan University of Science and Technology,
Component 4: Introduction to Information and Computer Science Unit 4: Application and System Software Lecture 3 This material was developed by Oregon Health.
THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED FILE SYSTEM M. Rosenblum and J. K. Ousterhout University of California, Berkeley.
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. LogKV: Exploiting Key-Value.
ICS 321 Fall 2011 Overview of Storage & Indexing (i) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 11/9/20111Lipyeow.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Log-structured Memory for DRAM-based Storage Stephen Rumble, John Ousterhout Center for Future Architectures Research Storage3.2: Architectures.
CPSC 404, Laks V.S. Lakshmanan1 External Sorting Chapter 13: Ramakrishnan & Gherke and Chapter 2.3: Garcia-Molina et al.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 12: File System Implementation File System Structure File System Implementation.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 11: File System Implementation.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
Advanced file systems: LFS and Soft Updates Ken Birman (based on slides by Ben Atkin)
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Yi Feng & Emery Berger University of Massachusetts Amherst A Locality-Improving.
CS333 Intro to Operating Systems Jonathan Walpole.
Hyperion :High Volume Stream Archival Divya Muthukumaran.
1 ECE 526 – Network Processing Systems Design System Implementation Principles I Varghese Chapter 3.
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
Bigtable: A Distributed Storage System for Structured Data
Operating Systems: Summary INF1060: Introduction to Operating Systems and Data Communication.
W4118 Operating Systems Instructor: Junfeng Yang.
File Systems and Disk Management
File System Implementation
CS222: Principles of Data Management Lecture #4 Catalogs, Buffer Manager, File Organizations Instructor: Chen Li.
Problem: Internet diagnostics and forensics
Jonathan Walpole Computer Science Portland State University
FileSystems.
Chapter 11: File System Implementation
File Systems and Disk Management
UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department
File Systems and Disk Management
CS222/CS122C: Principles of Data Management Lecture #4 Catalogs, File Organizations Instructor: Chen Li.
File Systems and Disk Management
File Systems and Disk Management
Persistence: I/O devices
CS222p: Principles of Data Management Lecture #4 Catalogs, File Organizations Instructor: Chen Li.
File Systems and Disk Management
File Systems and Disk Management
File Systems and Disk Management
Department of Computer Science
Presentation transcript:

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Hyperion: High Volume Stream Archival for Restrospective Querying Peter Desnoyers and Prashant Shenoy University of Massachusetts

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Packet monitoring with history Packet monitor: capture and search packet headers E.g.: Snort, tcpdump, Gigascope … with history: Capture, index, and store packet headers Interactive queries on stored data Provides new capabilities: Network forensics: When was a system compromised? From where? How? Management: After-the-fact debugging monitor storage

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Challenges Speed Storage rate, capacity to store data without loss, retain long enough Queries must search millions of packet records Indexing in real time for online queries Commodity hardware For each link monitored 1 gbit/s x 80% ÷ 400 B/pkt = 250,000 pkts/s

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Existing approaches Packet monitoring with history requires a new system. Event rates Archive Index, query Commodity HW Streaming query systems (GigaScope, Bro, Snort) YesNo Yes Peer-to-peer systems (MIND, PIER) NoYes Conventional DBMSNoYes CoMoYes NoYes Proprietary systems*?Yes No *Niksun NetDetector, Sandstorm NetInterceptor

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Outline of talk Introduction and Motivation Design Implementation Results Conclusions

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Hyperion Design Multiple monitor systems High-speed storage system Local index Distributed index for query routing Monitor/ capture Storage Index Distributed index Hyperion node

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Storage Requirements Real-time Writes must keep up or data is lost Prioritized Reads shouldn’t interfere with writes Aging Old data replaced by new Stream storage Different behavior Typical app Hyperion Likely deletesNewest files Oldest data File sizeRandom, small Streaming Sequential reads yesno Behavior: Typical app. vs. Hyperion Packet monitoring is different from typical applications

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Log structured stream storage Goal: minimize seeks despite interleaved writes on multiple streams Log-structured file system minimizes seeks Interleave writes at advancing frontier free space collected by segment cleaner A disk position 1: A A But: General-purpose segment cleaner performs poorly on streams Write frontier C 2: C 3: A B 4: B C

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Hyperion StreamFS How to improve on a general- purpose file system? Rely on application use patterns Eliminate un-needed features StreamFS – log structure with no segment cleaner. No deletes (just over-write) No fragmentation No segment cleaning overhead Operation: Write fixed-size segment Advance write frontier to next segment ready for deletion skip

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science StreamFS Design Record Single write, packed into: Segment Fixed-size, single stream, interleaved into: Region Contains: Region map Identifies segments in region Used when write frontier wraps Directory Locate streams on disk Region map record segment region directory Stream_A …

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science StreamFS optimizations Data Retention Control how much history saved Lets filesystem make delete decisions Speed balancing Worst-case speed set by slowest tracks Solution: interleave fast and slow sections Worst-case speed now set by average track New data Reservation Old data is deleted

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Local Index Requirements: High insertion speed Interactive query response Search speed Insert speed Exhaustive search NoYes B-tree YesNo Hash index YesNo Signature index Yes Index and search mechanisms

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Signature Index Compress data into signature Store signature separately Search signature, not data Retrieve data itself on match Signature algorithm: Bloom filter No false negatives – never misses a result False positives – extra read overhead Records Keys Signature

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Signature index efficiency Overhead = bytes searched Index size False positives (data scan) Concise index: Index scan cost: low False positive scans: high Verbose index: Index scan cost: high False positive scans: low Bytes searched Index size

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Multi-level signature index Concise index: Low scan overhead Verbose index: Low false positive overhead Use both Scan concise index Check positives in verbose index Concise index Verbose index Data records

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Distributed Index Query routing: Send queries only to nodes holding matches Use signature index Index distribution: Aggregate indexes at cluster head Route queries through cluster head Rotate cluster head for load sharing

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Implementation Components: StreamFS Index Capture RPC, query & index distribution Query API Linux OS Python framework Linux kernel capture StreamFS RPC, query, index dist. Query API Index Hyperion components

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Outline of talk Introduction and Motivation Design Implementation Results Conclusions

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Experimental Setup Hardware: Linux cluster Dual 2.4GHz Xeon CPUs 1 GB memory 4 x 10K RPM SCSI disks Syskonnect SK98xx + U. Cambridge driver Test data Packet traces from UMass Internet gateway* 400 mbit/s, 100k pkt/s *

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science StreamFS – write performance Tested configurations: NetBSD / LFS Linux / XFS (SGI) StreamFS Workload: multiple streams, rates Logfile rotation Used for LFS, XFS Results: 50% boost in worst-case throughput Fast enough to store 1,000,000 packet hdrs/s

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science StreamFS – read/write Workload: Continuous writes Random reads StreamFS: sustained write throughput XFS throughput collapse StreamFS can handle stream read+write traffic without data loss. XFS cannot.

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Index Performance Calculation benchmark: 250,000 pkts/sec Query: 380M packet headers 26GB data selective query (1 pkt returned) Query results: 13MB data fetched to query 26GB data (1:2000) Index size Data fetched (MB)

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science System Performance Workload: Trace replay Simultaneous queries Speed: K pkts/s Packet loss measured: #transmitted - #received Up to 175K pkts/s with negligible packet loss 10· ,000 4· , ,000 2· , , ,000 Loss ratePackets/s Results:

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Conclusions Hyperion - packet monitoring with retrospective queries Key components: Storage 50% improvement over GP file systems Index Insert at 250K pkts/sec Interactive query over 100s of millions of pkts System Capture, index, and query at 175K pkts/sec

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Questions Questions?