U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Hyperion: High Volume Stream Archival for Restrospective Querying Peter Desnoyers and Prashant Shenoy University of Massachusetts
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Packet monitoring with history Packet monitor: capture and search packet headers E.g.: Snort, tcpdump, Gigascope … with history: Capture, index, and store packet headers Interactive queries on stored data Provides new capabilities: Network forensics: When was a system compromised? From where? How? Management: After-the-fact debugging monitor storage
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Challenges Speed Storage rate, capacity to store data without loss, retain long enough Queries must search millions of packet records Indexing in real time for online queries Commodity hardware For each link monitored 1 gbit/s x 80% ÷ 400 B/pkt = 250,000 pkts/s
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Existing approaches Packet monitoring with history requires a new system. Event rates Archive Index, query Commodity HW Streaming query systems (GigaScope, Bro, Snort) YesNo Yes Peer-to-peer systems (MIND, PIER) NoYes Conventional DBMSNoYes CoMoYes NoYes Proprietary systems*?Yes No *Niksun NetDetector, Sandstorm NetInterceptor
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Outline of talk Introduction and Motivation Design Implementation Results Conclusions
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Hyperion Design Multiple monitor systems High-speed storage system Local index Distributed index for query routing Monitor/ capture Storage Index Distributed index Hyperion node
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Storage Requirements Real-time Writes must keep up or data is lost Prioritized Reads shouldn’t interfere with writes Aging Old data replaced by new Stream storage Different behavior Typical app Hyperion Likely deletesNewest files Oldest data File sizeRandom, small Streaming Sequential reads yesno Behavior: Typical app. vs. Hyperion Packet monitoring is different from typical applications
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Log structured stream storage Goal: minimize seeks despite interleaved writes on multiple streams Log-structured file system minimizes seeks Interleave writes at advancing frontier free space collected by segment cleaner A disk position 1: A A But: General-purpose segment cleaner performs poorly on streams Write frontier C 2: C 3: A B 4: B C
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Hyperion StreamFS How to improve on a general- purpose file system? Rely on application use patterns Eliminate un-needed features StreamFS – log structure with no segment cleaner. No deletes (just over-write) No fragmentation No segment cleaning overhead Operation: Write fixed-size segment Advance write frontier to next segment ready for deletion skip
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science StreamFS Design Record Single write, packed into: Segment Fixed-size, single stream, interleaved into: Region Contains: Region map Identifies segments in region Used when write frontier wraps Directory Locate streams on disk Region map record segment region directory Stream_A …
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science StreamFS optimizations Data Retention Control how much history saved Lets filesystem make delete decisions Speed balancing Worst-case speed set by slowest tracks Solution: interleave fast and slow sections Worst-case speed now set by average track New data Reservation Old data is deleted
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Local Index Requirements: High insertion speed Interactive query response Search speed Insert speed Exhaustive search NoYes B-tree YesNo Hash index YesNo Signature index Yes Index and search mechanisms
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Signature Index Compress data into signature Store signature separately Search signature, not data Retrieve data itself on match Signature algorithm: Bloom filter No false negatives – never misses a result False positives – extra read overhead Records Keys Signature
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Signature index efficiency Overhead = bytes searched Index size False positives (data scan) Concise index: Index scan cost: low False positive scans: high Verbose index: Index scan cost: high False positive scans: low Bytes searched Index size
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Multi-level signature index Concise index: Low scan overhead Verbose index: Low false positive overhead Use both Scan concise index Check positives in verbose index Concise index Verbose index Data records
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Distributed Index Query routing: Send queries only to nodes holding matches Use signature index Index distribution: Aggregate indexes at cluster head Route queries through cluster head Rotate cluster head for load sharing
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Implementation Components: StreamFS Index Capture RPC, query & index distribution Query API Linux OS Python framework Linux kernel capture StreamFS RPC, query, index dist. Query API Index Hyperion components
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Outline of talk Introduction and Motivation Design Implementation Results Conclusions
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Experimental Setup Hardware: Linux cluster Dual 2.4GHz Xeon CPUs 1 GB memory 4 x 10K RPM SCSI disks Syskonnect SK98xx + U. Cambridge driver Test data Packet traces from UMass Internet gateway* 400 mbit/s, 100k pkt/s *
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science StreamFS – write performance Tested configurations: NetBSD / LFS Linux / XFS (SGI) StreamFS Workload: multiple streams, rates Logfile rotation Used for LFS, XFS Results: 50% boost in worst-case throughput Fast enough to store 1,000,000 packet hdrs/s
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science StreamFS – read/write Workload: Continuous writes Random reads StreamFS: sustained write throughput XFS throughput collapse StreamFS can handle stream read+write traffic without data loss. XFS cannot.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Index Performance Calculation benchmark: 250,000 pkts/sec Query: 380M packet headers 26GB data selective query (1 pkt returned) Query results: 13MB data fetched to query 26GB data (1:2000) Index size Data fetched (MB)
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science System Performance Workload: Trace replay Simultaneous queries Speed: K pkts/s Packet loss measured: #transmitted - #received Up to 175K pkts/s with negligible packet loss 10· ,000 4· , ,000 2· , , ,000 Loss ratePackets/s Results:
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Conclusions Hyperion - packet monitoring with retrospective queries Key components: Storage 50% improvement over GP file systems Index Insert at 250K pkts/sec Interactive query over 100s of millions of pkts System Capture, index, and query at 175K pkts/sec
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Questions Questions?