Master’s Thesis (30 credits) By: Morten Lindeberg Supervisors: Vera Goebel and Jarle Søberg Design, Implementation, and Evaluation of Network Monitoring Tasks for the Borealis Stream Processing Engine
Slide no. 2 Outline Problem description Application domains Data stream management system (DSMS) Borealis Design Experiment Setup Implementation Evaluation Conclusion Future Work Network monitoring tasks
Slide no. 3 Problem Description Design, Implementation, and Evaluation of Network Monitoring Tasks for the Borealis Stream Processing Engine Network Monitoring Tasks: –Task-1: Verify Borealis load shedding mechanisms. –Task-2: Measure the average load of packets and network load per second over a one minute interval. –Task-3: How many packets have been sent to certain ports during the last five minutes? –Task-4: How many bytes have been exchanged on each connection during the last ten seconds? –Task-5: Identify possible SYN flood attacks
Slide no. 4 Application Domains Network monitoring (Controlling and measuring the Internet or parts of it) –Challenges Traffic volumes Get relevant data Privacy –On-line network measurements Passive: Our network tasks Active: E.g. Traceroute and Ping –Off-line network measurements Passive: E.g. InTraBase (Siekkinen, 2006) Active: Pandora FMS(Pandora, 2007) N.M Private network DB Looks at all passing packets Push - based
Slide no. 5 Cont. Application Domains Sensor networks –TinyDB Financial tickers –Traderbot Pull-based Push-based
Slide no. 6 DSMS Stream Data Model –Definition: A data stream is a real-time, continuous, ordered sequence of items (Golab, 2003) n
Slide no. 7 Cont. DSMS Requirements –Continuous query language –Data reduction techniques Sampling Load shedding Aggregations with window techniques Without sliding windows aggregations would be a blocking operator, since one never will see the whole stream at once –Adaptive –Integration with a traditional database –Low latency and high throughput Hopping windows Tumbling windows Overlapping windows Window techniques: Windows are either time-based or tuple-based Streaming tuples should only be kept in main memory, never written to disk (too slow)
Slide no. 8 Cont. DSMS Existing systems: Name:Language: TelegraphCQ (Berkeley Uni.)SQL-like STREAM (Stanford Uni.)SQL-like Aurora (Brown, M.I.T++)Boxes and arrows Medusa (Brown, M.I.T++)Boxes and arrows Borealis (Brown, M.I.T++)Boxes and arrows Gigascope ($ AT&T)SQL-Like
Slide no. 9 Borealis Stream processing engine (SPE) –Academic research / Public domain –Distributed queries –General purpose Multi-player first person shooter game Network monitoring Continuous query language –Operator boxes and stream arrows –XML + GUI –E.g., operators: Map, Aggregate, Join, Filter, Random Drop and operators for integration with statically stored tables n2n5n3n4 n1 n6 Distributed query Data stream Result tuples High Availability
Slide no. 10 Design Task 2 - Version 1 –Average load and packet count Task 1 - Version 1 – Mapping
Slide no. 11 Cont. Design Task 3 - Version 2 – Port destination cont Task 4 - Version 2 – Exchanged bytes
Slide no. 12 Cont. Design Task 5 - Version 1 –SYN Flood attack (Several hosts initiate half-open connections to a server so that it has to deny service to others) –Identifies the relation between the count of SYN packets and normal packets (Non-SYN). Joins aggregated tuples if SYN count is twice or more the normal packet count.
Slide no. 13 Cont. Design <parameter name="predicate" value = "left.count * 2 < right.count and left.count > 0" />
Slide no. 14 Experiment Setup Scripts executes the different stages of each experiment TG: Generates traffic fyaf: Filters packet headers from NIC. Counts the number of packets retrieved by the C.A C.A: Transforms the packet headers into tuples. I/O to the Q.P Q.P: Performs the query on the tuples retrieved from C.A System resource consumption is logged by the execution scripts.. fyaf calculates the number of lost packets.. TG controls the amount of generated traffic per second..
Slide no. 15 Borealis Implementation Client application main-method: int main( int argc, const char *argv[] ) {... sock = get_connection(); NOTICE << "Socket opened: " << sock; status = marshal.open(); if ( status ) { WARN << "Could not deply the network."; } else { //Start the timer.. timer = Time::now(); // Send the first batch of tuples. Queue up the next round with a delay. marshal.sentPacket(); // Run the client event loop. Return only on an exception. marshal.runClient(); }... } fyafQuery processor Results Data stream Client application
Slide no. 16 Evaluation Results for Task 1 ( The map task ) CPU Maximums Drop box can lead to increased CPU utilization
Slide no. 17 Cont. Evaluation Results for Task 2 - (the simple task) (Lost packets at different network loads) 40 Mbit/s
Slide no. 18 Cont. Evaluation Results for Task 2 - (the simple task) (Task result - Measured Load) A c 98% A c 93% A c 96%
Slide no. 19 Cont. Evaluation Results for Task 3 - Memory Consumption Low memory consumption. (31 Mbyte). No changes when increasing load. Static tables causes increased memory consumption, but not much.
Slide no. 20 Cont. Evaluation TaskNetwork LoadMemory Consumption Task 130,40 Mbit/s31 Mbyte Task 240 Mbit/s31 Mbyte Task 310, 30 Mbit/s31, 33 Mbyte Task 420 Mbit/s31 Mbyte Task 520 Mbit/s30, 50+ Mbyte
Slide no. 21 Conclusion Support complex network monitor queries Borealis can handle network loads: –40 Mbit/s for simple tasks – Mbit/s for complex tasks –10 Mbit/s when comparing input packets with several thousands of statically stored tuples. Load Shedding –Not fully working, does not identify overload situations –random_drop box does not significantly increase supported network load Low memory consumption –System code parameters might affect performance
Slide no. 22 Future Work Distribution of queries Expand client application (fyaf and load shedding) Optimization of source code system parameters New version of Borealis (Winter 2007) Comparison with results from TelegraphCQ (Søberg, 2006) and STREAM (Hernes, 2006)
Slide no. 23 Bibliography (Søberg, 2006) - Design, implementation, and evaluation of network monitoring tasks with the TelegraphCQ data stream management system,Master’s Thesis (Hernes, 2006) - Design, implementation, and evaluation of network monitoring tasks with the STREAM data stream management system, Master’s Thesis (Siekkinen, 2006) - Root Cause Analysis of TCP Throughput: Methodology, Techniques, and Applications, Dr. Scient. Thesis (Golab, 2003) - Issues in Data Stream Management, Lukasz Golab and M. Tamer Ötzu, 2003 (Pandora, 2007) -