Design, Implementation, and Evaluation of Network Monitoring Tasks with the TelegraphCQ Data Stream Management System INF5100, Autumn 2006 Jarle Søberg.

Slides:



Advertisements
Similar presentations
Mobile and Wireless Computing Institute for Computer Science, University of Freiburg Western Australian Interactive Virtual Environments Centre (IVEC)
Advertisements

MapReduce Online Tyson Condie UC Berkeley Slides by Kaixiang MO
Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom.
MapReduce Online Veli Hasanov Fatih University.
CCNA – Network Fundamentals
CISCO NETWORKING ACADEMY PROGRAM (CNAP)
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
OpenFlow-Based Server Load Balancing GoneWild
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
Hands-On Microsoft Windows Server 2003 Networking Chapter 7 Windows Internet Naming Service.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 11 Database Performance Tuning and Query Optimization.
© Lethbridge/Laganière 2001 Chap. 3: Basing Development on Reusable Technology 1 Let’s get started. Let’s start by selecting an architecture from among.
Design and Implementation of a Server Director Project for the LCCN Lab at the Technion.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
1.1 CAS CS 460/660 Introduction to Database Systems File Organization Slides from UC Berkeley.
WXES2106 Network Technology Semester /2005 Chapter 8 Intermediate TCP CCNA2: Module 10.
Check Disk. Disk Defragmenter Using Disk Defragmenter Effectively Run Disk Defragmenter when the computer will receive the least usage. Educate users.
CIS679: RTP and RTCP r Review of Last Lecture r Streaming from Web Server r RTP and RTCP.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Offline Performance Monitoring for Linux Abhishek Shukla.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
1 Semester 2 Module 10 Intermediate TCP/IP Yuda college of business James Chen
An Integration Framework for Sensor Networks and Data Stream Management Systems.
By Lecturer / Aisha Dawood 1.  You can control the number of dispatcher processes in the instance. Unlike the number of shared servers, the number of.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
IP Forwarding.
Module 10: Monitoring ISA Server Overview Monitoring Overview Configuring Alerts Configuring Session Monitoring Configuring Logging Configuring.
Master’s Thesis (30 credits) By: Morten Lindeberg Supervisors: Vera Goebel and Jarle Søberg Design, Implementation, and Evaluation of Network Monitoring.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Vladimír Smotlacha CESNET Full Packet Monitoring Sensors: Hardware and Software Challenges.
March 16 & 21, Csci 2111: Data and File Structures Week 9, Lectures 1 & 2 Indexed Sequential File Access and Prefix B+ Trees.
FALL 2005CSI 4118 – UNIVERSITY OF OTTAWA1 Part 2.5 Internetworking Chapter 25 (Transport Protocols, UDP and TCP, Protocol Port Numbers)
CS1Q Computer Systems Lecture 17 Simon Gay. Lecture 17CS1Q Computer Systems - Simon Gay2 The Layered Model of Networks It is useful to think of networks.
ECE 526 – Network Processing Systems Design Packet Processing I: algorithms and data structures Chapter 5: D. E. Comer.
Chapter 12 Transmission Control Protocol (TCP)
EECB 473 DATA NETWORK ARCHITECTURE AND ELECTRONICS PREPARED BY JEHANA ERMY JAMALUDDIN Basic Packet Processing: Algorithms and Data Structures.
Lecture 15- Parallel Databases (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad University- Mashhad Branch
An Example Data Stream Management System: TelegraphCQ INF5100, Autumn 2009 Jarle Søberg.
Intradomain Traffic Engineering By Behzad Akbari These slides are based in part upon slides of J. Rexford (Princeton university)
Processes CSCI 4534 Chapter 4. Introduction Early computer systems allowed one program to be executed at a time –The program had complete control of the.
Eddies: Continuously Adaptive Query Processing Ross Rosemark.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 3: Process-Concept.
1 MSRBot Web Crawler Dennis Fetterly Microsoft Research Silicon Valley Lab © Microsoft Corporation.
Query Processing CS 405G Introduction to Database Systems.
Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,
#16 Application Measurement Presentation by Bobin John.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
TCP/IP1 Address Resolution Protocol Internet uses IP address to recognize a computer. But IP address needs to be translated to physical address (NIC).
1 Chapter 24 Internetworking Part 4 (Transport Protocols, UDP and TCP, Protocol Port Numbers)
S. Sudarshan CS632 Course, Mar 2004 IIT Bombay
Module 11: File Structure
Large-scale file systems and Map-Reduce
Database Management System
SOFTWARE DESIGN AND ARCHITECTURE
Packet Switching Datagram Approach Virtual Circuit Approach
Introduction to Networking
Chapter 12: Query Processing
Software Architecture in Practice
Database Performance Tuning and Query Optimization
Disk Storage, Basic File Structures, and Buffer Management
Streaming Sensor Data Fjord / Sensor Proxy Multiquery Eddy
CS4470 Computer Networking Protocols
Practical Database Design and Tuning
Introduction to Database Systems
Selected Topics: External Sorting, Join Algorithms, …
Chapter 2: Operating-System Structures
Chapter 11 Database Performance Tuning and Query Optimization
Chapter 2: Operating-System Structures
Chapter 13: I/O Systems.
Presentation transcript:

Design, Implementation, and Evaluation of Network Monitoring Tasks with the TelegraphCQ Data Stream Management System INF5100, Autumn 2006 Jarle Søberg

INF5100, Autumn 2006 © Jarle Søberg Content Introduction Streaming Applications Data Stream Management Systems TelegraphCQ Query Design System Implementation Performance Evaluation Conclusion

INF5100, Autumn 2006 © Jarle Søberg Introduction Network monitoring is important – On-line – Real-time – Large amounts of data – A lot of programs exist ZoneRanger SiteAlive InMon and so on … Source:

INF5100, Autumn 2006 © Jarle Søberg Introduction Different solutions based on the protocol standards – Cumbersome? – Too specific? – How about new protocols? – Add-ons? – Proprietary solutions? – Written badly?

INF5100, Autumn 2006 © Jarle Søberg Introduction Declarative languages – Database management systems (SQL) – The web (HTML) – Prolog – What is solved? Use declarative languages to perform network monitoring!

INF5100, Autumn 2006 © Jarle Søberg Introduction A DBMS approach to network traffic analysis – Declarative SQL queries – Done after the traces are collected – Real-time analysis is not possible “We need something that is declarative AND real-time!!”

INF5100, Autumn 2006 © Jarle Søberg Introduction Data stream management systems (DSMSs) – E.g. SQL syntax – Easy to learn – Don’t have to be a programming expert – Several DSMSs available (look at earlier INF5100 lecture) STREAM Gigascope Borealis Aurora, Medusa TelegraphCQ

INF5100, Autumn 2006 © Jarle Søberg Introduction We have chosen to evaluate the performance of TelegraphCQ in an on-line network monitoring scenario – Build a general understanding/give a reminder of Network monitoring DSMSs TelegraphCQ

INF5100, Autumn 2006 © Jarle Søberg Introduction – Create a set of network monitoring tasks – Create a framework for the network monitoring – Evaluate the performance of TelegraphCQ on the set of tasks provided – Discuss and show results – Conclude the performance evaluation

INF5100, Autumn 2006 © Jarle Søberg Content Introduction Streaming Applications Data Stream Management Systems TelegraphCQ Query Design System Implementation Performance Evaluation Conclusion

INF5100, Autumn 2006 © Jarle Søberg Streaming Applications Revisited The data stream model – A contrast to the traditional stored data records E.g. in databases – On-line – Un-ordered – Un-bounded in size – Append-only

INF5100, Autumn 2006 © Jarle Søberg Streaming Applications Overview Streaming Applications Pull-Based Sensor Networks Push-Based Network Monitoring Network Traffic Measurement Active Passive Network Traffic Analysis Others Streaming Applications Push-Based Network Monitoring Network Traffic Measurement Active Passive Network Traffic Analysis

INF5100, Autumn 2006 © Jarle Søberg Pull-Based Applications Sensor networks revisited – Pull data from environment at certain intervals – Wireless – Minimize power usage – Aggregate data before sending – Locate and communicate with other sensors – Send data streams to access points where the data is further analyzed

INF5100, Autumn 2006 © Jarle Søberg Push-Based Applications Network monitoring – Data packets Protocol headers – Header fields Nice match for DSMS? – Let’s see!

INF5100, Autumn 2006 © Jarle Søberg Content Introduction Streaming Applications Data Stream Management Systems TelegraphCQ Query Design System Implementation Performance Evaluation Conclusion

INF5100, Autumn 2006 © Jarle Søberg Data Stream Management Systems (DSMSs) Revisited Introduction Database management systems (DBMSs) versus DSMSs – A comparison Issues in DSMS – Continuous queries and windows – See earlier lecture for: Approximation and optimization Query languages

INF5100, Autumn 2006 © Jarle Søberg DSMSs: Introduction Pose queries on a stream of data Extract information in real-time Compared to DBMSs for simplicity – A data packet can be viewed as a tuple – The header(s’) fields can be viewed as attributes – Still, there are some critical differences between DBMSs and DSMSs

INF5100, Autumn 2006 © Jarle Søberg DBMSs versus DSMSs Persistent storage Transient queries Random access and finite data sets Unbounded storage Correct answers Optimize once Transient streams Persistent queries Sequential access and possibly unlimited data streams Memory limitations Throw away tuples and aggregate at high loads Adaptive

INF5100, Autumn 2006 © Jarle Søberg Transient and persistent Continuous queries – Run query at each instance of time Investigate each tuple – Monotonic and append-only due to sequential access Give me all the messages that have not been replied to – Blocking of the data stream!! Must be avoided Stream is stopped and tuples are shedded

INF5100, Autumn 2006 © Jarle Søberg Introducing Windows Give me all messages that have not been given a reply within five minutes after it has been sent Partition data stream into smaller parts Use windows to unblock Different types: – Sliding – Jumping – Tumbling

INF5100, Autumn 2006 © Jarle Søberg Windows Sliding Jumping Tumbling window

INF5100, Autumn 2006 © Jarle Søberg Storage? Disk I/O is expensive! – Can not store tuples to disk – Can only watch tuples once Dependent on memory and window size Query processing must be simple – Windows solve the blocking problem

INF5100, Autumn 2006 © Jarle Søberg Content Introduction Streaming Applications Data Stream Management Systems TelegraphCQ Query Design System Implementation Performance Evaluation Conclusion

INF5100, Autumn 2006 © Jarle Søberg TelegraphCQ Introduction and overview Description of concepts – Wrappers – Fjords – Eddies – SteMs – CACQ – Other features A practical overview Limitations

INF5100, Autumn 2006 © Jarle Søberg TelegraphCQ: Introduction Developed at Berkeley Written in C – Open source GNU license Based on the PostgreSQL DBMS Current version: 2.1 on PostgreSQL code base – Each group has a running copy on dmms-lab107 Closed down Summer 2006 – Still, many interesting and important features to discuss

INF5100, Autumn 2006 © Jarle Søberg Client TelegraphCQ: Overview Shared memory buffer pool Wrapper clearing house Disk Shared memory queues Back end Front end Planner Parser Listener Fjords Eddies SteMs CACQ Client PostmasterServer

INF5100, Autumn 2006 © Jarle Søberg TelegraphCQ: Overview Based on modules – Query processing – Adaptive routing – Ingress and caching Communicate via Fjords – Push and pull data in pipeline fashion – Reduce overhead by non-blocking behavior

INF5100, Autumn 2006 © Jarle Søberg Wrappers Transform data to Datum items Push or pull Several formats – Comma separated format (CSV) is used by TelegraphCQ Contacted via TCP Wrapper clearing house (WCH) – Many connections possible Store streams to database if needed

INF5100, Autumn 2006 © Jarle Søberg Wrappers Shedded tuples, Data Triage – Support for dropping tuples Look at Vera’s presentation about methods – Periodically summarize tuple information – Runs “shadow” queries on shedded tuples Described later Shared Memory Buffer Pool shed

INF5100, Autumn 2006 © Jarle Søberg Eddies DBMSs – Query plan created once – E.g. joined on some attributes may give this plan: – Ok, as long as data set is finite and pulled

INF5100, Autumn 2006 © Jarle Søberg Eddies How about pushed data? Blocking or throwing away tuples is unavoidable!

INF5100, Autumn 2006 © Jarle Søberg Eddies A reconfiguration is necessary Might be much changes in the different streams Reconfiguration may take long time Not dynamic enough

INF5100, Autumn 2006 © Jarle Søberg Eddies An alternative is to use an eddy: eddy Dynamic on a tuple-per-tuple basis Adaptive to changes in the stream

INF5100, Autumn 2006 © Jarle Søberg Eddies: Details Bitmap per tuple represents each operator – ready and done bits – The ready bits specifies the operators the tuple should visit – Tuple is ready for output when all done bits are set – Manipulate bits to set a route for a tuple – On creation of new tuples due to e.g. joins: OR the bitmaps 10tuple

INF5100, Autumn 2006 © Jarle Søberg Eddies: Routing policy Priority scheme – Tuples coming from an operator = high priority Prevents starvation Originally: Back-pressure – Self regulating due to queuing – Naïve, hence not optimal Extended to lottery scheduling

INF5100, Autumn 2006 © Jarle Søberg Eddies: Lottery scheduling Each operator has ticket account – Credited for each arriving tuple – Debited for each leaving tuple Lottery among available operators – Empty in-queue: Fast operators – High number of tickets: Low selectivity operators

INF5100, Autumn 2006 © Jarle Søberg Eddies: Lottery scheduling Low selectivity operators – Win even if the operator is slowing down – Expand with a window scheme Banked tickets Escrow tickets window operator

INF5100, Autumn 2006 © Jarle Søberg Eddies Works for single query environments Simple and adaptive May still not be optimal with respect to dynamic changes over e.g. a single join Extend the eddy’s strength by introducing state modules (SteMs)

INF5100, Autumn 2006 © Jarle Søberg SteMs Split joins in two – Dynamic – Send build tuples Build hash tables – Send probe tuples Look for matches RS T eddy R S T

INF5100, Autumn 2006 © Jarle Søberg SteMs RS Any possible problems? – Two equal intermediate tuples! Solved by globally unique sequence number – Only youngest tuples allowed to match

INF5100, Autumn 2006 © Jarle Søberg SteMs: Issues SteMs are implemented using hash tables – Only equi-joins work properly Alternatively, use B-trees – Can correctly express more: “<>”, “>>”, “<=”, … – Not fast enough for conference show-off?

INF5100, Autumn 2006 © Jarle Søberg Eddies and SteMs Still single-query environment DSMSs aim to support many concurrent queries This feature needs to be adaptive and manage creation and deletion of queries in real-time Optimization is proven NP-hard

INF5100, Autumn 2006 © Jarle Søberg Introducing CACQ Continuously adaptive continuous queries Heuristics – Adding more information to the tuples – Creating even more meta information Avoid sending same singleton and intermediate tuples to same operators First of all: Use grouped filters!

INF5100, Autumn 2006 © Jarle Søberg CACQ: Grouped Filters Module for early filtering of selection predicates – For example: SELECT * FROM stream WHERE stream.a = 7 – All tuples without stream.a = 7 are not sent to the eddy – Includes “>”, “<”, and “ ”, as well

INF5100, Autumn 2006 © Jarle Søberg The CACQ Tuple Extended the eddy tuple to include bitmaps for queriesCompleted and sourceId The queriesCompleted bitmap – Represents the queries – Shows a lineage of the tuple The sourceId bitmap – Source when queries do not share tuples

INF5100, Autumn 2006 © Jarle Søberg Eddies, SteMs, and CACQ: Issues Bitmap statically configured – Faster, but not dynamic Much overhead experienced by the developers – Tuple-by-tuple processing takes time – Batching tuples are suggested Static for shorter periods

INF5100, Autumn 2006 © Jarle Søberg Continuous Queries in TelegraphCQ Windowing supports sliding, hopping, and jumping behavior – Aggregations are important for correct results – Output does not start until window is reached when aggregations are used SELECT stream.color, COUNT(*) FROM stream [RANGE BY ‘9’ SLIDE BY ‘1’] GROUP BY stream.color window START OUPUT!

INF5100, Autumn 2006 © Jarle Søberg Other Information Pros: – Introspective streams – Sub-queries, to some extent – Shadow queries for Data Triage tuples Cons: – OR is not understood – Only istreams, and not dstreams – Only six ANDs between SteMs – TelegraphCQ is very unstable at high pressure

INF5100, Autumn 2006 © Jarle Søberg Content Introduction Streaming Applications Data Stream Management Systems TelegraphCQ Query Design System Implementation Performance Evaluation Conclusion

INF5100, Autumn 2006 © Jarle Søberg Query Design and Implementation The IP/TCP header definition Three tasks for evaluation – Two simple – One complex and large – Show how we think when creating these tasks

INF5100, Autumn 2006 © Jarle Søberg The IP/TCP Stream All TCP/IP header fields are included Use TelegraphCQ’s data types – Try to save space – Change most fields into integer values smallint (16), int (32), and bigint (64) (all signed) – Use the cidr data type for IP addresses – Option fields as text – Control bits as char(1) Eddy and CACQ bits are added as well

INF5100, Autumn 2006 © Jarle Søberg Task 1 Measure the average load of packets and network load per second over a one minute interval – Two results simultaneously – Create two different queries Sub-query Non-sub-query

INF5100, Autumn 2006 © Jarle Søberg Task 1: Sub-query (task_1.1) WITH streams.task1 1 AS ( SELECT COUNT(*), SUM(totalLength), wtime(*) FROM streams.iptcp [RANGE BY ’1 second’ SLIDE BY ’1 second’] ) (SELECT AVG(totalNum), AVG(totalLength)*8 FROM streams.task1 1 [RANGE BY ’1 minute’ SLIDE BY ’1 second’]);

INF5100, Autumn 2006 © Jarle Søberg Task 1: Non-sub-query (task_1.2) SELECT COUNT(*)/60, AVG(s.totalLength) FROM streams.iptcp AS s [RANGE BY ’1 minute’ SLIDE BY ’1 second’];

INF5100, Autumn 2006 © Jarle Søberg Task 2 How many packets have been sent to certain ports during the last five minutes? – Join between table and stream – Which ports are interesting?

INF5100, Autumn 2006 © Jarle Søberg Task 2’s query SELECT wtime(*), streams.iptcp.destPort, COUNT(*) FROM streams.iptcp [RANGE BY ’5 minutes’ SLIDE BY ’1 second’], ports WHERE ports.port = streams.iptcp.destPort GROUP BY streams.iptcp.destPort;

INF5100, Autumn 2006 © Jarle Søberg Task 3: A Theoretical Approach How many bytes have been exchanged on each connection during the last ten seconds? – Identify connections We try to complicate the connection identification by looking at connection establishment – Investigate payload’s size Overall task: Investigate protocol requirement

INF5100, Autumn 2006 © Jarle Søberg Task 3: Connection establishment The 3-way handshake 310 Client Server Sequence number Acknowledgement number SYN ACK Closed Listen SYN-sent SYN-received Established

INF5100, Autumn 2006 © Jarle Søberg Express establishment with query Most ACK packets! Ohoh, what to do? SYN SYN/ACK Merge 3-way handshake Time window lasting for a given period

INF5100, Autumn 2006 © Jarle Søberg Task 3: Challenges All connections in first interval are not registered There are time limitations for how long a connection can be stored No possibility of storing a connection in TelegraphCQ and remove when connection is closed TelegraphCQ only supports six SteMs concurrently, needing to reduce the number of conditions

INF5100, Autumn 2006 © Jarle Søberg Content Introduction Streaming Applications Data Stream Management Systems TelegraphCQ Query Design System Implementation Performance Evaluation Conclusion

INF5100, Autumn 2006 © Jarle Søberg System Implementation: fyaf Simple filter for transforming binary packets to comma separated values (CSVs) understood by TelegraphCQ Use the pcap interface Monitors the stream and itself – Harder to implement such behavior using Linux programs like sed and awk – More control and less possible sources of failure Runs two threads to stop after a time-to-live after the final tuple – Precise with regard to start and stop Batch monitor Higher throughput than the DSMSs

INF5100, Autumn 2006 © Jarle Søberg Experiment Setup Computer A fyaf DSMS TG NIC TG Computer B Payload Header

INF5100, Autumn 2006 © Jarle Søberg Content Introduction Streaming Applications Data Stream Management Systems TelegraphCQ Query Design System Implementation Performance Evaluation Conclusion

INF5100, Autumn 2006 © Jarle Søberg Performance evaluation Metrics – Relative throughput How much data the DSMS receives versus how much it manages to compute – Accuracy Are the results correct? – Consumption Memory CPU

INF5100, Autumn 2006 © Jarle Søberg Performance evaluation Factors – Network load Evaluation technique – Measurements

INF5100, Autumn 2006 © Jarle Søberg Performance evaluation Workload selection – TCP packets The TG traffic generator – Manages to create accurate loads up to approx. 10 Mbits/s (according to sar output) Packet size of 576 bytes How is this for the eddy with respect to load shedding?

INF5100, Autumn 2006 © Jarle Søberg TelegraphCQ Monitors Use shadow queries for shedded tuples – Both count kept and dropped tuples Gives relative throughput – Use introspective queries TelegraphCQ Wrapper

INF5100, Autumn 2006 © Jarle Søberg TelegraphCQ Monitors Use introspective streams – Queries about inner life in TelegraphCQ Queues Modules – Not very much implemented Hard to obtain any useful information

INF5100, Autumn 2006 © Jarle Søberg Simple Filtering Task Three tasks projecting “*”, “sourceip, sourceport, destip, destport”, “destport” Using both introspective queries and not – Intro: called task_101, task_101.1, and task_101.2 – Non-intro: called task_101.3, task_101.4, and task_101.5

INF5100, Autumn 2006 © Jarle Søberg Relative Throughput: Simple filtering

INF5100, Autumn 2006 © Jarle Søberg Load Shedding Accuracy See how accurate TelegraphCQ is – fyaf should not drop any packets since TelegraphCQ supports load shedding – We investigated log files from both TelegraphCQ and fyaf on the filtering task For both introspective (lines) and non-introspective tasks (linespoints) – Found relation by dividing between fyaf ’s and TelegraphCQ’s results

INF5100, Autumn 2006 © Jarle Søberg Load Shedding Accuracy

INF5100, Autumn 2006 © Jarle Søberg Conclusion, Filtering Projecting few attributes using introspective queries seems to be most accurate – So we do this for the remaining tasks

INF5100, Autumn 2006 © Jarle Søberg Task 1 Measure the average load of packets and network load per second over a one minute interval task_1.1 with sub-query task_1.2 without sub-query

INF5100, Autumn 2006 © Jarle Søberg Task 1: Relative throughput

INF5100, Autumn 2006 © Jarle Søberg Task 1: Relative throughput task_1.1 probably faster because of the aggregation Still, not that much faster – If one eddy, the eddy still has to manage a lot of tuples – Eddy versus SteMs – Remember how and where the shedding is performed… streams.iptcpstreams.task_1.1 Q1Q2

INF5100, Autumn 2006 © Jarle Søberg Task 1: Accuracy

INF5100, Autumn 2006 © Jarle Søberg Task 1 TelegraphCQ manages to perform simple aggregations to investigate a stream Low relative throughput – Can possibly use shedded tuple information in real-time to adjust for failure – Can use a sub-query to aggregate smaller partitions of the stream Still uncertain how large this window should be

INF5100, Autumn 2006 © Jarle Søberg Task 2 How many packets have been sent to certain ports during the last five minutes? Run at three different table sizes – 65536, 1024, and 1 – SteMs use hash lookups at O(1) task_2.1: ports task_2.2: 1025 ports task_2.3: 1 port

INF5100, Autumn 2006 © Jarle Søberg Task 2: Relative Throughput

INF5100, Autumn 2006 © Jarle Søberg Task 2: Relative throughput, task_2.1

INF5100, Autumn 2006 © Jarle Søberg Task 2: Accuracy, task_2.1

INF5100, Autumn 2006 © Jarle Søberg Task 2: Accuracy Use sar to verify accuracy for 1 Mbits/s What do we see? – Memory and CPU? – Internal adaptation? – Some kind of other adaptation? Run for one hour to investigate patterns

INF5100, Autumn 2006 © Jarle Søberg Task 2: Equilibrium

INF5100, Autumn 2006 © Jarle Søberg Task 2: Equilibrium At 5 Mbits/s – Converge to approx 188 packets each second About 1 Mbits/s – Analytical modeling may help predicting the future behavior of the relative throughput Future work: Look at aggregations! – Though, remember throughput from other tasks…

INF5100, Autumn 2006 © Jarle Søberg Content Introduction Streaming Applications Data Stream Management Systems TelegraphCQ Query Design System Implementation Performance Evaluation Conclusion

INF5100, Autumn 2006 © Jarle Søberg Conclusions about TelegraphCQ Ok support for basic requirements – Problems expressing protocols E.g. connection establishment Not that fast for high traffic monitoring – Mbits/s just for filtering in our current setup – Has to perform heavy aggregations Loose fine granularity information, e.g. single packets Accurate Unstable

INF5100, Autumn 2006 © Jarle Søberg Conclusion about DSMSs and network monitoring Interesting application for DSMSs – Simple way of describing data streams – Still interesting to look at several other DSMSs for applicability, expressiveness, and accuracy Ideas: – Declarative protocol descriptions in dynamic and self aware and self configuring networks?

INF5100, Autumn 2006 © Jarle Søberg Questions? ?