DataLines a framework for building steaming data applications Mike Haberman Senior Software/Network Engineer

Slides:



Advertisements
Similar presentations
© 2008 Cisco Systems, Inc. All rights reserved.Cisco ConfidentialPresentation_ID 1 Chapter 8: Monitoring the Network Connecting Networks.
Advertisements

SavvyRecruiter Designed for Flexibility and Scalablity June 17, 2009 Jerusalem.
CS490T Advanced Tablet Platform Applications Network Programming Evolution.
Measuring Performance Chapter 12 CSE807. Performance Measurement To assist in guaranteeing Service Level Agreements For capacity planning For troubleshooting.
1 ParaView Current Architecture and History Current Architecture and History Issues with the Current Design Issues with the Current Design.
Computer Software.
Common Services in a network Server : provide services Type of Services (= type of servers) –file servers –print servers –application servers –domain servers.
Web Application Architecture: multi-tier (2-tier, 3-tier) & mvc
UNIT-V The MVC architecture and Struts Framework.
CVSQL 2 The Design. System Overview System Components CVSQL Server –Three network interfaces –Modular data source provider framework –Decoupled SQL parsing.
.NET, and Service Gateways Group members: Andre Tran, Priyanka Gangishetty, Irena Mao, Wileen Chiu.
QualNet 2014/05/ 尉遲仲涵. Outline Directory Structure QualNet Basic Message & Event QualNet simulation architecture Protocol Model Programming.
Java Frameworks Indy Java Users Group January 29, 2003.
Chapter 8 Windows Outline Programming Windows 2000 System structure Processes and threads in Windows 2000 Memory management The Windows 2000 file.
Java Beans.
Meir Botner David Ben-David. Project Goal Build a messenger that allows a customer to communicate with a service provider for a fee.
13/09/2015 Michael Chai; Behrouz Forouzan Staffordshire University School of Computing Transport layer and Application Layer Slide 1.
Zhonghua Qu and Ovidiu Daescu December 24, 2009 University of Texas at Dallas.
1. There are different assistant software tools and methods that help in managing the network in different things such as: 1. Special management programs.
© 2006 IBM Corporation IBM WebSphere Portlet Factory Architecture.
|Tecnologie Web L-A Anno Accademico Laboratorio di Tecnologie Web Introduzione ad Eclipse e Tomcat
Module 7: Fundamentals of Administering Windows Server 2008.
Cisco S2 C4 Router Components. Configure a Router You can configure a router from –from the console terminal (a computer connected to the router –through.
Database-Driven Web Sites, Second Edition1 Chapter 5 WEB SERVERS.
National Center for Supercomputing Applications NCSA OPIE Presentation November 2000.
1 Geospatial and Business Intelligence Jean-Sébastien Turcotte Executive VP San Francisco - April 2007 Streamlining web mapping applications.
About These Slides This slide set is designed to be used with the OMA sample application It is recommended you follow the steps outlined in the “Preparing.
Apache JMeter By Lamiya Qasim. Apache JMeter Tool for load test functional behavior and measure performance. Questions: Does JMeter offers support for.
MBL401.Net Compact Framework: Data Access Best Practices Paul Foster Mobile solutions architect.
© 2006 Intland Software1 Aron Gombas Architect, Intland Software Extending & customizing CodeBeamer.
Express Application Delivery 1 Ralph Chen Innovative Solutions Co. Ltd Confidential Gaming Application Development Solution Innovation is based on ideas.
NetLogger Using NetLogger for Distributed Systems Performance Analysis of the BaBar Data Analysis System Data Intensive Distributed Computing Group Lawrence.
FECOS the best people make cosylab Matej Miha Rok
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 4: Planning and Configuring Routing and Switching.
August 2003 At A Glance The IRC is a platform independent, extensible, and adaptive framework that provides robust, interactive, and distributed control.
Marcelo R.N. Mendes. What is FINCoS? A set of tools for data generation, load submission, and performance measurement of CEP systems; Main Characteristics:
Communications & Networks National 4 & 5 Computing Science.
Status & development of the software for CALICE-DAQ Tao Wu On behalf of UK Collaboration.
Reconfigurable Communication Interface Between FASTER and RTSim Dec0907.
Presented by Vishy Grandhi.  Lesson 1: AX Overview  Lesson 2: Role based security  Lesson 3: Monitoring  Troubleshooting.
Hands-On Microsoft Windows Server 2008 Chapter 5 Configuring Windows Server 2008 Printing.
Event Management. EMU Graham Heyes April Overview Background Requirements Solution Status.
Service Charging Platform. EMS (Entity Management System) 0 Logging Agent Provides detailed activity logs and reports all raw facts as they happen to.
SDN and Beyond Ghufran Baig Mubashir Adnan Qureshi.
Accelerometer based motion gestures for mobile devices Presented by – Neel Parikh Advisor Committee members Dr. Chris Pollett Dr. Robert Chun Dr. Mark.
IST 201 Chapter 11 Lecture 2. Ports Used by TCP & UDP Keep track of different types of transmissions crossing the network simultaneously. Combination.
SDN controllers App Network elements has two components: OpenFlow client, forwarding hardware with flow tables. The SDN controller must implement the network.
VisIt Project Overview
Admin Console for Glassfish v2
© 2002, Cisco Systems, Inc. All rights reserved.
Vocabulary Prototype: A preliminary sketch of an idea or model for something new. It’s the original drawing from which something real might be built or.
Introduction to .NET Framework Ch2 – Deitel’s Book
Lecture 6: TCP/IP Networking By: Adal Alashban
Vocabulary Prototype: A preliminary sketch of an idea or model for something new. It’s the original drawing from which something real might be built or.
Understanding the OSI Reference Model
Instructor: Mr. Malik Zaib
Chapter 3: Windows7 Part 4.
Unit 27: Network Operating Systems
Windows Internals Brown-Bag Seminar Chapter 1 – Concepts and Tools
Software Defined Networking (SDN)
Chapter 8: Monitoring the Network
IS 4506 Server Configuration (HTTP Server)
I/O Systems I/O Hardware Application I/O Interface
CS703 - Advanced Operating Systems
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 4: Planning and Configuring Routing and Switching.
Software Engineering with Reusable Components
Computer Networking A Top-Down Approach Featuring the Internet
Calypso Service Architecture
Mark Quirk Head of Technology Developer & Platform Group
Presentation transcript:

DataLines a framework for building steaming data applications Mike Haberman Senior Software/Network Engineer

The Problem Data deluge: routers, switches, IDS, servers (web, mail, logs, etc), software (tcpdump, web100, SNMP, tarpit, etc), sensors, taps, … (help me) ? ? ?

The problem (continues) Disparate data formats Software (sometimes) to manage each Tweaking to get what you want (custom software) Correlating data (more custom software)

DataLines Can we build a framework that can remove all (most) of the tedium of working with all these disparate data formats?

DataLines Framework designed to manage and build streaming data processing applications

DataLines Framework designed to manage and build streaming data processing applications

DataLines Framework Manage: would like one tool to handle all these different data sources. designed to manage and build streaming data processing applications

DataLines Framework Build: uniform way of creating a data processing application. designed to manage and build streaming data processing applications

DataLines Framework Streaming data: Never ending stream of ‘manageable’ chunks of data No random access, no blocking operators One look, linear or sub-linear algorithms/data ops Each data item (a tuple in DataLines) is an independent entity Many tools were not designed for streaming data designed to manage and build streaming data processing applications

DataLines Framework Processing: Something you want to do to the data (e.g. reading, writing, parsing, event generation, filtering, statistics, reports, data synopsis, …) designed to manage and build streaming data processing applications

DataLines Creating a DataLines application: XML DataLines Application “compile”

DataLines XML file defines 3 major components: –Data Processors What one does with the data –Processing Order The order in which the processors will operate on the data –Event Management What to do when a processor generates an event

DataLines Processors Data Processors are the heart of D.L. –I/O: socket, file –Filters:inline, dispatch –Collectors: binning, windowing (w/operators) –Gui: charts, picture taking –Converters: binary to tuple –Misc: printers, counters, iterators, timers, data generators, gates, delays Processors can generate events Processors can drop, mutate, mutilate the tuple being processed, generate new tuples

DataLines Pipelines Control tuple movement among processors Can connect either processors or other pipelines Two paths within a pipeline: binary and tuple

Event Management Allow processors to signal an event –timers, open/close, client connects, etc Allow the user to tie in domain logic Allow the user to call a processor specific API

DataLines Data The generalization of data is a DlTuple Tuple is just a set of values DlTuple is the interface processors use –String[] <-- getFieldNames() –DlValue <-- getValue(fieldname)

DataLines Data Tuples can have virtual fields –calculated values, static values Tuples can have composite fields The creation of the tuple is left to the processor in charge of conversion

XML Syntax … run away!

Data Example

Data Example ${A} + ${B}

DataLines Tutorial Fast forward past a painful 3 hour tutorial covering each of those sections in detail (tuples, processors, pipelines, event management, configurations) You have seen all the XML though!

DataLines Distilled A library of data processors that operate on “Tuples” –one of the processors takes the raw data and creates the tuple An XML compiler that takes the xml file, the library, and creates an application

DataLines Example

DataLines in use DataLines does make it easier to hit the ground running. Much of the tedious work you need to do is taken care of For highly specific needs, you still need to write code. But that code then becomes part of the DataLines lib. That others can build on

Balance Sheet Positive Flexible (vendor neutral, data, debugging) Reusable (pipelines, processors) Fast development time “easy” to change the client (cli, desktop, web page) Negative May need to write domain specific code Learning curve -- processors config, data expectations, events

DataLines in Action Network Engineering group –Monitor router, tar pit, IDS, packet sampling, L2/L3 mappings Security Group –Network forensics Intergroup Wiring Use DataLines to share data between groups/projects

DataLines in Action Network Research group –Monitor cluster network activity from MPI layer –Data Mining –Misc. NSF data oriented projects

Future Open Source More Info: (a work in progress)