Presentation is loading. Please wait.

Presentation is loading. Please wait.

DataLines a framework for building steaming data applications Mike Haberman Senior Software/Network Engineer

Similar presentations


Presentation on theme: "DataLines a framework for building steaming data applications Mike Haberman Senior Software/Network Engineer"— Presentation transcript:

1 DataLines a framework for building steaming data applications Mike Haberman Senior Software/Network Engineer mikeh@ncsa.edu

2 The Problem Data deluge: routers, switches, IDS, servers (web, mail, logs, etc), software (tcpdump, web100, SNMP, tarpit, etc), sensors, taps, … (help me) ? ? ?

3 The problem (continues) Disparate data formats Software (sometimes) to manage each Tweaking to get what you want (custom software) Correlating data (more custom software)

4 DataLines Can we build a framework that can remove all (most) of the tedium of working with all these disparate data formats?

5 DataLines Framework designed to manage and build streaming data processing applications

6 DataLines Framework designed to manage and build streaming data processing applications

7 DataLines Framework Manage: would like one tool to handle all these different data sources. designed to manage and build streaming data processing applications

8 DataLines Framework Build: uniform way of creating a data processing application. designed to manage and build streaming data processing applications

9 DataLines Framework Streaming data: Never ending stream of ‘manageable’ chunks of data No random access, no blocking operators One look, linear or sub-linear algorithms/data ops Each data item (a tuple in DataLines) is an independent entity Many tools were not designed for streaming data designed to manage and build streaming data processing applications

10 DataLines Framework Processing: Something you want to do to the data (e.g. reading, writing, parsing, event generation, filtering, statistics, reports, data synopsis, …) designed to manage and build streaming data processing applications

11 DataLines Creating a DataLines application: XML DataLines Application “compile”

12 DataLines XML file defines 3 major components: –Data Processors What one does with the data –Processing Order The order in which the processors will operate on the data –Event Management What to do when a processor generates an event

13 DataLines Processors Data Processors are the heart of D.L. –I/O: socket, file –Filters:inline, dispatch –Collectors: binning, windowing (w/operators) –Gui: charts, picture taking –Converters: binary to tuple –Misc: printers, counters, iterators, timers, data generators, gates, delays Processors can generate events Processors can drop, mutate, mutilate the tuple being processed, generate new tuples

14 DataLines Pipelines Control tuple movement among processors Can connect either processors or other pipelines Two paths within a pipeline: binary and tuple

15 Event Management Allow processors to signal an event –timers, open/close, client connects, etc Allow the user to tie in domain logic Allow the user to call a processor specific API

16 DataLines Data The generalization of data is a DlTuple Tuple is just a set of values DlTuple is the interface processors use –String[] <-- getFieldNames() –DlValue <-- getValue(fieldname)

17 DataLines Data Tuples can have virtual fields –calculated values, static values Tuples can have composite fields The creation of the tuple is left to the processor in charge of conversion

18 XML Syntax … run away!

19 Data Example

20 Data Example ${A} + ${B}

21 DataLines Tutorial Fast forward past a painful 3 hour tutorial covering each of those sections in detail (tuples, processors, pipelines, event management, configurations) You have seen all the XML though!

22 DataLines Distilled A library of data processors that operate on “Tuples” –one of the processors takes the raw data and creates the tuple An XML compiler that takes the xml file, the library, and creates an application

23 DataLines Example

24 DataLines in use DataLines does make it easier to hit the ground running. Much of the tedious work you need to do is taken care of For highly specific needs, you still need to write code. But that code then becomes part of the DataLines lib. That others can build on

25 Balance Sheet Positive Flexible (vendor neutral, data, debugging) Reusable (pipelines, processors) Fast development time “easy” to change the client (cli, desktop, web page) Negative May need to write domain specific code Learning curve -- processors config, data expectations, events

26 DataLines in Action Network Engineering group –Monitor router, tar pit, IDS, packet sampling, L2/L3 mappings Security Group –Network forensics Intergroup Wiring Use DataLines to share data between groups/projects

27 DataLines in Action Network Research group –Monitor cluster network activity from MPI layer –Data Mining –Misc. NSF data oriented projects

28 Future Open Source More Info: mikeh@ncsa.edumikeh@ncsa.edu http://datalines.ncsa.uiuc.edu (a work in progress)http://datalines.ncsa.uiuc.edu


Download ppt "DataLines a framework for building steaming data applications Mike Haberman Senior Software/Network Engineer"

Similar presentations


Ads by Google