Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parallelizing Incremental Bayesian Segmentation (IBS)

Similar presentations


Presentation on theme: "Parallelizing Incremental Bayesian Segmentation (IBS)"— Presentation transcript:

1 Parallelizing Incremental Bayesian Segmentation (IBS)
Joseph Hastings (in collaboration with Sid Sen)

2 IBS Incremental Bayesian Segmentation [1] is an on-line machine learning algorithm designed to segment time-series data into a set of distinct clusters It models the time-series as the concatenation of processes, each generated by a distinct Markov probability distribution, and attempts to find the most-likely break points between the processes

3 Training Process During the training phase of the algorithm, IBS builds a set of Markov matrices that it believes are most likely to describe the set of processes responsible for generating the time series

4 Project Proposal Currently, Joseph is attempting to use IBS to detect computer networking abnormalities (M. Eng thesis) Underlying most of the computations of the IBS algorithm are matrix calculations that we believe can be re-written to work in parallel The matrices involved are up to 250 by 250 elements in size, computations involve double-precision probability calculations

5 Parallelizable Operations
Entropy and relative entropy calculations Generating marginal likelihood that a particular sequence of transitions would be observed given a Markov probability distribution Matrix addition, conversion from histograms (integers) to estimated probabilities (doubles), KL-distance between pairs of matrices

6 Project Plan

7 MPI Use MPI to parallelize relevant matrix operations
Some amount of communication will be required even after data has been distributed (the operations depend upon knowledge of the time-series itself)

8 Cilk Originally developed by the Supercomputing Technologies Group at the MIT Laboratory for Computer Science (Sid’s current work) Cilk is a language for multithreaded parallel programming based on ANSI C that is very effective for exploiting highly asynchronous parallelism [3] (which can be difficult to write using message-passing interfaces like MPI)

9 Cilk First step is to convert the C++ program to Cilk (very easy)
Real intelligence is in Cilk runtime system, which handles load balancing, paging, and communication protocols between running threads Plan to make the runtime system adaptively parallel by intelligently determining how many threads/processors to use and how to distribute these threads

10 Comparison of Results Compare speed/performance on: C++/MPI code
Cilk code (using released version of Cilk) Cilk’ code (using modified version of Cilk—with adaptive parallelism)

11 Progress Checkpoint Completed tasks: Original code (Java, LISP, Perl)
Initial porting to C++ (conversion of data structures, classes, and some mathematical functions) Understanding the source code of Cilk; looking up appropriate system calls to provide information about processors and their state

12 References [1] Paola Sebastiani and Marco Ramoni. Incremental Bayesian Segmentation of Categorical Temporal Data [2] Wenke Lee and Salvatore J. Stolfo. Data Mining Approaches for Intrusion Detection [3] Cilk Reference Manual. Supercomputing Technologies Group, MIT Lab for Computer Science. November 9, Available online:


Download ppt "Parallelizing Incremental Bayesian Segmentation (IBS)"

Similar presentations


Ads by Google