Parallelizing Incremental Bayesian Segmentation (IBS)

Slides:



Advertisements
Similar presentations
Approaches, Tools, and Applications Islam A. El-Shaarawy Shoubra Faculty of Eng.
Advertisements

An Adaptive Learning Method for Target Tracking across Multiple Cameras Kuan-Wen Chen, Chih-Chuan Lai, Yi-Ping Hung, Chu-Song Chen National Taiwan University.
Distributed Systems CS
1 VLDB 2006, Seoul Mapping a Moving Landscape by Mining Mountains of Logs Automated Generation of a Dependency Model for HUG’s Clinical System Mirko Steinle,
A Dynamic World, what can Grids do for Multi-Core computing? Daniel Goodman, Anne Trefethen and Douglas Creager
Samford University Virtual Supercomputer (SUVS) Brian Toone 4/14/09.
1 Coven a Framework for High Performance Problem Solving Environments Nathan A. DeBardeleben Walter B. Ligon III Sourabh Pandit Dan C. Stanzione Jr. Parallel.
Parallelizing Incremental Bayesian Segmentation (IBS) Joseph Hastings Sid Sen.
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
Parallelized Evolution System Onur Soysal, Erkin Bahçeci Erol Şahin Dept. of Computer Engineering Middle East Technical University.
Methods of identification and localization of the DNA coding sequences Jacek Leluk Interdisciplinary Centre for Mathematical and Computational Modelling,
Image Processing Using Cilk 1 Parallel Processing – Final Project Image Processing Using Cilk Tomer Y & Tuval A (pp25)
Science Advisory Committee Meeting - 20 September 3, 2010 Stanford University 1 04_Parallel Processing Parallel Processing Majid AlMeshari John W. Conklin.
Hossein Bastan Isfahan University of Technology 1/23.
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
KUAS.EE Parallel Computing at a Glance. KUAS.EE History Parallel Computing.
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
Isolated-Word Speech Recognition Using Hidden Markov Models
1 Developing Native Device for MPJ Express Advisor: Dr. Aamir Shafi Co-advisor: Ms Samin Khaliq.
Anomaly detection with Bayesian networks Website: John Sandiford.
ITCS 4/5145 Cluster Computing, UNC-Charlotte, B. Wilkinson, 2006outline.1 ITCS 4145/5145 Parallel Programming (Cluster Computing) Fall 2006 Barry Wilkinson.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
K. J. O’Hara AMRS: Behavior Recognition and Opponent Modeling Oct Behavior Recognition and Opponent Modeling in Autonomous Multi-Robot Systems.
April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.
Summary Background –Why do we need parallel processing? Moore’s law. Applications. Introduction in algorithms and applications –Methodology to develop.
An Investigation into Implementations of DNA Sequence Pattern Matching Algorithms Peden Nichols Computer Systems Research April,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Mining Logs Files for Data-Driven System Management Advisor.
Motivation: Sorting is among the fundamental problems of computer science. Sorting of different datasets is present in most applications, ranging from.
+ Clusters Alternative to SMP as an approach to providing high performance and high availability Particularly attractive for server applications Defined.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
Using Charm++ to Mask Latency in Grid Computing Applications Gregory A. Koenig Parallel Programming Laboratory Department.
Parallelization Strategies Laxmikant Kale. Overview OpenMP Strategies Need for adaptive strategies –Object migration based dynamic load balancing –Minimal.
Artificial Intelligence: Research and Collaborative Possibilities a presentation by: Dr. Ernest L. McDuffie, Assistant Professor Department of Computer.
Anomaly Detection. Network Intrusion Detection Techniques. Ştefan-Iulian Handra Dept. of Computer Science Polytechnic University of Timișoara June 2010.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
© Peter Andreae Java Programs COMP 102 # T1 Peter Andreae Computer Science Victoria University of Wellington.
Fermi National Accelerator Laboratory & Thomas Jefferson National Accelerator Facility SciDAC LQCD Software The Department of Energy (DOE) Office of Science.
Parallel Algorithm Design & Analysis Course Dr. Stephen V. Providence Motivation, Overview, Expectations, What’s next.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Introduction to Machine Learning, its potential usage in network area,
cs612/2002sp/projects/ CS612 Term Projects cs612/2002sp/projects/
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Duncan MacMichael & Galen Deal CSS 534 – Autumn 2016
Tutorial: Big Data Algorithms and Applications Under Hadoop
CSCI-235 Micro-Computer Applications
Problem Parallelize (serial) applications that use files. In general
MatLab Programming By Kishan Kathiriya.
The Echo Algorithm The echo algorithm can be used to collect and disperse information in a distributed system It was originally designed for learning network.
QianZhu, Liang Chen and Gagan Agrawal
Programming Models for SimMillennium
by Hyunwoo Park and Kichun Lee Knowledge-Based Systems 60 (2014) 58–72
Genomic Data Clustering on FPGAs for Compression
NGS computation services: APIs and Parallel Jobs
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
Milind A. Bhandarkar Adaptive MPI Milind A. Bhandarkar
1.
CSE8380 Parallel and Distributed Processing Presentation
Distributed Systems CS
By Brandon, Ben, and Lee Parallel Computing.
Immersed Boundary Method Simulation in Titanium Objectives
Programming Fundamentals (750113) Ch1. Problem Solving
Automatic Segmentation of Data Sequences
Statistical Data Mining
An Introduction to Programming with C++ Fifth Edition
Programming Fundamentals (750113) Ch1. Problem Solving
Math review - scalars, vectors, and matrices
August 8, 2006 Danny Budik, Itamar Elhanany Machine Intelligence Lab
MapReduce: Simplified Data Processing on Large Clusters
Excursions into Parallel Programming
Presentation transcript:

Parallelizing Incremental Bayesian Segmentation (IBS) Joseph Hastings (in collaboration with Sid Sen)

IBS Incremental Bayesian Segmentation [1] is an on-line machine learning algorithm designed to segment time-series data into a set of distinct clusters It models the time-series as the concatenation of processes, each generated by a distinct Markov probability distribution, and attempts to find the most-likely break points between the processes

Training Process During the training phase of the algorithm, IBS builds a set of Markov matrices that it believes are most likely to describe the set of processes responsible for generating the time series

Project Proposal Currently, Joseph is attempting to use IBS to detect computer networking abnormalities (M. Eng thesis) Underlying most of the computations of the IBS algorithm are matrix calculations that we believe can be re-written to work in parallel The matrices involved are up to 250 by 250 elements in size, computations involve double-precision probability calculations

Parallelizable Operations Entropy and relative entropy calculations Generating marginal likelihood that a particular sequence of transitions would be observed given a Markov probability distribution Matrix addition, conversion from histograms (integers) to estimated probabilities (doubles), KL-distance between pairs of matrices

Project Plan

MPI Use MPI to parallelize relevant matrix operations Some amount of communication will be required even after data has been distributed (the operations depend upon knowledge of the time-series itself)

Cilk Originally developed by the Supercomputing Technologies Group at the MIT Laboratory for Computer Science (Sid’s current work) Cilk is a language for multithreaded parallel programming based on ANSI C that is very effective for exploiting highly asynchronous parallelism [3] (which can be difficult to write using message-passing interfaces like MPI)

Cilk First step is to convert the C++ program to Cilk (very easy) Real intelligence is in Cilk runtime system, which handles load balancing, paging, and communication protocols between running threads Plan to make the runtime system adaptively parallel by intelligently determining how many threads/processors to use and how to distribute these threads

Comparison of Results Compare speed/performance on: C++/MPI code Cilk code (using released version of Cilk) Cilk’ code (using modified version of Cilk—with adaptive parallelism)

Progress Checkpoint Completed tasks: Original code (Java, LISP, Perl) Initial porting to C++ (conversion of data structures, classes, and some mathematical functions) Understanding the source code of Cilk; looking up appropriate system calls to provide information about processors and their state

References [1] Paola Sebastiani and Marco Ramoni. Incremental Bayesian Segmentation of Categorical Temporal Data. 2000. [2] Wenke Lee and Salvatore J. Stolfo. Data Mining Approaches for Intrusion Detection. 1998. [3] Cilk 5.3.2 Reference Manual. Supercomputing Technologies Group, MIT Lab for Computer Science. November 9, 2001. Available online: http://supertech.lcs.mit.edu/manual-5.3.2.pdf.