Distributed Indexed Outlier Detection Algorithm Status Update as of March 11, 2014.

Slides:

Advertisements

Similar presentations

Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)

Advertisements

I/O Organization popo.

1 Routing Protocols I. 2 Routing Recall: There are two parts to routing IP packets: 1. How to pass a packet from an input interface to the output interface.

Parallel Jacobi Algorithm Steven Dong Applied Mathematics.

MINJAE HWANG THAWAN KOOBURAT CS758 CLASS PROJECT FALL 2009 Extending Task-based Programming Model beyond Shared-memory Systems.

MPI version of the Serial Code With One-Dimensional Decomposition Presented by Timothy H. Kaiser, Ph.D. San Diego Supercomputer Center Presented by Timothy.

Practical techniques & Examples

Parallel Sorting Sathish Vadhiyar. Sorting  Sorting n keys over p processors  Sort and move the keys to the appropriate processor so that every key.

CS 484. Discrete Optimization Problems A discrete optimization problem can be expressed as (S, f) S is the set of all feasible solutions f is the cost.

Garbage Collecting the World Bernard Lang Christian Queinnec Jose Piquer Presented by Yu-Jin Chia See also: pp text.

1 Chapter 1 Why Parallel Computing? An Introduction to Parallel Programming Peter Pacheco.

DISTRIBUTED COMPUTING & MAP REDUCE CS16: Introduction to Data Structures & Algorithms Thursday, April 17,

APACHE GIRAPH ON YARN Chuan Lei and Mohammad Islam.

Final Project of Information Retrieval and Extraction by d 吳蕙如.

Getting Started with MPI Self Test with solution.

Chapter 3 Assignment and Interactive Input. 2 Objectives You should be able to describe: Assignment Operators Mathematical Library Functions Interactive.

1 Parallel Computing—Introduction to Message Passing Interface (MPI)

Performance Optimization of Clustal W: Parallel Clustal W, HT Clustal and MULTICLUSTAL Arunesh Mishra CMSC 838 Presentation Authors : Dmitri Mikhailov,

Chapter 1 and 2 Computer System and Operating System Overview

Cliff Rhyne and Jerry Fu June 5, 2007 Parallel Image Segmenter CSE 262 Spring 2007 Project Final Presentation.

1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.

CS 732: Advance Machine Learning Usman Roshan Department of Computer Science NJIT.

1 TRAPEZOIDAL RULE IN MPI Copyright © 2010, Elsevier Inc. All rights Reserved.

External Sorting Problem: Sorting data sets too large to fit into main memory. –Assume data are stored on disk drive. To sort, portions of the data must.

The University of New Hampshire InterOperability Laboratory Serial ATA (SATA) Protocol Chapter 10 – Transport Layer.

WORKFLOW IN MOBILE ENVIRONMENT. WHAT IS WORKFLOW ?  WORKFLOW IS A COLLECTION OF TASKS ORGANIZED TO ACCOMPLISH SOME BUSINESS PROCESS.  EXAMPLE: Patient.

Task Farming on HPCx David Henty HPCx Applications Support

Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.

CS 221 – May 13 Review chapter 1 Lab – Show me your C programs – Black spaghetti – connect remaining machines – Be able to ping, ssh, and transfer files.

Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.

A COMPARISON MPI vs POSIX Threads. Overview MPI allows you to run multiple processes on 1 host  How would running MPI on 1 host compare with POSIX thread.

Identifying Reversible Functions From an ROBDD Adam MacDonald.

Process Management. Processes Process Concept Process Scheduling Operations on Processes Interprocess Communication Examples of IPC Systems Communication.

The Pipeline Processing Framework LSST Applications Meeting IPAC Feb. 19, 2008 Raymond Plante National Center for Supercomputing Applications.

MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.

1 The Map-Reduce Framework Compiled by Mark Silberstein, using slides from Dan Weld’s class at U. Washington, Yaniv Carmeli and some other.

DNA REASSEMBLY Using Javaspace Sung-Ho Maeung Laura Neureuter.

Module 7 Reading SQL Server® 2008 R2 Execution Plans.

OpenMP OpenMP A.Klypin Shared memory and OpenMP Simple Example Threads Dependencies Directives Handling Common blocks Synchronization Improving load balance.

DAQ Status Graham. EMU / EB status EMU framework prototype is complete. Prototype read, process and send modules are complete. XML configuration mechanism.

Parallel Computing with Matlab CBI Lab Parallel Computing Toolbox TM An Introduction Oct. 27, 2011 By: CBI Development Team.

SEARCHING. Vocabulary List A collection of heterogeneous data (values can be different types) Dynamic in size Array A collection of homogenous data (values.

Association Rule Mining in Peer-to-Peer Systems Ran Wolff Assaf Shcuster Department of Computer Science Technion I.I.T. Haifa 32000,Isreal.

The Alternative Larry Moore. 5 Nodes and Variant Input File Sizes Hadoop Alternative.

Testing the dynamic per-query scheduling (with a FIFO queue) Jan Iwaszkiewicz.

Workshop BigSim Large Parallel Machine Simulation Presented by Eric Bohm PPL Charm Workshop 2004.

1 Compiler Design (40-414)  Main Text Book: Compilers: Principles, Techniques & Tools, 2 nd ed., Aho, Lam, Sethi, and Ullman, 2007  Evaluation:  Midterm.

Saeed Darvish Pazoki – MCSE, CCNA Abstracted From: Cisco Press – ICND 2 – 10 EIGRP 1.

Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 3: Process-Concept.

CS4432: Database Systems II Query Processing- Part 2.

CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/

Project18’s Communication Drawing Design By: Camilo A. Silva BIOinformatics Summer 2008.

Image Processing A Study in Pixel Averaging Building a Resolution Pyramid With Parallel Computing Denise Runnels and Farnaz Zand.

CS 420 Design of Algorithms Parallel Algorithm Design.

Computing Approximate Weighted Matchings in Parallel Fredrik Manne, University of Bergen with Rob Bisseling, Utrecht University Alicia Permell, Michigan.

Techniques for List Creation (2) Data formatting and control level processing Basics for Interactive Lists Detail lists The Program Interface Interactive.

RIP Routing Protocol. 2 Routing Recall: There are two parts to routing IP packets: 1. How to pass a packet from an input interface to the output interface.

Parallel Computation of Skyline Queries Verification COSC6490A Fall 2007 Slawomir Kmiec.

Alien and GSI Marian Ivanov. Outlook GSI experience Alien experience Proposals for further improvement.

| presented by Vasileios Zois CS at USC 09/20/2013 Introducing Scalability into Smart Grid 1.

CEng 713, Evolutionary Computation, Lecture Notes parallel Evolutionary Computation.

Machine Learning Supervised Learning Classification and Regression

Compiler Design (40-414) Main Text Book:

External Sorting Sort n records/elements that reside on a disk.

Process Management Presented By Aditya Gupta Assistant Professor

The University of Adelaide, School of Computer Science

Seating “chart” Front Back 4 rows 5 rows 5 rows 4 rows 2 rows 2 rows

Today’s agenda ~10min: file system (UNIX) ~1hr: final review + Q&A

Discussion Section Week 9

Presentation transcript:

Distributed Indexed Outlier Detection Algorithm Status Update as of March 11, 2014

A Parallel iOrca Algorithm iOrca is a very efficient serial algorithm for outlier detection in data 3 key points for iOrca’s performance Indexing the data prior to analysis to front load potential outliers Select random point R from data, order data based on decreasing distance from R. Given that R is much more likely to be an inlier, distance from R is a good predictor of outliers Break indexed data into blocks for processing Each data point in the processed block is compared to points in the full data set, navigating in a spiral fashion until k neighbors are located closer than the current cutoff. Spiraling finds neighbors more quickly, while the fast increase in cutoff threshold (by indexing relative to R) means fewer comparisons are required to identify k neighbors Global early termination function Once any point is processed where (the distance from that point to R) + (the distance from R to R’s k-th neighbor) < the current cutoff, all processing is complete New DIO/iOrca algorithm applies these same techniques, but distributes the analysis of blocks of data over many processors

DIO Parallel Algorithm Overview Follows iOrca’s data indexing and block processing concepts, but adds a central control function which assigns blocks of data to available worker nodes Each worker node evaluates each data point in block until a sufficient number of neighbors closer than the cutoff are found, or the early termination condition is recognized Worker returns outlier candidates and requests another block of data Controller maintains running list of outliers and threshold, passes current threshold to workers along with data assignments Job ends at data EOF or when global termination condition occurs, controller outputs final list of outliers

Indexed Distributed Method at Controller Node initialization (open files, validate parameters) send (synchronous) to each worker node: index to data block and cutoff threshold loop until data EOF or early termination on message from worker receive (asynch) from worker: count of new potential outliers acknowledge receipt, wait for data receive (synchronous) from worker: details of each new outlier send (synchronous) to worker: new data index and cutoff threshold sort new candidates into master outlier list and recalculate threshold message processing end loop end process final outlier list and print output

Indexed Distributed Method at Worker Node initialization (open files, set parameters, only done once per worker process) loop until finished receive (synchronous) from MPI control process: index to data block & current cutoff threshold check for EOF check for global early termination condition for each data point find neighbors, determine outlier candidates sort outliers, calculate new cutoff threshold if potential outliers are found send (asynch) to MPI controller: count of new outliers wait for receive signal from MPI send (synchronous) to MPI controller: outlier candidates with scores and neighbors end if loop ends when data block index is at or past EOF, or early termination condition is observed

DIO/iOrca – Process Overview CSV input data dprep application indexed data files iOrca application results parsing script CSV output files output files dioapplication DIO works as a drop-in replacement for iOrca File formats, pre-processing and post- processing procedures are unchanged Blue = existing iOrca programs and scripts

Performance Results - Overview Serial iOrca tested against parallel DIO/iOrca Test data is 1 million rows of 10 random floating point numbers Detect 1% of input as outliers, process blocks of 150 rows All testing was done on hotel.futuregrid.org (IBM iDataplex, RHEL 5.9) MPI for testing: OpenMPI version (gnu-4.1 compiler) Largest run attempted, on 96 processor cores, ran in 1.09% of the time required for the serial iOrca job (12.5 minutes vs 19.1 hours)

Performance Results (detail) Processor CoresSeconds to CompletionTime Relative to SerialEfficiency per Core 1 core (serial iOrca)68, % 8 cores8, %97.23% 16 cores4, %102.76% 24 cores2, %103.62% 32 cores2, %103.66% 40 cores1, %102.85% 48 cores1, %102.47% 56 cores1, %101.08% 64 cores1, %99.99% 72 cores %99.13% 80 cores %97.98% 88 cores %96.69% 96 cores %95.49%

Performance Results – Near Linear Scaling

Future Work? Dynamically add additional controller node(s) as needed to allow scaling for much larger applications. Develop DIO as a free standing, open source application (current version runs within iOrca framework)