Parallel ICA Algorithm and Modeling Hongtao Du March 25, 2004.

Slides:

Advertisements

Similar presentations

CSE 160 – Lecture 9 Speed-up, Amdahl’s Law, Gustafson’s Law, efficiency, basic performance metrics.

Advertisements

Independent Component Analysis: The Fast ICA algorithm

Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.

SE-292 High Performance Computing

CA 714CA Midterm Review. C5 Cache Optimization Reduce miss penalty –Hardware and software Reduce miss rate –Hardware and software Reduce hit time –Hardware.

An Information-Maximization Approach to Blind Separation and Blind Deconvolution A.J. Bell and T.J. Sejnowski Computational Modeling of Intelligence (Fri)

PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker.

Parallel System Performance CS 524 – High-Performance Computing.

History of Distributed Systems Joseph Cordina

REAL-TIME INDEPENDENT COMPONENT ANALYSIS IMPLEMENTATION AND APPLICATIONS By MARCOS DE AZAMBUJA TURQUETI FERMILAB May RTC 2010.

A Parallel Computational Model for Heterogeneous Clusters Jose Luis Bosque, Luis Pastor, IEEE TRASACTION ON PARALLEL AND DISTRIBUTED SYSTEM, VOL. 17, NO.

Slide 1 Parallel Computation Models Lecture 3 Lecture 4.

Advanced Topics in Algorithms and Data Structures An overview of the lecture 2 Models of parallel computation Characteristics of SIMD models Design issue.

Models of Parallel Computation

Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.

1 Tuesday, October 03, 2006 If I have seen further, it is by standing on the shoulders of giants. -Isaac Newton.

Independent Component Analysis (ICA) and Factor Analysis (FA)

Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.

Models of Parallel Computation Advanced Algorithms & Data Structures Lecture Theme 12 Prof. Dr. Th. Ottmann Summer Semester 2006.

Parallel System Performance CS 524 – High-Performance Computing.

CS 240A: Complexity Measures for Parallel Computation.

1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.

Introduction to Parallel Processing Ch. 12, Pg

1 Parallel computing and its recent topics. 2 Outline 1. Introduction of parallel processing (1)What is parallel processing (2)Classification of parallel.

KUAS.EE Parallel Computing at a Glance. KUAS.EE History Parallel Computing.

ERP DATA ACQUISITION & PREPROCESSING EEG Acquisition: 256 scalp sites; vertex recording reference (Geodesic Sensor Net)..01 Hz to 100 Hz analogue filter;

10-1 Chapter 10 - Advanced Computer Architecture Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring.

1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.

1 Chapter 1 Parallel Machines and Computations (Fundamentals of Parallel Processing) Dr. Ranette Halverson.

Bulk Synchronous Parallel Processing Model Jamie Perkins.

Flynn’s Taxonomy SISD: Although instruction execution may be pipelined, computers in this category can decode only a single instruction in unit time SIMD:

Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,

Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note Introduction to Parallel Computing (Blaise Barney,

Department of Computer Science University of the West Indies.

Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec.

Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.

Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"

LogP and BSP models. LogP model Common MPP organization: complete machine connected by a network. LogP attempts to capture the characteristics of such.

Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari.

Chapter 9: Alternative Architectures In this course, we have concentrated on single processor systems But there are many other breeds of architectures:

Pipelined and Parallel Computing Data Dependency Analysis for 1 Hongtao Du AICIP Research Mar 9, 2006.

Classic Model of Parallel Processing

Parallel Computing.

Parallel Processing & Distributed Systems Thoai Nam Chapter 2.

Pipelining and Parallelism Mark Staveley

Data Structures and Algorithms in Parallel Computing Lecture 1.

Outline Why this subject? What is High Performance Computing?

Pipelined and Parallel Computing Partition for 1 Hongtao Du AICIP Research Dec 1, 2005 Part 2.

Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Principles of Parallel Programming First Edition by Calvin Lin Lawrence Snyder.

3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.

Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 May 2, 2006 Session 29.

An Introduction of Independent Component Analysis (ICA) Xiaoling Wang Jan. 28, 2003.

Complexity Measures for Parallel Computation. Problem parameters: nindex of problem size pnumber of processors Algorithm parameters: t p running time.

LECTURE #1 INTRODUCTON TO PARALLEL COMPUTING. 1.What is parallel computing? 2.Why we need parallel computing? 3.Why parallel computing is more difficult?

Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 28, 2005 Session 29.

CS203 – Advanced Computer Architecture Performance Evaluation.

Classification of parallel computers Limitations of parallel processing.

Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.

These slides are based on the book:

CS203 – Advanced Computer Architecture

4- Performance Analysis of Parallel Programs

CS 147 – Parallel Processing

Parallel computation models

Guoliang Chen Parallel Computing Guoliang Chen

Data Structures and Algorithms in Parallel Computing

Overview Parallel Processing Pipelining

AN INTRODUCTION ON PARALLEL PROCESSING

Part 2: Parallel Models (I)

Chapter 4 Multiprocessors

COMPUTER ORGANIZATION AND ARCHITECTURE

Presentation transcript:

Parallel ICA Algorithm and Modeling Hongtao Du March 25, 2004

Outline Review – Independent Component Analysis – FastICA – Parallel ICA Parallel Computing Laws Parallel Computing Models Model for pICA

Independent Component Analysis (ICA) A linear transformation which minimizes the higher order statistical dependence between components. ICA model: What is independence? Source signal S: – statistically independent – not more than one is Gaussian distributed Weight matrix (unmixing matrix) W:

Methods to minimize statistical dependence – Mutural information (InfoMax) – K-L divergence or relative entropy (Output Divergence) – Nongaussianity (FastICA)

FastICA Algorithm

Parallel ICA Internal Decorrelation External Decorrelation

Performance Comparison (4 Processors)

Parallel Computing Classified by instruction delivery mechanism and data stream. Single Instruction Flow Multiple Instruction Flow Single Data Stream SISDMISD (Pipeline) Multiple Data Stream SIMD (MPI, PVM) MIMD (Distributed)

SISD: Do-It-Yourself, No help SIMD: Rowing, 1 master, several slave MISD: Assemble line in car manufacture MIMD: Distributed sensor network PICA algorithm for hyperspectral image analysis (high volume data set) is SIMD.

Parallel Computing Laws and Models Amdahl Law Gustafson Law BSP Model LogP Model

Amdahl Law First law for parallel computing (1967) Limit the speedup for parallel applications. where N: number of processors s: serial fraction p: parallel fraction

Speedup boundary: 1/a Serial part should be limited and very fast Problem: parallel computer must be fast sequential computer.

Gustafson Law Improvement of Amdahl law Considering data size In a parallel program, if the quantity of data increases, then the sequential fraction decreases.

Parallel Computing Model Amdahl and Gustafson laws define the limits without considering the properties of the computer architecture Can not predict the real performance of any parallel application. Parallel computing models integrate the computer architecture and application architecture.

Purpose: – Predicting computing cost – Evaluating efficiency of programs Impacts on performance – Computing node (processor, memory) – Communication network – T app =T comp +T comm

Centric vs. Distributed Parallel Random Access Machine – Synchronous processors – Shared memory Distributed-memory Parallel Computer – Distributed processor and memory – Interconnected by a communication network – Each processor has fast access to its own memory, slow access to remote memory P1P2P3 Shared Memory P4 P1 M1 P2 M2 P3 M3 P4 M4

Bulk Synchronous Parallel - BSP For distributed-memory parallel computer. Assumptions – N identical processors, each of them having its own memory – Interconnected with a predictable network. – Each processor can conduct synchronization. Applications are composed by supersteps, separated by global synchronization. Each superstep includes: – computation step – communication step – synchronization step

T Superstep = w + g * h + l – w: maximum of computing time – g: 1 / (Network bandwidth) – h: amount of transferred message – l: time of synchronization Algorithm can be described with w and h.

LogP Model Improvement of BSP model. Decomposing the communication (g) into 3 parts. – Latency (L): message cross the network – Overhead (O): lost time in I/O – Gap (g): gap between 2 consecutive messages oo L

T Superstep = w + (L + 2 * o) * h + l Execution time is the time of the slowest process. The total time for a message to be transferred from processor A to processor B is: L + 2 * o

g > o oo P1 P2 wait g g < o oo P1 P2 g

Giving the finite capacity of the network: Drawbacks: – Does not address the data size. If the all messages are very small? – Does not consider the global capacity of the network.

Model for pICA Features – SIMD – High volume data set transfer at first stage – Low volume data transfer at other stages Combine BSP and LogP models – Stage 1: Pipeline: hyperspectral image transfer, one unit (weight vector) estimations Parallel: Internal decorrelations in sub-matrices – Other stages: Parallel: External decorrelations

T = T stage1 + T stage2 +… + T stagek Number of layers k = log 2 P T stage1 = (w one-unit + w internal-decorrelation ) + (L + 2 * o) * h hyperspectral-image + g * h weight-vectors + l stage1 T stagei = w external-decorrelation + g * h weight-vectors + l stagei i = 2, ….., k

Another Topic Optimization of parallel computing – Heterogeneous parallel computing network – Minimize overall time – Tradeoff problem between computation (individual computer properties) and communication (network)

References A. Hyv¨arinen and Erkki Oja, “A fast fixed-point algorithm for independent component analysis,” Neural Computation, vol. 9, pp. 1483–1492, P. Common, “Independent component analysis, a new concept,” Signal Processing, vol. 36, no. 3, pp. 287–314, April 1994, Special Issue on Highorder Statistics. A.J. Bell and T.J. Sejnowski, “An information maximisation approach to blind separation and blind deconvolution,” Neural Computation, vol. 7, no. 6, pp. 1129–1159, S. Amari, A. Cichochi, and H. Yang, “A new learning algorithm for blind signal separation,” Advances in Neural Information Processing Systems, vol. 8, Te-Won Lee, Mark Girolami, Anthony J. Bell, and Terrence J. Sejnowski, “A unifying information-theoretic framework for independent component analysis,” International Journal on Mathematical and Computer Modeling, 1998.