X-Informatics Clustering to Parallel Computing February 18 and 20 2013 Geoffrey Fox

Slides:



Advertisements
Similar presentations
Load Balancing Parallel Applications on Heterogeneous Platforms.
Advertisements

Clustering k-mean clustering Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
N-Body I CS 170: Computing for the Sciences and Mathematics.
Dynamic Load Balancing for VORPAL Viktor Przebinda Center for Integrated Plasma Studies.
Confidentiality/date line: 13pt Arial Regular, white Maximum length: 1 line Information separated by vertical strokes, with two spaces on either side Disclaimer.
2 Less fish … More fish! Parallelism means doing multiple things at the same time: you can get more work done in the same time.
Dinker Batra CLUSTERING Categories of Clusters. Dinker Batra Introduction A computer cluster is a group of linked computers, working together closely.
Generated Waypoint Efficiency: The efficiency considered here is defined as follows: As can be seen from the graph, for the obstruction radius values (200,
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500 Cluster.
Technology for Informatics Kmeans and MapReduce Parallelism April Geoffrey Fox Associate Dean for Research.
OpenFOAM on a GPU-based Heterogeneous Cluster
CISC October Goals for today: Foster’s parallel algorithm design –Partitioning –Task dependency graph Granularity Concurrency Collective communication.
Reference: Message Passing Fundamentals.
CS 584. Review n Systems of equations and finite element methods are related.
ECE669 L4: Parallel Applications February 10, 2004 ECE 669 Parallel Computer Architecture Lecture 4 Parallel Applications.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Clustering Color/Intensity
System Partitioning Kris Kuchcinski
High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.
Strategies for Implementing Dynamic Load Sharing.
Molecular Dynamics Classical trajectories and exact solutions
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
Rechen- und Kommunikationszentrum (RZ) Parallelization at a Glance Christian Terboven / Aachen, Germany Stand: Version 2.3.
Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.
Computational Science jsusciencesimulation Principles of Scientific Simulation Spring Semester 2005 Geoffrey Fox Community.
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500, clusters,
Institute for Mathematical Modeling RAS 1 Dynamic load balancing. Overview. Simulation of combustion problems using multiprocessor computer systems For.
1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.
X-Informatics Web Search; Text Mining B 2013 Geoffrey Fox Associate Dean for.
Network Aware Resource Allocation in Distributed Clouds.
Computational issues in Carbon nanotube simulation Ashok Srinivasan Department of Computer Science Florida State University.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
Finite Element Method.
Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.
Subject: Operating System.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
Application Paradigms: Unstructured Grids CS433 Spring 2001 Laxmikant Kale.
CS 584. Load Balancing Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
Lecture 7: Design of Parallel Programs Part II Lecturer: Simon Winberg.
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
X-Informatics MapReduce February Geoffrey Fox Associate Dean for Research.
PC07LaplacePerformance Parallel Computing 2007: Performance Analysis for Laplace’s Equation February 26-March Geoffrey Fox Community.
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University.
Advanced Computer Networks Lecture 1 - Parallelization 1.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
MSc in High Performance Computing Computational Chemistry Module Parallel Molecular Dynamics (i) Bill Smith CCLRC Daresbury Laboratory
HPC in the Cloud – Clearing the Mist or Lost in the Fog Panel at SC11 Seattle November Geoffrey Fox
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
AQWA Adaptive Query-Workload-Aware Partitioning of Big Spatial Data Dimosthenis Stefanidis Stelios Nikolaou.
CZ5211 Topics in Computational Biology Lecture 4: Clustering Analysis for Microarray Data II Prof. Chen Yu Zong Tel:
Optimization Indiana University July Geoffrey Fox
Parallel Programming in Chess Simulations Part 2 Tyler Patton.
Parallel Computing Presented by Justin Reschke
Concurrency and Performance Based on slides by Henri Casanova.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
Parallel programs Inf-2202 Concurrent and Data-intensive Programming Fall 2016 Lars Ailo Bongo
Panel: Beyond Exascale Computing
Auburn University
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
EE 193: Parallel Computing
Craig Schroeder October 26, 2004
Course Outline Introduction in algorithms and applications
By Brandon, Ben, and Lee Parallel Computing.
Simple Kmeans Examples
Basic Kmeans Clustering and Optimization
Indiana University July Geoffrey Fox
Parallel Programming in C with MPI and OpenMP
Presentation transcript:

X-Informatics Clustering to Parallel Computing February 18 and Geoffrey Fox Associate Dean for Research and Graduate Studies, School of Informatics and Computing Indiana University Bloomington 2013

The Course in One Sentence Study Clouds running Data Analytics processing Big Data to solve problems in X-Informatics

k means Clustering Choose k=2 to 10 million (or more) clusters according to some intuition Initialize k cluster centers (this only works in real spaces) Then Iterate – Assign each point to cluster whose center it is nearest to – Calculate new center position as centroid (average in each component) of points assigned to it Converge when center positions change < cut Lots of variants that are better but k means used as simple and fast – Also methods for when only some distances available and no vector space Sometimes gets wrong answer – trapped in a local minima

1) k initial "means" (in this case k=3) are randomly generated within the data domain (shown in color). 2) k clusters are created by associating every observation with the nearest mean. The partitions here represent the Voronoi diagram generated by the means. Wikipedia

3) The centroid of each of the k clusters becomes the new mean. 4) Steps 2 and 3 are repeated until convergence has been reached. But red and green centers (circles) are clearly in wrong place!!! (stars about right) Wikipedia

Clustering v User Rating

Two views of a 3D Clustering I

Two views of a 3D Clustering II

Getting the wrong answer Remember all things in life – including clustering – are optimization problems Lets say it’s a minimization (change sign if maximizing) k means always ends at a minimum but maybe a local minimum Unhappiness Real Minimum Local Minimum Wrong Answer Initial Position Center Positions

Annealing Physics systems find true lowest energy state if you anneal i.e. you equilibrate at each temperature as you cool – Allow atoms to move around Corresponds to fuzzy algorithms

Annealing T1 T2 T3 T1 T2 T3 T3 < T2 < T1 Objective Function Configuration – Center Positions Y(k) Fixed Temperature – false minima Annealing -- correct minima

Greedy Algorithms Many algorithms like k Means are greedy They consist of iterations Each iterations makes the step that most obviously minimizes function Set of short term optima; not always global optima Same is true of life (politics, wall street)

Parallel Computing I Our applications are running in a cloud with up to 100K servers in it – A million or so cores each of which is independent When you or rather your phone/tablet/laptop accesses cloud, you take a core which handles your request – So we can easily handle a million requests at same time This is simplest form of parallel computing – parallelism over jobs or users But other forms are needed to – Access giant databases – Calculate model-based recommendations linking items together – Do a large clustering – Answer a search question (similar to accessing giant database) These use “data parallelism” also used on supercomputers to calculate complex model of universe, fusion, battery, deformation of car in crash, evolution of molecules in a protein dynamics

Parallel Computing II Instead of dividing cores by users we (also) divide (some of) cores so each has a fraction of data In study of air flow over a plane, you divide air into a mesh and determine pressure, density, velocity at each mesh point – Weather forecasting similar Here data is mesh points For search, data is web pages For eCommerce, data is items and/or people For Kmeans, data is points to be clustered

PC07Intro Update on the Mesh 14 by 14 Internal Mesh

PC07Intro Parallelism is Straightforward If one has 16 processors, then decompose geometrical area into 16 equal parts Each Processor updates 9 12 or 16 grid points independently

PC07Intro Irregular 2D Simulation -- Flow over an Airfoil The Laplace grid points become finite element mesh nodal points arranged as triangles filling space All the action (triangles) is near near wing boundary Use domain decomposition but no longer equal area as equal triangle count

PC07Intro Simulation of cosmological cluster (say 10 million stars ) Lots of work per star as very close together ( may need smaller time step) Little work per star as force changes slowly and can be well approximated by low order multipole expansion Heterogeneous Problems

PC07Intro Further Decomposition Strategies Not all decompositions are quite the same In defending against missile attacks, you track each missile on a separate node -- geometric again In playing chess, you decompose chess tree -- an abstract not geometric space Computer Chess Tree Current Position (node in Tree) First Set Moves Opponents Counter Moves California gets its independence

PC07Intro Optimal v. stable scattered Decompositions Consider a set of locally interacting particles simulated on a 4 processor system Optimal overall

Parallel Computing III Note if I chop points up in Kmeans, then a given cluster is likely to need points in multiple cores As a user, I want search results from all web pages As an item, I want to be linked to all other items in eCommerce In parallel chess, I need to find best move whatever core discovered it and fancy optimal search algorithm (alpha-beta search) needs shared database and can have very different time complexity on each core – Need equal number of good moves on each core The things in red are decomposed across cores and communication between cores needed to either synchronize or send information – If cores on same CPU can use “shared memory” Fundamental problem in parallel computing is to efficiently coordinate cores and – Minimize synchronization and communication overhead – Make cores do roughly equal amount of work Famous cloud parallel computing approach called MapReduce invented by Google with sweetly named software called Hadoop from Yahoo Science used software called MPI – Message passing Interface

PC07Intro Parallel Irregular Finite Elements Here is a cracked plate and calculating stresses with an equal area decomposition leads to terrible results – All the work is near crack Processor

PC07Intro Irregular Decomposition for Crack Concentrating processors near crack leads to good workload balance equal nodal point -- not equal area -- but to minimize communication nodal points assigned to a particular processor are contiguous This is NP complete (exponenially hard) optimization problem but in practice many ways of getting good but not exact good decompositions Processor Region assigned to 1 processor Work Load Not Perfect !

PC07Intro Parallel Processing in Society It’s all well known ……

PC07Intro

PC07Intro Divide problem into parts; one part for each processor 8-person parallel processor

PC07Intro

PC07Intro Amdahl’s Law of Parallel Processing Speedup S(N) is ratio Time(1 Processor)/Time(N Processors); we want S(N) ≥ 0.8 N Amdahl’s law said no problem could get a speedup greater than about 10 It is misleading as it was gotten by looking at small or non-parallelizable problems (such as existing software) For Hadrian’s wall S(N) satisfies our goal as long as l  about 60 meters if l overlap = about 6 meters If l is roughly same size as l overlap then we have “too many cooks spoil the broth syndrome” –One needs large problems to get good parallelism but only large problems need large scale parallelism

PC07Intro

PC07Intro

PC07Intro

PC07Intro

PC07Intro The Web is also just message passing Neural Network

PC07Intro 1984 Slide – today replace hypercube by cluster

PC07Intro

PC07Intro

PC07Intro Inside CPU or Inner Parallelism Between CPU’s Called Outer Parallelism

PC07Intro Key to MapReduce

PC07Intro