Presentation is loading. Please wait.

Presentation is loading. Please wait.

István Lőrentz 1 Mihaela Malita 2 Răzvan Andonie 3 Mihaela MalitaRăzvan Andonie 3 (presenter) 1 Electronics and Computers Department, Transylvania University.

Similar presentations


Presentation on theme: "István Lőrentz 1 Mihaela Malita 2 Răzvan Andonie 3 Mihaela MalitaRăzvan Andonie 3 (presenter) 1 Electronics and Computers Department, Transylvania University."— Presentation transcript:

1 István Lőrentz 1 Mihaela Malita 2 Răzvan Andonie 3 Mihaela MalitaRăzvan Andonie 3 (presenter) 1 Electronics and Computers Department, Transylvania University of Brasov, Romania 2 Computer Science Department, Saint Anselm College Manchester, NH, 3 Computer Science Department, Central Washington University Ellensburg, WA, USA MAICS 2011 The 22nd Midwest Artificial Intelligence and Cognitive Science Conference

2 The Connex Architecture (more in Prof. Gheorghe M. Stefan)Gheorghe M. Stefan Evolutionary Algorithms (EA) Parallelizing EA on Connex Example problems Results Conclusions

3  The Connex Array: ◦Many-core data parallel area of 1024 Processing Cells (PC) ◦Area: ~ 50 mm 2 of the 1024-PC array, including 1Mbyte of memory and the two controllers ◦Clock speed: 400 MHz  Also on the chip ◦Multi-core area: 4 MIPS cores ◦Speculative parallel pipe of 8 PE  Interfaces ◦DDR, PCI ◦Video and Audio interfaces for 2 HDTV channels  Total Power: ~ 5 Watts  Total Area: 82 mm 2  65nm implementation

4 Sequencer Issues in each cycle (on a 2- stage pipe) one instruction for Connex Array and one instruction for itself I/O Controller Controls a 6.4 GB/s I/O channel Works in parallel with code running on the Connex Array Processing Cell Integer unit Data memory Boolean (predicate) unit

5  Chromosomes represented as vectors of integer components in Connex  Maximum chromosome length: 1024 elements  Population forms a matrix  Processing blocks are parallelized Crossover Mutation Evaluation Convergence or limit ? Select new generation STOP Initialize population randomly NoYes

6  Similar algorithm to GA  Population and mutation parameters encoded in vectors  Recombination forms a new individual from multiple parents  Mutation adds a gaussian- distributed random variable to each vector component  Deterministic selection of new generation, based of fitness ranking Recombination Mutation Evaluation Convergence or limit ? Select new parent generation STOP Initialize population randomly NoYes

7  Combines genes of two individuals (parents)  Example: 1-point crossover at a random position in Vector-C: vector crossover (vector X, vector Y) { int position = rand( VECTORSIZE ) ; where ( i < position) C = X; elsewhere C = Y; return C; }  Uses Connex's parallel-if construct: where(cond) {…} elsewhere {...}

8 A single position is selected, randomly vector mutate(vector X){ int pos = rand(VECTOR_SIZE); float amount = rand11(); where (i == pos) X += amount; return X; } The operation will affect only the selected position

9 The class of fitness functions that can be evaluated efficiently on Connex are those composed by: 1. data-parallel stage (local computation on each PC), followed by 2. parallel reduction (sum) For example: - Sum of squared differences - Knapsack problem: sum of weighted items - Travelling salesman problem: sum of distances between cities in a route

10 Benchmark problem for optimizations Vector-C implementation: where ( i<N ) Xsh = rotateLeft(X, 1); where( i<(N-1) ) { X2 = X * X; Xsh -= X2; Xsh *= Xsh * 100; X2 = 1 - X; X2 = X2 * X2; X2 += Xsh; } return sumv(X2);

11 The problem: given a set of distance measurements between atoms, determine their cartesian coordonates Formulated as a global optimization problem, minimize: Not all distances are known Some distances can be given as upper and lower bounds

12 Each given distance d(i,j) is mapped to a processing element Some PC share vertices Shared vertices share also random generator seeds No interprocessor communication (except parallel reduction)

13 Evaluate distances Xi,Yi = vertices D = vector of known distances void evaluateDist(vector Xi,Yi,D) { vector Dx, Dy; Dx=Xi[k]-Xj[k]; Dy=Yi[k]-Yj[k]; Dx *= dx; Dy *= dy; Dx += dy; return sumAbsDiff(Dx,D); }

14 Results Operation T par T seq Speedup A+=B11024N xorshift 1281313312N sumAbsDiffs740960.5 N 1-Point Crossover 320480.6 N Uniform Crossover 15143500.9 N Uniform Mutation33211720.6 N HS Mutation107715060.6 N Rosenbrock1414325N evaluateDist13102400.7 N Summary of operations: parallel instruction counts, sequential instructions and speedups, where N=1024, the vector size.

15 - The Connex chip is suitable to parallelize evolutionary algorithms, by vectorization - By horizontal data mapping, we can benefit of the parallel reduction, for a certain class of optimization problems


Download ppt "István Lőrentz 1 Mihaela Malita 2 Răzvan Andonie 3 Mihaela MalitaRăzvan Andonie 3 (presenter) 1 Electronics and Computers Department, Transylvania University."

Similar presentations


Ads by Google