Download presentation
Presentation is loading. Please wait.
Published byRodger Fisher Modified over 9 years ago
1
István Lőrentz 1 Mihaela Malita 2 Răzvan Andonie 3 Mihaela MalitaRăzvan Andonie 3 (presenter) 1 Electronics and Computers Department, Transylvania University of Brasov, Romania 2 Computer Science Department, Saint Anselm College Manchester, NH, 3 Computer Science Department, Central Washington University Ellensburg, WA, USA MAICS 2011 The 22nd Midwest Artificial Intelligence and Cognitive Science Conference
2
The Connex Architecture (more in Prof. Gheorghe M. Stefan)Gheorghe M. Stefan Evolutionary Algorithms (EA) Parallelizing EA on Connex Example problems Results Conclusions
3
The Connex Array: ◦Many-core data parallel area of 1024 Processing Cells (PC) ◦Area: ~ 50 mm 2 of the 1024-PC array, including 1Mbyte of memory and the two controllers ◦Clock speed: 400 MHz Also on the chip ◦Multi-core area: 4 MIPS cores ◦Speculative parallel pipe of 8 PE Interfaces ◦DDR, PCI ◦Video and Audio interfaces for 2 HDTV channels Total Power: ~ 5 Watts Total Area: 82 mm 2 65nm implementation
4
Sequencer Issues in each cycle (on a 2- stage pipe) one instruction for Connex Array and one instruction for itself I/O Controller Controls a 6.4 GB/s I/O channel Works in parallel with code running on the Connex Array Processing Cell Integer unit Data memory Boolean (predicate) unit
5
Chromosomes represented as vectors of integer components in Connex Maximum chromosome length: 1024 elements Population forms a matrix Processing blocks are parallelized Crossover Mutation Evaluation Convergence or limit ? Select new generation STOP Initialize population randomly NoYes
6
Similar algorithm to GA Population and mutation parameters encoded in vectors Recombination forms a new individual from multiple parents Mutation adds a gaussian- distributed random variable to each vector component Deterministic selection of new generation, based of fitness ranking Recombination Mutation Evaluation Convergence or limit ? Select new parent generation STOP Initialize population randomly NoYes
7
Combines genes of two individuals (parents) Example: 1-point crossover at a random position in Vector-C: vector crossover (vector X, vector Y) { int position = rand( VECTORSIZE ) ; where ( i < position) C = X; elsewhere C = Y; return C; } Uses Connex's parallel-if construct: where(cond) {…} elsewhere {...}
8
A single position is selected, randomly vector mutate(vector X){ int pos = rand(VECTOR_SIZE); float amount = rand11(); where (i == pos) X += amount; return X; } The operation will affect only the selected position
9
The class of fitness functions that can be evaluated efficiently on Connex are those composed by: 1. data-parallel stage (local computation on each PC), followed by 2. parallel reduction (sum) For example: - Sum of squared differences - Knapsack problem: sum of weighted items - Travelling salesman problem: sum of distances between cities in a route
10
Benchmark problem for optimizations Vector-C implementation: where ( i<N ) Xsh = rotateLeft(X, 1); where( i<(N-1) ) { X2 = X * X; Xsh -= X2; Xsh *= Xsh * 100; X2 = 1 - X; X2 = X2 * X2; X2 += Xsh; } return sumv(X2);
11
The problem: given a set of distance measurements between atoms, determine their cartesian coordonates Formulated as a global optimization problem, minimize: Not all distances are known Some distances can be given as upper and lower bounds
12
Each given distance d(i,j) is mapped to a processing element Some PC share vertices Shared vertices share also random generator seeds No interprocessor communication (except parallel reduction)
13
Evaluate distances Xi,Yi = vertices D = vector of known distances void evaluateDist(vector Xi,Yi,D) { vector Dx, Dy; Dx=Xi[k]-Xj[k]; Dy=Yi[k]-Yj[k]; Dx *= dx; Dy *= dy; Dx += dy; return sumAbsDiff(Dx,D); }
14
Results Operation T par T seq Speedup A+=B11024N xorshift 1281313312N sumAbsDiffs740960.5 N 1-Point Crossover 320480.6 N Uniform Crossover 15143500.9 N Uniform Mutation33211720.6 N HS Mutation107715060.6 N Rosenbrock1414325N evaluateDist13102400.7 N Summary of operations: parallel instruction counts, sequential instructions and speedups, where N=1024, the vector size.
15
- The Connex chip is suitable to parallelize evolutionary algorithms, by vectorization - By horizontal data mapping, we can benefit of the parallel reduction, for a certain class of optimization problems
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.