István Lőrentz 1 Mihaela Malita 2 Răzvan Andonie 3 Mihaela MalitaRăzvan Andonie 3 (presenter) 1 Electronics and Computers Department, Transylvania University of Brasov, Romania 2 Computer Science Department, Saint Anselm College Manchester, NH, 3 Computer Science Department, Central Washington University Ellensburg, WA, USA MAICS 2011 The 22nd Midwest Artificial Intelligence and Cognitive Science Conference
The Connex Architecture (more in Prof. Gheorghe M. Stefan)Gheorghe M. Stefan Evolutionary Algorithms (EA) Parallelizing EA on Connex Example problems Results Conclusions
The Connex Array: ◦Many-core data parallel area of 1024 Processing Cells (PC) ◦Area: ~ 50 mm 2 of the 1024-PC array, including 1Mbyte of memory and the two controllers ◦Clock speed: 400 MHz Also on the chip ◦Multi-core area: 4 MIPS cores ◦Speculative parallel pipe of 8 PE Interfaces ◦DDR, PCI ◦Video and Audio interfaces for 2 HDTV channels Total Power: ~ 5 Watts Total Area: 82 mm 2 65nm implementation
Sequencer Issues in each cycle (on a 2- stage pipe) one instruction for Connex Array and one instruction for itself I/O Controller Controls a 6.4 GB/s I/O channel Works in parallel with code running on the Connex Array Processing Cell Integer unit Data memory Boolean (predicate) unit
Chromosomes represented as vectors of integer components in Connex Maximum chromosome length: 1024 elements Population forms a matrix Processing blocks are parallelized Crossover Mutation Evaluation Convergence or limit ? Select new generation STOP Initialize population randomly NoYes
Similar algorithm to GA Population and mutation parameters encoded in vectors Recombination forms a new individual from multiple parents Mutation adds a gaussian- distributed random variable to each vector component Deterministic selection of new generation, based of fitness ranking Recombination Mutation Evaluation Convergence or limit ? Select new parent generation STOP Initialize population randomly NoYes
Combines genes of two individuals (parents) Example: 1-point crossover at a random position in Vector-C: vector crossover (vector X, vector Y) { int position = rand( VECTORSIZE ) ; where ( i < position) C = X; elsewhere C = Y; return C; } Uses Connex's parallel-if construct: where(cond) {…} elsewhere {...}
A single position is selected, randomly vector mutate(vector X){ int pos = rand(VECTOR_SIZE); float amount = rand11(); where (i == pos) X += amount; return X; } The operation will affect only the selected position
The class of fitness functions that can be evaluated efficiently on Connex are those composed by: 1. data-parallel stage (local computation on each PC), followed by 2. parallel reduction (sum) For example: - Sum of squared differences - Knapsack problem: sum of weighted items - Travelling salesman problem: sum of distances between cities in a route
Benchmark problem for optimizations Vector-C implementation: where ( i<N ) Xsh = rotateLeft(X, 1); where( i<(N-1) ) { X2 = X * X; Xsh -= X2; Xsh *= Xsh * 100; X2 = 1 - X; X2 = X2 * X2; X2 += Xsh; } return sumv(X2);
The problem: given a set of distance measurements between atoms, determine their cartesian coordonates Formulated as a global optimization problem, minimize: Not all distances are known Some distances can be given as upper and lower bounds
Each given distance d(i,j) is mapped to a processing element Some PC share vertices Shared vertices share also random generator seeds No interprocessor communication (except parallel reduction)
Evaluate distances Xi,Yi = vertices D = vector of known distances void evaluateDist(vector Xi,Yi,D) { vector Dx, Dy; Dx=Xi[k]-Xj[k]; Dy=Yi[k]-Yj[k]; Dx *= dx; Dy *= dy; Dx += dy; return sumAbsDiff(Dx,D); }
Results Operation T par T seq Speedup A+=B11024N xorshift N sumAbsDiffs N 1-Point Crossover N Uniform Crossover N Uniform Mutation N HS Mutation N Rosenbrock N evaluateDist N Summary of operations: parallel instruction counts, sequential instructions and speedups, where N=1024, the vector size.
- The Connex chip is suitable to parallelize evolutionary algorithms, by vectorization - By horizontal data mapping, we can benefit of the parallel reduction, for a certain class of optimization problems