Programming with CUDA WS 08/09 Lecture 12 Tue, 02 Dec, 2008
Previously Optimization example: parallel reduction Optimization example: parallel reduction
Today Graded/ungraded course? Graded/ungraded course? Revisiting shared memory bank conflicts Revisiting shared memory bank conflicts Final projects Final projects
Graded/ungraded All settled? All settled?
Shared Memory Devices of compute capability 1.x have 16 banks Devices of compute capability 1.x have 16 banks –16K shared memory in 16 banks, 1K each –Successive 32-bit words are stored in successive banks
Final Projects Time-line Time-line –Thu, 20 Nov: Float write-ups on ideas of Jens & Waqar Float write-ups on ideas of Jens & Waqar –Tue, 25 Nov: Suggest groups and topics Suggest groups and topics –Thu, 27 Nov: Groups and topics assigned Groups and topics assigned –Tue, 2 Dec (today): Last chance to change groups/topics Last chance to change groups/topics Groups and topics finalized Groups and topics finalized
Final Projects There will be no lectures in the second half There will be no lectures in the second half –Meetings with groups –Schedule will be put online
Final Projects General tips General tips –Optimize your code –Document your code –Make your code platform independent –don't mix C and C++
Final Exam When? When?
On to exercises!