Download presentation
Presentation is loading. Please wait.
1
Instructor’s Intent for this course
2
Making connections between concepts
Cache C Pipelining Locality Code tf Threads Parallelism MPI OpenMP Locks Depend.
3
Challenge Question for (i = 0; i < 100000; i++)
a[i ] = a[i] + 1; Dependences between a[0], a[1000], a[2000] … a[1], a[1001], a[2001] … “Dependence distance” is 1000 First idea: make the “dependence distance” fall outside of loop
4
General Example Dependence if m<n. for( i=0; i<n; i++ )
a[i+m] = a[i] + 1; Dependence if m<n.
5
Answer 1 for (i = 0; i < 100; i++) #pragma omp parallel for
for (j = i*1000; j < (i+1)*1000; j++) a[j+1000] = a[j] + 1; Not ideal – parallelizes inner loop
6
Answer 2 #pragma omp parallel for for (i = 0; i < 1000; i++)
for (j = 0; j < 100; j++) a[i + (j+1)*1000] = a[i + j*1000] + 1; Not ideal – same thread “jumps” through array Poor cache locality – leads to high false sharing (because array is written “interspersed” by different threads)
7
False Sharing Example cache line a[0] a[1] a[2] a[3] a[4] a[5] a[6]
Written by processor 0 Written by processor 1
8
Answer 3 #pragma omp parallel for private j
for (i=1; i <100; i++) { stride = i*1000; for(j = 0; j < 1000; j++) a [stride+j] = a[j] + i; } Maintain “intent” of the code, completely restructure it For i 1000 … a[i] is incremented by 1 For i 2000 … a[i] is incremented by 2, etc
9
Answer 4 (additional 0.2% bonus)
#pragma omp parallel for for (i = 1000; i < ; i++) a[i] = a[i%1000] + i/1000;
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.