Download presentation
Presentation is loading. Please wait.
Published byOrlando Marn Modified over 10 years ago
12
The C++ compiler is obsessed with optimization: In this case, it will auto-vectorize the loop
13
B[0] B[1] B[2] B[3] A[0] A[1] A[2] A[3] A[0] + B[0] A[1] + B[1] A[2] + B[2] A[3] + B[3] + xmm1 xmm0 “addps xmm1, xmm0 “ xmm1 for (i = 0; i < 1000; i++) { C[i] = A[i]+B[i] } for (i = 0; i < 1000; i+=4) { C[i:i+3] = A[i:i+3]+B[i:i+3] }
15
info C5002: loop not vectorized due to reason ‘501’
17
info C5001: loop vectorized
20
Lots of triangles: we have less than 15ms to “turn a page” in real time; we need to parallelize this algorithm C++ AMP is a good candidate, since the data size is pretty large
21
We’re looping over each triangle This set of operations is safe, because it works on a single triangle at each time, no races But here we’re updating vertexes which are shared between triangles -> race! This algorithm only works on a single thread
22
for each triangle for each vertex
23
We use C++ AMP Same as before, we calculate the normals for each triangle We collect the normals into a temporary array, which stay in GPU memory
24
We go over each vertex, so no races In sumTriangleNormals, we fetch the normals from tempTriangleNormals, i.e., the temporary we kept on the GPU memory
31
Please submit session evals on the Build Windows 8 App or at http://aka.ms/BuildSessionshttp://aka.ms/BuildSessions
32
MICROSOFT DEVELOPER DIVISION DESIGN RESEARCH EXPERIENCE DEVELOPMENT TOOLS AND FEATURES EARLY IN THEIR DESIGN AND DEVELOPMENT INFLUENCE FUTURE DESIGN DECISIONS FILL IT ONLINE AT http://bit.ly/x6dtHt
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.