Presentation is loading. Please wait.

Presentation is loading. Please wait.

Semi-automatic SOA / AOSOA

Similar presentations


Presentation on theme: "Semi-automatic SOA / AOSOA"— Presentation transcript:

1 Semi-automatic SOA / AOSOA
Pascal Costanza ExaScience Lab, Intel

2 Legal Notices This presentation is for informational purposes only. INTEL MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY. [BunnyPeople, Celeron, Celeron Inside, Centrino, Centrino Inside, Core Inside, i960, Intel, the Intel logo, Intel Atom, Intel Atom Inside, Intel Core, Intel Inside, the Intel Inside logo, Intel NetBurst, Intel NetMerge, Intel NetStructure, Intel SingleDriver, Intel SpeedStep, Intel Sponsors of Tomorrow., the Intel Sponsors of Tomorrow. logo, Intel StrataFlash, Intel Viiv, Intel vPro, Intel XScale, InTru, the InTru logo, InTru soundmark, Itanium, Itanium Inside, MCS, MMX, Pentium, Pentium Inside, skoool, the skoool logo, Sound Mark, The Journey Inside, vPro Inside, VTune, Xeon, and Xeon Inside] are trademarks of Intel Corporation in the U.S. and other countries. *Other names and brands may be claimed as the property of others. Microsoft, Windows, and the Windows logo are trademarks, or registered trademarks of Microsoft Corporation in the United States and/or other countries. Java and all Java based trademarks and logos are trademarks or registered trademarks of Sun Microsystems, Inc. in the U.S. and other countries. Bluetooth is a trademark owned by its proprietor and used by Intel Corporation under license. Intel Corporation uses the Palm OS® Ready mark under license from Palm, Inc. Copyright © 2010, Intel Corporation. All rights reserved. This slide has the standard copyright header text. To customize, you can remove the trademarks that don’t appear in your presentation. For additional resources: Start at > Notices & Disclaimers ( See table “General Notices” Updated broiler plate at This presentation is for informational purposes only. INTEL MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY. [BunnyPeople, Celeron, Celeron Inside, Centrino, Centrino Inside, Core Inside, i960, Intel, the Intel logo, Intel Atom, Intel Atom Inside, Intel Core, Intel Inside, the Intel Inside logo, Intel NetBurst, Intel NetMerge, Intel NetStructure, Intel SingleDriver, Intel SpeedStep, Intel Sponsors of Tomorrow., the Intel Sponsors of Tomorrow. logo, Intel StrataFlash, Intel Viiv, Intel vPro, Intel XScale, InTru, the InTru logo, InTru soundmark, Itanium, Itanium Inside, MCS, MMX, Pentium, Pentium Inside, skoool, the skoool logo, Sound Mark, The Journey Inside, vPro Inside, VTune, Xeon, and Xeon Inside] are trademarks of Intel Corporation in the U.S. and other countries. *Other names and brands may be claimed as the property of others. Microsoft, Windows, and the Windows logo are trademarks, or registered trademarks of Microsoft Corporation in the United States and/or other countries. Java and all Java based trademarks and logos are trademarks or registered trademarks of Sun Microsystems, Inc. in the U.S. and other countries. Bluetooth is a trademark owned by its proprietor and used by Intel Corporation under license. Intel Corporation uses the Palm OS® Ready mark under license from Palm, Inc. Copyright © 2009, Intel Corporation. All rights reserved. 2

3 ExaScience Lab Flanders Exascale Lab
Was launched in 2010 to investigate next generation High Performance Computing Is a collaboration between: Intel, Flanders, Imec, KU Leuven, U Gent, VU Brussel, U Antwerpen, U Hasselt

4 Intel Exascale Labs — Europe
Strong Commitment To Advance Computing Leading Edge: Intel collaborating with HPC community & European researchers 4 labs in Europe - Exascale computing is the central topic ExaScale Computing Research Lab, Paris Performance and scalability of Exascale applications Tools for performance characterization ExaCluster Lab, Jülich Exascale cluster scalability and reliability Comms avoiding algorithms Architectural simulation Scalable kernels and RT ExaScience Lab, Leuven Scalable RTS and tools New algorithms Intel and BSC Exascale Lab, Barcelona Started with 3 labs in 2010, in 2011 we added 4th lab in Barcelona and signed a collaboration agreement with Daresbury, on exascale apps.

5 Overview Manual AOS, SOA, and AOSOA representations
Semi-automatic SOA and AOSOA representations Preliminary results Limitations / open issues

6 Example Class struct C { double x; double y; double z; C () {} C (double x, double y, double z) : x(x), y(y), z(z) {} };

7 Different representations
Standard AOS x0, y0, z0, x1, y1, z1, x2, y2, z2, x3, y3, z3, x4, y4, z4, x5, y5, z5, x6, y6, z6, … SOA x0, x1, x2, x3, x4, x5, x6, x7, … y0, y1, y2, y3, y4, y5, y6, y7, … z0, z1, z2, z3, z4, z5, z6, z7, … AOSOA x0, x1, x2, x3, y0, y1, y2, y3, z0, z1, z2, z3 x4, x5, x6, x7, y4, y5, y6, y7, z4, z5, z6, z7

8 Standard AOS representation
auto arr = new C[len]; double result = 0; for (auto i=0; i<len; ++) { arr[i].x += arr[i].y * arr[i].z; result += arr[i].x; }

9 Manual SOA representation
struct Cv { double* x; double* y; double* z; Cv (int len) : x(new double[len]), y(new double[len]), z(new double[len]) {} ~Cv () {delete x[]; delete y[]; delete z[];} };

10 Manual SOA representation
Cv arr(len); double result = 0; for (auto i=0; i<len; ++i) { arr.x[i] += arr.y[i] * arr.z[i]; result += arr.x[i]; }

11 Manual SOA representation
The user has to write the new class. (but that may be ok, because it could even be generated) The user has to recode all accesses to array members. (arr[i].x  arr.x[i]) This requires invasive changes to the code. Cumbersome, error-prone, annoying, especially if it turns out that SOA is not better than AOS after all.

12 Semi-automatic SOA representation
struct Cr { double& x; double& y; double& z; typedef std::tuple<double&, double&, double&> reference_tuple; Cr (const reference_tuple& t) : x(std::get<0>(t)), y(std::get<1>(t)), z(std::get<2>(t)) {} };

13 Semi-automatic SOA representation
soa_array<Cr,len> array; double result = 0; for (auto i=0; i<len; ++i) { arr[i].x += arr[i].y * arr[i].z; result += arr[i].x; }

14 Semi-automatic SOA representation
The user has to write the new class. (but that may be ok, because it could even be generated) The user does not have to recode accesses to array members. (arr[i].x  arr[i].x) Only variable and parameter declarations have to be changed, or can be turned into template parameters. The accesses themselves can stay the same. This allows for switching back and forth between AOS & SOA.

15 Semi-automatic AOSOA representation
aosoa_vector<Cr,256> array(len); double result = 0; for_each(array, [&] (Cr&& element) { element.x += element.y * element.z; result += element.x; });

16 Semi-automatic AOSOA representation
template<class C, typename F> void for_each(C& a, const F& f){ auto size = a.size(); auto data = a.data(); auto blocksize = getBlocksize(a); for (auto i=0; i<(size/blocksize); ++i) { auto&& block = getBlock(data, i); for (auto j=0; j<blocksize; ++j) f(block[j]); } if (size%blocksize) { auto&& block = getBlock(data, size/blocksize); for (auto j=0; j<(size%blocksize); ++j) f(block[j]); }}

17 Flexible representation
aosoa_vector<Cr,256> array(len); // blocksize 256 double result = 0; for_each(array, [&] (Cr&& element) { element.x += element.y * element.z; result += element.x; });

18 Flexible representation
aosoa_vector<Cr,len> array(len); // SOA representation double result = 0; for_each(array, [&] (Cr&& element) { element.x += element.y * element.z; result += element.x; });

19 Flexible representation
aosoa_vector<Cr,1> array(len); // AOS representation double result = 0; for_each(array, [&] (Cr&& element) { element.x += element.y * element.z; result += element.x; });

20 Flexible representation
std::vector<C> array(len); // AOS representation double result = 0; for_each(array, [&] (Cr&& element) { element.x += element.y * element.z; result += element.x; });

21 Flexible representation
aosoa_vector<Cr,blocksize> v(len); aosoa_vector<Cr,len> v(len); aosoa_vecter<Cr,1> v(len); std::vector<C> v(len); aosoa_array<Cr,blocksize,len> a; aosoa_array<Cr,len,len> a; aosoa_array<Cr,1,len> a; std::array<C,len> a;

22 Semi-automatic AOSOA representation
The user has to write the “reference” class. (same reference class as for semi-automatic SOA) Loops over elements need to be nested, or need to use higher-order iterations (for_each, etc.). aosoa_vector & aosoa_array similar to std::vector & std::array …except for pointers into their data! The user does not have to recode accesses to array members. arr[i].x still works! But nested loops are probably better for vectorization.

23 Preliminary results (previous example on Core i7 860 @ 2.80 GHz)
len: repeat: flat AOS array: 171 sec flat SOA array: 63 sec std::array: 171 sec nested SOA array, blocksize 1: 182 sec nested SOA array, blocksize max: 63 sec nested SOA array, blocksize 256: 70 sec std::vector: 167 sec nested SOA vector, blocksize 1: 212 sec nested SOA vector, blocksize max: 64 sec nested SOA vector, blocksize 256: 70 sec

24 Limitations Adding new “reference” classes can be problematic.
Use of higher-order iterations may be a too strong commitment. Potential remedy: Special handling of C++11 range-for construct? aosoa_array<Cr,b,l> a; for (Cr&& element: a) { … } // this could be equivalent to for_each Adding new “reference” classes can be problematic. Methods and functions may need to be reimplemented… …as function templates, so they understand the different representations. …as redundant functions, if source code of third-party libraries is not available. Potential remedy in the long run: Better reflection support in C++? Note: New code can actually be expressed with just one class! A single instance would be an array of length C obj; => soa_array<Cr,1> obj;

25 Current experiments More complex data structures (composition, inheritance) Seems to require more tweaking to improve vectorization. Otherwise seems to work. Better support for standard containers, especially iterators. More iteration constructs (std::transform, etc.) Allow for using SOA in some places, and AOS in other places? Alternatively: Lift to an array-based programming approach?

26 Thank You

27 Implementation details

28 Details: tuples // create a tuple from values std::tuple<int, int, int> t0 = std::tuple<int, int, int>(1,2,3); assert(std::get<0>(t0) == 1); assert(std::get<1>(t0) == 2); assert(std::get<2>(t0) == 3);

29 Details: tuples // infer types when creating tuples auto t1 = std::make_tuple(4, 5, 6); assert(std::get<0>(t1) == 4); assert(std::get<1>(t1) == 5); assert(std::get<2>(t1) == 6);

30 Details: tuples // create a tuple from references auto x = 0, y = 0, z = 0; std::tuple<int&, int&, int&> t2 = std::tie(x, y, z); std::get<0>(t2) = 7; std::get<1>(t2) = 8; std::get<2>(t2) = 9; assert(x == 7); assert(y == 8); assert(z == 9);

31 Details: tuples // assign tuple results to variables auto const foo = []() { return std::make_tuple(10, 11, 12); }; std::tie(x, y, z) = foo(); assert(x == 10); assert(y == 11); assert(z == 12);

32 Details: tuples // concatenate tuples auto t3 = std::tuple_cat(t0, std::tuple_cat(t1, t2)); assert(std::get<0>(t3) == 1); assert(std::get<1>(t3) == 2); assert(std::get<2>(t3) == 3); assert(std::get<3>(t3) == 4); assert(std::get<4>(t3) == 5); assert(std::get<5>(t3) == 6); assert(std::get<6>(t3) == 10); assert(std::get<7>(t3) == 11); assert(std::get<8>(t3) == 12);

33 Details: soa_array template<typename T, size_t N> class _soa; template<size_t N> class _soa<std::tuple<>,N> { std::tuple<> operator[](size_t) {return std::tie(); }}; template<typename H, typename T…, size_t N> class _soa<std::tuple<H, T…>, N> : _soa<std::tuple<T…>, N> { typedef typename std::remove_reference<H>::type rH; rH a[N]; std::tuple<H,T…> operator[] (size_t pos) { return std::tuple_cat(std::tie(a[pos]), super::operator[](pos)); } };

34 Details: soa_array template<class C, size_t N> soa_array : _soa<typename C::reference_tuple, N> { C operator[] (size_t pos) {return C(super::operator[](pos));} }; ICC & gcc compile “reference” tuples away! The compiler eventually sees straightforward iterations over arrays in SOA representation that are easy to vectorize! aosoa_vector<Cr> is essentially std::vector<soa_array<Cr>>; same for aosoa_array & std::array.

35 Lambda expressions in C++11

36 while loops as functions
void _while(P predicate, B body) { if (predicate()) { body(); _while(predicate, body); } } auto i=0; _while( [&](){return i<10;}, [&](){ std::cout << i << std::endl; ++i; } );

37 for loops as functions void _for(int start, int end, B body) { for (auto i=start; i<end; ++i) body(i); } _for(0, 10, [](int i){ std::cout << i << std::endl; });

38 parallel_for loops as functions
void parallel_for(int start, int end, B body){ auto size = end-start; if (size==1) { body(start); } else if (size > 1) { auto half = size/2; auto middle = start+half; fork([=](){parallel_for(start, middle, body);}); fork([=](){parallel_for(middle, end, body);}); join(); } }

39 Lambda expressions Lambda expressions originally from Lisp 1.5: (defun while (predicate body) (when (funcall predicate) (funcall body) (while predicate body))) (let ((i 0)) (while (lambda () (< i 10)) (lambda () (print i) (setq i (+ i 1))))) [It took > 50 years to adopt them in C++.]

40 Lambda expressions “Pieces” of code
that can be passed around and receive arguments. (like function pointers) that can be defined inline and close over variables. (unlike function pointers) that can be very picky about their types. (C++ tradition)

41 Free variables auto i=0; auto n=10; _while ( [](){return i<n;}, [](){std::cout << i << std::endl; ++i;} );

42 Free variables Needs the value. auto i=0; auto n=10; _while ( [](){return i<n;}, [](){std::cout << i << std::endl; ++i;} );

43 Free variables Needs the value. auto i=0; auto n=10; _while ( [](){return i<n;}, [](){std::cout << i << std::endl; ++i;} ); Needs a reference.

44 Closing over variables
auto i=0; auto n=10; _while ( [&i, n](){return i<n;}, [&i](){std::cout << i << std::endl; ++i;} );

45 Closing over variables
capture value auto i=0; auto n=10; _while ( [&i, n](){return i<n;}, [&i](){std::cout << i << std::endl; ++i;} );

46 Closing over variables
capture value auto i=0; auto n=10; _while ( [&i, n](){return i<n;}, [&i](){std::cout << i << std::endl; ++i;} ); capture reference

47 Lambdas = syntactic sugar for classes with operator()
[&i, n](){return i<n;} class __predicate__ { int &i; int n; __predicate__(int &ii, int nn) : i(i), n(nn) {} inline bool operator()() { return i<n; } }; __predicate__ __implicit_instance__(i, n);

48 Lambda expressions in C++
[…capture list…] (…parameters…) {…body…} [](int i, int j){return i+j;} // no capture [=](){return i+j;} // capture values [=, &x]{x = i+j;} // values + 1 reference [&](int i, int j){x = i+j;} // capture references [&, i, j]{x = i+j;} // references + 2 values Note: parameter section is optional. Note: return types are typically inferred, but not always…

49 Lambda expressions in C++: explicit return types
[](int i, int j) -> int { if (i<j) return 0; else if (j>i) return 1; else return i+j; }

50 Typing of lambda expressions
In C++11, the lambda type cannot be typed! int i = 0; ?!? f = [=](int j){return i+j;};

51 Typing of lambda expressions
…but you can use auto! (good if lambda expression is locally needed) auto i = 0; auto f = [=](int j){return i+j;}; Note: auto only works with initialization. The following doesn’t make sense: auto x; // not valid!

52 Typing of lambda expressions
…or you can use templates! (good if template function uses lambda expression immediately) template<typename P, typename B> void _while(P predicate, B body) { if (predicate()) { body(); _while(predicate, body); } } template<typename B> void _for(int start, int end, B body) { for (auto i=start; i<end; ++i) body(i); }

53 Typing of lambda expressions
…or you can use std::function! (good if you need to pass lambda expressions around) #include<functional> typedef std::function<void()> thunk; class continuator { thunk continuation; … void join(thunk c) { continuation = c; } }; Note: std::function is generic, for operator(), lambdas, function pointers…

54 “Currying” example extern long foo(double x, double y, double z); function<long(double, double)> curryFoo(double y) { return [y](double x, double z){return foo(x, y, z);} } Forget about std::bind, etc. Lambda expressions are more efficient! C++11 compilers can optimize lambdas, especially inline them!

55

56


Download ppt "Semi-automatic SOA / AOSOA"

Similar presentations


Ads by Google