Faculty of Computer Science © 2008 José Nelson Amaral MPADS: Memory- Pooling-Assisted Data Splitting Stephen Curial - Xymbiant Systems Inc. Peng Zhao - Intel Corporation J. Nelson Amaral - University of Alberta Yaoqing Gao, Shimin Cui, Raul Silvera, Roch Archambault - IBM Toronto Software Laboratory FROM SUN MICROSYSTEMS
© 2006 Department of Computing Science ISMM 2008 Goal What: –Improve spatial locality Where: –Linked-based data structures How: –Pooling similar structures together –Grouping same fields from multiple objects together
© 2006 Department of Computing Science ISMM 2008 Goal (cont.) Why: –Because we can –Allow easy-to-write, easy-to-read, easy-to-maintain code to improve performance What compiler: –IBM XL compiler suite Limitation: –Needs more precise pointer analysis to benefit from more opportunities
© 2006 Department of Computing Science ISMM 2008 Most Relevant Earlier Work Pool Allocation –Lattner and Adve (CGO 04, PLDI 05) Reference Affinity –Zhong, Orlovich, Shen, Ding (PLDI 04) –Rabbah and Palem (TECS 03) Array Reshaping –Zhao, Cui, Gao, Silvera, Amaral (TOPLAS 07)
© 2006 Department of Computing Science ISMM 2008 A refreshing outcome “MPADS is not the first implementation of the combination of memory pools and splitting of pointer-based data structures.” “MPADS is still not delivering its full potential on standard benchmarks in the IBM XL compiler.” Reviewer’s Comment: “The technique only worked for Olden, and did nothing for SPECcpu2000 (but the authors get bonus points for being honest about that.)”
© 2006 Department of Computing Science ISMM 2008 The Cost of Programming Productivity Easy-to-read and easy-to-maintain code often results in lower runtime performance. Student Class University
© 2006 Department of Computing Science ISMM 2008 The Cost of Programming Productivity Abstraction Inheritance Student Professor Support Staff Person
© 2006 Department of Computing Science ISMM 2008 The Cost of Programming Productivity Data Encapsulation Person Date of Birth Address Driver Lic. Citizenship Name Gender Student Faculty Date of Adm Department Program Univ. ID Classes Enr. Grades
© 2006 Department of Computing Science ISMM 2008 A possible data layout Faculty Date of Adm Department Program Univ. ID Classes Enr. Grades Student: 1 byte 4 bytes 1 byte 2 bytes 4 bytes Date of Birth Address Driver Lic. Gender Name Citizenship Person: 4 bytes 32 bytes 3 bytes 1 byte 32 bytes 16 bytes
© 2006 Department of Computing Science ISMM 2008 Data in Memory Memory Address Univ. IDDate of Adm. Fa. De Progr.Classes Enr. Grades Univ. IDDate of Adm.Univ. IDDate of Adm. Fa. De Progr.Classes Enr. Grades Memory Address Name Date of Birth Address Dr. Lic. Ge Citizenship
© 2006 Department of Computing Science ISMM 2008 Assume a Cache Organization POWER5 Cache Organization –L1 Data Cache: 32 Kbytes, 128-byte cache lines –L2 Cache: 1.44 Mbytes, 128-byte cache lines –L3 Cache: 32 Mbytes, 512-byte cache lines
© 2006 Department of Computing Science ISMM 2008 Cache Organization Bytes Cache Lines
© 2006 Department of Computing Science ISMM 2008 Example: A search through the data structures Bytes Cache Lines How many Computing Science students are younger than 23 year old? Univ.ID Adm. F. D. Prg Class.GradesUniv.ID Adm. F. D. PrgClass.
© 2006 Department of Computing Science ISMM 2008 Example: A search through the data structures Bytes Cache Lines Student structure: For every 24 bytes loaded, reads either 1 or 5. Univ.ID Adm. F. D. Prg Class.GradesUniv.ID Adm. F. D. PrgClass.
© 2006 Department of Computing Science ISMM 2008 Example: A search through the data structures Bytes Cache Lines Univ.ID Adm. F. D. Prg Class.GradesUniv.ID Adm. F. D. PrgClass. NameDofB G Citizens. Address DL
© 2006 Department of Computing Science ISMM 2008 Example: A search through the data structures Bytes Cache Lines Person structure: For every 88 bytes loaded, reads 4. Univ.ID Adm. F. D. Prg Class.GradesUniv.ID Adm. F. D. PrgClass. NameDofB G Citizens. Address DL
© 2006 Department of Computing Science ISMM 2008 Data Reshaping for Arrays of Structures Student*ListOfStudents; …. ListOfStudents = (Student*)malloc(….); Univ. IDDate of Adm. Fa. De Progr. Classes Enr.GradesUniv. IDDate of Adm. Fa. De Progr. Classes Enr.GradesUniv. IDDate of Adm. Fa. De Progr. Classes Enr.Grades Univ. ID Date of Adm. Fa. De Progr. Univ. ID Date of Adm. Fa. De Progr. Univ. ID Date of Adm. Fa. De Progr.
© 2006 Department of Computing Science ISMM 2008 Maximal Structure Splitting ID 1 Adm 1 Dep 1 Fac 1 Clas 1 ID 2 Adm 2 Dep 2 Fac 2 Clas 2 ID 3 Adm 3 Dep 3 Fac 3 Clas 3 ID 1 ID 2 ID 3 Adm 1 Adm 2 Adm 3 Fac 1 Fac 2 Fac 3 Dep 1 Dep 2 Dep 3 Clas 1 Clas 2 Clas 3 Grad 1 11 Grad 2 22 Grad 3 33 Grad 1 Grad 2 Grad 3 11 22 33
© 2006 Department of Computing Science ISMM 2008 Implementation of Pool Allocation Intercept mallocs and replace by pool allocation: each structure layout gets its own pool. If pool is full another pool can be allocated ID 1 Adm 1 Fac 1 Dep 1 Clas 1 Grad 1 11 ID 2 Adm 2 Fac 2 Dep 2 Clas 2 Grad 2 22 ID 3 Adm 3 Fac 3 Dep 3 Clas 3 Grad 3 33 ID 4 Adm 4 Fac 4 Dep 4 Clas 4 Grad 4 44 ID 5 Adm 5 Fac 5 Dep 5 Clas 6 Grad 5 66 ID 7 Adm 7 Fac 7 Dep 7 Clas 7 Grad 7 77
© 2006 Department of Computing Science ISMM 2008 Implementing Pool Allocation The following types of statements need to be transformed: –Memory allocation statements –Memory reference statements
© 2006 Department of Computing Science ISMM 2008 Transforming Memory Allocation Statements Extended pointer analysis to maintain a set of allocation sites associated with each alias set. When an alias set is selected for transformation: –Replace each associated allocation with a call to the pool allocation function.
© 2006 Department of Computing Science ISMM 2008 Transforming Memory References Update address calculation for loads and stores: –Uniform splitting --- all fields are the same size Address calculation is simpler Restricts application of technique or Requires memory padding –Non-uniform splitting --- fields of different size Address calculation is more involved Can be applied more generally
© 2006 Department of Computing Science ISMM 2008 Non-Uniform Example struct example { type_3 a; /* 3 bytes */ type_7 b; /* 7 bytes */ type_5 c; /* 5 bytes */ }; s How can the compiler find the address to access: s->c pool_base = s & 0xF…F000 index = (s – pool_base) / 3 field_base = (3+7)*num_structs_per_pool s->c = *(s + field_base - 3*index + 5*index) s->c = *(s + field_base + (5-3)*index) field_base pool_base
© 2006 Department of Computing Science ISMM 2008 Data Transformation Safety How the compiler decide whether it is safe to transform a given structure? –Based on the results of the pointer analysis.
© 2006 Department of Computing Science ISMM 2008 Is it safe to transform a given data structure? Structure layout: two structures have the same layout if each field has the same offset and the same length. Build alias set –If a pointer P may point to the structure Then all the objects in the points-to set of the alias set of P must have the same layout. Data Struct 1 Data Struct 2 P Q Alias set Points-to set
© 2006 Department of Computing Science ISMM 2008 Experimental Results - Micro Benchmarks (Speedup) Power 4 Power 5
© 2006 Department of Computing Science ISMM 2008 Experimental Results - Micro Benchmarks (Instruction Count) Power 4 Power 5
© 2006 Department of Computing Science ISMM 2008 Experimental Results - Micro Benchmarks (L2 Cache Misses) Power 4 Power 5
© 2006 Department of Computing Science ISMM 2008 Experimental Study - Olden & LLU (Speedup) Power 4 Power 5 bh em3d health power tsp llu bh em3d health power tsp llu
© 2006 Department of Computing Science ISMM 2008 Active Hardware Prefetch Streams Active Prefetching Streams from Memory to L2 (in POWER4)
© 2006 Department of Computing Science ISMM 2008 Related Work Pool Allocation –Lattner & Adve - PLDI 2005 Data Structure Analysis Array Based Structure Splitting –Zhong et al. - PLDI 2004 Reference affinity / affinity based splitting Memory Trace Safe Pointer Based Structure Splitting –Jeon, Shin and Han - CC 2007 Similar to non-uniform splitting Affinity based splitting uses static analysis –Regular expression framework –Guarantee Safety with regular expressions
© 2006 Department of Computing Science ISMM 2008 Final Remarks Our Compiler-Research Guiding Principles –Programming productivity Enables programmers to be efficient Enables easy-to-write/easy-to-maintain programs – Execution Time Performance Recover runtime efficiency (time, storage or energy) through –Code analysis –Improved code generation –Knowledge of computer architecture and memory hierarchy
© 2006 Department of Computing Science ISMM 2008
© 2006 Department of Computing Science ISMM 2008
© 2006 Department of Computing Science ISMM 2008 Pointer Analysis Primer The following statement: int *a = malloc(…); Creates: a memory object (A), a pointer (a), and a points-to relation (a,A): a A
© 2006 Department of Computing Science ISMM 2008 Alias Analysis Primer: Andersen’s X Steensgaard’s a = &b; Program: Steensgaard (unification-based): Andersen: S = {(a,b)} a b b a (Shapiro/Horwitz, PPL97)
© 2006 Department of Computing Science ISMM 2008 a = &b; b = &c; Program: Andersen: S = {(a,b); (b,c)} c a b c b a (Shapiro/Horwitz, PPL97) Alias Analysis Primer: Andersen’s X Steensgaard’s Steensgaard (unification-based):
© 2006 Department of Computing Science ISMM 2008 a = &b; b = &c; a = &d; Program: Andersen: S = {(a,b); (b,c)} S = {(a,b); (b,c); (a,d)} c a b d c b a (Shapiro/Horwitz, PPL97) Alias Analysis Primer: Andersen’s X Steensgaard’s Steensgaard (unification-based): What should happen in the Steensgaard analysis?
© 2006 Department of Computing Science ISMM 2008 a = &b; b = &c; a = &d; Program: Andersen: S = {(a,b); (b,c); (a,d); (d,c)} S = {(a,b); (b,c); (a,d)} c a b d c (b,d) a (Shapiro/Horwitz, PPL97) Alias Analysis Primer: Andersen’s X Steensgaard’s Steensgaard (unification-based):
© 2006 Department of Computing Science ISMM 2008 a = &b; b = &c; a = &d; d = &e; Program: Andersen: S = {(a,b); (b,c); (a,d); (d,c)} S = {(a,b); (b,c); (a,d)} c a b d c (b,d) a (Shapiro/Horwitz, PPL97) And now? Alias Analysis Primer: Andersen’s X Steensgaard’s Steensgaard (unification-based):
© 2006 Department of Computing Science ISMM 2008 a = &b; b = &c; a = &d; d = &e; Program: Andersen: S = {(a,b); (b,c); (a,d); (d,c); (d,e); (b,e)} S = {(a,b); (b,c); (a,d); (d,e)} c a b d e (c,e) (b,d) a (Shapiro/Horwitz, PPL97) Alias Analysis Primer: Andersen’s X Steensgaard’s Steensgaard (unification-based):