Improving Cache Performance of OCaml Programs Case Study - MetaPRL Alexey Nogin and Alexei Kopylov April 15, 1999
Background Information OCaml is a dialect of the ML functional language MetaPRL is the next generation of the NuPrl Proof Development System. All measurements were done on a 400Mhz Pentium-II Xeon with 512Kb L2 cache running Linux 2.2.2
Overview What we tried to do –Collect some data –See if standard techniques (developed for Java and C programs) can be applied Why it didn’t work –Ocaml programs (MetaPRL in particular) are quite different from Java and C programs in their cache behavior.
Memory Usage Statistics Most object are really small: – 60-90% of all allocated objects are 3 words (12 bytes) big We allocate them really fast Mb/sec Only 1-10% of allocated objects survive the first garbage collection. L1 DCU miss rate is % L2 cache miss rate is 18-47%
Cache-Conscious Structure Definition Trushil M. Chilimbi Bob Davidson James R. Larus
Ideas Structure size << cache block size –no action Structure size cache block size –splitting structure into “hot” and “cold” portions Structure size >> cache block size –field reordering
Structure Splitting f1f2f3f4 becomes f3f1f2f4 hotcold Pros : –pack more hot object fields per cache line Cons: –cost of additional reference from hot to cold portion –code bloat –more objects in memory –extra indirection to access fields in the cold portion
Field Reordering Typically fields in big structures are grouped logically –exchange fields to better match program access pattern Problems : –in C may use pointer arithmetic to access field –existing file formats and protocol specifications