Experiments on the Effectiveness of an Automatic Insertion of Memory Reuses into ML-like Programs Oukseh Lee (Hanyang University) Kwangkeun Yi (Seoul National University)
Question Our SAS 2003 paper* presented an algorithm to replace allocations by memory reuse (or destructive update); and some promising yet preliminary experiment numbers. When and how much is it cost-effective? Space & time-wise. Before launching it inside our nML compiler. * Oukseh Lee, Hongseok Yang, and Kwangkeun Yi. Inserting Safe Memory Reuse Commands into ML-like Programs. In Proceedings of the Annual International Static Analysis Symposium, volume 2694 of Lecture Notes in Computer Science, pp , San Diego, California, June 2003.
Brief Overview of Our Algorithm
Example: insert nil l insert 5 l fun insert i l = case l of [] => i::[] | h::t => if i<h then i::l else let z = insert i t in h::z result fun insert i l = case l of [] => i::[] | h::t => if i<h then i::l else let z = insert i t in free l; h::z
34 Example: insert nil l insert 5 l fun insert i l = case l of [] => i::[] | h::t => if i<h then i::l else let z = insert i t in h::z fun insert i l = case l of [] => i::[] | h::t => if i<h then i::l else let z = insert i t in free l; h::z 5 21 result fun insert b i l = case l of [] => i::[] | h::t => if i<h then i::l else let z = insert b i t in free l when b; h::z
Analysis fun insert i l = case l of [] => i::[] | h::t => if i<h then i::l else let z = insert i t in h::z X1X1 X2X2 X3X3 X4X4 Z L.tl L X1X1 X 2 [ L X 4 [ Z L.hd L.tl X1[X2[L[X4[ZX1[X2[L[X4[ZL.hd [ L.tl Z µ X 3 [ L.tl X [ LL [ µ L.hd resultusage X =X 1 [ X 2 [ X 3 [ X 4 =L.hd [ L.tl
Transformation [1/3] fun insert i l = case l of [] => i::[] | h::t => if i<h then i::l else let z = insert i t in h::z fun insert b i l = case l of [] => i::[] | h::t => if i<h then i::l else let z = insert i t in h::z When b=true, the transformed insert function deallocates the cons cells of the input list l excluding those of the result list.
Transformation [2/3] must not be freed whenareaoverlap?necessary condition the input list lb =falseLyes b =true the result listX 4 [ Znonone When is it safe to free the tail cells t not in the result z ( L.tl\Z )? fun insert b i l = case l of [] => i::[] | h::t => if i<h then i::l else let z = insert b i t in h::z b
Transformation [3/3] must not freedwhenareaoverlap? necessary condition the input list lb =falseLyes b =true the cons cells freed during insert b i t b =trueL.tl \ Znonone the result listX 4 [ Znonone When is it safe to free the head cell ( L.hd )? fun insert b i l = case l of [] => i::[] | h::t => if i<h then i::l else let z = insert b i t in free l when ; h::z b
Experiments
Analysis & Transformation Cost slope=1.46 1,500~29,000 lines/sec program size (logarithmic scale) analysis & transformation cost (logarithmic scale)
Reuse Ratio 3.4%~93.9% of allocations are avoided. low reuse ratio due to much sharing.
Memory Peak Reduction 0.0%~71.9% peak reduction much reuse = much peak reduction memory reuse ratio memory peak reduction 84.4% 10.6% 2.6% 25.6% 41.9% 8.1%
Difference in Live Cells sieve 84.3% 56.5% merge 50.0% 49.4% qsort 93.9% 71.9% msort 89.3% 55.0%
Difference in Live Cells queens 4.2% 0.0% kb 3.4% 2.3% nucleic 16.9% 13.8% k-eval 31.5% 9.6%
Difference in Live Cells life 10.6% 25.6% mirage 84.4% 2.6% professor 41.9% 8.1%
GC Time & Runtime Changes -6.9%~90.5% GC-time reduction -7.3%~39.1% runtime reduction in Objective Caml system
GC Time & Runtime Changes -6.9%~90.5% GC-time reduction -7.3%~39.1% runtime reduction High reuse ratio & big GC portion: runtime speedup 50.0% 93.9% 89.3% 16.9% 50.0% 93.9% 89.3% 16.9% 76.0% 63.2% 59.9% 52.1% 78.2% 57.2% 55.3% 46.3% 24.0% 39.1% 21.6% 7.2% 30.0% 28.2% 20.7% 8.8% in Objective Caml system
GC Time & Runtime Changes -6.9%~90.5% GC-time reduction -7.3%~39.1% runtime reduction High reuse ratio & big GC portion: runtime speedup Low reuse ratio: flags overhead 4.2% 3.4% 4.2% 3.4% -8.4% -9.1% -6.8% -3.6% -5.8% -6.7% -4.7% -7.3% in Objective Caml system
GC Time & Runtime Changes -6.9%~90.5% GC-time reduction -7.3%~39.1% runtime reduction High reuse ratio & big GC portion: runtime speedup Low reuse ratio: flags overhead Small GC portion: almost no effect 7.2% 5.6% 1.4% 1.9% 4.3% 4.2% 1.1% 1.3% -5.5% -2.6% -3.8% -2.9% 4.8% 0.1% -0.9% 0.6% in Objective Caml system
GC-time & Runtime Changes much reuse = much GC-time reduction much reuse & big GC-time portion = much runtime reduction memory reuse ratio GC time reduction GC portion x memory reuse ratio runtime reduction
Conclusion program transformation result program performance not much sharing + big GC-time portion runtime speedup high reuse ratio memory peak reduction & GC time speedup