Runtime support for region-based memory management in Mercury

Runtime support for region-based memory management in Mercury
Quan Phan*, Zoltan Somogyi**, Gerda Janssens* Good afternoon everybody and welcome to my presentation about region-based memory management for the logic programming language Mercury. * DTAI, KU Leuven, Belgium. ** Computer Science and Software Engineering Department, University of Melbourne, Australia. ISMM 2008 Tucson, Arizona, USA

Region-based Memory Management (RBMM)
Idea: Group heap objects of the same lifetime into regions. Reclaim garbage by destroying region as a whole. Advantages: Small runtime overhead No runtime detection of garbage Often achieve good memory reuse. Good chance of better data locality Related data kept together. Its basic idea of region-based memory management is to divide the heap into separate regions, which are a bunch of memory cells. Here we have the heap memory and assume that in this example it is divided into 3 different regions. terms are distributed over the regions. We have two lists of integers, pointed to by X and Y, and they are stored as follows. The skeleton of X is in the green region, that of Y in the yellow region, and their elements in the blue region. Assume that we append the list Y to the list X and the result is the list Z. After appending, the skeleton of X is no longer used, and the cells where it is stored can be removed. To remove them the region is released as a whole.

Mercury Logic/functional programming language
developed at Melbourne Univ. declarative language aims at large-scale application development. Mercury's syntax is similar to Prolog's Explicit declarations Types, modes, determinism. Mercury is a functional/logic programming language, developed at the University of Melbourne, aiming at programming in the large. The language adopts explicit declarations for types, modes and determinism, which not only provide useful information for the programmers but also are very useful in program analyses and optimisations.

Mercury's types, modes, determinism.
Types: ~ Haskell's. list(int) ---> [] ; [int | list(int)]. Modes: instantiation of arguments of predicates. in: ground → ground, out: free → ground. a mode of a predicate: modes for its arguments → procedure. Determinism: # possible solutions of a procedure. Mercury is a functional/logic programming language, developed at the University of Melbourne, aiming at programming in the large. The language adopts explicit declarations for types, modes and determinism, which not only provide useful information for the programmers but also are very useful in program analyses and optimisations.

Mercury predicate :- pred append(list(int), list(int), list(int)).
:- mode append(in, in, out) is det. :- mode append(out, out, in) is multi. append([], Y, Y). append([Xe | Xs], Y, [Xe | Zs]) :- append(Xs, Y, Zs). Mercury is a functional/logic programming language, developed at the University of Melbourne, aiming at programming in the large. The language adopts explicit declarations for types, modes and determinism, which not only provide useful information for the programmers but also are very useful in program analyses and optimisations.

Backtracking Due to nondeterminism Disjunction:
(g1 ; ...; gi ; ...; gn) Make a choice and backtracks into the disjunction later. if g1 then g2 else g3 Semantically equivalent to the disjunction: (g1, g2); (not g1, g3). Try g1, if succeeds, execute g2. If fails, backtracks to g3 as if g1 had not been tried. To model the heap in terms of region, one important question is which variables, or to be more precise the terms to which the variables are bound, are in the same region? We defined that X, Y are in the same region if they are involved in an assigment. For recursive structures, if a term and its subterm are of the same type, they are stored in the same region. In this example, we see that X and Xs are of the same type and they are stored in the same region, then the elements of the list are also stored in the same region.

Backward execution ..., g1, (g2a; g2b), g3, ...
Forward execution containing g2a. Backtrack to g2b: backward execution. Backward liveness: live during backward execution. g1 choice point backtracks OR g2a g2b To model the heap in terms of region, one important question is which variables, or to be more precise the terms to which the variables are bound, are in the same region? We defined that X, Y are in the same region if they are involved in an assigment. For recursive structures, if a term and its subterm are of the same type, they are stored in the same region. In this example, we see that X and Xs are of the same type and they are stored in the same region, then the elements of the list are also stored in the same region. forward execution succeeds/fails

RBMM for Mercury Example: the call to append([1], [2, 3], Z) in the first mode. % (in, in, out) is det. append(X, Y, Z) :- ( X == [], Z := Y ; X => [Xe | Xs], append(Xs, Y, Zs), Z <= [Xe | Zs] ). X [|] 1 [] Y [|] 2 In Mercury, a predicate with a mode is called a procedure. When compiling, a source Mercury program is transformed into superhomogeneous form, where a multiple-clause predicate is turned into a single procedure with the introduction of explicit disjunctions, the goals are reordered and unifications are specialized based on the modes. The deterministic program append in homogeneous form is like this. Here is the type declaration, here is the modes, determinism, and two clauses are turned into an explicit disjuction and unifications are specialized into, for example, deconstruction, assignment, and construction. The programs in this form are input to our algorithm. [|] 3 [] Z [|] 1 [|]

RBMM for Mercury Program analysis and transformation
Q. Phan and G. Janssens. Static region analysis for Mercury. ICLP 2007. Regions. Region liveness. Mercury to region-annotated Mercury. Often achieve good memory reuse. S. Cherem and R. Rugina. Region analysis and transformation for Java programs. ISMM 2004. The algorithm consists of three phases: region points-to analysis that builds the memory model in terms of regions, region liveness analysis that decides about regions’ lifetime and the program transformation that introduces creation and removal instructions into the original program.

Region-annotated Mercury
([|], 2) ([|], 2) ([|], 1) ([|], 1) Z, Y, Zs R1 Xe X, Xs R2 % (in, in, out) is det. append(X, Y, Z) :- ( X == [], Z := Y ; X => [Xe | Xs], append(Xs, Y, Zs), Z <= [Xe | Zs] ). % (in, in, out) is det. append(X, Y, Z, R1, R2) :- ( X == [], remove(R1), Z := Y ; X => [Xe | Xs], append(Xs, Y, Zs, R1, R2), Z <= [Xe | Zs] in R2 ). In Mercury, a predicate with a mode is called a procedure. When compiling, a source Mercury program is transformed into superhomogeneous form, where a multiple-clause predicate is turned into a single procedure with the introduction of explicit disjunctions, the goals are reordered and unifications are specialized based on the modes. The deterministic program append in homogeneous form is like this. Here is the type declaration, here is the modes, determinism, and two clauses are turned into an explicit disjuction and unifications are specialized into, for example, deconstruction, assignment, and construction. The programs in this form are input to our algorithm.

Runtime support Basic support
Regions, region instructions, allocation into regions. Needed in any RBMM systems. Mercury: only enough for programs with no backtracking. Support for backtracking Liveness w.r.t forward execution. Backtracking causes problems. How to support backtracking with little impact on deterministic code?? Less than 5% of Mercury code is nondeterministic. The goal of region points-to analysis is to produce region points-to graph for each procedure. A region points-to graph model the heap used by a procedure in terms of region. A node in a graph represents a physical region. Each node has an associated set of variables, which are stored in the region. A directed edge represents the fact that a region contains references to another region. An edge is labelled by a type selector

Impact of backtracking: Region resurrection.
Dead in forward execution but live in backward execution. E.g., R. Destroy backward live regions in forward execution causes runtime errors. create(R) g1 choice point OR The goal of region points-to analysis is to produce region points-to graph for each procedure. A region points-to graph model the heap used by a procedure in terms of region. A node in a graph represents a physical region. Each node has an associated set of variables, which are stored in the region. A directed edge represents the fact that a region contains references to another region. An edge is labelled by a type selector g2a g2b use R remove(R) remove(R) backtracks succeeds/fails

Impact of backtracking: Instant reclaiming.
When backtracks to a choice point, allocations in the backtracked-over execution can be instantly reclaimed. Popularly used in logic programming systems. 1st case: New regions with respect to the choice point: R1 Reclaim R1 before starting the backward execution containing g2b. g1 choice point backtracks OR The goal of region points-to analysis is to produce region points-to graph for each procedure. A region points-to graph model the heap used by a procedure in terms of region. A node in a graph represents a physical region. Each node has an associated set of variables, which are stored in the region. A directed edge represents the fact that a region contains references to another region. An edge is labelled by a type selector g2a g2b create(R1) fails remove(R1)

Impact of backtracking: Instant reclaiming.
..., g1, (g2a ; g2b), g3... Instant reclaiming ... 2nd case: Allocations into existing/old regions: R2 (not R1). create(R2) g1 choice point backtracks OR The goal of region points-to analysis is to produce region points-to graph for each procedure. A region points-to graph model the heap used by a procedure in terms of region. A node in a graph represents a physical region. Each node has an associated set of variables, which are stored in the region. A directed edge represents the fact that a region contains references to another region. An edge is labelled by a type selector g2a g2b Allocate in R2 create(R1) use R2 fails Allocate in R1 remove(R1)

Old vs. new regions, region list
Maintain a global region sequence number. Region list Saved sequence number: 6 R_8 R_6 R_5 R_1 ... To model the heap in terms of region, one important question is which variables, or to be more precise the terms to which the variables are bound, are in the same region? We defined that X, Y are in the same region if they are involved in an assigment. For recursive structures, if a term and its subterm are of the same type, they are stored in the same region. In this example, we see that X and Xs are of the same type and they are stored in the same region, then the elements of the list are also stored in the same region. new regions old regions

Support for nondet disjunction: Region resurrection.
..., g1, (g2a ; g2b), g3, ... Nondet: any disjuncts may succeed. Both g2a and g2b. Backtrack from outside → backward live regions ≈ all old regions: e.g., R. create(R) g1 choice point backtracks OR The goal of region points-to analysis is to produce region points-to graph for each procedure. A region points-to graph model the heap used by a procedure in terms of region. A node in a graph represents a physical region. Each node has an associated set of variables, which are stored in the region. A directed edge represents the fact that a region contains references to another region. An edge is labelled by a type selector g2a g2b use R remove(R) remove(R) succeeds/fails

Support for nondet disjunction: Region resurrection.
Protect R at the entry to the disjunction: before g2a. Save the global sequence number. remove instruction: ignore old regions. Unprotect R at the start of the last disjunct: g2b. No longer backtrack into the disjunction Clear the saved number remove instructions become effective again. R is destroyed when the second remove is reached. create(R) g1 choice point backtracks OR The goal of region points-to analysis is to produce region points-to graph for each procedure. A region points-to graph model the heap used by a procedure in terms of region. A node in a graph represents a physical region. Each node has an associated set of variables, which are stored in the region. A directed edge represents the fact that a region contains references to another region. An edge is labelled by a type selector Unprotect R Protect R g2a g2b use R remove(R) remove(R) succeeds/fails

Support for nondet disjunction: Instant reclaiming
..., g1, (g2a ; g2b), g3, ... Instant reclaiming new regions Already save the global sequence number. When backtrack to a non-first disjunct: g2b traverse the region list reclaim regions until seeing an old one. g1 choice point backtracks OR g2a g2b The goal of region points-to analysis is to produce region points-to graph for each procedure. A region points-to graph model the heap used by a procedure in terms of region. A node in a graph represents a physical region. Each node has an associated set of variables, which are stored in the region. A directed edge represents the fact that a region contains references to another region. An edge is labelled by a type selector create(R) Region list Saved sequence number: 6 fails R_8 R_6 R_5 R_1 remove(R) ... new regions old regions

Support for nondet disjunction: Instant reclaiming
..., g1, (g2a ; g2b), ... Instant reclaiming new allocations R is an old region. Save the size of R at entry to the disjunction. Instant reclaim by restoring at start of any non-first disjunct: g2b. create(R) g1 choice point backtracks OR The goal of region points-to analysis is to produce region points-to graph for each procedure. A region points-to graph model the heap used by a procedure in terms of region. A node in a graph represents a physical region. Each node has an associated set of variables, which are stored in the region. A directed edge represents the fact that a region contains references to another region. An edge is labelled by a type selector g2a g2b Allocate in R use R

Optimized support for if-then-else
Efficient implementation. Support if-then-else without damaging its efficiency. Similar support needed. Region resurrection: protecting backward live regions. Instant reclaiming at start of the else part. The goal of region points-to analysis is to produce region points-to graph for each procedure. A region points-to graph model the heap used by a procedure in terms of region. A node in a graph represents a physical region. Each node has an associated set of variables, which are stored in the region. A directed edge represents the fact that a region contains references to another region. An edge is labelled by a type selector

Optimized support for if-then-else
Backtrack happens from inside the condition goal Only support for changes in the condition Protect backward live regions removed in the condition, Instant reclaiming new regions created in the condition, Instant reclaiming new allocations happen in the condition. These changes can be computed from region analysis information. No changes: No support added. Condition goals are often simple tests → maintaining efficiency. The goal of region points-to analysis is to produce region points-to graph for each procedure. A region points-to graph model the heap used by a procedure in terms of region. A node in a graph represents a physical region. Each node has an associated set of variables, which are stored in the region. A directed edge represents the fact that a region contains references to another region. An edge is labelled by a type selector

Runtime performance Mercury compiler that uses Boehm gc vs. Mercury compiler with RBMM. Average speedup 25%. 2 nondet programs: crypt & queens. boyer and life: substantial cost of supporting backtracking. (s) Add remove R after i Add remove R before j

Runtime performance Exclude gc time:
RBMM still better in 5 programs: better data locality → speedup due to better cache behaviour. (s) In general, our algorithm can detect the shortest lifetime of the regions in a program, but to be honest it is not always a good thing because what are in a region is more or maybe as important as the region lifetime.

Memory consumption Region page size 2k (words).
Initial RBMM size 200k, 200k/increase. Initial heap size ~ 4M words (default in Mercury). Add create R before i. words

Conclusions Our results suggest that
RBMM can be implemented with modest runtime overhead. Better data locality. → overall speedup. Related work: RBMM for Prolog: [K. Sagonas and H. ICLP 2002] Require different algorithms due to the significant difference between the two languages. Future work: Modify region analysis to take into account backward execution. Extend the supported subset of Mercury. In general, our algorithm can detect the shortest lifetime of the regions in a program, but to be honest it is not always a good thing because what are in a region is more or maybe as important as the region lifetime.

Runtime support for region-based memory management in Mercury

Similar presentations

Presentation on theme: "Runtime support for region-based memory management in Mercury"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Runtime support for region-based memory management in Mercury

Similar presentations

Presentation on theme: "Runtime support for region-based memory management in Mercury"— Presentation transcript:

Similar presentations

About project

Feedback