Garbage Collecting the World Bernard Lang Christian Queinnec Jose Piquer Presented by Yu-Jin Chia See also: pp text
Objective Fault Tolerant Processor Independent Non-centralized Non-blocking Multiple instances can run No object passing Complete garbage reclamation
Dynamic Group Structure Processors cooperate to GC the group Works faster with small groups Groups can reorganize dynamically -Scales well. -Works well despite failures or additions. -Suitable for: large heterogeneous networks, distributed symbolic computation, distributed file systems, distributed databases.
Processor Level Can use any kind of local GC scheme (e.g. Mark and Sweep) Use Reference Counters to track remote objects -Terminology: Entry item (a.k.a. ‘skeleton,’ a reference to a local object that’s another node’s remote object), Exit item (a.k.a. ‘proxy,’ a reference to remote object). -Entry items have an RC = # exit items referring to it. -Use a scheme like Generational RC.
Group Negotiation Each node chooses when to join a GC group, and which group to join. Each node in a group is aware of the others. Groups last until the algorithm terminates (a.k.a. a ‘GC cycle’) Any group formation method suffices. Unique identifier for each group and GC cycle.
Initial Marking - Marks are assigned by group, and only meaningful to that group. Similar scheme to ‘tracing in groups’ p. 234 text. Skeleton: soft or hard. Proxy: soft, hard, or none.
Christopher’s Algorithm 1.Copy the RC for each node in the group, for each skeleton. 2.Check all proxies in the group, decrement RCs on corresponding skeletons. 3.When all are done, any skeletons with a positive RC are hard, else soft.
Propagation (2-Phase Marking) Local (inside each node): Set proxy marks to none. Trace from hard skeletons and root nodes, marking all proxies hard. Trace from soft skeletons, changing none to soft. Node Hard or Root Hard Soft
Global Propagation Propagation always occurs within the group under consideration. Upon completion of a GC cycle, a node may start a new one once its proxies have received hard marks. If a new remote reference is created, mark its skeleton hard in advance. Hard: referenced by local hard item or root Hard or root Hard Node A Node B
Stabilization Group Stability: 1.All nodes are ‘stable.’ 2.No messages in transit that would mark a skeleton hard. Stable: 1.No new data that would harden more skeletons in the group. 2.Can be lost if a new proxy is made, or when a skeleton is mobile. Given no node failure and the finite number of skeletons that may be allocated, this will happen eventually.
Dead Cycles Removal Done individually on each node, without group synchronization. 1.Modify soft skeletons to reference nil. 2.Reclaim the related proxies locally. 3.Send decrement messages to skeletons. 4.When a skeleton’s RC hits 0, reclaim. 5.When GC is finished, group may disband.
Failure Detection is done separately. Can either wait for the node to wake up, or reorganize the group. If the latter, keep hard marks, start at skeleton propagation. If multiple failures, multiple groups may form from an old one.
Non-Recoverable Failure 1.What do we do about objects referenced by the failed node? 2.What do we do about references to the failed node? 3.What do we do about objects on the failed node that are possibly recoverable?
#1 Assuming your skeleton RCs are up to date… 1.Run Christopher’s Algorithm on the new group. 2.Those skeletons whose RCs have changed reference the missing node. 3.Do something about it.
A Problem G1 G2 G3
Simultaneous Group Collections Don’t want to run GC at a node multiple times if it’s in more than one GC group. Local GCs can just track the marks for each group GC. Fast if the marks tend to be similar. However, if one group is a subgroup of another, the marks at skeletons may conflict. This can slow down group GC. Nodes in an overlapping area between largely unrelated groups is even worse- 50% performance on average.
Hierarchical Group Cooperation Definitions: Universal Group: The set of all nodes. Level Index: For each group, the number of larger groups for which it is a subgroup. This may change and must be updated. -By definition, at any time for any node, the level indexes uniquely identify the groups to which the node belongs.
Level Indexes Level 0 Level 1 Level 2
Objective Want each local GC to contribute to all GC groups in which it is a part. By previous definitions, if a skeleton is marked hard in one group, it can be safely marked hard in its subgroups. Define Mark x (N) as the mark on level x, which is the lowest level index for which node N is marked hard.
Multi-Phase Marking 1.Propagate Mark() entries to a proxies, which record the lowest level (biggest group) that has a reference to it. 2.The proxies are now hard for this level, and any higher ones. 3.Stability is reached on a node for a particular level when all resident proxies won’t have their marking level reduced to less than or equal that level. 4.When all groups stabilize for a particular level at a node, all resident skeletons with a higher level can be deleted.
Next Cycle For next cycle, reinitialize by checking skeleton references. If there are references external to the group, the marking level is set to that of the group, unless it’s already smaller.
True? “If a skeleton is marked hard in one group, it can be safely marked hard in its subgroups.” Not necessarily. Low markings might be incorrect due to the disappearance of a link from a larger group. Jobs aren’t the only thing being outsourced…
Keeping Reference Counts Avoids re-running Christopher’s Algorithm if groups don’t change. Useful for failure recovery. How? Instead of counting all references at each skeleton, have a global count for the whole network (level 0).
The Equation i≥1, Diff N [i, x] = Count N [i-1, x] – Count N [i, x] Where N= the node, i = the level, x = the skeleton -Vector of differences between the RC for each skeleton at this level and the next lower level (group and supergroup). -Initial = the smallest i s.t. Diff N [i, x] ≠ 0 (or existing one if smaller). 0 if all are 0. -Fewer updates required- only global count and one difference count with each added proxy. -Counts often smaller than conventional RCs.