Science  People  Locations  Timeline
Index: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Home > Garbage collection (computer science)


 

In computing, garbage collection is a system of automatic memory management which seeks to reclaim memory used by objects which will never be referenced in the future. It is commonly abbreviated as GC. The part of a system which performs garbage collection is called a garbage collector.

When a system has a garbage collector it is usually part of the language run-time system and integrated into the language. The language is said to be garbage collected. Garbage collection was invented by John McCarthy as part of the first Lisp system. When someone refers to a garbage collected language what is meant is that garbage collection is a feature specified by the language. There are also some languages (mostly formal languages) which would require a garbage collector in any practical implementation, but whose specifications do not require garbage collection, usually because they are not intended to be used as practical programming languages. The lambda calculus is an example of such a language.

The Symbolics Lisp machine was unique in having hardware support for garbage collection.

The basic principle of how a garbage collector works is:

  1. Determine what data objects in a program cannot be referenced in the future
  2. Reclaim the storage used by those objects

Although in general it's impossible to know the moment an object has been used for the last time, garbage collectors use conservative estimates that allow them to identify when an object could not possibly be referenced in the future. For example, if there are no references to an object in the system, then it can never be referenced again. This is the criterion used by most modern garbage-collection systems, but see Disadvantages of Tracing Garbage Collectors below.

While garbage collection assists the management of memory, the feature is also almost always necessary in order to make a programming language type safe, because it prevents several classes of runtime errors. For example, it prevents dangling pointer errors, where a reference to a deallocated object is used.

1 Tracing garbage collectors

Tracing garbage collectors are the most common type of garbage collector. They focus on locating reachable objects, which are objects that may be referenced in the future, and then discard all remaining objects.

1.1 Reachability of an object

More formally, objects can be reachable in only two ways:

  1. A distinguished set of objects are assumed to be reachable, these are known as the roots. In a typical system these objects will be the machine registers, the stack, the instruction pointer, the global variables; in other words, everything that a program can reach directly.
  2. Anything referenced from a reachable object is itself reachable. (transitivity)

Informally, a reachable object is one that the program could get to by starting at an object it can reach directly, and then following a chain of pointer references.

1.2 Basic algorithm

Tracing garbage collectors perform garbage collection cycles. A cycle is started when the collector decides (or is notified) that it needs to reclaim storage, which in particular happens when the system is low on memory. A cycle consists of the following steps:

  1. Create initial white, grey, and black sets; these sets will be used to maintain progress during the cycle. Initially the black set is empty, the grey set consists of specially denoted objects known as the "roots" and possibly some additional objects chosen according to the particular algorithm used, and the white set is everything else. At any time in the algorithm a particular object will be in exactly one of the three sets. The set of white objects can be thought of as the set of objects that we are trying to reclaim the storage for; in the course of the cycle the algorithm will remove many objects from the white set, leaving behind a set of objects that it can reclaim the storage for.
  2. (This step is repeated until there are no objects in the grey set.) Pick an object from the grey set. Move all the white objects that are referenced (reachable in one step of following pointers) from the selected object into the grey set. Move the selected object from the grey set to the black set.
  3. When there are no more objects in the grey set, then all the objects remaining in the white set are not reachable and the storage occupied by them can be reclaimed.

The tricolour invariant is an important property of the objects and their colours. It is simply this:

No black object points directly to a white object.

Notice that the algorithm above preserves the tricolour invariant. The initial partition has no black objects, so the invariant trivially holds. Subsequently whenever an object is made black any white objects that it references are made grey, ensuring that the invariant remains true. When the last step of the algorithm is reached, because the tricolour invariant holds, none of the objects in the black set point to any of the objects in the white set (and there are no grey objects) so the white objects must be unreachable from the roots, and the system calls their finalisers and frees their storage.

Some variations on the algorithm do not preserve the tricolour invariant but they use a modified form for which all the important properties hold.



Read more »

Non User