Garbage (computer science)


In computer science, garbage includes data, objects, or other regions of the memory of a computer system, which will not be used in any future computation by the system, or by a program running on it. Because every computer system has a finite amount of memory, and most software produces garbage, it is frequently necessary to deallocate memory that is occupied by garbage and return it to the heap, or memory pool, for reuse.

Classification

Garbage is generally classified into two types: syntactic garbage, any object or data which is within a program's memory space but unreachable from the program's root set; and semantic garbage, any object or data which is never accessed by a running program for any combination of program inputs. Objects and/or data which are not garbage are said to be live.
Casually stated, syntactic garbage is data that cannot be reached, and semantic garbage is data that will not be reached. More precisely, syntactic garbage is data that is unreachable due to the reference graph, which can be determined by many algorithms, as discussed in Tracing garbage collection, and only requires analyzing the data, not the code. Semantic garbage is data that will not be accessed, either because it is unreachable, or is reachable but will not be accessed; this latter requires analysis of the code, and is in general an undecidable problem.
Syntactic garbage is a subset of semantic garbage, as it is entirely possible for an object to hold a reference to another object without ever using that object.

Example

In the following simple stack implementation in Java, each element popped from the stack becomes semantic garbage once there are no outside references to it:

public class Stack

This is because elements still contains a reference to the object, but the object will never be accessed again through this reference, because elements is private to the class and the pop method only returns references to elements it has not already popped. However, knowing this requires analysis of the code of the class, which is undecidable in general.
If a later push call re-grows the stack to the previous size, overwriting this last reference, then the object will become syntactic garbage, because it can never be accessed again, and will be eligible for garbage collection.

Automatic garbage collection

An example of the automatic collection of syntactic garbage, by reference counting garbage collection, can be produced using the Python command-line interpreter:

>>> class Foo:
... """This is an empty testing class."""
... pass
...
>>> bar = Foo
>>> bar
<__main__.Foo object at 0x54f30>
>>> del bar

In this session, an object is created, its location in the memory is displayed, and the only reference to the object is then destroyed—there is no way to ever use the object again from this point on, as there are no references to it. This becomes apparent when we try to access the original reference:

>>> bar
Traceback :
File "", line 1, in ?
NameError: name 'bar' is not defined

As it is now impossible to refer to the object, the object has become useless; it is garbage. Since Python uses garbage collection, it automatically deallocates the memory that was used for the object so that it may be used again:

>>> class Bar:
... """This is another testing class."""
... pass
...
>>> baz = Bar
>>> baz
<__main__.Bar object at 0x54f30>

The Bar instance now resides at the memory location 0x54f30; at the same place as where our previous object, the Foo instance, was located. Since the Foo instance was destroyed, freeing up the memory used to contain it, the interpreter creates the Bar object at the same memory location as before, making good use of the available resources.

Effects

Garbage consumes heap memory, and thus one wishes to collect it.
However, collecting garbage takes time and, if done manually, requires coding overhead. Further, collecting garbage destroys objects and thus can cause calls to finalizers, executing potentially arbitrary code at an arbitrary point in the program's execution. Incorrect garbage collection, primarily due to errors in manual garbage collection, results in memory safety violations due to use of dangling pointers.
Syntactic garbage can be collected automatically, and garbage collectors have been extensively studied and developed. Semantic garbage cannot be automatically collected in general, and thus causes memory leaks even in garbage-collected languages. Detecting and eliminating semantic garbage is typically done using a specialized debugging tool called a heap profiler, which allows one to see which objects are live and how they are reachable, enabling one to remove the unintended reference.

Eliminating garbage

The problem of managing the deallocation of garbage is well-known in computer science. Several approaches are taken: