* Time spent in the malloc that triggered the collection.
* Time spent with the world stopped.
* Time spent doing the collection.
* Memory info before and after the collection: used, free, overhead and
wasted memory. Used is the memory used by the mutator, free is the
memory the mutator can request, overhead is the memory used by the
collector itself and wasted is memory that is not used by either the
mutator or collector and that can't be requested by the mutator either.
For each malloc() call:
* Time spent.
* Amount of memory requested.
* Attributes of the requested memory.
* A flag to tell if this call triggered a collection.
Statistics collection is controlled via the D_GC_OPTS environment
variable. To collect malloc statistics, use the option
malloc_stats_file, the value is the path to the file where to store the
malloc statistics (the contents will be replaced). To collect garbage
collection statistics, use the option collect_stats_file, the value is
the path to the file where to store the malloc statistics (the contents
will be replaced). The generated files are in CSV format and have
headers that make them self explanatory.
The GC offers a couple of options to debug memory problems, but they are
selectable only at compile-time. Being the GC part of the compiler
runtime, is not very common for the user to recompile the GC when it has
a memory problem, so making this option available always is very
desirable.
This patch allows configuring the GC via environment variables. 4 options
are available: sentinel, mem_stomp, verbose and log file. Only the first
2 are implemented right now.
For example, to check a program using memory stomping and a sentinel, you
can run it like this (using sh):
$ D_GC_OPTS=mem_stop=1:sentinel
As you can see, the value is optional for boolean options.
Even when free() can be called with a null pointer, the extra call might
be significant. On hard GC benchmarks making the test for null in the GC
code (i.e. avoiding the free() call) can reduce the GC time by almost ~5%.
This code will be superseded by the statistic collection code, and it was
unmantained and very probably broken (for example, the file and line
number was never filled in).
Use tango bindings to C standard library functions
As we need to use more libraries it became less practical to maintain our
own set of bindings, and since the GC only works with Tango, it makes
sense to just use Tango bindings.
This will be an inherently concurrent GC, so having a non-threaded version
of it makes no sense. Even more, I think the non-threaded doesn't even
compile.
This distinction is only made by Windows, and adds an extra complexity
that probably doesn't worth it (specially for other OSs, where this adds
a little overhead too, in both space and time).
Other OSs (like Linux) even do all the committing automatically, under the
hood, see:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;\
f=Documentation/vm/overcommit-accounting;hb=HEAD
Almost any system that support valloc() supports mmap(), and being
a deprecated function, it makes not much sense to maintain it as an
allocation method.
To avoid Tango dependency, we need to write our own C-API interface. This
is done in the new gc.libc module. In the future, maybe this module will
use Tango or Phobos accordly, but for now we stay free of dependencies (at
the expense of some extra work).
The code seemed to be broken, since the self thread ID was stored at
initialization and then asserted that the GC always run from that thread,
which seems far from reality (the GC can be invoked by any thread).
The PRINTF version now doesn't print the current thread ID either.
The Concurrent D Garbage Collector (CDGC) is based on the "basic" garbage
collector from the Tango runtime. This first commit is a copy of this GC,
as it is in Tango 0.99.8.
The CDGC is designed only for Linux, at least for now.