This is the first big step towards a concurrent GC. The mark phase is ran
in a fork()ed process and the world is only stopped to do the fork()
because we need each thread to dump the CPU registers into the stack to
be scanned.
Forking is controlled via the option "no_fork" (which is false by default).
If not enough support from the underlying OS is found (i.e. no fork() or
no shared memory) or if fork() fails, the mark phase fallback to run in
the same process as the mutator (as it was done before this patch).
The mark and freebits bitmaps are shared between the two processes to
communicate the results of the mark phase. The freebits could not be
shared, but in that case the freebits should be set in the mutator process,
making pauses longer. Freebits should be revisited though.