1 Title: GC optimization for contiguous pointers to the same page
2 Tags: en, d, dgc, gc, phobos, optimization
4 This optimization had a patch__, written by Vladimir Panteleev, sitting on
5 Bugzilla__ (`issue #1923`__) for a little more than an year now. It was already
6 included in both Tango__ (`issue #982`__) and `DMD 2.x`__ but `DMD 1.x`__ was
9 __ http://d.puremagic.com/issues/attachment.cgi?id=239
10 __ http://d.puremagic.com/issues/
11 __ http://d.puremagic.com/issues/show_bug.cgi?id=1923
12 __ http://www.dsource.org/projects/tango
13 __ http://dsource.org/projects/tango/ticket/982
14 __ http://www.digitalmars.com/d/2.0/changelog.html
15 __ http://www.digitalmars.com/d/1.0/changelog.html
17 Fortunately is now included in `DMD 1.042`__, released yesterday.
19 __ http://www.digitalmars.com/d/1.0/changelog.html#new1_042
21 This optimization is best seen when you do word splitting of a big text (as
22 shown in the post__ that triggered the patch)::
24 import std.file, std.string;
26 auto txt = cast(string) read("text.txt"); // 6.3 MiB of text
27 auto words = txt.split();
30 __ http://www.digitalmars.com/d/archives/digitalmars/D/Slow_GC_67673.html
32 Now in ``words`` we have an array of slices (a contiguous area in memory filled
33 with pointers) about the same size of the original text, as explained__ by
36 __ http://www.digitalmars.com/d/archives/digitalmars/D/Slow_GC_67673.html#N67681
38 The GC heap is divided in (4KiB) pages, each page contains cells of a fixed
39 type called *bins*. There are bin sizes of 16 (``B_16``) to 4096 (``B_PAGE``),
40 incrementing in steps of power of 2 (32, 64, etc.). See `Understanding the
41 current GC`__ for more details.
43 __ /blog/blog/post/250bf643
45 For large contiguous objects (like ``txt`` in this case) multiple pages are
46 needed, and that pages contains only one bin of size ``B_PAGEPLUS``, indicating
47 that this object is distributed among several pages.
49 Now, back with the ``words`` array, we have a range of about 3 millions
50 interior pointers into the ``txt`` contiguous memory (stored in about 1600
51 pages of bins with size ``B_PAGEPLUS``). So each time the GC needs to mark the
52 heap, it has to follow this 3 millions pointers and find out where is the
53 beginning of that block to see its *mark-state* (if it's marked or not).
54 Finding the beginning of the block is not **that** slow, but when you multiply
55 it by 3 millions, it could get a little noticeable. Specially when this is done
56 several times as the dynamic array of ``words`` grow and the GC collection is
57 triggered several times, so this is kind of *exponential*.
59 The optimization consist in remembering the last page visited **if** the bin
60 size was ``B_PAGE`` or ``B_PAGEPLUS``, so if the current pointer being followed
61 points to the last visited (cached) page, we can skip this lookup (and all the
62 marking indeed, as we know we already visited that page).
64 .. vim: set et sw=4 sts=4 :