source/blog/posts/2010/08/29-recursive-vs.-iterative-marking.rst

   1 Title: Recursive vs. iterative marking
   2 Tags: en, d, gc, dgc, cdgc, mark, recursive, iterative, performance, benchmark
   3
   4 After a `small (but important) step`__ towards making the D__ GC__ truly
   5 concurrent__ (which is my main goal), I've been exploring the possibility of
   6 making the mark phase recursive instead of iterative (as it currently is).
   7
   8 __ /blog/blog/post/-4c9dd5b5
   9 __ http://www.digitalmars.com/d/
  10 __ http://en.wikipedia.org/wiki/Garbage_collection_%28computer_science%29
  11 __ http://www.memorymanagement.org/glossary/c.html#concurrent.garbage.collection
  12
  13 The motivation is that the iterative algorithm makes several passes through the
  14 **entire heap** (it doesn't need to do the **full job** on each pass, it
  15 processes only the newly reachable nodes found in the previous iteration, but to
  16 look for that new reachable node it does have to iterate over the entire heap).
  17 The number of passes is the same as the connectivity graph depth, the best case
  18 is where all the heap is reachable through the root set, and the worse is when
  19 the heap is a single linked list. The recursive algorithm, on the other hand,
  20 needs only a single pass but, of course, it has the problem of potentially
  21 consuming a lot of stack space (again, the recurse depth is the same as the
  22 connectivity graph depth), so it's not paradise either.
  23
  24 To see how much of a problem is the recurse depth in reality, first I've
  25 implemented a fully recursive algorithm, and I found **it is** a real problem,
  26 since I had segmentation faults because the (8MiB by default in Linux) stack
  27 overflows. So I've implemented an hybrid approach, setting a (configurable)
  28 maximum recurse depth for the marking phase. If the maximum depth is reached,
  29 the recursion is stopped and nodes that should be scanned deeply than that are
  30 queued to scanned in the next iteration.
  31
  32 Here are some results showing how the total run time is affected by the maximum
  33 recursion depth:
  34
  35 .. image:: ##POST_URL##/29-recursive-dil.png
  36    :align: center
  37
  38 .. image:: ##POST_URL##/29-recursive-voronoi.png
  39    :align: center
  40
  41 The red dot is how the pure iterative algorithm currently performs (it's placed
  42 arbitrarily in the plot, as the X-axis doesn't make sense for it).
  43
  44 The results are not very conclusive. Even when the hybrid approach performs
  45 better for both Dil__ and Voronoi__ when the maximum depth is bigger than 75, the
  46 better depth is program specific. Both have its worse case when depth is 0,
  47 which makes sense, because is paying the extra complexity of the hybrid
  48 algorithm with using its power. As soon as we leave the 0 depth, a big drop is
  49 seen, for Voronoi big enough to outperform the purely iterative algorithm, but
  50 not for Dil, which matches it near 60 and clearly outperforms it at 100.
  51
  52 __ http://code.google.com/p/dil/
  53 __ http://codepad.org/xGDCS3KO
  54
  55 As usual, Voronoi challenges all logic, as the best depth is **31** (it was
  56 a consistent result among several runs). Between 20 and 50 there is not much
  57 variation (except for the magic number **31**) but when going beyond that, it
  58 worsen slowly but constantly as the depth is increased.
  59
  60 Note that the plots might make the performance improvement look a little bigger
  61 than it really is. The best case scenario the gain is 7.5% for Voronoi and 3%
  62 for Dil (which is probably better measure for the real world). If I had to
  63 choose a default, I'll probably go with 100 because is where both get
  64 a performance gain and is still a small enough number to ensure no segmentation
  65 faults due to stack exhaustion is caused (only) by the recursiveness of the mark
  66 phase (I guess a value of 1000 would be reasonable too, but I'm a little scared
  67 of causing inexplicable, magical, mystery segfaults to users). Anyway, for
  68 a value of 100, the performance gain is about 1% and 3.5% for Dil and Voronoi
  69 respectively.
  70
  71 So I'm not really sure if I should merge this change or not. In the best case
  72 scenarios (which requires a work from the user to search for the better depth
  73 for its program), the performance gain is not exactly **huge** and for
  74 a reasonable default value is so little that I'm not convinced the extra
  75 complexity of the change (because it makes the marking algorithm a little more
  76 complex) worth it.
  77
  78 Feel free to leave your opinion (I would even appreciate it if you do :).
  79
  80
  81 .. vim: set et sw=3 sts=3 :