Improving raytracing performance by improving libbu

Final Report — Cezar El-Nazli

The Goal

The original goal was speeding up raytracing by replacing some data structures with more efficient ones.

First Session

The original goal was to improve raytracing performance by using better-suited data structures. Originally, it was thought that maintaining sets using a vector was inefficient (due to maintaining the uniqueness property in O(n)), but an investigation revealed that this was not really the case.

More complex data structures with better theoretical runtime performance (e.g., balanced trees with O(log N) look ups) had worse performance because N was small and the constant factor was dominant (small for iterating a vector, big for trees and hashes with expensive hash functions).

After benchmarking a number of data structures and analysing the behaviour of the raytracer on some example geometry, I changed the initial size of the vector used for backing the set from 64 to 16, which showed a performance improvement of 10%. The reason for this improvement was that the vector was being zeroed initially, and it contained at most 12 elements. Therefore, with an initial size of 64, 52 elements were zeroed but never accessed, while for 16 initial elements, only 4 were being wasted.

Second Session

In the second part of GSoC, I spent some time reading an article about CPU caches and tracking down four bugs.

  1. An infinite loop which occurred when the raytracer was run with debugging enabled. List elements were removed from the head and added to the tail, so the list was never emptied. Its presence was due to older code being removed and the logic getting out of sync.
  2. A deadlock in bu_mapped_file. It occurred when a branch was taken which did not release a semaphore. Discussion on Zulip concluded that the branch was only taken by some code which was not actually used. Nonetheless, the logic was still flawed.
  3. On the latest version of macOS (10.13 High Sierra) and possibly BSDs (I did not test on BSDs), filling struct timeval's usec member with an integer amounting to more than a second causes functions taking the struct as an argument to immediately return with errno=EINVAL. I fixed a bug related to this in fbserv.
  4. I tried to isolate a compilation bug on Debian 9.4 + GCC 6.3. It took a lot of time but was ultimately of not much use because I couldn't figure out the root cause (likely a bug in the linker) and my proposed workaround incurred an unacceptable performance penalty.

Final Session

The focus in this period was on eliminating linked lists in favour of dynamically-allocated arrays. Use of linked lists is discouraged because of their cache unfriendliness.

I touched some of the more isolated structs which used linked lists (bu_mapped_file, bu_temp, bu_observer, bu_hook, bu_redblack, bu_cmd, fb_obj) and began creating unit tests for some of them (starting with bu_hook).

My Work

Future Work

Future work would most likely focus on removing the linked list from the data structures that are left. The more central ones (such as struct partition) require more care as there are more things that can go wrong there.