Path-tracing is a technique to render realistic images by simulating the paths that light takes throughout a scene. Backwards path-tracing is a standard way of doing path-tracing where you shoot rays out from an "origin" (a camera, or a player's eye), and track their propagation through a scene until they reach a light source. A backwards path-tracer can typically be broken down into these steps:
Launching primary rays
Ray intersection
Launching secondary rays
Real time path-tracing attempts to do this in (you guessed it) real time. This means we are heavily constrained in the number of ray intersection checks we're allowed to do per-frame. For example, in my Minecraft path-tracer we can do ~5 ray-intersections per-pixel per-frame at 60 FPS. This means it's a good idea, for example, to skip the primary-ray intersection check and just use rasterization to resolve the primary ray. We also have to be very deliberate about the types of rays that we launch; typically this means we launch 1 ray for ambient light, and 1 ray for specular, and we restrict the number of bounces that are allowed.
A scene with only a primary ray. Light cannot bounce, and so only the sky is visible.
The same scene with 1 bounce of ambient light, and many accumulated frames. Skylight can directly reach a few areas.
The same scene with up to 3 bounces of ambient light, and direct sunlight rays for each of those bounces.
GPU Shading languages do not support recursive function calls. This is a problem for path-tracing, which is generally described as a recursive algorithm. Luckily, path-tracing is tail-recursive, which means it can be reframed as an iterative algorithm. The iterative version of path-tracing traverses a path-tree, using either a stack or a queue. For a single pixel with a bounce limit of 2, the path-tree might look like this:
And some naive GLSL code to trace a 2-bounce path-tree would look something like this:
vec3 Pathtrace(vec3 origin, vec3 direction, Surface rasterSurface) {
vec3 outColor = vec3(0.0);
vec3 ambientDir1 = GetAmbientSampleDir(direction, rasterSurface);
vec3 ambientAbsorbtion1 = GetAmbientBRDF(direction, ambientDir1, rasterSurface) * rasterSurface.albedo;
if (IsVisible(ambientAbsorbtion1)) {
RayIntersection ambIntersect1 = ray_intersect(origin, ambientDir1);
if (ambIntersect1.hit) {
Surface ambSurface1 = GetSurface(ambIntersect1.data);
outColor += ambSurface1.emissive * ambientAbsorbtion1;
vec3 ambientDir2 = GetAmbientSampleDir(ambientDir1, ambSurface1);
vec3 ambientAbsorbtion2 = GetAmbientBRDF(ambientDir1, ambientDir2, ambSurface1) * ambSurface1.albedo;
if (IsVisible(ambientAbsorbtion2)) {
RayIntersection ambIntersect2 = ray_intersect(ambIntersect1.origin, ambientDir2);
if (ambIntersect2.hit) {
Surface ambSurface2 = GetSurface(ambIntersect2.data);
outColor += ambSurface2.emissive * ambientAbsorbtion2;
} else {
outColor += GetSkyColor(ambIntersect1.origin, ambientDir2) * ambientAbsorbtion2;
}
}
vec3 specularDir2 = GetSpecularSampleDir(ambientDir1, ambSurface1);
vec3 specularAbsorbtion2 = GetSpecularBRDF(ambientDir1, specularDir2, ambSurface1) * ambSurface1.albedo;
if (IsVisible(specularAbsorbtion2)) {
RayIntersection specIntersect2 = ray_intersect(ambIntersect1.origin, specularDir2);
if (specIntersect2.hit) {
Surface specSurface2 = GetSurface(specIntersect2.data);
outColor += specSurface2.emissive * specularAbsorbtion2;
} else {
outColor += GetSkyColor(ambIntersect1.origin, specularDir2) * specularAbsorbtion2;
}
}
} else {
outColor += GetSkyColor(origin, ambientDir1) * ambientAbsorbtion1;
}
}
// Now do the entire thing again for the specular side of the tree
...
return outColor;
}
There are a lot of problems with this naive method. It effectively bakes a depth-first traversal of the tree statically into the code. This means that the lines of code have to be multiplied for every additional bounce we want to allow. Furthermore, this code has terrible branching capabilities, since every node is a distinct line of code, every thread in a warp has to wait on the worst-case of all the threads in that warp. Imagine trying to implement this tree using the naive method:
The escaped pixels for 1 bounce of ambient light. Almost none of the threads that computed these pixels will be able to move on to productive work.
Warps will be bottlenecked by the worst case of every tree that each thread has to execute. This is exasperated by the fact that monte-carlo sampling of ambient light demands that basically every pixel is random and de-correlated from its neighbors. It is statistically guaranteed that tons of threads will sit idle all the time. Parallelism is not being properly exploited.