Path-tracing

Path-tracing is a technique to render realistic images by simulating the paths that light takes throughout a scene. Backwards path-tracing is a standard way of doing path-tracing where you shoot rays out from an "origin" (a camera, or a player's eye), and track their propagation through a scene until they reach a light source. A backwards path-tracer can typically be broken down into these steps:

  1. Launch primary rays from the origin. Each ray typically belongs to 1 pixel.
  2. Do intersection testing for each of these rays.
  3. For rays that hit a surface, calculate surface interactions, and potentially launch bounced rays. For rays that hit a light-source, or escape the scene and hit a skybox, add the light contribution to the ray's origin pixel.
  4. If there are any newly bounced rays, go back to step 2 and repeat.

Launching primary rays

Launching primary rays

Ray intersection

Ray intersection

Launching secondary rays

Launching secondary rays

Real time path-tracing attempts to do this in (you guessed it) real time. This means we are heavily constrained in the number of ray intersection checks we're allowed to do per-frame. For example, in my Minecraft path-tracer we can do ~5 ray-intersections per-pixel per-frame at 60 FPS. This means it's a good idea, for example, to skip the primary-ray intersection check and just use rasterization to resolve the primary ray. We also have to be very deliberate about the types of rays that we launch; typically this means we launch 1 ray for ambient light, and 1 ray for specular, and we restrict the number of bounces that are allowed.

A scene with only a primary ray. Light cannot bounce, and so only the sky is visible.

A scene with only a primary ray. Light cannot bounce, and so only the sky is visible.

The same scene with 1 bounce of ambient light, and many accumulated frames. Skylight can directly reach a few areas.

The same scene with 1 bounce of ambient light, and many accumulated frames. Skylight can directly reach a few areas.

The same scene with up to 3 bounces of ambient light, and direct sunlight rays for each of those bounces.

The same scene with up to 3 bounces of ambient light, and direct sunlight rays for each of those bounces.

Path-Tracing in Shaders

GPU Shading languages do not support recursive function calls. This is a problem for path-tracing, which is generally described as a recursive algorithm. Luckily, path-tracing is tail-recursive, which means it can be reframed as an iterative algorithm. The iterative version of path-tracing traverses a path-tree, using either a stack or a queue. For a single pixel with a bounce limit of 2, the path-tree might look like this:

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/78d4a80a-6698-4cfa-86b8-18f17071ae4b/image.png

And some naive GLSL code to trace a 2-bounce path-tree would look something like this:

vec3 Pathtrace(vec3 origin, vec3 direction, Surface rasterSurface) {
    vec3 outColor = vec3(0.0);
    
    vec3 ambientDir1 = GetAmbientSampleDir(direction, rasterSurface);
    vec3 ambientAbsorbtion1 = GetAmbientBRDF(direction, ambientDir1, rasterSurface) * rasterSurface.albedo;
    if (IsVisible(ambientAbsorbtion1)) {
        RayIntersection ambIntersect1 = ray_intersect(origin, ambientDir1);
        
        if (ambIntersect1.hit) {
            Surface ambSurface1 = GetSurface(ambIntersect1.data);
            outColor += ambSurface1.emissive * ambientAbsorbtion1;
            
            vec3 ambientDir2 = GetAmbientSampleDir(ambientDir1, ambSurface1);
            vec3 ambientAbsorbtion2 = GetAmbientBRDF(ambientDir1, ambientDir2, ambSurface1) * ambSurface1.albedo;
            
            if (IsVisible(ambientAbsorbtion2)) {
                RayIntersection ambIntersect2 = ray_intersect(ambIntersect1.origin, ambientDir2);
                
                if (ambIntersect2.hit) {
                    Surface ambSurface2 = GetSurface(ambIntersect2.data);
                    outColor += ambSurface2.emissive * ambientAbsorbtion2;
                } else {
                    outColor += GetSkyColor(ambIntersect1.origin, ambientDir2) * ambientAbsorbtion2;
                }
            }
            
            vec3 specularDir2 = GetSpecularSampleDir(ambientDir1, ambSurface1);
            vec3 specularAbsorbtion2 = GetSpecularBRDF(ambientDir1, specularDir2, ambSurface1) * ambSurface1.albedo;
            
            if (IsVisible(specularAbsorbtion2)) {
                RayIntersection specIntersect2 = ray_intersect(ambIntersect1.origin, specularDir2);
                
                if (specIntersect2.hit) {
                    Surface specSurface2 = GetSurface(specIntersect2.data);
                    outColor += specSurface2.emissive * specularAbsorbtion2;
                } else {
                    outColor += GetSkyColor(ambIntersect1.origin, specularDir2) * specularAbsorbtion2;
                }
            }
        } else {
            outColor += GetSkyColor(origin, ambientDir1) * ambientAbsorbtion1;
        }
    }
    
    // Now do the entire thing again for the specular side of the tree
    ...
    
    return outColor;
}

There are a lot of problems with this naive method. It effectively bakes a depth-first traversal of the tree statically into the code. This means that the lines of code have to be multiplied for every additional bounce we want to allow. Furthermore, this code has terrible branching capabilities, since every node is a distinct line of code, every thread in a warp has to wait on the worst-case of all the threads in that warp. Imagine trying to implement this tree using the naive method:

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/00ecccc4-a8e2-418c-a46e-7e5757faafff/image.png

The escaped pixels for 1 bounce of ambient light. Almost none of the threads that computed these pixels will be able to move on to productive work.

The escaped pixels for 1 bounce of ambient light. Almost none of the threads that computed these pixels will be able to move on to productive work.

Warps will be bottlenecked by the worst case of every tree that each thread has to execute. This is exasperated by the fact that monte-carlo sampling of ambient light demands that basically every pixel is random and de-correlated from its neighbors. It is statistically guaranteed that tons of threads will sit idle all the time. Parallelism is not being properly exploited.