nmdist-log

Frustum Culling

In the past week I’ve been working on frustum culling. This started with a desire to run nmdist at a consistent 60 fps on my Surface Pro 11. And… I didn’t succeed! But I’m still putting this effort down as a success. Let’s walk through it.

The idea behind frustum culling is to not draw things that aren’t within view. One way to achieve this would be to leverage the visibility data held within the map (BSP). However, not all maps will have this visibility data precomputed. So I decided to ignore the visibility data for now.

Instead, we’ll walk the BSP and determine which nodes and leaves are within our view frustum. If a leaf node is within the view frustum, we’ll draw all the triangles associated with the leaf. Our view frustum check will be pretty simple, we’ll check a bounding sphere against 6 planes that make up our view frustum. The planes are derived from our camera transform using the technique described in Fast Extraction of Viewing Frustum Planes from the World-View-Projection Matrix by Gil Gribb and Klaus Hartmann. I’ll be calling this implementation the “Standard” implementation.

The next method I investigated was to leverage indirect drawing. Instead of supplying the arguments to draw commands on the CPU, the GPU can draw based on commands that are in video memory. Typically you’d do this because you want to build the draw commands using a compute shader, but for now we’ll build the indirect buffer on the CPU. However, this leads to a problem. While we can record draw commands in our indirect buffer… that’s it. We can’t bind new textures in-between draw calls. At the time, I was binding a new texture between every face.

The solution is to use a texture atlas. I’ve already been using a texture atlas for the lightmap data and text, so it was a natural next step. However, unlike lightmaps or text, Half-Life’s map textures are meant to tile. To preserve the tiling behavior, I don’t resolve the final atlas texture coordinates until the fragment/pixel shader. This preserves the tiling behavior while also allowing me to bind just one texture for the entire map. To fix some seams, I also had to pre-tile each texture around the borders with a configurable number of pixels.

The final method was to build the draw commands on the GPU using a compute shader. This is similar to the previous method, except that I no longer walk the BSP. Instead each thread on the GPU processes a leaf directly.

So how does all of that perform? Well, a bit disappointingly, there’s barely a difference when running fullscreen. These are the machines I used, where I loaded c1a0 staring at the large door (quite a bit of the level is within the view frustum here).

Machine Name CPU Model GPU Model Resolution/Refresh
Dev Machine AMD Ryzen 9 5950X RTX 4080 SUPER 2560x1440 @ 144Hz
Test Machine AMD Ryzen 7 5700X Intel ARC A380 1920x1080 @ 60Hz
Surface Pro 11 Snapdragon X Elite Adreno X1-85 2880x1920 @ 120Hz

And these are the framerates:

Machine Name No Culling Standard Indirect/CPU Indirect/GPU
Dev Machine 144 fps 144 fps 144 fps 144 fps
Test Machine 60 fps 60 fps 60 fps 60 fps
Surface Pro 11 32 fps 30 fps 30 fps 30 fps

So was it all for nothing? Well, not exactly. Despite not configuring anything, the game seems to be running with VSync enabled, so we may have more headroom on my dev and test machines when using frustum culling. What’s strange, is that if I run nmdist in windowed mode on my dev machine, the NVIDIA driver seems to kick into VRR mode? Here I can see a difference, but even then it’s mostly because of the atlasing rather than the frustum culling. No culling goes from 660 fps to 1300 fps just from atlasing.

Keep in mind, Half-Life is 25 years old at this point. The geometry in Half-Life levels should be trivial to most modern hardware at this point. I’ll have to build some test levels to really show the difference my frustum culling makes. But for now, I’m happy with the performance win from texture atlasing, and from learning about indirect drawing. I’m sure I’ll revisit performance again before the end of this project.