Last time I wrote about my quest to get nmdist running on my Surface Pro 11 at a consistent 60 fps, and I noted how the performance numbers I collected were more or less useless due to VSync. I did some retooling and got proper numbers with VSync disabled and there were some surprises! As a reminder, these are the machines I tested on:
| Machine Name | CPU Model | GPU Model | Resolution/Refresh |
|---|---|---|---|
| Dev Machine | AMD Ryzen 9 5950X | RTX 4080 SUPER | 2560x1440 @ 144Hz |
| Test Machine | AMD Ryzen 7 5700X | Intel ARC A380 | 1920x1080 @ 60Hz |
| Surface Pro 11 | Snapdragon X Elite | Adreno X1-85 | 2880x1920 @ 120Hz |
Here are the new numbers:
| Machine Name | No Culling | Standard | Indirect/CPU | Indirect/GPU |
|---|---|---|---|---|
| Dev Machine | 1000 fps | 940 fps | 510 fps | 1030 fps |
| Test Machine | 557 fps | 550 fps | 288 fps | 468 fps |
| Surface Pro 11 | 60 fps | 63 fps | 50 fps | 30 fps |
These numbers are more useful! And you might have noticed that I didn’t actually fail when I thought I had! Both the “No Culling” and “Standard” codepaths hit 60 fps! My only guess is that somewhere along the display chain, VSync is busted on this device and locked me to 30 fps before, despite being able to consistent stay above 60 fps. Second, it seems that I was right that the NVIDIA driver was forcing VRR mode when windowed, and I’m still baffled by that behavior.
Another interesting finding is that on the Surface Pro 11, Indirect/CPU performed better than Indirect/GPU. I can only speculate why this is, but I’m guessing that indirect drawing is emulated on this device, and the fact that I’m writing to the indirect buffer from the CPU may be giving the Qualcomm driver an opportunity to skip GPU readback when emulating the indirect draw call.
Similarly, I believe that Intel’s Alchemist GPUs emulate indirect drawing. This is supported by a comment made by Intel’s Tom Petersen when interviewed by Gamers Nexus, which you can watch here. I’m very interested to get my hands on a Battlemage card, as that shouldn’t be emulating indirect drawing! Too bad they’re sold out on Newegg…
Finally, this seems to back up my conclusions from the previous post. Most of the performance gains were achieved via atlasing the map textures, and frustum culling didn’t do a whole lot. That’s to be expected, given that Half-Life is a 25 year old game that had tiny triangle budgets by today’s standards. I’m still interested in doing some testing with custom maps that try to push the nmdist engine, as well as getting my hands on an Intel ARC B580 to test with.