Right after writing the previous post on pre-pass lighting I began performing some tests, to see how it compares to the old deferred renderer. The results that I got exactly where pretty intriguing, so thought I might as nicely share them. Also note that this post may be a bit a lot more technical than the earlier.
The excellent issue with these renderers is that they each share the simple material data. So I can use the same information for each HPl2 and HPl3. HPL3 comes with the handful of far more attributes for decals but for tests, it is easy to just skip them. When setting up the test I went with a very basic scene, it just the same box model rendered several occasions, a floor and lights. Some times it is best to test with appropriate game scenes, but I wanted to some thing that could be easily tweaked and gave simpler output. This indicates that the tests are not one hundred% accurate of in-game efficiency, but even testing a level in game is not that, as framerate varies a lot based on where in a level a single appears. So generally benchmarking has some kind of fly-via, but that is of the scope from what I intended to do.
Note that HPl2 test was built in Visual Studio 2003, even though HPL3 makes use of the 2010 version. I do not feel this should matter much although, even if the optimization routines differ, just due to the fact pretty significantly all of the work is carried out on the GPU. The graphics card I did all my testing on is a Radeon 5850 HD (and other folks exactly where tried for some tests). And as a final note, all of the data is offered as average frame time (in milliseconds!) and not as frames per second. As Emil Persson points out, FPS is not a really great way to examine efficiency.
Now with my setup information out of the way, let's get down to the specifics. I very first began out with a scene like this:
1 x box, xz-plane floor, 1x spot light + shadow
which game me the following benefits:
This means, that provided a straightforward scene like this the old renderer is truly more quickly! This is not that strange although given that the scene does not have several lit screen pixels, most of the image getting sky. Hence, the extra pass added produced with the pre-pass renderer matters far more than an lighting speed-ups. Also, the reduce in draw buffer (three to two) in the g-buffer does not make up for the additional pass.
4000 x boxes, 1 x point light, x-z plane floor
As expected when there is a lot of factors to render, the pre-pass lighting is even slower. That additional pass shows on the efficiency. Bear in mind even though that 4000 objects is quite a lot and an crucial thing for very good efficiency on GPUs is to have as handful of draw calls as feasible.
1 x boxes, 1000 x point light, x-z plane floor
As noticed, after the scene is filled with lights, pre-pass lighting is quicker, but only so by a slight amount. Specifically taking into consideration the huge amount of lights. (I later realised that the actual lit screen pixels where really few, some thing fixed later on in test #five).
4000 x boxes, 1000 x point light, x-z plane floor
Performing a actually stressful test (the number of lights and objects are truly massive) it seems like the old deferred renderer wins out. This was actually a bit unexpected and dissappointing to me as I thought that the pre-pass lighting ought to not be this far behind. But taking the small distinction in test 3 into account, it is not that suprising. Nonetheless, after these tests it is clearly shown that pre-pass lighting is far from a giant speed up compared to deferred shading and it actually appears slower in most instances.
I also attempted to skip the early-z pass for pre-pass lighting (I use early-z in both renderers on all other tests). This is fundamentally a pass where the z-buffer is set up, and tends to make positive later passes only draws visible pixels. From reading Crytech papers, it does not seem like the the Crysis 2 engine has this although (and exact same seems true for other engines), so I attempted to do a quick and dirty test of not employing it and got this data: 48.7 (+two.5%)
This means that even without having the early z test, the pre-pass was nonetheless slower. Nevertheless, I did not attempts to lessen overdraw (like sorting front to back) and it may possibly be attainable for optimizations right here. However, when rendering front to back, there will be a lot much more state switching as you can't sort according to texture, and so forth as efficiently, so I wonder if the data might not even be worse in a a lot more realistic scenario.
I also tried this test out on a few other other cards (once again with complete early-z testing):
Geforce 240gt: 125, 137 (+9.six%)
Geforce 320M: 240, 240 (+/- %)
This gave the indication that on some cards pre-pass may well in fact be greater, and that it might not be as clear-reduce as the initial tests seemed to show.
As a final variation on this test, I added illumination maps to all textures, a function that needs an extra pass in the old engine. I also removed the height map rendering. This gave me: 50.six, 50. (-1.2%)
This is a very tiny speed up contemplating that the strategies now have the exact same amount of passes and that pre-pass lighting has faster light rendering and a smaller g-buffer.
488 x boxes, 30 x point light, x-z plane floor
Radeon 5850 HD: 7.four, 7.eight (+five.four%)
Geforce 240gt: 18, 19 (+five.5%)
Geforce 320M: 50., 45.5 (-9%)
Geforce 9800gtx: 9.five, 9.five (%)
In this test I change to a far more realistic number of lights and draw calls. I also aligned the lights so the lit pixels covered the whole screen, which I did not do above. As can be seen, on my pc (the 5850) deferred shading nevertheless wins, but on a less strong card the pre-pass lighting is considerably more rapidly. This difference may possibly be a bandwidth problem and some cards may possibly have troubles pushing the information amounts needed for deferred shading.
I also did a tweak to this test and turned down the quantity of draw calls a bit:
316x boxes, 30 x point light, x-z plane floor
Providing: 6.4, six.six (+3%)
This additional reduced the distinction and if I did the hackish removal of early z, pre-pass lighting plunged down to: five,2 (-18%)
Even even though this removal of early z is not really realistic, the final results show that I need to have to investigate it. Anything I will do as soon as I get a much more correct scene up and running.
Finally, I also attempted to give all the boxes illumination (and turning back on early z test):
6.8, six.six (-2.9%)
This clearly shows how you get the illumination nearly for cost-free in pre-pass, and that it costs a bit far more with the deferred shader. This is not surprising though, given that it demands an extra pass, but hints that additional effects can be a lot more efficiently implemented when employing pre-pass lighting.
The tests clearly show that my previous assumption that light rendering in pre-pass lighting would be significantly more quickly was incorrect. It is a bit more rapidly, but only noticeable so when actually stretching the limit and then only by a modest fraction. This tends to make me conclude that a single need to not use pre-pass lighting to have faster light rendering. Nonetheless, as can be observed on the test with the Geforce 320M, the pre-pass lighting approach matters a lot far more on older hardware, and it may possibly truly be of higher use there.
There is not any vast differences in the methods though and rather the choice ought to be primarily based on other merits. Given that pre-pass lighting enables for so much more range in materials, I will preserve it for HPL3, but I will not be expecting any rises in framerate any longer.
I hope this post will prove valuable for these who are pondering of employing either rendering strategy, and for the rest it may well be an fascinating insight on how testing is carried out (at least how I do it). Once again, sorry for the lack of pretty image, which I guarantee to make up for!