Good evening our loyal supporters. Welcome back to the same spiderchannel, at the same spidertime.
This biweek's plan was MLAA and SSAO. In addition, some remaining issues with lighting were squashed, and some unplanned effects were added, business as usual.
Selective bloom, requested by artists, was added. Now you can mark objects as causing bloom, even if their native brightness is not enough to catch them in the normal siphoning.
The test object in this picture is not the best one; you will likely see this effect used in the green crystals of the mine, and the windows of the mansion. The aforementioned windows may get something else too, but that's a secret, so don't tell anyone.
This is done by keeping a list of all such marked objects, and rendering that list with color and depth writes off, only writing the stencil. This results in 4x-8x drawing speed, as most graphics cards have that path optimized for uses such as Z-prepass.
This stencil is then used to copy the affected pixels from the picture to the bloom texture.
The bloom threshold was made configurable per-track. As you can see in the picture, the left version is clearly not bloom over-use.
The right is the default threshold, left was altered a bit to show how glorious a bit of bloom looks like.
Lightmaps were enabled in the new system. Before, after, the map.
This is a splash of older tech, and quite laborous for the artists too. However four tracks currently have it, so it should be used there.
MLAA, or Morphological Anti-Aliasing, is a shader-based AA technique. This means it can run in situations where traditional MSAA cannot, or catch edges such as those inside transparent textures better. It can usually reach better quality at the same speed; the quoted numbers for this particular technique are MSAA 8x quality at MSAA 2x speed, but the exact specifics obviously depend on the scene.
While no longer the state of the art, Jimenez's MLAA was used due to the author's familiarity with the technique, and the very decent results it still achieves. The most common competitor, FXAA, is a bit faster, but results in inferior quality - it blurs textures noticeably, which should not happen.
A short overview of the technique follows. First, any jagged edges are detected and marked for processing using the stencil. This greatly improves the performance of the technique, since the heavy shader for the following pass is only run on the needed pixels.
In the second pass, each edge is classified according to a pre-generated map of edges. This map tells the following pass where to fetch information from - essentially, which pixels to combine and the weights for them.
The final pass simply overlays the combined pixels on the picture. Before - after:
Dear reader, meet JJ.
Have no fear though, the glistering brightness that fills you with awe is a bit toned-up in this picture; sadly, the version to make it in the game will be more subtle. This comes as a disappointment to fans everywhere, and there have already been mass protests of over a thousand people outside my residence.
Levels can now choose to have cloud shadows. This completely fake system only fits some levels with a bright, daytime sky with clouds, but there are a few of such levels, and it improves them quite a bit.
It's a fairly subtle effect, and so hard to catch from a still picture, but when moving in the wind it's quite nice.
Itching to hear more I bet? This very fake effect is done inside the sun light, as that's the only thing that should be affected by it. A straight planar top-down mapping is created based on the pixel's world-space position; it's offset by the wind, and then used for a look-up in a specially made texture.
The effect is quite cheap. It does come with the downside that as currently there is no shadow mapping, the cloud shadows (just like sunlight) are visible inside buildings, caves and such. This may limit the levels where this can be used a bit for now, but with the overworld and hacienda, those areas are few and having it there is not distracting.
Finally, SSAO, or Screen-Space Ambient Occlusion.
"Mama, I want contact shadows and dark creases" "Sure thing, hon"
Another fake effect with no real-world equivalent, SSAO usually enhances scenes quite a bit. It approximates global light bouncing by making approximate ray traces around a pixel, checking if those points can possibly occlude some light from reaching this pixel.
Like you can probably imagine, it is expensive. Several tradeoffs were made in order to get it to run on the majority of the user base; it should still look acceptable, while the FPS hit is much less than the usual implementation.
Low - high:
The low option runs on a quarter resolution, taking only a single sample per step, without randomness, or an edge-aware blur.
Each of the listed parts lessen its quality somewhat, while increasing speed. First of all, the lower resolution prevents small creases from being detected, limiting the effect only to contact shadows. The upside is that it's 16 times less area, directly resulting in a 16x speedup over the high version.
Second, being able to only take a single sample per step, the information we have to work with is the normal plus a linearized, 8-bit depth. This results in some false occlusions in case the occluder is outside the hemisphere; the alternative however, making positions available, would double the cost of this effect.
The randomness in SSAO is usually implemented via a small normal texture. Its cost (beyond flicker that always results from randomness) is that it changes the step's texture read to a dependent one - named using great wit since the coordinates depend on another texture read.
A dependent texture read is a costly operation, since it prevents caches from pre-fetching the actual read. The exact cost cannot be calculated in a highly parallel pipelined system, but assuming it delays each read by 300 instructions (a typical cache miss), a core clock of 400MHz, and 16 steps, each pixel is delayed by about 12 microseconds. The delay is highly variable due to inter-dependencies, so take that cost estimation with a pinch of salt.
The first couple programmable desktop GPUs could only prefetch textures based on coordinates from a varying (that is, export from the vertex shader) or a hardcoded value. This is still the case for many mobile GPUs, but any desktop GPU from about 2004 onwards supports a set of math operations, such as additions and multiplication, making our set of vectors completely pre-fetchable. Should the target be a mobile GPU, this limitation could be circumvented by making the calculation of those vectors in the vertex shader and passing them as varyings.
As for using a normal gaussian blur over an edge-aware one, it results in some of the occlusions bleeding over a bit, but is about 4x the speed of the alternative. A 2x component comes from the fact that the edge-aware blur cannot take advantage of the bilinear hardware, requiring it to do twice as many samples, and the other 2x from requiring a branch for each sample.
With this, we end today's infotainment. Tune back next time!