The first rendering talk was very interesting, it was mostly about mobile GPUs, how they differ from classic desktop GPUs at the hardware level, and how to take advantage of it in the Vulkan API. I think those insights are essential for any graphics programmer working on mobile, and are not limited to Vulkan programming, as those features are exposed in popular game engines like Unity.
Mobile GPUs usually use Tile-based rendering, which means shading is done for a small portion of the screen at the time (a tile). When a tile needs to be rendered, data is transferred from main memory (which is usually shared between CPU and GPU) to the local and faster SRAM, and all rendering operations are done on-tile. When finished, the tile's pixel data is written back to main memory, and the next tile can be rendered the same way.
Tile-based renderer hardware architecture
All transfer back and forth from main memory to SRAM should be minimized. Also, data that doesn't need to be loaded into SRAM (LOAD_OP), or stored back into main memory (STORE_OP) should be marked as such so that bandwidth is not wasted on unnecessary data. This can be done with Load/Store Operations, which are specified individually for each render texture (LOAD, CLEAR, DONT_CARE):
Load/Store operations are specified per attachment
For example, if the depth buffer is not needed after rendering, we can set it's STORE_OP to DONT_CARE so it doesn't get written back into main memory. Or maybe the color buffer needs to be cleared before being rendered to, but doesn't need to be read from, so we can set its LOAD_OP to CLEAR, which has a small overhead but not as much as loading previous color buffer values to SRAM.
The talk goes into more details, for example on how to get an efficient behavior when using MSAA (write back only the resolved buffer), and also how to do post-processing or shading from G-Buffer directly on-tile, without requiring reads/writes from/to main memory in-between.
The talk also touches on Vulkan pipeline barriers, which are useful to avoid pipeline stalls in-between draw calls. For example, there is the "Bottom to top" pipeline, where there is a barrier between each render pass (the next pass cannot execute before the current one is finished):
Bottom to top pipeline causes gaps between tasks
Another pipeline type is the "Frag to Frag", where the fragment shader waits for the previous one to complete before executing. This allows for the vertex shader of the next pass to execute simultaneously to the current fragment shader:
Frag to Frag pipeline removes the gaps for better performance
As you can see the latter is much more efficient.
Roblox is an open world creative sandbox game very popular with kids and teens.
An in-game screenshot of Roblox
It might appear simplistic at first glance because of the low-fidelity graphics, but because players can create any world or mini-game inside Roblox, everything needs to be 100% dynamic and still run smoothly across a wide range of hardware. Also, everything needs to work out-of-the-box with any feature combination, since most players don't have the artistic/technical skills to fix asset authoring issues that would typically arise during the development of a video game.
For voxel terrains they have improved upon the marching cube algorithm that has non-uniform topology and non-intuitive/restrictive vertex placement. They use a dual method instead to get better uniformity:
Image of a terrain generated with their new method
Screenshot of a terrain without de-tiling
Screenshot of a terrain without de-tiling
The general idea is to split the texture into tiles, mix them and blend the color per-texel, then remove seams by doing histogram equalization.
They also do terrain layers blending based on heightmap.
Screenshot of a terrain with basic blending
Heightmap based blending:
Screenshot of a terrain with heightmap blending
This is probably the highlight of Siggraph 2020 for me, a technique to do differentiable rendering without a derivative graph. Differentiable rendering is useful to reconstruct scene parameters from an image, as seen in this video:
The implementation presented previously by Wenzel Jackob was with Mitsuba 2 in the form of a code graph that can be seen as a differential program generated automatically from the original code of the path tracing renderer. This worked, but was very slow, required a lot of memory, and was complex to implement.
The new adjoint method presented at Siggraph ditches the graph to instead find derivative directly with path tracing! At runtime it needs to make 2 simulations: Primal rendering, and adjoint differential simulation. But can also do normal and differential simulation at the same time.
Overview of the radiative backpropagation method
This is the important part:
Close-up on the backpropagation part of the algorithm
- The adjoint radiance (difference between actual change and desired change) is propagated from the camera into the scene (the first ray).
- When hitting a surface (table), sample next direction from BSDF, and estimate incoming radiance with recursive path tracing (red). We need incoming radiance because changes in the BSDF are more or less important based on the amount of light it receives.
- Multiply adjoint radiance (projected from camera) by the incident radiance (coming from the rest of the scene).
- Backpropagate changes to BSDF parameters (pre-compiled autodiff derivative shader in OptiX).
- Repeat recursively.
The technique boasts a 1000x speedup relative to the original version!