Lots of changes since last week, but the majority of my efforts were focused on overhauling my approach to rendering.
My basic approach is primarily inspired by Christer Ericson’s post about the approach he uses for ordering draw calls. Some other useful sources of information were Tom Forsyth’s post about the cost of renderstate changes and Guerilla’s presentation on Killzone 2’s renderer from DEVELOP 2007. The Killzone 2 presentation is especially worthwhile since it describes the way they take advantage of concurrency techniques in their renderer.
Until now, the rendering code in my game has been almost entirely immediate-mode; various objects in the game world like tiles and entities had Draw methods that would utilize the game’s GraphicsDevice and SpriteBatch to render themselves, changing render states as needed and generating dynamic geometry where necessary (like for water).
Of course, if every object changes render states, you quickly run into a situation where you’re changing states tens of thousands of times per frame, which doesn’t perform very well. As a result, I ended up having to add some basic support for grouping together identical render states. In the end, I often had to perform multiple passes over the scene in order to minimize state changes, and the addition of multiple onscreen ‘layers’ meant that I often had to make half a dozen passes over the same collections (like Level.Entities and Level.Geometry) in order to render things in the correct order without too many state changes. My new renderer clearly needed to provide an inexpensive way to reduce state changes without complicating my code.
In addition to the issue of state changes, however, was the simple fact that rendering was too expensive. In particular, rendering all the onscreen tiles for the 6+ layers in the level could often add up to 5%+ of my CPU time, due to the relative inefficiency of the XNA SpriteBatch class when trying to draw thousands of bitmaps at various locations on the screen. The use of SpriteBatch also meant that I couldn’t effectively sort tiles by texture, because using a custom blending configuration with SpriteBatch requires you to disable its texture sorting support. (My tiles and sprites have to be premultiplied so they look correct when scaled, unfortunately.) As a result, my new renderer needed to minimize the cost of individual drawing operations, and it also needed to give me a way to utilize material/texture sorting with custom blending configurations.
One final detail that factored into the design was concurrency: Between the cost of actually rendering the game world, and the cost of waiting for vertical sync, the main thread was spending as much as 40% of its time performing rendering, which made it much harder to maintain a smooth, consistent 60fps framerate. In order to minimize the amount of time the main thread spent blocked during rendering, my new design needed to make it possible to move rendering work off the main thread without introducing thread safety issues – putting a lock around all my game state wasn’t going to cut it.
The design I ended up with roughly looks like this:
Every frame, at the beginning of my Draw method I construct a new Render.Frame object to represent the frame that’s being rendered.
The Render.Frame object contains a list of Render.Batch objects; each Batch has an associated Layer and Material. Layers are simple integers that allow me to explicitly order batches, so that I can create batches in any given order, and even create multiple batches at once. This also allows me to perform a single sort of the Batches array before sending the batches off to the video card, in order to minimize state changes.
Any given Batch contains a list of structures representing ‘Draw Calls’. In some cases, a draw call maps directly to a hardware drawing operation – like DrawUserPrimitives – but in other cases, it maps to something more granular. For example, a BitmapBatch contains BitmapDrawCalls, where each draw call represents a single bitmap, much like the arguments you pass to SpriteBatch’s Draw method. This allows me to sort individual draw calls based on their parameters to minimize state changes within a batch.
Finally, the Material objects associated with a given Batch are a superset of the XNA’s Effect class – they include a VertexDeclaration, Effect, and optional delegates for configuring other rendering state like the current blending function or stencil state. Grouping all these parameters together in one object allows me to sort batches cheaply by comparing material instances, but still gives me some level of granularity since I can change the shader parameters of an active Effect within a batch, for example to support rendering multiple textures inside a single BitmapBatch.
In addition to the more obvious performance advantages of this approach, like the ability to sort by material, one other advantage is less obvious: Since a Frame contains information on all the drawing operations that need to be performed, but doesn’t depend on any of my game state, I can safely hand that object to another thread and have that thread perform the drawing operations. This lets me move a significant portion of my rendering off the main thread, and begin performing my next Update while the previous Draw completes, without needing to add any locking or complex synchronization.
It’s not all great, though. The biggest downsides to this approach are twofold:
First, I basically have to reimplement everything from scratch – SpriteBatch and SpriteFont are both completely impossible to extend, so I have to reimplement them in order to render text and bitmaps with this approach, and the same goes for any other rendering code based on directly manipulating a GraphicsDevice.
Second, this approach is inherently more dependent on the garbage collector, since most of the types must be classes by necessity. If I want to run well on the 360, this means I need to make use of pooling and other techniques to avoid frequent allocations during frames. I’m also no longer able to reuse a single scratch buffer when generating geometry, so every piece of geometry I render needs to have its own buffer – more allocations.
So far I’m pretty pleased with this approach. There’s more work to be done – for example, I don’t have pooling implemented so my game collects extremely often on the 360 – but a large portion of my game is now running on this new rendering architecture, and my average framerate has already improved slightly from moving work off the main thread, despite the fact that I haven’t spent any time on optimizations.
The biggest challenge that remains is porting all of my old GraphicsDevice-oriented rendering code over to using batches. Here’s a before and after example:
public void RenderTileLayer (int index) {
var layer = RuntimeLevel.Layers[index];
RuntimeLayer.ItemInfo itemInfo = null;
BeginSpriteBatch(BlendModes.AlphaPremultiplied);
using (var e = layer.GetItemsFromBounds(Camera.Bounds))
while (e.GetNext(out itemInfo)) {
RenderTile(itemInfo.Item, SpriteBatch, Camera.ViewportPosition, Camera.Zoom, AnimationTimeProvider.Ticks);
}
}
private void RenderTile (RuntimeTile tile, SpriteBatch spriteBatch, Vector2 viewportPosition, float zoom, long time) {
var info = tile.TileInfo;
var pos = (tile.Bounds.TopLeft - viewportPosition) * zoom;
spriteBatch.Draw(info.Texture, pos, info.Rectangle, Colors.White, 0.0f, new Vector2(0, 0), zoom, strip.GetSpriteEffect(), 0.0f);
}
You can see here that my approach for rendering tiles is pretty simple: Get all the tiles within the screen’s boundaries, and render them one by one using a SpriteBatch. I’m able to reduce the number of state changes since I know that every tile within a given layer shares the same state, but I still need to change state once per layer (using the BeginSpriteBatch function, which calls SpriteBatch.Begin and sets up my render state). I also have to go out of my way to render each layer in the right order, which means calling RenderLayer in multiple places so that certain tiles appear below entities while other tiles appear above entities.
The new implementation looks like this:
public void RenderTileLayer (int index) {
var layer = RuntimeLevel.Layers[index];
RuntimeLayer.ItemInfo itemInfo = null;
int drawLayer = (index <= 2) ? DrawLayers.Background : DrawLayers.Foreground;
using (var bitmapBatch = new BitmapBatch(PendingFrame, drawLayer + index, Materials.Bitmap[BlendModes.AlphaPremultiplied]))
using (var e = layer.GetItemsFromBounds(Camera.Bounds))
while (e.GetNext(out itemInfo)) {
RenderTile(itemInfo.Item, bitmapBatch);
}
}
public void RenderTile (RuntimeTile tile, Render.BitmapBatch batch) {
var info = tile.TileInfo;
var drawCall = new Render.BitmapDrawCall(info.Texture, tile.Bounds.TopLeft, info.Bounds);
drawCall.Mirror(info.Strip.Mirroring.X, info.Strip.Mirroring.Y);
batch.Add(drawCall);
}
One thing you’ll notice is that RenderTile now has considerably fewer arguments. Since I had to reimplement SpriteBatch from scratch, this gave me the opportunity to integrate some of my rendering calculations, like zooming and viewport positioning, directly into the shader. As a result, I don’t need to pass those values around as parameters anymore; I simply set them as a shader parameter every time they change. Using an EffectPool for all my shaders also means that I only need to set those parameters once, instead of having to update all my individual shaders with the correct values.
One other difference here is the need to explicitly choose a layer for the tiles to render on when I create a BitmapBatch. I have a small set of constants (named DrawLayers) that I use to roughly organize my layers, but I also add values to those constants so that I can organize batches within those layers. That way, I can be certain that if I draw three sets of tiles on the same layer, they are always drawn in the same order relative to each other.
I also have to explicitly pass in a Material object for the BitmapBatch, instead of just passing a blend mode to the BeginSpriteBatch function. This isn’t a significant change, but it does mean that I now have to explicitly manage my materials – as a result, my game now has a LoadMaterials function that runs at startup and creates all the various permutations of parameters I need and stores them so that I can grab them at runtime.
You may also notice that there are no explicit drawing operations in here anywhere; I’m just creating a BitmapBatch and adding draw calls to it. Essentially, what’s going on here is that creating a batch automatically attaches it to the Frame being built. When the batch is Disposed (by the using block, in this case), the Frame is notified that the contents of the Batch (its draw calls) are ready and it stores them for later. The use of IDisposable to represent ‘readiness’ instead of disposal is a little weird, but it’s convenient in that it gives me relatively automatic batch management. This also means that if I wanted to, I could create lots of batches at once and fill them with draw calls on multiple threads, since the Frame has a straightforward way to determine whether all of its batches are ready yet.
Below you can see a short video of how the renderer groups the game up into batches. If you look carefully, you may also notice that particles are being rendered behind all the batches, since they aren’t yet integrated into the renderer.


Pingback: The Week in Code (VIII) « Sgt. Conker
Pingback: A Render/SamplerState Stack and Manager for XNA | The Instruction Limit