<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>luminance &#187; performance</title>
	<atom:link href="http://www.luminance.org/tag/performance/feed" rel="self" type="application/rss+xml" />
	<link>http://www.luminance.org</link>
	<description>Programming and Game Development - Kevin Gadd's Personal Blog</description>
	<lastBuildDate>Thu, 29 Apr 2010 17:20:57 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Updating Onscreen Objects / Profiling</title>
		<link>http://www.luminance.org/gruedorf/2009/08/20/updating-onscreen-objects-profiling</link>
		<comments>http://www.luminance.org/gruedorf/2009/08/20/updating-onscreen-objects-profiling#comments</comments>
		<pubDate>Fri, 21 Aug 2009 04:24:39 +0000</pubDate>
		<dc:creator>Kael</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Gruedorf]]></category>
		<category><![CDATA[360]]></category>
		<category><![CDATA[csharp]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[profiler]]></category>
		<category><![CDATA[xna]]></category>

		<guid isPermaLink="false">http://www.luminance.org/?p=742</guid>
		<description><![CDATA[One of the problems I started to run into while polishing things up for my contest entry builds was that as my levels grew larger, the game&#8217;s CPU utilization on the 360 steadily grew with them. While PC builds of my game ran smooth on basically all the machines I had access to, on the [...]]]></description>
			<content:encoded><![CDATA[<p>One of the problems I started to run into while polishing things up for my contest entry builds was that as my levels grew larger, the game&#8217;s CPU utilization on the 360 steadily grew with them. While PC builds of my game ran smooth on basically all the machines I had access to, on the 360 the cost of updating all the level&#8217;s objects and entities became quite significant &#8211; most likely due to the 360&#8217;s feeble floating-point performance and lack of out-of-order execution.</p>
<p>The solution to this was, at least to me, relatively obvious: My levels were much larger than the camera, so it didn&#8217;t really make sense to update the entire level every frame.</p>
<p>The first thing I tried to verify this theory was simply hacking it in: Do a check before updating each object to see if it was onscreen. Interestingly, this didn&#8217;t make the game any faster on the 360. Depending on your point of view, this either confirmed or denied my hypothesis: If the problem was simply the cost of all the floating point operations, the cost of doing the onscreen check for each object (since the camera and object bounds were both expressed in floating-point) could have been making the problem worse. Clearly, I didn&#8217;t have enough data to be sure about the right choice to make.</p>
<p>So, I spent a day or so rigging up the necessary infrastructure to be able to profile my game on the 360. Since you can&#8217;t use tools like CLR Profiler or NProf on the 360, I ended up building a very simple frame timing system, and adding an overlay to the game that would show timing data. This let me get a good idea of how much time each subsystem in the game was using, and then I could compare the costs of individual subsystems, and try making changes and seeing how the profile data changed.</p>
<p>Once I had the profiler up and running on the 360, a clear pattern emerged.</p>
<p><a href="http://www.luminance.org/wp-content/uploads/2009/08/01.png"><img class="aligncenter size-full wp-image-745" title="01" src="http://www.luminance.org/wp-content/uploads/2009/08/01.png" alt="01" width="500" height="550" /></a></p>
<p>Updates were consuming a huge amount of CPU time on the 360. While on my desktop, updates basically accounted for no more than 1% of CPU time, on the 360 they actually accounted for more CPU time than rendering &#8211; this was actually a bit of a surprise to me since rendering was definitely the bottleneck at one point on the 360. It seems that at some point along the way, I solved my rendering performance issues on the 360, but didn&#8217;t notice because I had made updates so much more expensive &#8211; one mistake I plan not to repeat was that I went a week or two without testing the game on the 360, since my 360 was not hooked up at the time. During that span of time I made a lot of changes that drastically altered the game&#8217;s performance characteristics, so it was hard to tell what had caused things to degrade.</p>
<p><span id="more-742"></span></p>
<p>Now that I knew updates were expensive, I decided to try and narrow down which object types were the problem. I added profiling markers around specific types of objects, to try and figure out which ones cost the most CPU time to update. After doing that and deploying a few builds to the 360, I found one of my culprits: Water.</p>
<p><a href="http://www.luminance.org/wp-content/uploads/2009/08/04.png"></a><a href="http://www.luminance.org/wp-content/uploads/2009/08/05.png"><img class="aligncenter size-full wp-image-749" title="05" src="http://www.luminance.org/wp-content/uploads/2009/08/05.png" alt="05" width="500" height="550" /></a></p>
<p>While I had suspected that water might be a problem, I was still somewhat surprised by the results; A similar piece of in-game geometry, zones, used similar update code but turned out to have virtually zero update cost at runtime, while water cost a tremendous amount of CPU for very little gameplay impact. The problem turned out to be a subtle difference in how they were implemented.</p>
<p>Zones were inefficient in that every frame, each zone did an obstruction test to see if any entities were inside &#8211; in most cases a zone would be empty, so this was wasted effort. This would have been better implemented by having every entity check for nearby zones, since there are typically far less entities than there are zones. However, in practice this didn&#8217;t turn out to be the problem. The problem was that water built on this logic, and then used it to perform additional work: It did an obstruction test to determine how far the water should fall, and then did another obstruction test to locate any entities inside the water and apply &#8216;flow force&#8217; to them so that the flowing water would push them in a given direction. These two obstruction tests each ended up accounting for a significant portion of the time spent updating the level.</p>
<p>To solve this, I took two steps. First, I reworked zones and water so that they both operated the way I described &#8211; each entity does a check to locate all the zones it&#8217;s within, in a single obstruction test. This reduced the number of obstruction tests I was running every frame by a large amount, and helped reduce CPU usage for water. However, that still wasn&#8217;t enough.</p>
<p>The biggest improvement came from reworking things so that only onscreen objects get updated every frame. Instead of performing a test against every object to see if it&#8217;s onscreen, I decided to build on the partitioning scheme I use for rendering to get a list of onscreen objects, and use that as my list of objects to update. Once I had that working, my performance improved drastically and the game easily ran at 60fps in every part of my levels with CPU to spare.</p>
<p>Of course, doing this introduced bugs. One of the biggest issues is that often you will have important objects just outside the edge of the screen that need to keep updating, like moving platforms or enemies. To solve this, I added two mechanisms:</p>
<p>First, I added a margin around the screen within which objects still continued to update. This meant that an object just barely offscreen would keep updating, and solved the problem of a moving platform or enemy getting left behind as soon as he dropped offscreen.</p>
<p>Second, I built a simple &#8216;update manager&#8217; that maintains a list of all the objects that are currently onscreen. When an object leaves the screen, instead of removing it immediately, the update manager instead sets a timeout, which causes the object to become &#8216;asleep&#8217; within a certain number of frames. As a result, once an object leaves the screen it has a second or two to return to the screen before it falls asleep, which helps with things like moving platforms that are going to move on and off the screen regularly &#8211; since they don&#8217;t spend very long offscreen, they never have a chance to fall asleep.</p>
<p>The update manager also gives me the ability to exclude some objects from updates entirely, by flagging them as &#8216;unable to wake&#8217;, so the update manager knows to never remove them from sleep status. Likewise, it gives me the option to flag objects as &#8216;unable to sleep&#8217; so that they always stay active &#8211; for example, I do this to the player character and his companion to avoid any unintentional bugs that might result from the player remaining offscreen too long (say, during a cinematic).</p>
<p>The update manager has some other benefits, too. Here&#8217;s what it looks like:</p>
<pre>    public class UpdateManager&lt;T&gt;
        where T : class, IUpdateable, IHasBounds {

        public struct Entry {
            public readonly T Object;
            public int WakeCounter;

            public Entry (T obj, int wakeCounter) {
                Object = obj;
                WakeCounter = wakeCounter;
            }
        }

        protected Dictionary&lt;T, bool&gt; _VisibleObjects = new Dictionary&lt;T, bool&gt;(new ReferenceComparer&lt;T&gt;());
        protected UnorderedList&lt;Entry&gt; _Entries = new UnorderedList&lt;Entry&gt;();

        public int SleepTimeout = 30;
        public readonly SpatialCollection&lt;T&gt; Collection;

        public UpdateManager (SpatialCollection&lt;T&gt; collection) {
            Collection = collection;
        }

        public void Update (Bounds liveRegion) {
            using (var e = Collection.GetItemsFromBounds(liveRegion, true))
            while (e.MoveNext()) {
                _VisibleObjects[e.Current.Item] = false;
            }

            Entry currentItem;

            using (var e = _Entries.GetEnumerator())
            while (e.GetNext(out currentItem)) {
                if (!_VisibleObjects.Remove(currentItem.Object)) {
                    // Object not visible

                    if (!currentItem.Object.AllowSleep) {
                        // Object cannot fall asleep
                    } else if (--currentItem.WakeCounter == 0) {
                        // Object fell asleep
                        e.RemoveCurrent();
                        continue;
                    } else {
                        // Object still awake
                        e.SetCurrent(ref currentItem);
                    }
                } else {
                    // Object visible

                    currentItem.WakeCounter = SleepTimeout;
                    e.SetCurrent(ref currentItem);
                }

                currentItem.Object.Update();
            }

            foreach (var newObject in _VisibleObjects.Keys) {
                if (newObject.AllowWake) {
                    _Entries.Add(new Entry(newObject, SleepTimeout));

                    newObject.Update();
                }
            }

            _VisibleObjects.Clear();
        }
    }</pre>
<p>One of the nice things it does in addition to improving performance is that it simplifies my game code &#8211; previously, I had hand-written logic to step through the various object lists (geometry, entities, etc) and update them every frame, and that code differed in subtle ways. The update manager unifies all that, so it kills a lot of duplicated code. Also, by maintaining a unique &#8216;awake objects&#8217; list, it allows me to remove objects from the geometry/entity lists while performing an update, instead of having to wait until the end of the frame, which simplifies some of my entity code as well.</p>
<p>And despite the amount of problems it solves, it ends up actually being quite simple and easy to use. All I have to do to rig up an update manager is point it at a collection of objects and hand it the current camera boundaries every frame.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.luminance.org/gruedorf/2009/08/20/updating-onscreen-objects-profiling/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Threaded Renderer</title>
		<link>http://www.luminance.org/gruedorf/2009/07/20/threaded-renderer</link>
		<comments>http://www.luminance.org/gruedorf/2009/07/20/threaded-renderer#comments</comments>
		<pubDate>Tue, 21 Jul 2009 05:07:44 +0000</pubDate>
		<dc:creator>Kael</dc:creator>
				<category><![CDATA[Gruedorf]]></category>
		<category><![CDATA[concurrency]]></category>
		<category><![CDATA[csharp]]></category>
		<category><![CDATA[design]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[xna]]></category>

		<guid isPermaLink="false">http://www.luminance.org/?p=585</guid>
		<description><![CDATA[Lots of changes since last week, but the majority of my efforts were focused on overhauling my approach to rendering.

My basic approach is primarily inspired by Christer Ericson&#8217;s post about the approach he uses for ordering draw calls. Some other useful sources of information were Tom Forsyth&#8217;s post about the cost of renderstate changes and [...]]]></description>
			<content:encoded><![CDATA[<p>Lots of changes since last week, but the majority of my efforts were focused on overhauling my approach to rendering.</p>
<div class="video"><object width="640" height="385" data="http://www.youtube-nocookie.com/v/stiPgEW6AJI&amp;hl=en&amp;fs=1&amp;rel=0&amp;color1=0x2b405b&amp;color2=0x6b8ab6&amp;hd=1" type="application/x-shockwave-flash"><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://www.youtube-nocookie.com/v/stiPgEW6AJI&amp;hl=en&amp;fs=1&amp;rel=0&amp;color1=0x2b405b&amp;color2=0x6b8ab6&amp;hd=1" /><param name="allowfullscreen" value="true" /></object></div>
<p>My basic approach is primarily inspired by <a href="http://realtimecollisiondetection.net/blog/?p=86">Christer Ericson&#8217;s post about the approach he uses for ordering draw calls</a>. Some other useful sources of information were <a href="http://home.comcast.net/~tom_forsyth/blog.wiki.html#[[Renderstate%20change%20costs]]">Tom Forsyth&#8217;s post about the cost of renderstate changes</a> and <a href="http://www.guerrilla-games.com/publications/dr_kz2_rsx_dev07.pdf">Guerilla&#8217;s presentation on Killzone 2&#8217;s renderer from DEVELOP 2007</a>. The Killzone 2 presentation is especially worthwhile since it describes the way they take advantage of concurrency techniques in their renderer.</p>
<p>Until now, the rendering code in my game has been almost entirely immediate-mode; various objects in the game world like tiles and entities had Draw methods that would utilize the game&#8217;s GraphicsDevice and SpriteBatch to render themselves, changing render states as needed and generating dynamic geometry where necessary (like for water).</p>
<p><span id="more-585"></span></p>
<p>Of course, if every object changes render states, you quickly run into a situation where you&#8217;re changing states tens of thousands of times per frame, which doesn&#8217;t perform very well. As a result, I ended up having to add some basic support for grouping together identical render states. In the end, I often had to perform multiple passes over the scene in order to minimize state changes, and the addition of multiple onscreen &#8216;layers&#8217; meant that I often had to make half a dozen passes over the same collections (like Level.Entities and Level.Geometry) in order to render things in the correct order without too many state changes. My new renderer clearly needed to provide an inexpensive way to reduce state changes without complicating my code.</p>
<p>In addition to the issue of state changes, however, was the simple fact that rendering was too expensive. In particular, rendering all the onscreen tiles for the 6+ layers in the level could often add up to 5%+ of my CPU time, due to the relative inefficiency of the XNA SpriteBatch class when trying to draw thousands of bitmaps at various locations on the screen. The use of SpriteBatch also meant that I couldn&#8217;t effectively sort tiles by texture, because using a custom blending configuration with SpriteBatch requires you to disable its texture sorting support. (My tiles and sprites have to be premultiplied so they look correct when scaled, unfortunately.) As a result, my new renderer needed to minimize the cost of individual drawing operations, and it also needed to give me a way to utilize material/texture sorting with custom blending configurations.</p>
<p>One final detail that factored into the design was concurrency: Between the cost of actually rendering the game world, and the cost of waiting for vertical sync, the main thread was spending as much as 40% of its time performing rendering, which made it much harder to maintain a smooth, consistent 60fps framerate. In order to minimize the amount of time the main thread spent blocked during rendering, my new design needed to make it possible to move rendering work off the main thread without introducing thread safety issues &#8211; putting a lock around all my game state wasn&#8217;t going to cut it.</p>
<hr />The design I ended up with roughly looks like this:</p>
<p>Every frame, at the beginning of my Draw method I construct a new Render.Frame object to represent the frame that&#8217;s being rendered.</p>
<p>The Render.Frame object contains a list of Render.Batch objects; each Batch has an associated Layer and Material. Layers are simple integers that allow me to explicitly order batches, so that I can create batches in any given order, and even create multiple batches at once. This also allows me to perform a single sort of the Batches array before sending the batches off to the video card, in order to minimize state changes.</p>
<p>Any given Batch contains a list of structures representing &#8216;Draw Calls&#8217;. In some cases, a draw call maps directly to a hardware drawing operation &#8211; like DrawUserPrimitives &#8211; but in other cases, it maps to something more granular. For example, a BitmapBatch contains BitmapDrawCalls, where each draw call represents a single bitmap, much like the arguments you pass to SpriteBatch&#8217;s Draw method. This allows me to sort individual draw calls based on their parameters to minimize state changes within a batch.</p>
<p>Finally, the Material objects associated with a given Batch are a superset of the XNA&#8217;s Effect class &#8211; they include a VertexDeclaration, Effect, and optional delegates for configuring other rendering state like the current blending function or stencil state. Grouping all these parameters together in one object allows me to sort batches cheaply by comparing material instances, but still gives me some level of granularity since I can change the shader parameters of an active Effect within a batch, for example to support rendering multiple textures inside a single BitmapBatch.</p>
<p>In addition to the more obvious performance advantages of this approach, like the ability to sort by material, one other advantage is less obvious: Since a Frame contains information on all the drawing operations that need to be performed, but doesn&#8217;t depend on any of my game state, I can safely hand that object to another thread and have that thread perform the drawing operations. This lets me move a significant portion of my rendering off the main thread, and begin performing my next Update while the previous Draw completes, without needing to add any locking or complex synchronization.</p>
<hr />It&#8217;s not all great, though. The biggest downsides to this approach are twofold:</p>
<p>First, I basically have to reimplement everything from scratch &#8211; SpriteBatch and SpriteFont are both completely impossible to extend, so I have to reimplement them in order to render text and bitmaps with this approach, and the same goes for any other rendering code based on directly manipulating a GraphicsDevice.</p>
<p>Second, this approach is inherently more dependent on the garbage collector, since most of the types must be classes by necessity. If I want to run well on the 360, this means I need to make use of pooling and other techniques to avoid frequent allocations during frames. I&#8217;m also no longer able to reuse a single scratch buffer when generating geometry, so every piece of geometry I render needs to have its own buffer &#8211; more allocations.</p>
<p>So far I&#8217;m pretty pleased with this approach. There&#8217;s more work to be done &#8211; for example, I don&#8217;t have pooling implemented so my game collects extremely often on the 360 &#8211; but a large portion of my game is now running on this new rendering architecture, and my average framerate has already improved slightly from moving work off the main thread, despite the fact that I haven&#8217;t spent any time on optimizations.</p>
<hr />The biggest challenge that remains is porting all of my old GraphicsDevice-oriented rendering code over to using batches. Here&#8217;s a before and after example:</p>
<pre>        public void RenderTileLayer (int index) {
            var layer = RuntimeLevel.Layers[index];
            RuntimeLayer.ItemInfo itemInfo = null;

            BeginSpriteBatch(BlendModes.AlphaPremultiplied);

            using (var e = layer.GetItemsFromBounds(Camera.Bounds))
            while (e.GetNext(out itemInfo)) {
                RenderTile(itemInfo.Item, SpriteBatch, Camera.ViewportPosition, Camera.Zoom, AnimationTimeProvider.Ticks);
            }
        }

        private void RenderTile (RuntimeTile tile, SpriteBatch spriteBatch, Vector2 viewportPosition, float zoom, long time) {
            var info = tile.TileInfo;
            var pos = (tile.Bounds.TopLeft - viewportPosition) * zoom;
            spriteBatch.Draw(info.Texture, pos, info.Rectangle, Colors.White, 0.0f, new Vector2(0, 0), zoom, strip.GetSpriteEffect(), 0.0f);
        }</pre>
<p>You can see here that my approach for rendering tiles is pretty simple: Get all the tiles within the screen&#8217;s boundaries, and render them one by one using a SpriteBatch. I&#8217;m able to reduce the number of state changes since I know that every tile within a given layer shares the same state, but I still need to change state once per layer (using the BeginSpriteBatch function, which calls SpriteBatch.Begin and sets up my render state). I also have to go out of my way to render each layer in the right order, which means calling RenderLayer in multiple places so that certain tiles appear below entities while other tiles appear above entities.</p>
<p>The new implementation looks like this:</p>
<pre>        public void RenderTileLayer (int index) {
            var layer = RuntimeLevel.Layers[index];
            RuntimeLayer.ItemInfo itemInfo = null;

            int drawLayer = (index &lt;= 2) ? DrawLayers.Background : DrawLayers.Foreground;

            using (var bitmapBatch = new BitmapBatch(PendingFrame, drawLayer + index, Materials.Bitmap[BlendModes.AlphaPremultiplied]))
            using (var e = layer.GetItemsFromBounds(Camera.Bounds))
            while (e.GetNext(out itemInfo)) {
                RenderTile(itemInfo.Item, bitmapBatch);
            }
        }

        public void RenderTile (RuntimeTile tile, Render.BitmapBatch batch) {
            var info = tile.TileInfo;

            var drawCall = new Render.BitmapDrawCall(info.Texture, tile.Bounds.TopLeft, info.Bounds);
            drawCall.Mirror(info.Strip.Mirroring.X, info.Strip.Mirroring.Y);

            batch.Add(drawCall);
        }</pre>
<p>One thing you&#8217;ll notice is that RenderTile now has considerably fewer arguments. Since I had to reimplement SpriteBatch from scratch, this gave me the opportunity to integrate some of my rendering calculations, like zooming and viewport positioning, directly into the shader. As a result, I don&#8217;t need to pass those values around as parameters anymore; I simply set them as a shader parameter every time they change. Using an EffectPool for all my shaders also means that I only need to set those parameters once, instead of having to update all my individual shaders with the correct values.</p>
<p>One other difference here is the need to explicitly choose a layer for the tiles to render on when I create a BitmapBatch. I have a small set of constants (named DrawLayers) that I use to roughly organize my layers, but I also add values to those constants so that I can organize batches within those layers. That way, I can be certain that if I draw three sets of tiles on the same layer, they are always drawn in the same order relative to each other.</p>
<p>I also have to explicitly pass in a Material object for the BitmapBatch, instead of just passing a blend mode to the BeginSpriteBatch function. This isn&#8217;t a significant change, but it does mean that I now have to explicitly manage my materials &#8211; as a result, my game now has a LoadMaterials function that runs at startup and creates all the various permutations of parameters I need and stores them so that I can grab them at runtime.</p>
<p>You may also notice that there are no explicit drawing operations in here anywhere; I&#8217;m just creating a BitmapBatch and adding draw calls to it. Essentially, what&#8217;s going on here is that creating a batch automatically attaches it to the Frame being built. When the batch is Disposed (by the using block, in this case), the Frame is notified that the contents of the Batch (its draw calls) are ready and it stores them for later. The use of IDisposable to represent &#8216;readiness&#8217; instead of disposal is a little weird, but it&#8217;s convenient in that it gives me relatively automatic batch management. This also means that if I wanted to, I could create lots of batches at once and fill them with draw calls on multiple threads, since the Frame has a straightforward way to determine whether all of its batches are ready yet.</p>
<hr />Below you can see a short video of how the renderer groups the game up into batches. If you look carefully, you may also notice that particles are being rendered behind all the batches, since they aren&#8217;t yet integrated into the renderer.</p>
<div class="video"><object width="640" height="385" data="http://www.youtube-nocookie.com/v/NtlDmjVpIXg&amp;hl=en&amp;fs=1&amp;rel=0&amp;color1=0x2b405b&amp;color2=0x6b8ab6&amp;hd=1" type="application/x-shockwave-flash"><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://www.youtube-nocookie.com/v/NtlDmjVpIXg&amp;hl=en&amp;fs=1&amp;rel=0&amp;color1=0x2b405b&amp;color2=0x6b8ab6&amp;hd=1" /><param name="allowfullscreen" value="true" /></object></div>
]]></content:encoded>
			<wfw:commentRss>http://www.luminance.org/gruedorf/2009/07/20/threaded-renderer/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Cutting and tuning</title>
		<link>http://www.luminance.org/gruedorf/2009/06/12/cutting-and-tuning</link>
		<comments>http://www.luminance.org/gruedorf/2009/06/12/cutting-and-tuning#comments</comments>
		<pubDate>Sat, 13 Jun 2009 04:53:28 +0000</pubDate>
		<dc:creator>Kael</dc:creator>
				<category><![CDATA[Gruedorf]]></category>
		<category><![CDATA[360]]></category>
		<category><![CDATA[csharp]]></category>
		<category><![CDATA[cuts]]></category>
		<category><![CDATA[design]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[xna]]></category>

		<guid isPermaLink="false">http://www.luminance.org/?p=446</guid>
		<description><![CDATA[As far as gameplay goes, the only major addition since last week&#8217;s post was a relatively complete implementation of player death, along with the &#8216;reunion&#8217; teleport that goes with it. Fairly simple at present, with some bugs to work out (including one related to the teleport location that you&#8217;ll see in the video below). Definitely [...]]]></description>
			<content:encoded><![CDATA[<p>As far as gameplay goes, the only major addition since last week&#8217;s post was a relatively complete implementation of player death, along with the &#8216;reunion&#8217; teleport that goes with it. Fairly simple at present, with some bugs to work out (including one related to the teleport location that you&#8217;ll see in the video below). Definitely helps get a better feeling for whether a given puzzle is too hard or too easy.</p>
<p><object width="640" height="385" data="http://www.youtube-nocookie.com/v/FXb9E23WxpQ&amp;hl=en&amp;fs=1&amp;rel=0&amp;color1=0x2b405b&amp;color2=0x6b8ab6&amp;hd=1" type="application/x-shockwave-flash"><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://www.youtube-nocookie.com/v/FXb9E23WxpQ&amp;hl=en&amp;fs=1&amp;rel=0&amp;color1=0x2b405b&amp;color2=0x6b8ab6&amp;hd=1" /><param name="allowfullscreen" value="true" /></object></p>
<p>Other than that, the only thing of note code-wise is the time I spent doing some performance tuning. My framerate had crept down over the past month or so, to the point that I wasn&#8217;t able to maintain a stable 60fps on the 360 anymore and my CPU utilization was approaching 30% on my desktop PC. Some of the optimizations were relatively obvious &#8211; for example, I was calling SpriteBatch.Begin/SpriteBatch.End for every on-screen object, for the sake of simplicity, which resulted in a lot of unnecessary draw calls.</p>
<p>Some simple changes to automatically begin/end batches when changing rendering settings reduced the number of draw calls per frame in most cases to around 30, which is perfectly acceptable, and reduced my CPU utilization by around 50%. After that, the rest of the optimization was pretty trivial &#8211; finding other hotspots in my profiler data and reducing the cost.</p>
<p>Well, it was trivial until I ran the game on the 360 again and noticed that the framerate hadn&#8217;t improved very much. Huh? I doubled my framerate on the PC, but on the 360, it barely moved an inch. What&#8217;s the deal?</p>
<p>Turns out, the 360&#8217;s pitiful floating-point performance was kneecapping me. Believe it or not, the primary culprit was the geometric shapes in the game&#8217;s HUD, for the circular health displays you may have seen in previous screenshots/videos. I knew the 360&#8217;s FPU was weak, but not THAT bad. Unfortunately, the only way to detect this is by manually measuring the performance cost of your code on the 360, by commenting out/toggling individual sections of your game code.</p>
<p>Tedious, at best. For now, I ended up just reducing the complexity of the geometry for the HUD elements and reducing the number of shapes I was drawing, which brought the 360 framerate much closer to what it used to be. As it happens, some design changes later in the week helped here too&#8230;</p>
<hr />The majority of my work ended up focusing on the game&#8217;s design: My schedule for this project is extremely aggressive (insanely so, really) and as such, I need to have a relatively complete playable demo within mere months for submission to a couple major game competitions. Being able to hit that deadline in my free time requires me to aggressively control the scope of the project, avoid feature creep, and do as little work as possible to get game mechanics implemented and content built.&nbsp;<br />
&nbsp;<br />
Towards this end, after getting one of my main mechanics prototyped and testing it out in content I&#8217;d built, I made the hard decision to cut the mechanic. The second controllable character you&#8217;ve seen in some of my previous videos is effectively gone, though I&#8217;m going to attempt to make use of the design and code effort for the revised design.&nbsp;<br />
&nbsp;<br />
Making a choice like this is always painful, especially when you don&#8217;t have an unchangeable deadline or overbearing boss pushing you towards it. But ultimately, I think I&#8217;ll benefit from making these cuts sooner rather than later. I wish I had started thinking hard about it a few weeks earlier, when my first prototypes were working, instead of waiting until the issues were obvious, but I&#8217;m still relatively happy with the turnaround. I was able to prototype a relatively unusual game mechanic in a matter of a couple dozen hours of programming time, and decide that it wasn&#8217;t worth pursuing, and cut it. Definitely an improvement over traditional Waterfall with long cycles, but not quite true Agile yet. <img src='http://www.luminance.org/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> &nbsp;<br />
&nbsp;<br />
For me, this underscores the importance of aggressive, early prototyping of almost every possible game mechanic and design, instead of focusing on a single section of game content or gameplay until it&#8217;s done. Previously I was used to having a rigid focus on a section of a game or application, working on it day in and day out until it was done and ready to hand off &#8211; but in many cases, this meant that I could sink days or weeks of my time into something that ultimately had to be thrown out.&nbsp;<br />
&nbsp;<br />
Just like a lot of Agile proponents will tell you, it turns out that failing fast means wasting less time. The chaotic feeling and loss of productivity to context switches can be painful, and I think being successful requires setting things up to avoid having to pay those costs too many times a week, but ultimately, it&#8217;s a great decision.&nbsp;<br />
&nbsp;<br />
<object width="640" height="385"><param name="movie" value="http://www.youtube-nocookie.com/v/AujBCa_R4tM&#038;hl=en&#038;fs=1&#038;rel=0&#038;color1=0x2b405b&#038;color2=0x6b8ab6&#038;hd=1"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube-nocookie.com/v/AujBCa_R4tM&#038;hl=en&#038;fs=1&#038;rel=0&#038;color1=0x2b405b&#038;color2=0x6b8ab6&#038;hd=1" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="640" height="385"></embed></object></p>
]]></content:encoded>
			<wfw:commentRss>http://www.luminance.org/gruedorf/2009/06/12/cutting-and-tuning/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>More entities and optimization</title>
		<link>http://www.luminance.org/gruedorf/2009/01/30/more-entities-and-optimization</link>
		<comments>http://www.luminance.org/gruedorf/2009/01/30/more-entities-and-optimization#comments</comments>
		<pubDate>Sat, 31 Jan 2009 06:43:39 +0000</pubDate>
		<dc:creator>Kael</dc:creator>
				<category><![CDATA[Gruedorf]]></category>
		<category><![CDATA[collision]]></category>
		<category><![CDATA[csharp]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[platformer]]></category>
		<category><![CDATA[xna]]></category>

		<guid isPermaLink="false">http://www.luminance.org/?p=233</guid>
		<description><![CDATA[Most of my work on the platformer this week has been focused on improving the Entity class and all the related code. I made a number of changes and improvements to SpatialCollection aimed at improving its performance, and completely overhauled all my collision detection code to reduce the number of passes I make over the [...]]]></description>
			<content:encoded><![CDATA[<p>Most of my work on the platformer this week has been focused on improving the <strong>Entity</strong> class and all the related code. I made a number of changes and improvements to <strong>SpatialCollection</strong> aimed at improving its performance, and completely overhauled all my collision detection code to reduce the number of passes I make over the level when doing things like computing an entity&#8217;s StandingY or testing for a collision. The efforts have mostly paid off so far; my framerate with 6 entities walking around on curved surfaces right now is about the same as it was before with a single entity walking around on sloped surfaces, even though I&#8217;ve improved my collision model significantly. The optimizations ended up bringing my game&#8217;s performance profile back to where it was before the addition of entities, with the majority of CPU time being spent doing raw number-crunching to perform intersection checks on polygons.</p>
<p style="text-align: center;"><a href="http://www.luminance.org/wp-content/uploads/2009/01/profile2.png"><img class="size-full wp-image-236 aligncenter" title="profile2" src="http://www.luminance.org/wp-content/uploads/2009/01/profile2.png" alt="" width="577" height="308" /></a></p>
<p>My previous approach to collision detection was fairly brute-force: I had a few special methods defined in the <strong>Game</strong> class named things like <strong>FindObstruction </strong>and <strong>ResolveMotion</strong>, that took a bunch of parameters to control a collision detection sweep of the entire level. The addition of SpatialCollection&lt;T&gt; as mentioned in last week&#8217;s post made those functions faster, but that didn&#8217;t completely eliminate the performance issues &#8211; the most visible offender was <strong>ComputeStandingY</strong>. The original implementation of <strong>ComputeStandingY</strong> had to make four complete passes over the level to come up with a result, and I ended up invoking it twice per frame when updating an entity. The end result was that no matter what, it accounted for a significant amount of my CPU usage, no matter how efficient I made the underlying collision detection routine.</p>
<p>In order to address this problem, I set out to refactor my collision detection code. The first step was to unify the different collision test routines &#8211; they all shared many common elements, the most obvious one being the task of walking over the contents of the level. Using SpatialCollection meant that each routine had about 8 lines of boilerplate in order to do iteration, in addition to all of the code for actually performing collision tests, so my first step was to factor that out into a baseline <strong>ObstructionTest</strong> function that performed the simple task of walking over the contents of the level and invoking a &#8216;Collision Visitor&#8217; delegate on each item.</p>
<pre>    public ICollidable ObstructionTest (ICollisionVisitor visitor, ObstructionFlags flags) {
        ICollidable result = null;
        var bounds = visitor.GetBounds();

        if ((flags &amp; ObstructionFlags.Geometry) == ObstructionFlags.Geometry) {
            using (var e = Level.Geometry.GetItemsFromBounds(visitor.GetBounds()))
            while (e.MoveNext()) {
                var current = e.Current;

                if (!Bounds.Intersect(ref bounds, ref current.Bounds))
                    continue;

                var rg = current.Item.GetRuntimeGeometry(this);

                if (visitor.Visit(rg, ref result))
                    return result;
            }
        }

        if ((flags &amp; ObstructionFlags.Entities) == ObstructionFlags.Entities) {
            if (bounds.Intersects(Player.Bounds))
                if (visitor.Visit(Player, ref result))
                    return result;

            using (var e = Entities.GetItemsFromBounds(bounds))
            while (e.MoveNext()) {
                var current = e.Current;

                if (!Bounds.Intersect(ref bounds, ref current.Bounds))
                    continue;

                if (visitor.Visit(current.Item, ref result))
                    return result;
            }
        }

        return result;
    }</pre>
<p>After that, I refactored all the existing routines to be built on top of ObstructionTest. After that I iteratively refined things down until I didn&#8217;t need any of the specialized routines anymore, and <strong>ObstructionTest </strong>ended up operating on specialized <strong>ICollisionVisitor</strong> objects that had two &#8216;Visit&#8217; methods, one for &#8216;Collidable Objects&#8217;, and one for individual collision polygons. The pair of visit methods allowed a visitor to reject entire objects before any complex collision tests were performed, in addition to rejecting individual polygons due to non-collision. The use of visitor objects also meant that a visitor could record a list of the objects it encountered, or reject collision with an entire list of entities instead of just the entity that was performing the check. These improvements not only resulted in better performance, but they allowed me to address a bug in the <strong>Entity</strong> class that prevented entities from walking up sloped surfaces while the player was standing on them.</p>
<pre>    public class StandingSurfaceVisitor : ICollisionVisitor {
        public Entity Entity;
        public Bounds Bounds;
        public List&lt;IStandable&gt; Results;

        public StandingSurfaceVisitor (Entity entity) {
            Entity = entity;
            Results = new List&lt;IStandable&gt;();
        }

        public Bounds GetBounds () {
            return Bounds;
        }

        public CollisionState VisitCollidable (ICollidable obj) {
            if (obj == Entity)
                return new CollisionState(false, false);

            var standable = (obj as IStandable);
            if (standable != null)
                Results.Add(standable);
            return new CollisionState(false, true);
        }

        public void Reset () {
            Results.Clear();
        }

        public CollisionState VisitPolygon (Polygon poly) {
            throw new NotImplementedException();
        }
    }</pre>
<p>One of the other things I invested some time into was an overhaul of the logic for standing on surfaces. Previously, the <strong>Player</strong> class had a <strong>StandingOn</strong> property that held one of the entities/surfaces he was currently standing on. I used this to determine whether the player was standing on a pressure plate, and cause him to move with things he was standing on. The problem was that in almost all cases the player would be standing on multiple surfaces, so the nondeterministic nature of the collision detection code meant that sometimes you could be atop a pressure plate but not trigger it. I also had no easy way of determining whether entities were standing on top of each other, which meant that an entity being ridden by another entity could not walk up a sloped surface (his path would be blocked by his rider).</p>
<p>To solve those two problems, I defined a pair of simple interfaces called <strong>IRider</strong> and <strong>IRideable</strong>. These interfaces provided a simple way for me to manage all the logic around riding surfaces &#8211; entities could automatically build and update a list of rideable objects that they were currently occupying, and rideable objects were able to automatically maintain a list of rider objects on top of them, making it simple to exclude those riders from collision checks. This not only fixed pressure plates, but also made it easy for monsters to set off pressure plates, and meant that monsters could walk up sloped surfaces while being ridden by other entities, and that you could stack entities on top of each other to create big riding chains without any major issues (though in the video below, it was still unfinished and somewhat busted):</p>
<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="640" height="385" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://www.youtube.com/v/isRTK0TIYGs&amp;hl=en&amp;fs=1&amp;rel=0&amp;color1=0x2b405b&amp;color2=0x6b8ab6&amp;ap=%2526fmt%3D18" /><embed type="application/x-shockwave-flash" width="640" height="385" src="http://www.youtube.com/v/isRTK0TIYGs&amp;hl=en&amp;fs=1&amp;rel=0&amp;color1=0x2b405b&amp;color2=0x6b8ab6&amp;ap=%2526fmt%3D18" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<p>One of the other fixes I was able to make as a result of the new system for handling rideable surfaces was to the code for mantling up onto surfaces. Previously, if you tried to mantle up onto a moving surface, like a moving platform or an entity, you&#8217;d mantle up onto the space it previously occupied and typically fall right back down, making it pretty useless. With the new riding system I was easily able to adapt this so that you could mantle up onto a moving surface without being left behind by its motion or causing it to become obstructed. Since the mantling code doesn&#8217;t currently distinguish between different types of surfaces, this means you can basically crawl up on top of a crowd of enemies and run across it, which is pretty fun.</p>
<p>The video below shows how entity mantling looks, though it&#8217;s from before I implemented the riding system (so you can see it get broken in a couple places):</p>
<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="640" height="385" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://www.youtube.com/v/bVDWEiDRC6Y&amp;hl=en&amp;fs=1&amp;rel=0&amp;color1=0x2b405b&amp;color2=0x6b8ab6&amp;ap=%2526fmt%3D18" /><embed type="application/x-shockwave-flash" width="640" height="385" src="http://www.youtube.com/v/bVDWEiDRC6Y&amp;hl=en&amp;fs=1&amp;rel=0&amp;color1=0x2b405b&amp;color2=0x6b8ab6&amp;ap=%2526fmt%3D18" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
]]></content:encoded>
			<wfw:commentRss>http://www.luminance.org/gruedorf/2009/01/30/more-entities-and-optimization/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
