<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>luminance &#187; Gruedorf</title>
	<atom:link href="http://www.luminance.org/tag/gruedorf/feed" rel="self" type="application/rss+xml" />
	<link>http://www.luminance.org/blog</link>
	<description>Programming and Game Development - Kevin Gadd&#039;s Blog</description>
	<lastBuildDate>Sun, 02 Oct 2011 00:15:58 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Achievements and player data I</title>
		<link>http://www.luminance.org/blog/gruedorf/2009/08/28/achievements-and-player-data-i</link>
		<comments>http://www.luminance.org/blog/gruedorf/2009/08/28/achievements-and-player-data-i#comments</comments>
		<pubDate>Fri, 28 Aug 2009 09:35:37 +0000</pubDate>
		<dc:creator>Kael</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Gruedorf]]></category>
		<category><![CDATA[achievements]]></category>
		<category><![CDATA[csharp]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[JSON]]></category>
		<category><![CDATA[xna]]></category>

		<guid isPermaLink="false">http://www.luminance.org/?p=814</guid>
		<description><![CDATA[One of the things I&#8217;ve been working on lately is a way to collect and store data on play sessions, so I can track achievements for players and track where players spend the most time in a level, or where they die the most. There are lots of details to get right for a system [...]]]></description>
			<content:encoded><![CDATA[<p>One of the things I&#8217;ve been working on lately is a way to collect and store data on play sessions, so I can track achievements for players and track where players spend the most time in a level, or where they die the most. There are lots of details to get right for a system like this, but I&#8217;ve at least gotten a prototype working and started experimenting with ways to visualize the data.</p>
<p>For a system like this you have a few important pieces:</p>
<ul>
<li>Your game needs to collect data during play sessions &#8211; in my case, I track important events like player/creature death, and periodically sample the player&#8217;s position to get a general idea of where players are in a given level during their session.</li>
<li>You need a way for your game to periodically report collected data to a remote server. You have to handle lots of edge cases here &#8211; for example, it&#8217;s likely some people will play without an active internet connection or suffer temporary loss of connection, so you need to batch up events to report later in this case &#8211; you definitely don&#8217;t want the game to fall over and choke without access to the internet, and if possible you want to avoid losing data too.</li>
<li>You need to build a set of services to collect data from the game. This typically means you also need services to handle things like uniquely identifying individual play sessions and computers, so that you can track achievements for individual players and perform analysis on individual sessions instead of only on a player&#8217;s entire play history.</li>
<li>You need to build an in-game frontend, to expose collected data to a player. This means turning your event data into user-friendly statistics &#8211; like a kill counter, or an achievement for killing a particular boss. One interesting challenge here is that you probably want to expose this information online, too, so that players can share their profiles and achievements.</li>
</ul>
<p>In this post, I&#8217;m going to cover the first two.</p>
<p><span id="more-814"></span></p>
<h2>Collecting Data</h2>
<p>So, first, you need a way to collect important player data. For it to be useful, you need to collect it in as consistent a format as possible; you don&#8217;t want every piece of data to have a unique set of parameters attached to it, for example, because that would make it impossible to perform any generic analysis of your data. Likewise, you want to try and keep your data &#8216;denormalized&#8217;, by storing as much relevant data together as possible. If you achieve both of these goals, you end up with a stream of &#8216;events&#8217; that represent the things you care about in a form that lets you inspect them individually or analyze them in large groups.</p>
<p>In my case, this step was actually quite straightforward. I already had a simple &#8216;event bus&#8217; integrated into the game, that I was previously using to synchronize animation and sound effects with gameplay. Given this, it was simple to create a data collector that attaches to the event bus and listens for events that it wants to collect.</p>
<p>I did, however, have to make some changes &#8211; many of my events didn&#8217;t expose the information necessary to be useful; for example, the event representing a player death didn&#8217;t contain any information to tell you who (or what) killed the player, so I had to tweak the game code to attach that to the event. Likewise, many other events didn&#8217;t include position information, which made it hard to get meaningful information about them &#8211; knowing that the player grabbed onto a ledge isn&#8217;t particularly useful if you don&#8217;t know *where* or *which ledge*. In some cases, the data collector attached important parameters into the events so that game objects wouldn&#8217;t have to; for example any event coming from the Player automatically has PlayerX and PlayerY parameters attached to it (since the Player is the source of the event, it is simple to extract a position and attach it to the event).</p>
<p>One thing that required some special treatment as well was recording the player&#8217;s position over time &#8211; while attaching position information to events gives you a general idea of the player&#8217;s location, it&#8217;s not very useful for figuring out where players are getting stuck or confused. To address this, I built a simple class that periodically samples the player&#8217;s position and adds it to a list. Every time the list reaches a certain size, its contents are batched up into a single event &#8211; a &#8216;HistoryBatch&#8217; &#8211; and sent to the data collector. This allows me to get fairly high-resolution position data without having to report hundreds of individual events every minute (which would have contained lots and lots of redundant information).</p>
<h2>Reporting Data</h2>
<p>Once you have your player data, you need to report it to your server (and probably store some of it locally, too). I honestly haven&#8217;t tackled the latter part yet, but here&#8217;s how I handled the former:</p>
<p>To handle periodically reporting events, I created a simple &#8216;Event Reporter&#8217; class that collaborates with the data collector. Essentially, the event reporter maintains a queue of pending events, and runs a thread that is responsible for sending batches of events to the remote server once the queue gets large enough. It automatically waits a certain amount of time between requests, and attempts to batch events into groups instead of reporting them one at a time as they occur. This allows me to get relatively low latency (if I want it) but still remain relatively efficient in my use of the network. Whenever the data collector gets an event, it adds it to the event reporter&#8217;s queue.</p>
<p>Once I have a batch of events ready to send to the server, I pack them together into a class that&#8217;s laid out optimally for network transmission. I then use .NET 3.5&#8242;s built in JSON serializer to convert the batch into a blob of JSON data ready to send to my remote server. A typical blob looks like this:</p>
<pre>{
  "PlayerId":xxxx,
  "SessionId":yyyy,
  "Events":[
    {"Type":"ActiveControllerChanged","Time":0,"Data":{
      "IsKeyboard":true,
      "ConnectedGamepads":0,
      "ActiveController":0
    }},
    {"Type":"LevelLoaded","Time":5436,"Data":"troupe"},
    {"Type":"HistoryBlock","Time":7788,"Data":{
      "LevelName":"troupe",
      "PlayerX":[-136,-74,35,192,209,209],
      "PlayerY":[588,875,920,920,920,920]
    }}
  ]
}</pre>
<p>As you can see, events are small when possible &#8211; they simply store the event type and the timestamp at which the event occurred, along with their associated information. In cases where an event has lots of information attached, I send a dictionary, but in some cases I can just send a single value. In the case of a history block, I make sure to send the X and Y coordinates as individual lists, so that I don&#8217;t have to waste time encoding &#8216;PlayerX&#8217; and &#8216;PlayerY&#8217; along with every individual coordinate. A batch of events has a player ID and a session ID attached, which allows me to analyze individual play sessions and track achievements. (Note that both of these IDs are randomly generated on the server, so they&#8217;re anonymous.)</p>
<p>The reporter starts an HTTP POST to send the JSON to the server, and goes to sleep until it&#8217;s finished. If the send fails, the events are left on the queue and the thread sleeps for a while so it can retry sending them later. Once a send completes, the reporter checks to see if the queue is empty &#8211; if the queue is now empty, it goes to sleep until the queue contains items again. If the queue still contains items, it sleeps for a short while and prepares to send another batch &#8211; this allows the event reporter to spend the majority of its time asleep, but still handle large bursts of events without sending a huge blob of events at the server in one POST (since that would be very likely to fail or time out.)</p>
<p>One important detail is that the reporter needs to be able to correctly report events even if the player quits. In my case, the player can quit at any time, and I can end up with a very large number of events in the queue at the time. To deal with this, once the game has finished tearing down, the event reporter is given around 30 seconds to report any remaining events to the server. This usually allows events to get through, but prevents the game from hanging indefinitely on exit (in the event that there&#8217;s some sort of network failure, or the event queue is <strong>really</strong> full). Since this occurs after teardown, the game window is gone so the player doesn&#8217;t get the impression that the game has hung, and since the reporter spends most of its time sleeping, they won&#8217;t see any significant CPU usage either.</p>
<p>The reporter also runs in its own isolated thread and uses its own task scheduler, instead of sharing resources with the game. This increases the likelihood that the game will be able to successfully report events in the case of a bug or crash, and also means that if the reporter fails for some reason &#8211; probably a network outage &#8211; it won&#8217;t take the game down with it.</p>
<p>In my next post, I&#8217;ll cover the remaining two items and show you some of the things you can do once you&#8217;ve gathered player data. In the interim, you can <a href="http://inferus-data.luminance.org/player/view?id=1115">take a look at my player profile</a>!</p>
<hr />
]]></content:encoded>
			<wfw:commentRss>http://www.luminance.org/blog/gruedorf/2009/08/28/achievements-and-player-data-i/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Updating Onscreen Objects / Profiling</title>
		<link>http://www.luminance.org/blog/gruedorf/2009/08/20/updating-onscreen-objects-profiling</link>
		<comments>http://www.luminance.org/blog/gruedorf/2009/08/20/updating-onscreen-objects-profiling#comments</comments>
		<pubDate>Fri, 21 Aug 2009 04:24:39 +0000</pubDate>
		<dc:creator>Kael</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Gruedorf]]></category>
		<category><![CDATA[360]]></category>
		<category><![CDATA[csharp]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[profiler]]></category>
		<category><![CDATA[xna]]></category>

		<guid isPermaLink="false">http://www.luminance.org/?p=742</guid>
		<description><![CDATA[One of the problems I started to run into while polishing things up for my contest entry builds was that as my levels grew larger, the game&#8217;s CPU utilization on the 360 steadily grew with them. While PC builds of my game ran smooth on basically all the machines I had access to, on the [...]]]></description>
			<content:encoded><![CDATA[<p>One of the problems I started to run into while polishing things up for my contest entry builds was that as my levels grew larger, the game&#8217;s CPU utilization on the 360 steadily grew with them. While PC builds of my game ran smooth on basically all the machines I had access to, on the 360 the cost of updating all the level&#8217;s objects and entities became quite significant &#8211; most likely due to the 360&#8242;s feeble floating-point performance and lack of out-of-order execution.</p>
<p>The solution to this was, at least to me, relatively obvious: My levels were much larger than the camera, so it didn&#8217;t really make sense to update the entire level every frame.</p>
<p>The first thing I tried to verify this theory was simply hacking it in: Do a check before updating each object to see if it was onscreen. Interestingly, this didn&#8217;t make the game any faster on the 360. Depending on your point of view, this either confirmed or denied my hypothesis: If the problem was simply the cost of all the floating point operations, the cost of doing the onscreen check for each object (since the camera and object bounds were both expressed in floating-point) could have been making the problem worse. Clearly, I didn&#8217;t have enough data to be sure about the right choice to make.</p>
<p>So, I spent a day or so rigging up the necessary infrastructure to be able to profile my game on the 360. Since you can&#8217;t use tools like CLR Profiler or NProf on the 360, I ended up building a very simple frame timing system, and adding an overlay to the game that would show timing data. This let me get a good idea of how much time each subsystem in the game was using, and then I could compare the costs of individual subsystems, and try making changes and seeing how the profile data changed.</p>
<p>Once I had the profiler up and running on the 360, a clear pattern emerged.</p>
<p><a href="http://www.luminance.org/wp-content/uploads/2009/08/01.png"><img class="aligncenter size-full wp-image-745" title="01" src="http://www.luminance.org/wp-content/uploads/2009/08/01.png" alt="01" width="500" height="550" /></a></p>
<p>Updates were consuming a huge amount of CPU time on the 360. While on my desktop, updates basically accounted for no more than 1% of CPU time, on the 360 they actually accounted for more CPU time than rendering &#8211; this was actually a bit of a surprise to me since rendering was definitely the bottleneck at one point on the 360. It seems that at some point along the way, I solved my rendering performance issues on the 360, but didn&#8217;t notice because I had made updates so much more expensive &#8211; one mistake I plan not to repeat was that I went a week or two without testing the game on the 360, since my 360 was not hooked up at the time. During that span of time I made a lot of changes that drastically altered the game&#8217;s performance characteristics, so it was hard to tell what had caused things to degrade.</p>
<p><span id="more-742"></span></p>
<p>Now that I knew updates were expensive, I decided to try and narrow down which object types were the problem. I added profiling markers around specific types of objects, to try and figure out which ones cost the most CPU time to update. After doing that and deploying a few builds to the 360, I found one of my culprits: Water.</p>
<p><a href="http://www.luminance.org/wp-content/uploads/2009/08/04.png"></a><a href="http://www.luminance.org/wp-content/uploads/2009/08/05.png"><img class="aligncenter size-full wp-image-749" title="05" src="http://www.luminance.org/wp-content/uploads/2009/08/05.png" alt="05" width="500" height="550" /></a></p>
<p>While I had suspected that water might be a problem, I was still somewhat surprised by the results; A similar piece of in-game geometry, zones, used similar update code but turned out to have virtually zero update cost at runtime, while water cost a tremendous amount of CPU for very little gameplay impact. The problem turned out to be a subtle difference in how they were implemented.</p>
<p>Zones were inefficient in that every frame, each zone did an obstruction test to see if any entities were inside &#8211; in most cases a zone would be empty, so this was wasted effort. This would have been better implemented by having every entity check for nearby zones, since there are typically far less entities than there are zones. However, in practice this didn&#8217;t turn out to be the problem. The problem was that water built on this logic, and then used it to perform additional work: It did an obstruction test to determine how far the water should fall, and then did another obstruction test to locate any entities inside the water and apply &#8216;flow force&#8217; to them so that the flowing water would push them in a given direction. These two obstruction tests each ended up accounting for a significant portion of the time spent updating the level.</p>
<p>To solve this, I took two steps. First, I reworked zones and water so that they both operated the way I described &#8211; each entity does a check to locate all the zones it&#8217;s within, in a single obstruction test. This reduced the number of obstruction tests I was running every frame by a large amount, and helped reduce CPU usage for water. However, that still wasn&#8217;t enough.</p>
<p>The biggest improvement came from reworking things so that only onscreen objects get updated every frame. Instead of performing a test against every object to see if it&#8217;s onscreen, I decided to build on the partitioning scheme I use for rendering to get a list of onscreen objects, and use that as my list of objects to update. Once I had that working, my performance improved drastically and the game easily ran at 60fps in every part of my levels with CPU to spare.</p>
<p>Of course, doing this introduced bugs. One of the biggest issues is that often you will have important objects just outside the edge of the screen that need to keep updating, like moving platforms or enemies. To solve this, I added two mechanisms:</p>
<p>First, I added a margin around the screen within which objects still continued to update. This meant that an object just barely offscreen would keep updating, and solved the problem of a moving platform or enemy getting left behind as soon as he dropped offscreen.</p>
<p>Second, I built a simple &#8216;update manager&#8217; that maintains a list of all the objects that are currently onscreen. When an object leaves the screen, instead of removing it immediately, the update manager instead sets a timeout, which causes the object to become &#8216;asleep&#8217; within a certain number of frames. As a result, once an object leaves the screen it has a second or two to return to the screen before it falls asleep, which helps with things like moving platforms that are going to move on and off the screen regularly &#8211; since they don&#8217;t spend very long offscreen, they never have a chance to fall asleep.</p>
<p>The update manager also gives me the ability to exclude some objects from updates entirely, by flagging them as &#8216;unable to wake&#8217;, so the update manager knows to never remove them from sleep status. Likewise, it gives me the option to flag objects as &#8216;unable to sleep&#8217; so that they always stay active &#8211; for example, I do this to the player character and his companion to avoid any unintentional bugs that might result from the player remaining offscreen too long (say, during a cinematic).</p>
<p>The update manager has some other benefits, too. Here&#8217;s what it looks like:</p>
<pre>    public class UpdateManager&lt;T&gt;
        where T : class, IUpdateable, IHasBounds {

        public struct Entry {
            public readonly T Object;
            public int WakeCounter;

            public Entry (T obj, int wakeCounter) {
                Object = obj;
                WakeCounter = wakeCounter;
            }
        }

        protected Dictionary&lt;T, bool&gt; _VisibleObjects = new Dictionary&lt;T, bool&gt;(new ReferenceComparer&lt;T&gt;());
        protected UnorderedList&lt;Entry&gt; _Entries = new UnorderedList&lt;Entry&gt;();

        public int SleepTimeout = 30;
        public readonly SpatialCollection&lt;T&gt; Collection;

        public UpdateManager (SpatialCollection&lt;T&gt; collection) {
            Collection = collection;
        }

        public void Update (Bounds liveRegion) {
            using (var e = Collection.GetItemsFromBounds(liveRegion, true))
            while (e.MoveNext()) {
                _VisibleObjects[e.Current.Item] = false;
            }

            Entry currentItem;

            using (var e = _Entries.GetEnumerator())
            while (e.GetNext(out currentItem)) {
                if (!_VisibleObjects.Remove(currentItem.Object)) {
                    // Object not visible

                    if (!currentItem.Object.AllowSleep) {
                        // Object cannot fall asleep
                    } else if (--currentItem.WakeCounter == 0) {
                        // Object fell asleep
                        e.RemoveCurrent();
                        continue;
                    } else {
                        // Object still awake
                        e.SetCurrent(ref currentItem);
                    }
                } else {
                    // Object visible

                    currentItem.WakeCounter = SleepTimeout;
                    e.SetCurrent(ref currentItem);
                }

                currentItem.Object.Update();
            }

            foreach (var newObject in _VisibleObjects.Keys) {
                if (newObject.AllowWake) {
                    _Entries.Add(new Entry(newObject, SleepTimeout));

                    newObject.Update();
                }
            }

            _VisibleObjects.Clear();
        }
    }</pre>
<p>One of the nice things it does in addition to improving performance is that it simplifies my game code &#8211; previously, I had hand-written logic to step through the various object lists (geometry, entities, etc) and update them every frame, and that code differed in subtle ways. The update manager unifies all that, so it kills a lot of duplicated code. Also, by maintaining a unique &#8216;awake objects&#8217; list, it allows me to remove objects from the geometry/entity lists while performing an update, instead of having to wait until the end of the frame, which simplifies some of my entity code as well.</p>
<p>And despite the amount of problems it solves, it ends up actually being quite simple and easy to use. All I have to do to rig up an update manager is point it at a collection of objects and hand it the current camera boundaries every frame.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.luminance.org/blog/gruedorf/2009/08/20/updating-onscreen-objects-profiling/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Constant Binding</title>
		<link>http://www.luminance.org/blog/gruedorf/2009/08/13/constant-binding</link>
		<comments>http://www.luminance.org/blog/gruedorf/2009/08/13/constant-binding#comments</comments>
		<pubDate>Thu, 13 Aug 2009 10:31:28 +0000</pubDate>
		<dc:creator>Kael</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Gruedorf]]></category>
		<category><![CDATA[constants]]></category>
		<category><![CDATA[csharp]]></category>

		<guid isPermaLink="false">http://www.luminance.org/?p=729</guid>
		<description><![CDATA[One of the changes I made in the weeks leading up to my contest deadlines was to pull some of the player-specific combat logic, like that for attack chains, combos, and flinching, out into their own objects. Doing this let me apply that same combat logic to monsters and other entities in the game world, [...]]]></description>
			<content:encoded><![CDATA[<p>One of the changes I made in the weeks leading up to my contest deadlines was to pull some of the player-specific combat logic, like that for attack chains, combos, and flinching, out into their own objects. Doing this let me apply that same combat logic to monsters and other entities in the game world, which cut down on duplication considerably.</p>
<p>However, doing this made it clear that I had some architectural issues to tackle: All of these mechanics were heavily dependent on the <a href="http://www.luminance.org/gruedorf/2009/03/30/changing-constants-at-runtime">tunable constants</a> for the creature in question, which meant I couldn&#8217;t just pull methods and variables out of my entity classes into classes of their own.</p>
<p>To solve the problem of accessing an object&#8217;s constants, I came up with a solution based on reflection. I can define a helper object designed to handle an aspect of an entity&#8217;s mechanics &#8211; for example, a HealthPool object to manage the creature&#8217;s health, along with associated aspects like regeneration. The helper object can define instance variables for the constants it needs access to, like so:</p>
<pre>public class HealthPool {
    public Constant&lt;float&gt; HealthMax = null;
    public Constant&lt;float&gt; HealthPassiveRegen = null;
    public Constant&lt;float&gt; HealthRegenDelayTime = null;
    public Constant&lt;float&gt; HealthRegenRampTime = null;
    public Constant&lt;float&gt; FlinchThreshold = null;
    public Constant&lt;float&gt; FlinchThresholdDecay = null;

    public readonly RuntimeEntity Entity;
    public readonly ITimeProvider TimeProvider;
    public float Health = 0.0f;</pre>
<p>Note that these variables are the same name and type as the actual <a href="http://www.luminance.org/gruedorf/2009/03/30/changing-constants-at-runtime">tunable constants</a> &#8211; the difference is that instead of being static, they&#8217;re instance variables. Doing this allows me to pull a function out of an entity&#8217;s source code without needing to change the way it references particular constants, since the constants have the exact same names as before.</p>
<p>Of course, since these variables default to null, we need some way to fill them in with references to the actual tunable constants we want to use. To do that, we apply reflection:</p>
<pre>public static void BindConstants (object destination, params Type[] sourceTypes) {
    var genericConstant = typeof(Constant&lt;&gt;);
    var destinationType = destination.GetType();
    var destinationFields = new Dictionary&lt;string, FieldInfo&gt;();

    foreach (var field in
        destinationType.GetFields(BindingFlags.Public | BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.FlattenHierarchy)
    ) {
        var fieldName = field.Name;
        var fieldType = field.FieldType;

        if (!fieldType.IsGenericType || fieldType.GetGenericTypeDefinition() != genericConstant)
            continue;

        destinationFields[fieldName] = field;
    }

    foreach (var sourceType in sourceTypes) {
        foreach (var field in
            sourceType.GetFields(BindingFlags.Public | BindingFlags.NonPublic | BindingFlags.Static | BindingFlags.FlattenHierarchy)
        ) {
            var fieldName = field.Name;
            var fieldType = field.FieldType;

            if (!fieldType.IsGenericType || fieldType.GetGenericTypeDefinition() != genericConstant)
                continue;

            FieldInfo destinationField = null;
            if (destinationFields.TryGetValue(fieldName, out destinationField)) {
                destinationField.SetValue(destination, field.GetValue(null));
                destinationFields.Remove(fieldName);
            }
        }
    }

    if (destinationFields.Count &gt; 0) {
        var constants = String.Join(", ", (from key in destinationFields.Keys select key).ToArray());
        var types = String.Join(", ", (from type in sourceTypes select type.Name).ToArray());

        throw new InvalidDataException(String.Format("Type(s) {0} do not declare the following constants:\n{1}", types, constants));
    }
}</pre>
<p>What we&#8217;re doing here is pretty simple: We accept a reference to an object that has constants requiring binding, and a list of source types to retrieve constants from. The function operates in two stages: First, we enumerate all the instance variables defined in the target object, and build a list of all the tunable constants it has that need to be bound. After that, we enumerate all the static fields of the provided source types, looking for constants that have names matching those of the instance variables on the target object, binding them where appropriate. After this, we can simply check to see if our list of constants is empty or not &#8211; if it&#8217;s empty, we successfully bound all our constants, and if it&#8217;s not, we know that one or more of the desired constants was missing.</p>
<p>We get a few useful things out of this: First, accepting a list of types allows us to do simple inheritance of constants. If we first check the most-derived type and then the base type of an entity, that allows us to define a &#8216;default value&#8217; for a particular constant, like &#8216;Maximum Health&#8217;, in a base class, and then define a new constant with the same name in the derived type. This also allows us to create &#8216;global defaults&#8217; for a given constant &#8211; for example, if we always put Game at the end of the type list, we can have global game-wide constants for things like physics parameters, and only override them in specific classes if necessary.</p>
<p>Finally, to wire things up, we just need to do a little work in the constructor for our helper object:</p>
<pre>    public HealthPool (RuntimeEntity entity, ITimeProvider timeProvider) {
        Entity = entity;
        TimeProvider = timeProvider;

        ConstantManager.BindConstants(this, entity.GetType(), entity.Game.GetType());

        Health = HealthMax;
    }</pre>
<p>In this case, we&#8217;re initializing the HealthPool using the constants defined in the entity, and falling back to any constants defined in the Game when the entity doesn&#8217;t specify them. If a necessary constant is missing, we&#8217;ll get an exception thrown when constructing our helper object. Once we&#8217;ve bound the constants, we can just use them like we would otherwise &#8211; in this case, the HealthPool automatically initializes itself based on the HealthMax constant.</p>
<p>This is definitely preferable to the approach I used to use for exposing an entity&#8217;s constants &#8211; previously, for important constants like an entity&#8217;s bounding box size, I&#8217;d define an abstract property in a base class, and override it in each derived type to return the value of the constant. Now, I don&#8217;t need to use any abstract members or interfaces; I can just bind to the constants once when I initialize my helper objects.</p>
<p>One thing of note is that since this technique uses reflection, you might run into performance issues if you&#8217;re binding constants repeatedly. This is pretty trivial to solve, however; you can just cache the results of a constant binding operation based on the destination and source types, since those aren&#8217;t going to change at runtime.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.luminance.org/blog/gruedorf/2009/08/13/constant-binding/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>One down</title>
		<link>http://www.luminance.org/blog/gruedorf/2009/08/06/one-down</link>
		<comments>http://www.luminance.org/blog/gruedorf/2009/08/06/one-down#comments</comments>
		<pubDate>Fri, 07 Aug 2009 06:44:41 +0000</pubDate>
		<dc:creator>Kael</dc:creator>
				<category><![CDATA[Gruedorf]]></category>
		<category><![CDATA[360]]></category>
		<category><![CDATA[dbp2009]]></category>
		<category><![CDATA[lame]]></category>
		<category><![CDATA[xna]]></category>

		<guid isPermaLink="false">http://www.luminance.org/?p=679</guid>
		<description><![CDATA[One to go. Thanks to some especially hard work by Troupe and Ian, we got a relatively decent build of the game in for the Dream-Build-Play 2009 deadline. Next is the Intel Level Up 2009 competion, only a few days from now. I&#8217;m too lazy to write a large blog post this time, since I&#8217;ve [...]]]></description>
			<content:encoded><![CDATA[<p>One to go.</p>
<p>Thanks to some especially hard work by Troupe and Ian, we got a relatively decent build of the game in for the Dream-Build-Play 2009 deadline. Next is the Intel Level Up 2009 competion, only a few days from now.</p>
<p>I&#8217;m too lazy to write a large blog post this time, since I&#8217;ve been up for about 48 hours. Instead, enjoy this <a href="http://www.luminance.org/inferusgame">conveniently placed link</a> that allows you to download a build of the game and try it out. Feel free to mess around with the level editor, too.</p>
<p>Biggest things of note from the SVN logs this week:</p>
<ul>
<li>Added a boss fight!</li>
<li>Overhauled my physics system to address some floating point accuracy issues.</li>
<li>Overhauled the combat system to try and make it more fun. Only slightly successful.</li>
<li>Finally implemented the player character&#8217;s companion, and added support for talking to her.</li>
<li>Significantly reduced garbage generation, which means less stuttering on the 360. Hooray!</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.luminance.org/blog/gruedorf/2009/08/06/one-down/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Home stretch</title>
		<link>http://www.luminance.org/blog/gruedorf/2009/07/30/home-stretch</link>
		<comments>http://www.luminance.org/blog/gruedorf/2009/07/30/home-stretch#comments</comments>
		<pubDate>Fri, 31 Jul 2009 06:21:50 +0000</pubDate>
		<dc:creator>Kael</dc:creator>
				<category><![CDATA[Gruedorf]]></category>
		<category><![CDATA[audio]]></category>
		<category><![CDATA[csharp]]></category>
		<category><![CDATA[design]]></category>
		<category><![CDATA[xna]]></category>

		<guid isPermaLink="false">http://www.luminance.org/?p=645</guid>
		<description><![CDATA[In the next two weeks I have deadlines for two different contests coming up, so things are getting pretty hectic. Lots of things changed in the ~100 or so commits since the last blog post, so I&#8217;ll pick a few to describe. Combo System &#38; Attack Chains Instance Limiting Overhaul Active Controller Detection Combo System [...]]]></description>
			<content:encoded><![CDATA[<p>In the next two weeks I have deadlines for two different contests coming up, so things are getting pretty hectic. Lots of things changed in the ~100 or so commits since the last blog post, so I&#8217;ll pick a few to describe.</p>
<ul>
<li><a href="http://www.luminance.org/gruedorf/2009/07/30/home-stretch#section1">Combo System &amp; Attack Chains</a></li>
<li><a href="http://www.luminance.org/gruedorf/2009/07/30/home-stretch#section2">Instance Limiting Overhaul</a></li>
<li><a href="http://www.luminance.org/gruedorf/2009/07/30/home-stretch#section3">Active Controller Detection</a></li>
</ul>
<div class="video"><object width="720" height="430" style="width:720px;"><param name="movie" value="http://www.youtube-nocookie.com/v/8UJ6-5EHRsw&#038;hl=en&#038;fs=1&#038;rel=0&#038;color1=0x2b405b&#038;color2=0x6b8ab6&#038;hd=1"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube-nocookie.com/v/8UJ6-5EHRsw&#038;hl=en&#038;fs=1&#038;rel=0&#038;color1=0x2b405b&#038;color2=0x6b8ab6&#038;hd=1" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="720" height="430" style="width:720px;"></embed></object></div>
<p><span id="more-645"></span></p>
<p><a name="section1"></a><br />
<h2>Combo System &amp; Attack Chains</h2>
<p>Previously, combat basically consisted of hitting the punch button over and over to kill monsters. This sucked.</p>
<p>I took a few major steps to address this:</p>
<p>First, I added a second attack type bound to the button I was previously going to use for the grappling hook. This attack is slower and hits heavier, with a wider attack arc, and gives you a nice alternative to the fast, lighter-hitting punch attack.</p>
<p>After the addition of the second attack type, I built on that to add &#8216;combo&#8217; variations of each attack (punch combo, slash combo). The player can combo these attacks onto a previous attack by properly timing another button press near the end of the previous attack animation. Missing the timing (either by pressing too early or too late) &#8216;botches&#8217; the combo and causes the delay before they can perform another attack to be longer than it would be otherwise. The size of the time window in which you can successfully combo an attack decreases every time you combo, so over time it gets harder. This means that simply attacking once will be slow and inefficient, but you are also prevented from comboing attacks indefinitely, which strikes a good balance. The fact that a failed combo has a longer delay than a normal attack means that a player won&#8217;t be punished for choosing to simply combo once or twice and then attack again, since the effectiveness ends up being nearly the same.</p>
<p>Finally, I added a &#8216;attack chain&#8217; system that tracks the number of hits you land on a foe in rapid succession. This allows me to delay the &#8216;flinching&#8217; animation normally played when a creature recieves damage, so that the creature stays within reach of your attacks, allowing you to combo. Each successive hit extends the chain for a short period of time, allowing you to continue landing blows, and when the chain &#8216;breaks&#8217;, all the hits you landed deal their damage to the creature at once, knocking it back and possibly killing it. Chains can span across multiple creatures as well, allowing you to keep multiple enemies &#8216;locked&#8217; by your chain at once. Right now it&#8217;s a bit overpowered, but I think some careful tuning will maintain most of the positive aspects without making the game too easy.</p>
<p><a name="section2"></a><br />
<h2>Instance Limiting Overhaul</h2>
<p>Recently, while working on audio improvements, Troupe ran across a bug in XACT. PC builds of the game ran perfectly without any significant CPU usage problems, and the audio sounded great &#8211; but on the 360, as soon as enough channels of audio started playing, the game&#8217;s framerate tanked to around 15FPS and stayed there indefinitely. This was despite the fact that the game already issues most of its audio calls on a background thread to work around the prohibitively high cost of XACT&#8217;s API calls.</p>
<p>The problem turned out to be that despite the fact I was using XACT&#8217;s built in instance limiting support to control the number of instances playing at once, XACT was struggling to handle the number of cues I was asking it to start at once. Essentially, I was starting all the ambient loop cues for my level at once &#8211; around 30 &#8211; and expecting it to pick the 8 loudest ones to play at any given time based on the instance limit. On PC this worked perfectly without any performance issues, but for whatever reason, not so on the 360.</p>
<p>As a result I basically tore out all the existing code that relied on XACT instance limiting, and reimplemented limiting inside my engine. Luckily this only required changes to about 500 lines of code, but it was still rather frustrating to have to reimplement it when it worked perfectly on PC. On the bright side, now I have more control over how instance limiting behaves, so I at least got something out of it.</p>
<p><a name="section3"></a><br />
<h2>Active Controller Detection</h2>
<p>While doing some testing on my 360 I realized that my approach to handling the 360 controller was incorrect. While I assumed that the player might want to play with any of their four connected controllers, I neglected to notice that most of the XNA Guide APIs (storage device selection, etc) are designed to only respond to input from a single controller. This meant that I needed to detect which controller the player was currently using and make sure to only show XNA dialog boxes using that controller, and that I also needed to detect if that controller became unplugged so that I wouldn&#8217;t attempt to show a dialog box the player was unable to close.</p>
<p>I ended up spending a while changing my input framework so that it would automatically detect disconnected controllers, and inform the game code when a controller had been reconnected. I also did some work to automatically track the active controller, while still handling keyboard input correctly on the PC.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.luminance.org/blog/gruedorf/2009/07/30/home-stretch/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Event-driven audio</title>
		<link>http://www.luminance.org/blog/gruedorf/2009/07/24/event-driven-audio</link>
		<comments>http://www.luminance.org/blog/gruedorf/2009/07/24/event-driven-audio#comments</comments>
		<pubDate>Fri, 24 Jul 2009 19:49:39 +0000</pubDate>
		<dc:creator>Kael</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Gruedorf]]></category>
		<category><![CDATA[audio]]></category>
		<category><![CDATA[content pipeline]]></category>
		<category><![CDATA[csharp]]></category>
		<category><![CDATA[editor]]></category>
		<category><![CDATA[xact]]></category>
		<category><![CDATA[xna]]></category>

		<guid isPermaLink="false">http://www.luminance.org/?p=625</guid>
		<description><![CDATA[One of the older items on my to-do list was to give my sound designer a way to change the game&#8217;s audio without having to recompile the game in Visual Studio and start it up. Based on some of the improvements I made recently, I was finally able to knock that item off my to-do [...]]]></description>
			<content:encoded><![CDATA[<p>One of the older items on my to-do list was to give my sound designer a way to change the game&#8217;s audio without having to recompile the game in Visual Studio and start it up. Based on some of the improvements I made recently, I was finally able to knock that item off my to-do list.</p>
<p>Below, you can see a short annotated video walkthrough where I demonstrate the technique and show how it integrates with XACT.</p>
<div class="video"><object width="720" height="480" data="http://www.youtube-nocookie.com/v/faHdvcz45lU&amp;hl=en&amp;fs=1&amp;rel=0&amp;color1=0x2b405b&amp;color2=0x6b8ab6&amp;hd=1;hq=1" type="application/x-shockwave-flash" style="width: 720px"><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://www.youtube-nocookie.com/v/faHdvcz45lU&amp;hl=en&amp;fs=1&amp;rel=0&amp;color1=0x2b405b&amp;color2=0x6b8ab6&amp;hd=1;hq=1" /><param name="allowfullscreen" value="true" /></object></div>
<hr />
<p>There are a few key pieces necessary for this to work.</p>
<p><span id="more-625"></span></p>
<p>First, I need a way to get updated audio into the game. It turns out this is pretty simple &#8211; if you change the settings in your XACT project, you can get it to build output files into a folder of your choice. At that point, all you need to do is change your game&#8217;s XACT code to be able to load from that location.</p>
<p>Second, I need a way to pull information out of the XACT datafiles that I can use to attach cues to events. Since the XACT API provided by the XNA Framework is basically useless for this purpose, I ended up solving this problem by loading up the raw datafiles and pulling the names of my cues out of the datafiles. Yes, it&#8217;s disgusting. But it works! Luckily, the cue names are right at the end of the data files, and they&#8217;re null-terminated, so it&#8217;s not difficult to read them. It&#8217;s beyond my understanding why the framework developers opted not to provide the information.</p>
<p>Third, I needed a way to broadcast events from objects in my game world. The solution I ended up building for this was essentially an improved version of the event framework that I previously used for handling input events, like button presses. The framework allows me to &#8216;subscribe&#8217; to various types of events, either for a specific object, or for all objects that can broadcast that event. The ability to subscribe to all objects inexpensively gives me a straightforward way to say things like &#8216;whenever any creature gets hit by a rocket, play an explosion sound&#8217;, which is important.</p>
<p>Fourth, I needed a way to expose event information in an understandable manner, so that it would be easy to figure out what events are occurring in-game and how to attach sounds to them. I solved this by creating a simple &#8216;event overlay&#8217; that shows a list of the most recently fired events, and highlights objects when they broadcast events. This allows you to simply play the game with the overlay open and look for things that are missing sound effects &#8211; once you find something, just look at the log to find out the name of the event.</p>
<hr />
<p>If you&#8217;re interested, you can check out the source code and automated tests for the event system <a href="http://code.google.com/p/fracture/source/browse/trunk/Squared/Util/EventBus.cs">here, on Google Code</a>. In the future I will be releasing the rest of my audio framework as open-source for people to use. Please don&#8217;t hesitate to leave a comment if you have any questions; I&#8217;d be glad to help explain more about how this technique works.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.luminance.org/blog/gruedorf/2009/07/24/event-driven-audio/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Threaded Renderer</title>
		<link>http://www.luminance.org/blog/gruedorf/2009/07/20/threaded-renderer</link>
		<comments>http://www.luminance.org/blog/gruedorf/2009/07/20/threaded-renderer#comments</comments>
		<pubDate>Tue, 21 Jul 2009 05:07:44 +0000</pubDate>
		<dc:creator>Kael</dc:creator>
				<category><![CDATA[Gruedorf]]></category>
		<category><![CDATA[concurrency]]></category>
		<category><![CDATA[csharp]]></category>
		<category><![CDATA[design]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[xna]]></category>

		<guid isPermaLink="false">http://www.luminance.org/?p=585</guid>
		<description><![CDATA[Lots of changes since last week, but the majority of my efforts were focused on overhauling my approach to rendering. My basic approach is primarily inspired by Christer Ericson&#8217;s post about the approach he uses for ordering draw calls. Some other useful sources of information were Tom Forsyth&#8217;s post about the cost of renderstate changes [...]]]></description>
			<content:encoded><![CDATA[<p>Lots of changes since last week, but the majority of my efforts were focused on overhauling my approach to rendering.</p>
<div class="video"><object width="640" height="385" data="http://www.youtube-nocookie.com/v/stiPgEW6AJI&amp;hl=en&amp;fs=1&amp;rel=0&amp;color1=0x2b405b&amp;color2=0x6b8ab6&amp;hd=1" type="application/x-shockwave-flash"><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://www.youtube-nocookie.com/v/stiPgEW6AJI&amp;hl=en&amp;fs=1&amp;rel=0&amp;color1=0x2b405b&amp;color2=0x6b8ab6&amp;hd=1" /><param name="allowfullscreen" value="true" /></object></div>
<p>My basic approach is primarily inspired by <a href="http://realtimecollisiondetection.net/blog/?p=86">Christer Ericson&#8217;s post about the approach he uses for ordering draw calls</a>. Some other useful sources of information were <a href="http://home.comcast.net/~tom_forsyth/blog.wiki.html#[[Renderstate%20change%20costs]]">Tom Forsyth&#8217;s post about the cost of renderstate changes</a> and <a href="http://www.guerrilla-games.com/publications/dr_kz2_rsx_dev07.pdf">Guerilla&#8217;s presentation on Killzone 2&#8242;s renderer from DEVELOP 2007</a>. The Killzone 2 presentation is especially worthwhile since it describes the way they take advantage of concurrency techniques in their renderer.</p>
<p>Until now, the rendering code in my game has been almost entirely immediate-mode; various objects in the game world like tiles and entities had Draw methods that would utilize the game&#8217;s GraphicsDevice and SpriteBatch to render themselves, changing render states as needed and generating dynamic geometry where necessary (like for water).</p>
<p><span id="more-585"></span></p>
<p>Of course, if every object changes render states, you quickly run into a situation where you&#8217;re changing states tens of thousands of times per frame, which doesn&#8217;t perform very well. As a result, I ended up having to add some basic support for grouping together identical render states. In the end, I often had to perform multiple passes over the scene in order to minimize state changes, and the addition of multiple onscreen &#8216;layers&#8217; meant that I often had to make half a dozen passes over the same collections (like Level.Entities and Level.Geometry) in order to render things in the correct order without too many state changes. My new renderer clearly needed to provide an inexpensive way to reduce state changes without complicating my code.</p>
<p>In addition to the issue of state changes, however, was the simple fact that rendering was too expensive. In particular, rendering all the onscreen tiles for the 6+ layers in the level could often add up to 5%+ of my CPU time, due to the relative inefficiency of the XNA SpriteBatch class when trying to draw thousands of bitmaps at various locations on the screen. The use of SpriteBatch also meant that I couldn&#8217;t effectively sort tiles by texture, because using a custom blending configuration with SpriteBatch requires you to disable its texture sorting support. (My tiles and sprites have to be premultiplied so they look correct when scaled, unfortunately.) As a result, my new renderer needed to minimize the cost of individual drawing operations, and it also needed to give me a way to utilize material/texture sorting with custom blending configurations.</p>
<p>One final detail that factored into the design was concurrency: Between the cost of actually rendering the game world, and the cost of waiting for vertical sync, the main thread was spending as much as 40% of its time performing rendering, which made it much harder to maintain a smooth, consistent 60fps framerate. In order to minimize the amount of time the main thread spent blocked during rendering, my new design needed to make it possible to move rendering work off the main thread without introducing thread safety issues &#8211; putting a lock around all my game state wasn&#8217;t going to cut it.</p>
<hr />The design I ended up with roughly looks like this:</p>
<p>Every frame, at the beginning of my Draw method I construct a new Render.Frame object to represent the frame that&#8217;s being rendered.</p>
<p>The Render.Frame object contains a list of Render.Batch objects; each Batch has an associated Layer and Material. Layers are simple integers that allow me to explicitly order batches, so that I can create batches in any given order, and even create multiple batches at once. This also allows me to perform a single sort of the Batches array before sending the batches off to the video card, in order to minimize state changes.</p>
<p>Any given Batch contains a list of structures representing &#8216;Draw Calls&#8217;. In some cases, a draw call maps directly to a hardware drawing operation &#8211; like DrawUserPrimitives &#8211; but in other cases, it maps to something more granular. For example, a BitmapBatch contains BitmapDrawCalls, where each draw call represents a single bitmap, much like the arguments you pass to SpriteBatch&#8217;s Draw method. This allows me to sort individual draw calls based on their parameters to minimize state changes within a batch.</p>
<p>Finally, the Material objects associated with a given Batch are a superset of the XNA&#8217;s Effect class &#8211; they include a VertexDeclaration, Effect, and optional delegates for configuring other rendering state like the current blending function or stencil state. Grouping all these parameters together in one object allows me to sort batches cheaply by comparing material instances, but still gives me some level of granularity since I can change the shader parameters of an active Effect within a batch, for example to support rendering multiple textures inside a single BitmapBatch.</p>
<p>In addition to the more obvious performance advantages of this approach, like the ability to sort by material, one other advantage is less obvious: Since a Frame contains information on all the drawing operations that need to be performed, but doesn&#8217;t depend on any of my game state, I can safely hand that object to another thread and have that thread perform the drawing operations. This lets me move a significant portion of my rendering off the main thread, and begin performing my next Update while the previous Draw completes, without needing to add any locking or complex synchronization.</p>
<hr />It&#8217;s not all great, though. The biggest downsides to this approach are twofold:</p>
<p>First, I basically have to reimplement everything from scratch &#8211; SpriteBatch and SpriteFont are both completely impossible to extend, so I have to reimplement them in order to render text and bitmaps with this approach, and the same goes for any other rendering code based on directly manipulating a GraphicsDevice.</p>
<p>Second, this approach is inherently more dependent on the garbage collector, since most of the types must be classes by necessity. If I want to run well on the 360, this means I need to make use of pooling and other techniques to avoid frequent allocations during frames. I&#8217;m also no longer able to reuse a single scratch buffer when generating geometry, so every piece of geometry I render needs to have its own buffer &#8211; more allocations.</p>
<p>So far I&#8217;m pretty pleased with this approach. There&#8217;s more work to be done &#8211; for example, I don&#8217;t have pooling implemented so my game collects extremely often on the 360 &#8211; but a large portion of my game is now running on this new rendering architecture, and my average framerate has already improved slightly from moving work off the main thread, despite the fact that I haven&#8217;t spent any time on optimizations.</p>
<hr />The biggest challenge that remains is porting all of my old GraphicsDevice-oriented rendering code over to using batches. Here&#8217;s a before and after example:</p>
<pre>        public void RenderTileLayer (int index) {
            var layer = RuntimeLevel.Layers[index];
            RuntimeLayer.ItemInfo itemInfo = null;

            BeginSpriteBatch(BlendModes.AlphaPremultiplied);

            using (var e = layer.GetItemsFromBounds(Camera.Bounds))
            while (e.GetNext(out itemInfo)) {
                RenderTile(itemInfo.Item, SpriteBatch, Camera.ViewportPosition, Camera.Zoom, AnimationTimeProvider.Ticks);
            }
        }

        private void RenderTile (RuntimeTile tile, SpriteBatch spriteBatch, Vector2 viewportPosition, float zoom, long time) {
            var info = tile.TileInfo;
            var pos = (tile.Bounds.TopLeft - viewportPosition) * zoom;
            spriteBatch.Draw(info.Texture, pos, info.Rectangle, Colors.White, 0.0f, new Vector2(0, 0), zoom, strip.GetSpriteEffect(), 0.0f);
        }</pre>
<p>You can see here that my approach for rendering tiles is pretty simple: Get all the tiles within the screen&#8217;s boundaries, and render them one by one using a SpriteBatch. I&#8217;m able to reduce the number of state changes since I know that every tile within a given layer shares the same state, but I still need to change state once per layer (using the BeginSpriteBatch function, which calls SpriteBatch.Begin and sets up my render state). I also have to go out of my way to render each layer in the right order, which means calling RenderLayer in multiple places so that certain tiles appear below entities while other tiles appear above entities.</p>
<p>The new implementation looks like this:</p>
<pre>        public void RenderTileLayer (int index) {
            var layer = RuntimeLevel.Layers[index];
            RuntimeLayer.ItemInfo itemInfo = null;

            int drawLayer = (index &lt;= 2) ? DrawLayers.Background : DrawLayers.Foreground;

            using (var bitmapBatch = new BitmapBatch(PendingFrame, drawLayer + index, Materials.Bitmap[BlendModes.AlphaPremultiplied]))
            using (var e = layer.GetItemsFromBounds(Camera.Bounds))
            while (e.GetNext(out itemInfo)) {
                RenderTile(itemInfo.Item, bitmapBatch);
            }
        }

        public void RenderTile (RuntimeTile tile, Render.BitmapBatch batch) {
            var info = tile.TileInfo;

            var drawCall = new Render.BitmapDrawCall(info.Texture, tile.Bounds.TopLeft, info.Bounds);
            drawCall.Mirror(info.Strip.Mirroring.X, info.Strip.Mirroring.Y);

            batch.Add(drawCall);
        }</pre>
<p>One thing you&#8217;ll notice is that RenderTile now has considerably fewer arguments. Since I had to reimplement SpriteBatch from scratch, this gave me the opportunity to integrate some of my rendering calculations, like zooming and viewport positioning, directly into the shader. As a result, I don&#8217;t need to pass those values around as parameters anymore; I simply set them as a shader parameter every time they change. Using an EffectPool for all my shaders also means that I only need to set those parameters once, instead of having to update all my individual shaders with the correct values.</p>
<p>One other difference here is the need to explicitly choose a layer for the tiles to render on when I create a BitmapBatch. I have a small set of constants (named DrawLayers) that I use to roughly organize my layers, but I also add values to those constants so that I can organize batches within those layers. That way, I can be certain that if I draw three sets of tiles on the same layer, they are always drawn in the same order relative to each other.</p>
<p>I also have to explicitly pass in a Material object for the BitmapBatch, instead of just passing a blend mode to the BeginSpriteBatch function. This isn&#8217;t a significant change, but it does mean that I now have to explicitly manage my materials &#8211; as a result, my game now has a LoadMaterials function that runs at startup and creates all the various permutations of parameters I need and stores them so that I can grab them at runtime.</p>
<p>You may also notice that there are no explicit drawing operations in here anywhere; I&#8217;m just creating a BitmapBatch and adding draw calls to it. Essentially, what&#8217;s going on here is that creating a batch automatically attaches it to the Frame being built. When the batch is Disposed (by the using block, in this case), the Frame is notified that the contents of the Batch (its draw calls) are ready and it stores them for later. The use of IDisposable to represent &#8216;readiness&#8217; instead of disposal is a little weird, but it&#8217;s convenient in that it gives me relatively automatic batch management. This also means that if I wanted to, I could create lots of batches at once and fill them with draw calls on multiple threads, since the Frame has a straightforward way to determine whether all of its batches are ready yet.</p>
<hr />Below you can see a short video of how the renderer groups the game up into batches. If you look carefully, you may also notice that particles are being rendered behind all the batches, since they aren&#8217;t yet integrated into the renderer.</p>
<div class="video"><object width="640" height="385" data="http://www.youtube-nocookie.com/v/NtlDmjVpIXg&amp;hl=en&amp;fs=1&amp;rel=0&amp;color1=0x2b405b&amp;color2=0x6b8ab6&amp;hd=1" type="application/x-shockwave-flash"><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://www.youtube-nocookie.com/v/NtlDmjVpIXg&amp;hl=en&amp;fs=1&amp;rel=0&amp;color1=0x2b405b&amp;color2=0x6b8ab6&amp;hd=1" /><param name="allowfullscreen" value="true" /></object></div>
]]></content:encoded>
			<wfw:commentRss>http://www.luminance.org/blog/gruedorf/2009/07/20/threaded-renderer/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Camera Constraints</title>
		<link>http://www.luminance.org/blog/gruedorf/2009/07/15/camera-constraints</link>
		<comments>http://www.luminance.org/blog/gruedorf/2009/07/15/camera-constraints#comments</comments>
		<pubDate>Wed, 15 Jul 2009 10:02:36 +0000</pubDate>
		<dc:creator>Kael</dc:creator>
				<category><![CDATA[Gruedorf]]></category>
		<category><![CDATA[camera]]></category>
		<category><![CDATA[csharp]]></category>
		<category><![CDATA[design]]></category>
		<category><![CDATA[xna]]></category>

		<guid isPermaLink="false">http://www.luminance.org/?p=559</guid>
		<description><![CDATA[As the level I&#8217;m currently building has gotten larger, it&#8217;s become obvious that I need a way to control the behavior of the camera &#8211; simply centering it on the player isn&#8217;t sufficient. While I already had some usable support for panning the camera to show points of interest, and locking it in place while [...]]]></description>
			<content:encoded><![CDATA[<p>As the level I&#8217;m currently building has gotten larger, it&#8217;s become obvious that I need a way to control the behavior of the camera &#8211; simply centering it on the player isn&#8217;t sufficient. While I already had some usable support for panning the camera to show points of interest, and locking it in place while the player performs an action like climbing onto a surface, I didn&#8217;t have anything in place for more advanced control of the camera, like constraining it to a region of the level.</p>
<p>So, the first approach I took was simple: I placed a rectangle around the entire level to constrain the camera. For smaller, rectangular levels, this worked good enough &#8211; all I had to do was place the rectangle so that you could see all the important parts of the level, and make sure I filled the entire space with tiles so the player didn&#8217;t see any ugly empty space while the camera panned around.</p>
<p>But to be honest, that kind of sucks. It means I have to waste time filling in the entire rectangle with tiles, and the camera doesn&#8217;t do anything to help give the player a sense of the layout of the environment. Ideally, the camera should be able to automatically position itself so that you can see the important parts of the area around you, and not show you unimportant sections of the level that you might have already passed through, or might be encountering later. Furthermore, it should avoid showing you empty/boring space whenever possible &#8211; no point in showing boring repeated tiles to the player when we could be showing interesting parts of the level instead.</p>
<p>The first approach I tried was a relatively simple one: I placed points of interest throughout the level, and then drew lines to connect them using the editor. After that, I assigned a radius to each point, controlling how far the camera would be allowed to &#8216;drift&#8217; away from the point. The end result was that I had a series of &#8216;rails&#8217; the camera could follow through the level, of varying thickness depending on the radius of each point.</p>
<p><span id="more-559"></span></p>
<p>While this approach provided very smooth, organic-feeling camera motion, it was extremely difficult to get it to do exactly what I wanted it to do. It&#8217;s hard to visualize how the camera will behave when following rounded paths through the level, and it was difficult to get the camera to transition between rails at exactly the times I wanted &#8211; if two rails were close to each other, the camera would often jump between them unpredictably, and it was hard to prevent that without introducing nasty side effects.</p>
<p>So, I moved on to a more complex approach: Instead of creating &#8216;rails&#8217; for the camera to follow, I instead defined a set of rectangular camera regions, covering all the parts of the level I wanted the camera to focus on. The idea was to have the camera automatically focus on the most interesting parts of the level, and as a secondary effect, have it avoid showing empty parts of the screen.</p>
<p>Once I had all the regions set up, I wrote some simple code to have the camera perform some simple collision detection, between the four edges of the camera (top, left, bottom, right) and the camera regions, so that it would automatically pan around to follow the player, but stop when hitting the edge of the visible regions. While I eventually got this approach to work, it was difficult to use it to hide empty parts of the level &#8211; most of the camera regions ended up being smaller than the screen, and when dealing with regions smaller than the screen, it&#8217;s impossible to get the camera to automatically show you the &#8216;right&#8217; parts of the level, since the camera algorithm has to choose between multiple equally acceptable positions for the camera.</p>
<p>The final approach I settled on was a refined version of the previous one. Instead of simply defining a set of rectangles, I built on that approach: I placed rectangles to define regions or &#8216;rooms&#8217; within the level, and assigned each room a set of constraints to apply to the camera. At runtime, the camera code selects a region based on the player&#8217;s current position, and then applies the constraints from that region to control the motion of the camera. In most cases, I only use one or two constraints on a given region, but if I want to, I can use all four of them, constraining the minimum/maximum position of the camera along both axes.</p>
<p><object width="640" height="385" data="http://www.youtube-nocookie.com/v/asuK-y5YCKs&amp;hl=en&amp;fs=1&amp;rel=0&amp;color1=0x2b405b&amp;color2=0x6b8ab6&amp;hd=1" type="application/x-shockwave-flash"><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://www.youtube-nocookie.com/v/asuK-y5YCKs&amp;hl=en&amp;fs=1&amp;rel=0&amp;color1=0x2b405b&amp;color2=0x6b8ab6&amp;hd=1" /><param name="allowfullscreen" value="true" /></object></p>
<p>This gives me most of what I need to make the camera behave in a pleasing way: While the player&#8217;s in a long corridor, I can lock the camera&#8217;s Y axis and make it follow the player along the corridor, but stop when it hits the edges of the corridor. When the player&#8217;s ascending a long vertical shaft, I can lock the camera horizontally but have it follow him vertically. And in cases where the player is moving along both axes, I can just lock it to a rectangular region so it attempts to follow the player, but avoids wasting onscreen space.</p>
<p>Of course, there&#8217;s one obvious flaw with this approach, once you prototype it: Since the player can only be in one camera region at a time, you get an abrupt jump when the player moves from one region to another. Luckily, there&#8217;s a relatively simple way to solve this problem: Interpolation. Unfortunately, the simplest possible approach here won&#8217;t work right: If you just interpolate between regions, the camera ends up bouncing around, traveling outside the constraints set by the camera regions. The solution I ended up applying works like this: When the player enters a new camera region, I slowly ramp up the effect of the new region, while leaving the previous region&#8217;s constraints operating at full strength. Once the new region&#8217;s constraints are in full effect, I then ramp down the effect of the previous region. This ensures that the camera is fully constrained at all times, but still provides a relatively smooth transition between regions. It&#8217;s not perfect, but in practice it&#8217;s relatively easy to set up regions so the camera smoothly tracks the player as he travels between them.</p>
<p>Overall, I&#8217;m pleased with how this turned out &#8211; I only had to write around a hundred lines of code to implement the final version of my camera constraint system, even though it took me around 8 hours to arrive at my final implementation. It&#8217;s unfortunate that it took so long to get it right, but considering that I had no idea how to solve this problem when I started, things went pretty well.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.luminance.org/blog/gruedorf/2009/07/15/camera-constraints/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>GPU accelerated particles</title>
		<link>http://www.luminance.org/blog/gruedorf/2009/07/08/gpu-accelerated-particles</link>
		<comments>http://www.luminance.org/blog/gruedorf/2009/07/08/gpu-accelerated-particles#comments</comments>
		<pubDate>Wed, 08 Jul 2009 21:01:15 +0000</pubDate>
		<dc:creator>Kael</dc:creator>
				<category><![CDATA[Gruedorf]]></category>
		<category><![CDATA[csharp]]></category>
		<category><![CDATA[hlsl]]></category>
		<category><![CDATA[particles]]></category>
		<category><![CDATA[xna]]></category>

		<guid isPermaLink="false">http://www.luminance.org/?p=548</guid>
		<description><![CDATA[During this last week, some of the work I did was to optimize my particle system, since it was showing up consistently on my profiles and I was adding more and more particles to my environments. There are a few basic approaches you can take when trying to optimize code like my particle system. The [...]]]></description>
			<content:encoded><![CDATA[<p>During this last week, some of the work I did was to optimize my particle system, since it was showing up consistently on my profiles and I was adding more and more particles to my environments.</p>
<div class="video">
<object width="640" height="385" data="http://www.youtube-nocookie.com/v/vIXl9xFTd08&amp;hl=en&amp;fs=1&amp;rel=0&amp;color1=0x2b405b&amp;color2=0x6b8ab6&amp;hd=1" type="application/x-shockwave-flash"><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://www.youtube-nocookie.com/v/vIXl9xFTd08&amp;hl=en&amp;fs=1&amp;rel=0&amp;color1=0x2b405b&amp;color2=0x6b8ab6&amp;hd=1" /><param name="allowfullscreen" value="true" /></object>
</div>
<p>There are a few basic approaches you can take when trying to optimize code like my particle system.</p>
<ul>
<li>The fact that you have a large number of particles all behaving in the same way means that you can easily distribute the work of updating/rendering particles across multiple cores, as long as your data structures and libraries are set up correctly to handle it &#8211; so one option is to multithread your particle system.</li>
<li>The parallel-friendly nature of a particle system also means that it&#8217;s possible to offload much of the work involved in rendering particles directly to the GPU, and do it in a shader instead of on the CPU. This is almost always faster.</li>
<li>In fact, in many cases, you can even update your particles on the GPU, by storing their state in a texture or vertex buffer and having a shader run over all the particles and write their new state to another texture/buffer. You can then take the new state and feed it into another shader as input to render your particles.</li>
<li>And of course, you can always take the standard approach of brute-force optimization, by making your particle system as efficient as possible with the same basic algorithm.</li>
</ul>
<p><span id="more-548"></span></p>
<p>While I could have attempted to shift all the work onto the GPU, or make use of multiple threads, for now I decided to simply offload rendering work onto the GPU, because my profiles showed that I was spending a considerable amount of time handing particle system state off to the XNA SpriteBatch class. Reading the source code for SpriteBatch in Reflector shows that there are lots of tiny inefficiencies in its implementation when you&#8217;re trying to use it to render particles &#8211; it does a lot of work to handle state changes, texture sorting, and other considerations that do not apply when you have large batches of particles with identical parameters.</p>
<p>As an upside, rendering particles yourself using a shader makes it easier to distribute your updating and rendering logic across threads, because you can now generate vertex information for your particles in batches on multiple threads, before handing them to the GPU. When using SpriteBatch, you&#8217;re stuck because every SpriteBatch.Draw call requires synchronization.</p>
<p>Since this was my first time writing a HLSL shader, the process of moving from SpriteBatch to my custom shader was an interesting one. I ran into lots of little snags and ended up having to change my design multiple times along the way, and spent a little while experimenting with different rendering techniques to try and figure out which one performed the best. One particularly surprising conclusion was that small batches of vertices were much faster than large batches &#8211; initially, I was updating and rendering all my particles in a single batch, and then handing them all to the GPU at once.</p>
<p>I assumed that this would allow the driver and the GPU to crunch away on all the particles in the background while I moved on to doing other work on the CPU, but in practice, generating a small batch and handing it to the GPU while I work on the next batch is consistently faster on both the PC and the XBox 360. This is one of those cases where you might assume a code change will improve performance, but if you don&#8217;t benchmark and tune carefully, it can actually impair your game&#8217;s performance &#8211; disappointing if you just spent 8 hours hacking on something only to realize it was a bad idea.</p>
<p>The first step when implementing the shader for my particle system was to determine how to convert my particle system&#8217;s state into vertices for the shader to consume. There are a few factors that make this a bit of a challenge:</p>
<ul>
<li>In general, GPUs operate on collections of values &#8211; vectors and matrices &#8211; not individual values. This means you can&#8217;t just toss 16 uniquely named floats and integers into a vertex and get good performance; for ideal performance you need to pack groups of related values into vectors. This is straightforward for things like positions and velocities.</li>
<li>With a few exceptions, you need to send the GPU as many vertices as you want it to draw. This is a bit of a pain when dealing with a particle system, since you typically want to map one set of values (a particle) into 6 vertices for a textured quad that represents the particle. In some cases, you can utilize Point Sprite support to get the job done, but hardware point sprites are tremendously limited. This means you need to find an efficient way to transform each particle into 6 vertices.</li>
<li>Since you have to generate 6 vertices from each particle, you need to minimize redundant calculations &#8211; there are lots of calculations that the GPU is capable of doing, so you want to offload as many of them to the GPU as possible, so you can avoid doing them on the main CPU, where they cost significantly more.</li>
</ul>
<p>For my particle system, the state of a particle looks like this:</p>
<pre>    public struct Particle {
        public Vector2 Position;
        public Vector2 Velocity;
        public float Opacity;
        public float Scale;
        public float Rotation;
        public Color Color;
    }</pre>
<p>After some experimenting and thinking, the vertex format I ended up with looks like this:</p>
<pre>    public struct ParticleVertex {
        public Vector2 Position;
        public Vector3 Params; // Opacity, Scale, Rotation
        public Color Color;
        public short Corner;
        public short Unused;
    }</pre>
<p>So, to begin with, you&#8217;ll notice that I&#8217;m packing three unique values (opacity, scale, rotation) into a single vector. This is important because the vertex shader will only need to use one register to hold all three values, instead of needing a register for each individual value. I&#8217;m also separating the opacity value from the particle&#8217;s color, because combining the two values on the CPU is prohibitively expensive (mostly due to some stupid design decisions in the XNA framework, but I digress&#8230;), so I multiply out the alpha in the shader instead. The &#8216;Corner&#8217; value is used so the shader can determine which of the particle&#8217;s four corners are being shaded &#8211; this allows us to duplicate a given particle vertex six times to satisfy the video card&#8217;s desire for two triangles, by only changing the Corner. There&#8217;s also that strange looking &#8216;Unused&#8217; value there, which exists for a reason I&#8217;ll explain later.</p>
<p>Given the two formats, it&#8217;s relatively simple to write some code to transform from one to the other:</p>
<pre>Particle p = particles[i];
vertex.Position = p.Position;
vertex.Params.X = p.Opacity;
vertex.Params.Y = p.Scale;
vertex.Params.Z = p.Rotation;
vertex.Color = p.Color;</pre>
<p>Once you have a vertex for a given particle, then all you have to do is emit the vertices for each corner:</p>
<pre>for (short k = 0; k &lt; 4; j++, k++) {
    vertex.Unused = vertex.Corner = k;
    d[j] = vertex;
}</pre>
<p>You may notice that Unused has shown up again. Here&#8217;s why: Originally, I only populated the Corner field, and the shader worked perfectly &#8211; on my PC. On the XBox, it mysteriously rendered nothing. I finally realized that the XBox has a different byte ordering from my PC, since it&#8217;s a PowerPC-based chip instead of an x86 one. As a result, my shader was reading from Unused on the 360 instead of from Corner. As a simple solution, I just populate both fields, since I have to send them anyway (there&#8217;s no way to send a single byte or integer as part of a vertex).</p>
<p>You may also notice that I&#8217;m generating four vertices, not six. This is so that I can take advantage of a pre-generated index buffer and only send four vertices per particle to the GPU instead of six. The index buffer is really simple to generate:</p>
<pre>for (short i = 0, j = 0; i &lt; numVertices; i += 4, j += 6) {
    indices[j] = i;
    indices[j + 1] = (short)(i + 1);
    indices[j + 2] = (short)(i + 3);
    indices[j + 3] = (short)(i + 1);
    indices[j + 4] = (short)(i + 2);
    indices[j + 5] = (short)(i + 3);
}</pre>
<p>Once I have the vertex format set up, and I have code to generate vertices for my particles, the only hard parts remaining are to write a shader and set it up to be used by the game. The shader ends up being relatively simple &#8211; the only real complicated part is handling rotation:</p>
<pre>float2 TextureSize;
float2 Translation;
texture ParticleTexture;
float4x4 MatrixTransform;

sampler TextureSampler = sampler_state {
    Texture = (ParticleTexture);

    MinFilter = Linear;
    MagFilter = Linear;
    MipFilter = Linear;
};

const float2 Corners[] = {
    {-0.5f, -0.5f},
    { 0.5f, -0.5f},
    { 0.5f,  0.5f},
    {-0.5f,  0.5f}
};

void VertexShader(
    in float2 position : POSITION0, // x, y
    inout float4 color : COLOR0,
    in float3 params : POSITION1, // opacity, scale, rotation
    in int2 cornerIndex : BLENDINDICES0, // 0-3
    out float2 texCoord : TEXCOORD0,
    out float4 result : POSITION0
) {
    float2 corner = Corners[cornerIndex.x];
    texCoord = corner + Corners[2];
    float2 sinCos, rotatedCorner;
    corner *= TextureSize.xy;
    sincos(params.z, sinCos.x, sinCos.y);
    rotatedCorner.x = (sinCos.y * corner.x) - (sinCos.x * corner.y);
    rotatedCorner.y = (sinCos.x * corner.x) + (sinCos.y * corner.y);
    position.xy += (rotatedCorner * params.y) - Translation;
    color *= params.x;
    result = mul(float4(position.xy, 0, 1), MatrixTransform);
}

void PixelShader(
    inout float4 color : COLOR0,
    float2 texCoord : TEXCOORD0
) {
    color *= tex2D(TextureSampler, texCoord);
}

technique ParticleTechnique
{
    pass P0
    {
        vertexShader = compile vs_1_1 VertexShader();
        pixelShader = compile ps_1_1 PixelShader();
    }
}</pre>
<p>There are a few things at work here: We define a Sampler in our shader that represents the texture for our particles, and set the parameters that determine how the texture will be scaled and mipmapped. We also define some variables that can be set by the game at runtime to feed into the shader, along with a constant array that contains offsets for all four vertex corners. The array lets us map those integer corner indices to x/y coordinates easily, so we can convert four identical points into the corners of a quad.</p>
<p>The pixel shader doesn&#8217;t do anything of note, so I&#8217;ll just go over the vertex shader. First, we map the corner index into an xy coordinate pair, by looking it up in the constant array. Then, we read the rotation out of the parameters structure, and use sincos to generate a rotated version of the corner coordinate, so that the resulting quad for the particle is rotated appropriately (You could do this with a matrix multiply instead of individual arithmetic, but I&#8217;m too lazy. <img src='http://www.luminance.org/wordpress/wp-includes/images/smilies/icon_biggrin.gif' alt=':D' class='wp-smiley' /> ).</p>
<p>Finally, we combine the rest of the parameters: Add the location of the rotated corner to the particle&#8217;s centerpoint, scale it by the scale parameter, and then translate it by the position of the camera.</p>
<p>Once we&#8217;ve done that, we multiply the input color by the input opacity to generate the actual color for the particle, and apply our transform matrix to generate the actual position of the particle&#8217;s vertex. Note that the translation and rotation stages could be done here if we wanted, since we&#8217;re using a 4&#215;4 transform matrix. All in all, a relatively simple shader.</p>
<p>After adding the shader to my game&#8217;s content project and compiling it, I can load it up at runtime as an Effect, and apply it when I want to render particles. Of course, using it requires filling in the various constants with the right values so that the shader can generate particles at the right coordinates:</p>
<pre>Effect.CurrentTechnique = Effect.Techniques["ParticleTechnique"];
Effect.Parameters["TextureSize"].SetValue(new Vector2(Texture.Width, Texture.Height));
Effect.Parameters["ParticleTexture"].SetValue(Texture);
Effect.Parameters["Translation"].SetValue(Camera.ViewportPosition);</pre>
<p>Once we&#8217;ve set the constants, we&#8217;re ready to render some particles.</p>
<p>At this point, we have a relatively efficient GPU-accelerated particle system. There&#8217;s lots of room for improvement, but as-is this system is considerably faster than my previous SpriteBatch-based implementation. The fact that I&#8217;m generating vertices in arrays and handing them to the GPU directly also means that if I want to, I can improve my particle system to use multiple threads to do updating and rendering without much hassle, since I won&#8217;t need to add any sophisticated synchronization &#8211; I can just slice up the particle array into chunks and hand each chunk to a thread.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.luminance.org/blog/gruedorf/2009/07/08/gpu-accelerated-particles/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Threaded EndDraw in XNA (Crouching WaitOne, Hidden Lock)</title>
		<link>http://www.luminance.org/blog/gruedorf/2009/07/01/threaded-enddraw-in-xna</link>
		<comments>http://www.luminance.org/blog/gruedorf/2009/07/01/threaded-enddraw-in-xna#comments</comments>
		<pubDate>Wed, 01 Jul 2009 23:21:29 +0000</pubDate>
		<dc:creator>Kael</dc:creator>
				<category><![CDATA[Gruedorf]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[concurrency]]></category>
		<category><![CDATA[csharp]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[threading]]></category>
		<category><![CDATA[xna]]></category>

		<guid isPermaLink="false">http://www.luminance.org/?p=526</guid>
		<description><![CDATA[After finishing up the materials for my Level Up 2009 entry, today I spent a little while trying out an idea I had recently: One of the problems with using vertical sync in a video game is that it eats into the available CPU time for performing game updates. The way vsync is implemented in [...]]]></description>
			<content:encoded><![CDATA[<p>After finishing up the materials for <a href="http://software.intel.com/en-us/contests/levelup2009/entry_detail.php?entryid=132339">my Level Up 2009 entry</a>, today I spent a little while trying out an idea I had recently:</p>
<p>One of the problems with using vertical sync in a video game is that it eats into the available CPU time for performing game updates. The way vsync is implemented in most graphics APIs, it causes your Present/EndDraw/SwapBuffers call to block until the card enters vertical blank and the frame is shown to the user. While this is ideal from a correctness perspective, it&#8217;s a tremendous waste since it means you can end up sitting there for up to 16 milliseconds, waiting for vertical blank. If your game spends lots of time doing both updating and drawing, all that time could be spent performing updates instead. Ouch.</p>
<p>Currently, my game spends about as much time drawing as it does updating. A significant portion of the time spent drawing (20-30%) is within the EndDraw function. Turning off vertical sync drops the amount of time spent in EndDraw considerably, but introduces tearing. So, as a potential solution, why not call EndDraw on a background thread? While the thread waits for vertical blank, I can begin performing the next frame&#8217;s Update, and in the event that I finish updating before the previous frame is visible, I simply wait for that previous EndDraw call before beginning to paint the next frame. In the optimal case, this means I can come much closer to the best possible framerate without introducing tearing, and in the worst case, the cost of rendering an individual frame is only *slightly* increased by the use of thread synchronization. The fact that I&#8217;m only doing EndDraw on another thread means that I don&#8217;t have to worry about protecting my game data with locks and other synchronization techniques, since the GraphicsDevice doesn&#8217;t use any of my game data when performing the EndDraw operation.</p>
<p>So, to test this out, I overrode my Game class&#8217;s BeginDraw and EndDraw methods. This turns out to be all we have to do to change the way drawing is performed, because the XNA Framework developers were kind enough to make both of these methods virtual.</p>
<pre>        protected override bool BeginDraw () {
            _DrawCompleteEvent.WaitOne();
            _DrawCompleteEvent.Reset();
            return base.BeginDraw();
        }

        protected override void EndDraw () {
            _DrawRequiredEvent.Set();
        }</pre>
<p>Of course, at this point, the two events used here are never set, so this code won&#8217;t work. Thus, we add a background thread to perform our painting:</p>
<pre>        AutoResetEvent _DrawRequiredEvent = new AutoResetEvent(false);
        ManualResetEvent _DrawCompleteEvent = new ManualResetEvent(true);

        public Game () {
            ...

            _DrawThread = new Thread(DrawThreadFunc);
            _DrawThread.IsBackground = true;
            _DrawThread.Start();
        }

        protected void DrawThreadFunc () {
            while (true) {
                _DrawRequiredEvent.WaitOne();
                base.EndDraw();
                _DrawCompleteEvent.Set();
            }
        }</pre>
<p>Fairly simple thread programming here: We create a thread, and set IsBackground to true so that it will stop as soon as the main thread exits. The thread spends all of its time waiting for a &#8216;required draw&#8217; signal, and then performs an EndDraw call. Once the call is complete, it sets another signal to inform the Game class that the previous draw has finished and it&#8217;s safe to perform a BeginDraw call (this lets us make sure that we never use the GraphicsDevice on the main thread while the background thread is performing an EndDraw).</p>
<p>Once this is all done, I start up my game, and&#8230; the framerate isn&#8217;t any different. Huh? What&#8217;s more, my frame profiler indicates that Update is now taking <strong>ten times</strong> as long as it used to, while Draw isn&#8217;t any faster. Huh???</p>
<p>In situations like this, it&#8217;s always good to consult a profiler to see if you&#8217;re missing something important:</p>
<p><a href="http://www.luminance.org/wp-content/uploads/2009/07/profile_01.png"><img class="aligncenter size-full wp-image-527" title="profile_01" src="http://www.luminance.org/wp-content/uploads/2009/07/profile_01.png" alt="profile_01" width="550" height="255" /></a></p>
<p>So, we can see that the DrawThread is definitely doing its job &#8211; it calls EndDraw, then waits for a signal asking it to draw again. Both are taking about as much time as we&#8217;d expect. But why is Update taking so long&#8230;?</p>
<p style="text-align: center;"><a href="http://www.luminance.org/wp-content/uploads/2009/07/profile_02.png"><img class="aligncenter size-full wp-image-528" title="profile_02" src="http://www.luminance.org/wp-content/uploads/2009/07/profile_02.png" alt="profile_02" width="499" height="132" /></a></p>
<p style="text-align: left;">&#8230; oh. Oops.</p>
<p>So it turns out that I was using GraphicsDevice.Viewport.Width and GraphicsDevice.Viewport.Height in my camera code. Accessing the Viewport property caused the XNA framework to call into Direct3D to retrieve the viewport, which acquired the exact same lock being used by EndDraw, causing my main thread to stall until the draw completed. <strong>WHOOPS</strong>.</p>
<p>This is especially embarassing since the viewport size never changes anyway, so I could have just stored the width/height into constants. After doing just that and starting the game again, the profile looks more like you&#8217;d expect:</p>
<p><a href="http://www.luminance.org/wp-content/uploads/2009/07/profile_03.png"><img class="aligncenter size-full wp-image-529" title="profile_03" src="http://www.luminance.org/wp-content/uploads/2009/07/profile_03.png" alt="profile_03" width="562" height="254" /></a></p>
<p>What&#8217;s more, this is actually an improvement: With vertical sync <strong>enabled</strong>, this results in a significant reduction in the amount of time spent inside the BeginDraw/Draw/EndDraw functions on the main thread, which means there&#8217;s more time left to perform Updates. This means that I can maintain a solid, smooth 60fps easier on dual-core/hyperthreaded machines.</p>
<p>Even with vertical sync <strong>disabled</strong>, this is still an improvement, though not as significant &#8211; apparently other things are happening inside EndDraw (not a big surprise), so by shifting that work off onto a second thread, I&#8217;m still gaining some time to spend performing the next update. When I disable the built in framerate balancer, this brings my framerate from ~350fps up to ~380fps. Not bad for a couple dozen lines of code!</p>
<p>Of course, it&#8217;s worth pointing out that the XNA Framework documentation doesn&#8217;t make any promises here, so it&#8217;s possible that this technique is unsafe. When it comes to concurrency, it&#8217;s very easy to do the wrong thing and get away with it &#8211; as you might have noticed here, I was doing something utterly stupid and unsafe in my Update function, and I got away with it because the DirectX developers had the foresight to put a lock in the right place. If they hadn&#8217;t, my game might have corrupted state from accessing the GraphicsDevice on two threads, and crashed intermittently.</p>
<p>Regardless, this is a handy technique &#8211; once I&#8217;ve had the chance to do lots of testing on various PC configurations (and the XBox 360), I&#8217;ll probably be using it in my game when I ship.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.luminance.org/blog/gruedorf/2009/07/01/threaded-enddraw-in-xna/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

