One of the problems I started to run into while polishing things up for my contest entry builds was that as my levels grew larger, the game’s CPU utilization on the 360 steadily grew with them. While PC builds of my game ran smooth on basically all the machines I had access to, on the 360 the cost of updating all the level’s objects and entities became quite significant – most likely due to the 360′s feeble floating-point performance and lack of out-of-order execution.

The solution to this was, at least to me, relatively obvious: My levels were much larger than the camera, so it didn’t really make sense to update the entire level every frame.

The first thing I tried to verify this theory was simply hacking it in: Do a check before updating each object to see if it was onscreen. Interestingly, this didn’t make the game any faster on the 360. Depending on your point of view, this either confirmed or denied my hypothesis: If the problem was simply the cost of all the floating point operations, the cost of doing the onscreen check for each object (since the camera and object bounds were both expressed in floating-point) could have been making the problem worse. Clearly, I didn’t have enough data to be sure about the right choice to make.

So, I spent a day or so rigging up the necessary infrastructure to be able to profile my game on the 360. Since you can’t use tools like CLR Profiler or NProf on the 360, I ended up building a very simple frame timing system, and adding an overlay to the game that would show timing data. This let me get a good idea of how much time each subsystem in the game was using, and then I could compare the costs of individual subsystems, and try making changes and seeing how the profile data changed.

Once I had the profiler up and running on the 360, a clear pattern emerged.

01

Updates were consuming a huge amount of CPU time on the 360. While on my desktop, updates basically accounted for no more than 1% of CPU time, on the 360 they actually accounted for more CPU time than rendering – this was actually a bit of a surprise to me since rendering was definitely the bottleneck at one point on the 360. It seems that at some point along the way, I solved my rendering performance issues on the 360, but didn’t notice because I had made updates so much more expensive – one mistake I plan not to repeat was that I went a week or two without testing the game on the 360, since my 360 was not hooked up at the time. During that span of time I made a lot of changes that drastically altered the game’s performance characteristics, so it was hard to tell what had caused things to degrade.

Now that I knew updates were expensive, I decided to try and narrow down which object types were the problem. I added profiling markers around specific types of objects, to try and figure out which ones cost the most CPU time to update. After doing that and deploying a few builds to the 360, I found one of my culprits: Water.

05

While I had suspected that water might be a problem, I was still somewhat surprised by the results; A similar piece of in-game geometry, zones, used similar update code but turned out to have virtually zero update cost at runtime, while water cost a tremendous amount of CPU for very little gameplay impact. The problem turned out to be a subtle difference in how they were implemented.

Zones were inefficient in that every frame, each zone did an obstruction test to see if any entities were inside – in most cases a zone would be empty, so this was wasted effort. This would have been better implemented by having every entity check for nearby zones, since there are typically far less entities than there are zones. However, in practice this didn’t turn out to be the problem. The problem was that water built on this logic, and then used it to perform additional work: It did an obstruction test to determine how far the water should fall, and then did another obstruction test to locate any entities inside the water and apply ‘flow force’ to them so that the flowing water would push them in a given direction. These two obstruction tests each ended up accounting for a significant portion of the time spent updating the level.

To solve this, I took two steps. First, I reworked zones and water so that they both operated the way I described – each entity does a check to locate all the zones it’s within, in a single obstruction test. This reduced the number of obstruction tests I was running every frame by a large amount, and helped reduce CPU usage for water. However, that still wasn’t enough.

The biggest improvement came from reworking things so that only onscreen objects get updated every frame. Instead of performing a test against every object to see if it’s onscreen, I decided to build on the partitioning scheme I use for rendering to get a list of onscreen objects, and use that as my list of objects to update. Once I had that working, my performance improved drastically and the game easily ran at 60fps in every part of my levels with CPU to spare.

Of course, doing this introduced bugs. One of the biggest issues is that often you will have important objects just outside the edge of the screen that need to keep updating, like moving platforms or enemies. To solve this, I added two mechanisms:

First, I added a margin around the screen within which objects still continued to update. This meant that an object just barely offscreen would keep updating, and solved the problem of a moving platform or enemy getting left behind as soon as he dropped offscreen.

Second, I built a simple ‘update manager’ that maintains a list of all the objects that are currently onscreen. When an object leaves the screen, instead of removing it immediately, the update manager instead sets a timeout, which causes the object to become ‘asleep’ within a certain number of frames. As a result, once an object leaves the screen it has a second or two to return to the screen before it falls asleep, which helps with things like moving platforms that are going to move on and off the screen regularly – since they don’t spend very long offscreen, they never have a chance to fall asleep.

The update manager also gives me the ability to exclude some objects from updates entirely, by flagging them as ‘unable to wake’, so the update manager knows to never remove them from sleep status. Likewise, it gives me the option to flag objects as ‘unable to sleep’ so that they always stay active – for example, I do this to the player character and his companion to avoid any unintentional bugs that might result from the player remaining offscreen too long (say, during a cinematic).

The update manager has some other benefits, too. Here’s what it looks like:

    public class UpdateManager<T>
        where T : class, IUpdateable, IHasBounds {

        public struct Entry {
            public readonly T Object;
            public int WakeCounter;

            public Entry (T obj, int wakeCounter) {
                Object = obj;
                WakeCounter = wakeCounter;
            }
        }

        protected Dictionary<T, bool> _VisibleObjects = new Dictionary<T, bool>(new ReferenceComparer<T>());
        protected UnorderedList<Entry> _Entries = new UnorderedList<Entry>();

        public int SleepTimeout = 30;
        public readonly SpatialCollection<T> Collection;

        public UpdateManager (SpatialCollection<T> collection) {
            Collection = collection;
        }

        public void Update (Bounds liveRegion) {
            using (var e = Collection.GetItemsFromBounds(liveRegion, true))
            while (e.MoveNext()) {
                _VisibleObjects[e.Current.Item] = false;
            }

            Entry currentItem;

            using (var e = _Entries.GetEnumerator())
            while (e.GetNext(out currentItem)) {
                if (!_VisibleObjects.Remove(currentItem.Object)) {
                    // Object not visible

                    if (!currentItem.Object.AllowSleep) {
                        // Object cannot fall asleep
                    } else if (--currentItem.WakeCounter == 0) {
                        // Object fell asleep
                        e.RemoveCurrent();
                        continue;
                    } else {
                        // Object still awake
                        e.SetCurrent(ref currentItem);
                    }
                } else {
                    // Object visible

                    currentItem.WakeCounter = SleepTimeout;
                    e.SetCurrent(ref currentItem);
                }

                currentItem.Object.Update();
            }

            foreach (var newObject in _VisibleObjects.Keys) {
                if (newObject.AllowWake) {
                    _Entries.Add(new Entry(newObject, SleepTimeout));

                    newObject.Update();
                }
            }

            _VisibleObjects.Clear();
        }
    }

One of the nice things it does in addition to improving performance is that it simplifies my game code – previously, I had hand-written logic to step through the various object lists (geometry, entities, etc) and update them every frame, and that code differed in subtle ways. The update manager unifies all that, so it kills a lot of duplicated code. Also, by maintaining a unique ‘awake objects’ list, it allows me to remove objects from the geometry/entity lists while performing an update, instead of having to wait until the end of the frame, which simplifies some of my entity code as well.

And despite the amount of problems it solves, it ends up actually being quite simple and easy to use. All I have to do to rig up an update manager is point it at a collection of objects and hand it the current camera boundaries every frame.