Posts Tagged performance

Updating Onscreen Objects / Profiling

One of the problems I started to run into while polishing things up for my contest entry builds was that as my levels grew larger, the game’s CPU utilization on the 360 steadily grew with them. While PC builds of my game ran smooth on basically all the machines I had access to, on the 360 the cost of updating all the level’s objects and entities became quite significant – most likely due to the 360’s feeble floating-point performance and lack of out-of-order execution.

The solution to this was, at least to me, relatively obvious: My levels were much larger than the camera, so it didn’t really make sense to update the entire level every frame.

The first thing I tried to verify this theory was simply hacking it in: Do a check before updating each object to see if it was onscreen. Interestingly, this didn’t make the game any faster on the 360. Depending on your point of view, this either confirmed or denied my hypothesis: If the problem was simply the cost of all the floating point operations, the cost of doing the onscreen check for each object (since the camera and object bounds were both expressed in floating-point) could have been making the problem worse. Clearly, I didn’t have enough data to be sure about the right choice to make.

So, I spent a day or so rigging up the necessary infrastructure to be able to profile my game on the 360. Since you can’t use tools like CLR Profiler or NProf on the 360, I ended up building a very simple frame timing system, and adding an overlay to the game that would show timing data. This let me get a good idea of how much time each subsystem in the game was using, and then I could compare the costs of individual subsystems, and try making changes and seeing how the profile data changed.

Once I had the profiler up and running on the 360, a clear pattern emerged.

01

Updates were consuming a huge amount of CPU time on the 360. While on my desktop, updates basically accounted for no more than 1% of CPU time, on the 360 they actually accounted for more CPU time than rendering – this was actually a bit of a surprise to me since rendering was definitely the bottleneck at one point on the 360. It seems that at some point along the way, I solved my rendering performance issues on the 360, but didn’t notice because I had made updates so much more expensive – one mistake I plan not to repeat was that I went a week or two without testing the game on the 360, since my 360 was not hooked up at the time. During that span of time I made a lot of changes that drastically altered the game’s performance characteristics, so it was hard to tell what had caused things to degrade.

Read the rest of this entry »

Tags: , , , , ,

Threaded Renderer

Lots of changes since last week, but the majority of my efforts were focused on overhauling my approach to rendering.

My basic approach is primarily inspired by Christer Ericson’s post about the approach he uses for ordering draw calls. Some other useful sources of information were Tom Forsyth’s post about the cost of renderstate changes and Guerilla’s presentation on Killzone 2’s renderer from DEVELOP 2007. The Killzone 2 presentation is especially worthwhile since it describes the way they take advantage of concurrency techniques in their renderer.

Until now, the rendering code in my game has been almost entirely immediate-mode; various objects in the game world like tiles and entities had Draw methods that would utilize the game’s GraphicsDevice and SpriteBatch to render themselves, changing render states as needed and generating dynamic geometry where necessary (like for water).

Read the rest of this entry »

Tags: , , , , ,

Cutting and tuning

As far as gameplay goes, the only major addition since last week’s post was a relatively complete implementation of player death, along with the ‘reunion’ teleport that goes with it. Fairly simple at present, with some bugs to work out (including one related to the teleport location that you’ll see in the video below). Definitely helps get a better feeling for whether a given puzzle is too hard or too easy.

Other than that, the only thing of note code-wise is the time I spent doing some performance tuning. My framerate had crept down over the past month or so, to the point that I wasn’t able to maintain a stable 60fps on the 360 anymore and my CPU utilization was approaching 30% on my desktop PC. Some of the optimizations were relatively obvious – for example, I was calling SpriteBatch.Begin/SpriteBatch.End for every on-screen object, for the sake of simplicity, which resulted in a lot of unnecessary draw calls.

Some simple changes to automatically begin/end batches when changing rendering settings reduced the number of draw calls per frame in most cases to around 30, which is perfectly acceptable, and reduced my CPU utilization by around 50%. After that, the rest of the optimization was pretty trivial – finding other hotspots in my profiler data and reducing the cost.

Well, it was trivial until I ran the game on the 360 again and noticed that the framerate hadn’t improved very much. Huh? I doubled my framerate on the PC, but on the 360, it barely moved an inch. What’s the deal?

Turns out, the 360’s pitiful floating-point performance was kneecapping me. Believe it or not, the primary culprit was the geometric shapes in the game’s HUD, for the circular health displays you may have seen in previous screenshots/videos. I knew the 360’s FPU was weak, but not THAT bad. Unfortunately, the only way to detect this is by manually measuring the performance cost of your code on the 360, by commenting out/toggling individual sections of your game code.

Tedious, at best. For now, I ended up just reducing the complexity of the geometry for the HUD elements and reducing the number of shapes I was drawing, which brought the 360 framerate much closer to what it used to be. As it happens, some design changes later in the week helped here too…


The majority of my work ended up focusing on the game’s design: My schedule for this project is extremely aggressive (insanely so, really) and as such, I need to have a relatively complete playable demo within mere months for submission to a couple major game competitions. Being able to hit that deadline in my free time requires me to aggressively control the scope of the project, avoid feature creep, and do as little work as possible to get game mechanics implemented and content built. 
 
Towards this end, after getting one of my main mechanics prototyped and testing it out in content I’d built, I made the hard decision to cut the mechanic. The second controllable character you’ve seen in some of my previous videos is effectively gone, though I’m going to attempt to make use of the design and code effort for the revised design. 
 
Making a choice like this is always painful, especially when you don’t have an unchangeable deadline or overbearing boss pushing you towards it. But ultimately, I think I’ll benefit from making these cuts sooner rather than later. I wish I had started thinking hard about it a few weeks earlier, when my first prototypes were working, instead of waiting until the issues were obvious, but I’m still relatively happy with the turnaround. I was able to prototype a relatively unusual game mechanic in a matter of a couple dozen hours of programming time, and decide that it wasn’t worth pursuing, and cut it. Definitely an improvement over traditional Waterfall with long cycles, but not quite true Agile yet. :)  
 
For me, this underscores the importance of aggressive, early prototyping of almost every possible game mechanic and design, instead of focusing on a single section of game content or gameplay until it’s done. Previously I was used to having a rigid focus on a section of a game or application, working on it day in and day out until it was done and ready to hand off – but in many cases, this meant that I could sink days or weeks of my time into something that ultimately had to be thrown out. 
 
Just like a lot of Agile proponents will tell you, it turns out that failing fast means wasting less time. The chaotic feeling and loss of productivity to context switches can be painful, and I think being successful requires setting things up to avoid having to pay those costs too many times a week, but ultimately, it’s a great decision. 
 

Tags: , , , , , ,

More entities and optimization

Most of my work on the platformer this week has been focused on improving the Entity class and all the related code. I made a number of changes and improvements to SpatialCollection aimed at improving its performance, and completely overhauled all my collision detection code to reduce the number of passes I make over the level when doing things like computing an entity’s StandingY or testing for a collision. The efforts have mostly paid off so far; my framerate with 6 entities walking around on curved surfaces right now is about the same as it was before with a single entity walking around on sloped surfaces, even though I’ve improved my collision model significantly. The optimizations ended up bringing my game’s performance profile back to where it was before the addition of entities, with the majority of CPU time being spent doing raw number-crunching to perform intersection checks on polygons.

My previous approach to collision detection was fairly brute-force: I had a few special methods defined in the Game class named things like FindObstruction and ResolveMotion, that took a bunch of parameters to control a collision detection sweep of the entire level. The addition of SpatialCollection<T> as mentioned in last week’s post made those functions faster, but that didn’t completely eliminate the performance issues – the most visible offender was ComputeStandingY. The original implementation of ComputeStandingY had to make four complete passes over the level to come up with a result, and I ended up invoking it twice per frame when updating an entity. The end result was that no matter what, it accounted for a significant amount of my CPU usage, no matter how efficient I made the underlying collision detection routine.

In order to address this problem, I set out to refactor my collision detection code. The first step was to unify the different collision test routines – they all shared many common elements, the most obvious one being the task of walking over the contents of the level. Using SpatialCollection meant that each routine had about 8 lines of boilerplate in order to do iteration, in addition to all of the code for actually performing collision tests, so my first step was to factor that out into a baseline ObstructionTest function that performed the simple task of walking over the contents of the level and invoking a ‘Collision Visitor’ delegate on each item.

    public ICollidable ObstructionTest (ICollisionVisitor visitor, ObstructionFlags flags) {
        ICollidable result = null;
        var bounds = visitor.GetBounds();

        if ((flags & ObstructionFlags.Geometry) == ObstructionFlags.Geometry) {
            using (var e = Level.Geometry.GetItemsFromBounds(visitor.GetBounds()))
            while (e.MoveNext()) {
                var current = e.Current;

                if (!Bounds.Intersect(ref bounds, ref current.Bounds))
                    continue;

                var rg = current.Item.GetRuntimeGeometry(this);

                if (visitor.Visit(rg, ref result))
                    return result;
            }
        }

        if ((flags & ObstructionFlags.Entities) == ObstructionFlags.Entities) {
            if (bounds.Intersects(Player.Bounds))
                if (visitor.Visit(Player, ref result))
                    return result;

            using (var e = Entities.GetItemsFromBounds(bounds))
            while (e.MoveNext()) {
                var current = e.Current;

                if (!Bounds.Intersect(ref bounds, ref current.Bounds))
                    continue;

                if (visitor.Visit(current.Item, ref result))
                    return result;
            }
        }

        return result;
    }

After that, I refactored all the existing routines to be built on top of ObstructionTest. After that I iteratively refined things down until I didn’t need any of the specialized routines anymore, and ObstructionTest ended up operating on specialized ICollisionVisitor objects that had two ‘Visit’ methods, one for ‘Collidable Objects’, and one for individual collision polygons. The pair of visit methods allowed a visitor to reject entire objects before any complex collision tests were performed, in addition to rejecting individual polygons due to non-collision. The use of visitor objects also meant that a visitor could record a list of the objects it encountered, or reject collision with an entire list of entities instead of just the entity that was performing the check. These improvements not only resulted in better performance, but they allowed me to address a bug in the Entity class that prevented entities from walking up sloped surfaces while the player was standing on them.

    public class StandingSurfaceVisitor : ICollisionVisitor {
        public Entity Entity;
        public Bounds Bounds;
        public List<IStandable> Results;

        public StandingSurfaceVisitor (Entity entity) {
            Entity = entity;
            Results = new List<IStandable>();
        }

        public Bounds GetBounds () {
            return Bounds;
        }

        public CollisionState VisitCollidable (ICollidable obj) {
            if (obj == Entity)
                return new CollisionState(false, false);

            var standable = (obj as IStandable);
            if (standable != null)
                Results.Add(standable);
            return new CollisionState(false, true);
        }

        public void Reset () {
            Results.Clear();
        }

        public CollisionState VisitPolygon (Polygon poly) {
            throw new NotImplementedException();
        }
    }

One of the other things I invested some time into was an overhaul of the logic for standing on surfaces. Previously, the Player class had a StandingOn property that held one of the entities/surfaces he was currently standing on. I used this to determine whether the player was standing on a pressure plate, and cause him to move with things he was standing on. The problem was that in almost all cases the player would be standing on multiple surfaces, so the nondeterministic nature of the collision detection code meant that sometimes you could be atop a pressure plate but not trigger it. I also had no easy way of determining whether entities were standing on top of each other, which meant that an entity being ridden by another entity could not walk up a sloped surface (his path would be blocked by his rider).

To solve those two problems, I defined a pair of simple interfaces called IRider and IRideable. These interfaces provided a simple way for me to manage all the logic around riding surfaces – entities could automatically build and update a list of rideable objects that they were currently occupying, and rideable objects were able to automatically maintain a list of rider objects on top of them, making it simple to exclude those riders from collision checks. This not only fixed pressure plates, but also made it easy for monsters to set off pressure plates, and meant that monsters could walk up sloped surfaces while being ridden by other entities, and that you could stack entities on top of each other to create big riding chains without any major issues (though in the video below, it was still unfinished and somewhat busted):

One of the other fixes I was able to make as a result of the new system for handling rideable surfaces was to the code for mantling up onto surfaces. Previously, if you tried to mantle up onto a moving surface, like a moving platform or an entity, you’d mantle up onto the space it previously occupied and typically fall right back down, making it pretty useless. With the new riding system I was easily able to adapt this so that you could mantle up onto a moving surface without being left behind by its motion or causing it to become obstructed. Since the mantling code doesn’t currently distinguish between different types of surfaces, this means you can basically crawl up on top of a crowd of enemies and run across it, which is pretty fun.

The video below shows how entity mantling looks, though it’s from before I implemented the riding system (so you can see it get broken in a couple places):

Tags: , , , ,

luminance is Digg proof thanks to caching by WP Super Cache