Posts Tagged Code

A Journey Into Linker Hell, And A Mistake

While doing some research into why Firefox sometimes takes a long amount of time to shut down, I discovered that builds of Firefox on my work machine could not even shut down successfully. Every time I tried to shut down, shortly into the shutdown process, I’d get a crash, with a message about heap corruption. At first, I assumed I was dealing with breakage that someone had checked into trunk. When dealing with a large enough software project, the odds are good that pulling down the latest code from trunk will bring you a few bugs to discover. Very often, you can just roll back a few revisions, or find the right person and determine that they’ve already got a fix.

So, I tried those things. I rolled back a few revisions, rebuilt, and it still crashed. I tried rolling back four weeks to a known-good revision, and it still crashed. I went through all the commits over that time period to try and identify any obvious suspects, but didn’t find anything. I tried running the browser with a new, empty profile, on the assumption that perhaps one of my add-ons was causing the problem. Still crashed.

At this point, I could have thrown out my clone of the mozilla source code, along with my build folders, in the hopes that starting over from scratch would fix the problem. I’ve done this before, and sometimes it does make problems go away. But in practice, if everyone does this every time they hit a problem, insidious bugs can creep their way into your build system (or worse, your code base), and when you have a few hundred developers hacking on one project, the time wasted by using that solution adds up. So I decided to try and figure out why FF was crashing on shutdown.

The first thing I needed to figure out was whether the heap was actually corrupted. I started up Firefox inside the Visual Studio debugger and configured it to break on error, so that I could inspect the state of the application. At the point of the crash, here’s what the state looked like, along with some other useful information:

The most immediately obvious piece of information is that the ‘crash’ in this case is actually a breakpoint, triggered by ntdll.dll!_RtlValidateHeap. Furthermore, if we walk up the stack a bit, we can see that this is from msvcr100d.dll!_free_dbg. This tells us that we’re using the debug heap provided by the Visual C++ Debug Runtime, and the validations it performs when freeing a block of memory have failed. This can happen for various reasons, but if it happens, it almost always means something is wrong.

Next, we need to figure out why this is failing. There are a few obvious candidates, so let’s try to rule them out.

Memory corruption would be a likely cause. Corrupting memory that is infrequently used will probably not cause an application to fail, but when you try to free the memory, if the bookkeeping data used by the heap was also corrupted, the free call will fail. To rule that out, I walked up the stack to find free()’s caller and figure out what kind of data I was dealing with:

  free(lib->name);
  lib->name = NULL;

Based on this, we can guess that the data is a name (and if you read more code, it’s clear that it is in fact a single-byte-per-character null terminated string). Finding the pointer being freed within the crashed process and looking at it in the debugger’s Memory window shows a well-formed string (see the Memory 1 and Locals panes in the screenshot above).

So, casual inspection suggests that this is not memory corruption, but it’s still possible. What if the library in question is corrupting memory near the string, and that’s why free is failing when we unload the library?

To rule that out, I climbed further up the stack and discovered that there’s an environment variable you can set that prevents the library from being unloaded. Interestingly enough, setting this variable causes the crash to go away… but one of Firefox’s child processes, plugin-container, then crashes in the same way, unloading a different library. This suggests that it’s at least not specific to one single library.

At this point, I decided I wanted to know where lib->name was coming from. I tried a couple quick searches with DXR (for lib->name, then ->name), but the first didn’t provide any results, and the second provided a bunch of results.

So, instead, I started Firefox up under Heap Profiler, and then attached a debugger and shut it down normally. I followed the above steps to get the address of the pointer being freed, and then opened up the Heap view of the latest snapshot I had taken in Heap Profiler. It was at this point I realized that I had completely failed to anticipate this use case when I was working on Heap Profiler, so I had to manually scroll through the heap, looking for the address I wanted. Whoops. Luckily, the address was near the beginning of the heap, so I found it easily:

Now, given the stack, I knew exactly who was responsible for allocating the string. Closer inspection even revealed why I was getting a crash: The string was being allocated by msvcr100!malloc, but freed by msvcr100d!free. Wait, how is that even possible?

I set breakpoints on the respective calls and started Firefox under the debugger, to ensure that I wasn’t being tricked by DLL load/unload operations that were occurring between the malloc and free calls. After seeing that the data from Heap Profiler matched reality, I decided to do some more digging. I opened up the Modules window to confirm that yes, both the debug and release C++ runtimes were loaded, and used the watch window to find the addresses of free and strdup, respectively (as strdup was the function responsible for calling the release malloc, in this case).

As you can see in the highlighted sections, __imp__free, the import symbol for free, points into MSVCR100D, while __imp___strdup, the import symbol for _strdup, points into MSVCR100. This is pretty confusing, and definitely confirms that the crash we’re seeing is caused by mismatched heaps.

At this point, there are two major questions:

  1. How are we loading both versions of the runtime in a debug build?
  2. And, how is this causing strdup and free to get imported from different runtimes?

This is especially troubling, because my local build of Firefox seems to run perfectly, other than this shutdown bug. You would expect mismatched heaps to cause all sorts of trouble. I did some digging around, and arrived at part of the answer to both questions:

This screenshot shows that nspr4.dll is linked against both runtimes, but only imports _putenv and _strdup from the release runtime. Everything else comes from the debug runtime! This explains why the heap mismatch is limited to the one crash I’m seeing, instead of causing widespread problems with Firefox – the effects of this linking problem are (theoretically) limited to code within NSPR, and further limited to code that uses _strdup or _putenv. Very mysterious.

After confirming that NSPR4.dll didn’t link against both libraries when compiled on my home machine, I set out to try and get verbose output from the linker, to identify just why I was getting linked against both runtime libraries. And this is where I made a mistake.

Earlier in the debugging process, I had convinced myself that the problem was reproducible, because checking out various revisions of the Firefox source code and doing full builds had caused the problem to persist. In order to get the linker to actually re-link nspr4.dll from scratch, I decided that I had to delete the contents of my build folder – I couldn’t get it to perform the link operation by simply deleting the output files (probably because I was deleting the wrong ones). So, I deleted my build folder, and performed a full build from scratch. Pouring over the log files for the build, I found the link command in question:

c:\mozilla\nsprpub\config\rules.mk:330:0$ rm -f nspr4.dll
c:\mozilla\nsprpub\config\rules.mk:345:0$ link -nologo -DLL -SUBSYSTEM:WINDOWS -DYNAMICBASE -OUT:"nspr4.dll" -DEBUG -MAP   advapi32.lib wsock32.lib winmm.lib ./prvrsion.obj io/./prfdcach.obj io/./prmwait.obj io/./prmapopt.obj io/./priometh.obj io/./pripv6.obj io/./prlayer.obj io/./prlog.obj io/./prmmap.obj io/./prpolevt.obj io/./prprf.obj io/./prscanf.obj io/./prstdio.obj threads/./prcmon.obj threads/./prrwlock.obj threads/./prtpd.obj linking/./prlink.obj malloc/./prmalloc.obj malloc/./prmem.obj md/./prosdep.obj memory/./prshm.obj memory/./prshma.obj memory/./prseg.obj misc/./pralarm.obj misc/./pratom.obj misc/./prcountr.obj misc/./prdtoa.obj misc/./prenv.obj misc/./prerr.obj misc/./prerror.obj misc/./prerrortable.obj misc/./prinit.obj misc/./prinrval.obj misc/./pripc.obj misc/./prlog2.obj misc/./prlong.obj misc/./prnetdb.obj misc/./praton.obj misc/./prolock.obj misc/./prrng.obj misc/./prsystem.obj misc/./prthinfo.obj misc/./prtpool.obj misc/./prtrace.obj misc/./prtime.obj io/./prdir.obj io/./prfile.obj io/./prio.obj io/./prsocket.obj misc/./pripcsem.obj threads/./prcthr.obj threads/./prdump.obj threads/./prmon.obj threads/./prsem.obj threads/combined/./prucpu.obj threads/combined/./prucv.obj threads/combined/./prulock.obj threads/combined/./prustack.obj threads/combined/./pruthr.obj md/windows/./ntmisc.obj md/windows/./ntsec.obj md/windows/./ntsem.obj md/windows/./ntinrval.obj md/windows/./ntgc.obj md/windows/./w95thred.obj md/windows/./w95io.obj md/windows/./w95cv.obj md/windows/./w95sock.obj md/windows/./win32_errors.obj md/windows/./w32ipcsem.obj md/windows/./w32poll.obj md/windows/./w32rng.obj md/windows/./w32shm.obj md/windows/./w95dllmain.obj  ./nspr.res

I dug around further in the log files to locate the nspr4.dll in question, deleted it, and then ran the link command with the additional -VERBOSE flag to get verbose linker output. Pouring through the verbose output from the linker, I located the two functions in question:

      Found __imp__free
        Referenced in prlog.obj
        Referenced in prlink.obj
        Referenced in prmem.obj
        Referenced in prdtoa.obj
        Loaded MSVCRTD.lib(MSVCR100D.dll)
      Found __imp__strdup
        Referenced in prlog.obj
        Referenced in prlink.obj
        Loaded OLDNAMES.lib(strdup.obi)
      Found __imp___strdup
        Referenced in OLDNAMES.lib(strdup.obi)
        Loaded MSVCRTD.lib(MSVCR100D.dll)

It was at this point that I realized my mistake: NSPR4.dll was now linking correctly. Somehow, my previous attempts at building older revisions of the code had not fixed the build, but entirely wiping out my build folder had fixed it. Now I felt like a moron.

On the other hand, the mystery grew slightly more mysterious: Somehow, strdup was different from free – the verbose linker output, even from this valid build, was different for the two! strdup, for some reason, was defined in OLDNAMES.lib, while free was defined in MSVCRTD.lib. Why the difference?

A quick search of MSDN reveals the truth: The POSIX strdup function is deprecated beginning in Visual C++ 2005, and you should use the ISO C++ conformant _strdup function instead. strdup is an alias to _strdup! This means that the way the linker treats it is different from the way the linker treats free.

Armed with this knowledge, I stumbled across a discussion forum post where someone encountered the same problem, by accidentally linking against both runtimes, and that post linked to a bug report on Microsoft’s Connect website. The bug report not only confirms that the crash in question is caused by having both heaps loaded, but describes the behavior as by design, and points out that strdup and free are being loaded from different runtimes.

And now, at the end of this long, frustrating journey, I have acquired many valuable things:

  • A further distrust for huge build systems, and linkers in particular
  • The knowledge that loading both debug and release C++ runtimes is definitely not a good idea
  • The knowledge that aliases and exported functions in libraries are treated differently by the linker
  • A Heap Profiler to-do item: Add support for searching the heap by address
  • And most importantly: Don’t delete your build folder while debugging, even if you think you can reproduce the problem!

(As an aside, while debugging this problem I also discovered that debug builds of Firefox crash on my personal machine but not my work machine – but the crash only occurs if I run firefox.exe from within a bash shell. I kind of hate computers.)

I hope this short tale of woe has given you some new ideas about how to debug the next mysterious crash you encounter. If it hasn’t, I hope it at least entertained you!

Tags: , ,

Compiler-assisted Data Binding with LINQ

When trying to write reliable, maintainable software, it’s important to try and minimize duplication. When you find yourself writing the same magic number in multiple places, you move it into a named constant to eliminate the duplication. When you find yourself writing the same expression multiple times, you move it into a named function, or perhaps a property. Sometimes, you find yourself dealing with higher level duplication – entire concepts and patterns that you find yourself writing again and again. Many modern languages provide tools to help you eliminate this kind of duplication: class inheritance allows you to share common logic between classes by inheriting that logic from a parent class, and generic types allow you to apply a generic pattern to multiple individual types. Both of these tools work with the aid of the compiler, so they interact well with debugging tools and allow you to catch most errors at compile time. Sometimes, though, you run into duplication that can’t easily be eliminated, even with the aid of the compiler.

In many graphical applications, a common pattern is the need to remember user preferences or inputs. Modern languages provide useful tools that can make this task simpler – for example, both Java and C# provide ways to automatically serialize an entire data structure into XML, which eliminates the need to manually read/write your preference values. Unfortunately, even with the aid of these features, you still find yourself with a lot of duplication – even if the runtime can automatically serialize your data structure, you still have to fill the structure out with your preferences, or read them back in manually. I’m going to demonstrate how you can use a little-known aspect of the new LINQ feature in C# 3.5 to eliminate duplication with the aid of the compiler.

As an example, we’ll be using the simple application shown above. It’s a Windows Forms dialog box, with a few different controls, set up to remember user inputs between sessions. You can download the source code for this application below:

Preferences Example (v1)

If you run the application, input some values, and click OK, you’ll find that it stores your inputs in a JSON file in the %AppData%\PreferencesExample folder. If you run it again, you’ll see that it pulls your inputs back out of the JSON file to populate the UI.

Read the rest of this entry »

Tags: , , ,

A managed WebKit library: BerkeliumSharp

I spent the past day or so hacking together a managed wrapper for Sirikata’s Berkelium library. Berkelium allows you to easily embed a WebKit-based browser into games and other applications. It’s based on Chrome and it’s really easy to get up and running in a C++ application. Unfortunately, if you’re using C# or VB.net, you’re out of luck – no way to link directly against a C++ .lib file in those languages. The addition of my managed wrapper – BerkeliumSharp – means that you can use the library in managed applications, and integrate it with Windows Forms, XNA, or even WPF.

I’ve released the source under the BSD license (just like Berkelium) and included two simple examples (one for Windows Forms, one for XNA). It’s not a Chrome competitor but it’s surprisingly easy to get a lot of complicated things working – you can watch Hulu videos if you have Flash installed, and Flash games like Dino Run work in the Windows Forms example since it implements keyboard input.

You can download pre-built demos to test it out, or grab the source code over at Google Code.

Have fun!

P.S. If you try out the examples, be aware that content that opens pop-up windows doesn’t seem to work. I think this is a bug in Berkelium.
Edit: Figured out how to build Chromium and Berkelium myself and fixed the bug. Open source is awesome!

Tags: , , , , , , ,

Constant Binding

One of the changes I made in the weeks leading up to my contest deadlines was to pull some of the player-specific combat logic, like that for attack chains, combos, and flinching, out into their own objects. Doing this let me apply that same combat logic to monsters and other entities in the game world, which cut down on duplication considerably.

However, doing this made it clear that I had some architectural issues to tackle: All of these mechanics were heavily dependent on the tunable constants for the creature in question, which meant I couldn’t just pull methods and variables out of my entity classes into classes of their own.

To solve the problem of accessing an object’s constants, I came up with a solution based on reflection. I can define a helper object designed to handle an aspect of an entity’s mechanics – for example, a HealthPool object to manage the creature’s health, along with associated aspects like regeneration. The helper object can define instance variables for the constants it needs access to, like so:

public class HealthPool {
    public Constant<float> HealthMax = null;
    public Constant<float> HealthPassiveRegen = null;
    public Constant<float> HealthRegenDelayTime = null;
    public Constant<float> HealthRegenRampTime = null;
    public Constant<float> FlinchThreshold = null;
    public Constant<float> FlinchThresholdDecay = null;

    public readonly RuntimeEntity Entity;
    public readonly ITimeProvider TimeProvider;
    public float Health = 0.0f;

Note that these variables are the same name and type as the actual tunable constants – the difference is that instead of being static, they’re instance variables. Doing this allows me to pull a function out of an entity’s source code without needing to change the way it references particular constants, since the constants have the exact same names as before.

Of course, since these variables default to null, we need some way to fill them in with references to the actual tunable constants we want to use. To do that, we apply reflection:

public static void BindConstants (object destination, params Type[] sourceTypes) {
    var genericConstant = typeof(Constant<>);
    var destinationType = destination.GetType();
    var destinationFields = new Dictionary<string, FieldInfo>();

    foreach (var field in
        destinationType.GetFields(BindingFlags.Public | BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.FlattenHierarchy)
    ) {
        var fieldName = field.Name;
        var fieldType = field.FieldType;

        if (!fieldType.IsGenericType || fieldType.GetGenericTypeDefinition() != genericConstant)
            continue;

        destinationFields[fieldName] = field;
    }

    foreach (var sourceType in sourceTypes) {
        foreach (var field in
            sourceType.GetFields(BindingFlags.Public | BindingFlags.NonPublic | BindingFlags.Static | BindingFlags.FlattenHierarchy)
        ) {
            var fieldName = field.Name;
            var fieldType = field.FieldType;

            if (!fieldType.IsGenericType || fieldType.GetGenericTypeDefinition() != genericConstant)
                continue;

            FieldInfo destinationField = null;
            if (destinationFields.TryGetValue(fieldName, out destinationField)) {
                destinationField.SetValue(destination, field.GetValue(null));
                destinationFields.Remove(fieldName);
            }
        }
    }

    if (destinationFields.Count > 0) {
        var constants = String.Join(", ", (from key in destinationFields.Keys select key).ToArray());
        var types = String.Join(", ", (from type in sourceTypes select type.Name).ToArray());

        throw new InvalidDataException(String.Format("Type(s) {0} do not declare the following constants:\n{1}", types, constants));
    }
}

What we’re doing here is pretty simple: We accept a reference to an object that has constants requiring binding, and a list of source types to retrieve constants from. The function operates in two stages: First, we enumerate all the instance variables defined in the target object, and build a list of all the tunable constants it has that need to be bound. After that, we enumerate all the static fields of the provided source types, looking for constants that have names matching those of the instance variables on the target object, binding them where appropriate. After this, we can simply check to see if our list of constants is empty or not – if it’s empty, we successfully bound all our constants, and if it’s not, we know that one or more of the desired constants was missing.

We get a few useful things out of this: First, accepting a list of types allows us to do simple inheritance of constants. If we first check the most-derived type and then the base type of an entity, that allows us to define a ‘default value’ for a particular constant, like ‘Maximum Health’, in a base class, and then define a new constant with the same name in the derived type. This also allows us to create ‘global defaults’ for a given constant – for example, if we always put Game at the end of the type list, we can have global game-wide constants for things like physics parameters, and only override them in specific classes if necessary.

Finally, to wire things up, we just need to do a little work in the constructor for our helper object:

    public HealthPool (RuntimeEntity entity, ITimeProvider timeProvider) {
        Entity = entity;
        TimeProvider = timeProvider;

        ConstantManager.BindConstants(this, entity.GetType(), entity.Game.GetType());

        Health = HealthMax;
    }

In this case, we’re initializing the HealthPool using the constants defined in the entity, and falling back to any constants defined in the Game when the entity doesn’t specify them. If a necessary constant is missing, we’ll get an exception thrown when constructing our helper object. Once we’ve bound the constants, we can just use them like we would otherwise – in this case, the HealthPool automatically initializes itself based on the HealthMax constant.

This is definitely preferable to the approach I used to use for exposing an entity’s constants – previously, for important constants like an entity’s bounding box size, I’d define an abstract property in a base class, and override it in each derived type to return the value of the constant. Now, I don’t need to use any abstract members or interfaces; I can just bind to the constants once when I initialize my helper objects.

One thing of note is that since this technique uses reflection, you might run into performance issues if you’re binding constants repeatedly. This is pretty trivial to solve, however; you can just cache the results of a constant binding operation based on the destination and source types, since those aren’t going to change at runtime.

Tags: , , ,

Shipping a Brick

Computer software, and by extension, video games, are getting more complicated every day. It used to be that a game might be a few hundred thousand or maybe a million lines of code. Now, some of the third-party libraries we use in our games are approaching that size, if not already larger! Keeping that in mind, it’s quite impressive that modern games still run for the majority of users. But here’s a simple fact: Sooner or later, someone is going to be unable to play your game. What you do then determines whether or not they will remain your customer.

brick


A few years back I was doing design work on an MMORPG. Really satisfying work – come up with characters & stories, try to construct gameplay around them, and then watch the rest of the team turn it into a living, breathing part of your game world that your players are going to interact with for months afterward.

So, at one point we decided to run a weekend promotion to promote our upcoming title. We temporarily enabled access to the content from our next game, and allowed people who didn’t own the game to create characters and play for the duration of the weekend. Great idea – get people in there at no cost, try and convince them your game is fun and worth playing. Works good for existing customers, too, because now they get excited about what’s coming down the pipe because you’ve let them play it for a couple days.

I was pretty excited about it, since it was the first promotion of its kind that we’d run since I had started working there – which meant I could finally show some of my friends what I was working on, and try and convince them it was worth playing. Most of them hadn’t even looked at the game since they had to pay for it first – and who can blame them, really?

Friday rolls around and I head home late and get some rest. The rest of the team does the same.

On Saturday morning, I send a message to a couple of my friends explaining how to download our client and try out the game. They’re both pretty enthusiastic about it and get started. A few minutes later, one of them says:

‘Why can’t I use any skills?’

What? What do you mean you can’t use any skills? Are there buttons on the bar at the bottom of the screen?

‘Yeah, but they have little lock icons over them. Nothing happens when I press them.’

Well, uh, s–t. That’s not supposed to happen. I emailed one of my coworkers to ask – uh, did anyone try creating a brand new account for the promotion to see if it worked? – and the answer is depressing: Nope. Luckily, it wasn’t a total loss – our existing customers were still able to participate in the event, because they already had working accounts. But for new, potential customers – the game didn’t work. They downloaded and installed our game, and it was mostly useless to them.


Read the rest of this entry »

Tags: , , , ,

‘This application is not configured to use PerfHUD’ – how to fix it

For anyone trying to use NVPerfHud, if you get stuck trying to figure out this error message:

This application is not configured to use PerfHUD.

Consult the User’s Guide for more information.

You may have noticed that there is no reference to this error message, or ‘configuring your application’ in the User’s Guide. The solution is actually documented in the NVPerfHud Quick Tutorial‘, and involves changing the parameters you pass to CreateDevice.

Hopefully posting this here will help other people find this information, since googling the error message turned up zero results for me. Good work, NVidia!

P.S. Preventing ‘unauthorized profiling by third parties’? What the hell is this, the cold war? This is computer graphics, not espionage. Can you imagine if other software worked this way?

To prevent unauthorized pixel modification by third parties, any images you wish to view with Adobe Photoshop must contain a hand-written sample of dialogue from the second act of A Midsummer Night’s Dream, by William Shakespeare. Please note that dialogue from modern adaptations like the 1999 version featuring Kevin Kline is not sufficient. Also please note that handwriting in sufficiently obscure cursive scripts is unlikely to be recognized.

Tags: , , , ,

Threaded EndDraw in XNA (Crouching WaitOne, Hidden Lock)

After finishing up the materials for my Level Up 2009 entry, today I spent a little while trying out an idea I had recently:

One of the problems with using vertical sync in a video game is that it eats into the available CPU time for performing game updates. The way vsync is implemented in most graphics APIs, it causes your Present/EndDraw/SwapBuffers call to block until the card enters vertical blank and the frame is shown to the user. While this is ideal from a correctness perspective, it’s a tremendous waste since it means you can end up sitting there for up to 16 milliseconds, waiting for vertical blank. If your game spends lots of time doing both updating and drawing, all that time could be spent performing updates instead. Ouch.

Currently, my game spends about as much time drawing as it does updating. A significant portion of the time spent drawing (20-30%) is within the EndDraw function. Turning off vertical sync drops the amount of time spent in EndDraw considerably, but introduces tearing. So, as a potential solution, why not call EndDraw on a background thread? While the thread waits for vertical blank, I can begin performing the next frame’s Update, and in the event that I finish updating before the previous frame is visible, I simply wait for that previous EndDraw call before beginning to paint the next frame. In the optimal case, this means I can come much closer to the best possible framerate without introducing tearing, and in the worst case, the cost of rendering an individual frame is only *slightly* increased by the use of thread synchronization. The fact that I’m only doing EndDraw on another thread means that I don’t have to worry about protecting my game data with locks and other synchronization techniques, since the GraphicsDevice doesn’t use any of my game data when performing the EndDraw operation.

So, to test this out, I overrode my Game class’s BeginDraw and EndDraw methods. This turns out to be all we have to do to change the way drawing is performed, because the XNA Framework developers were kind enough to make both of these methods virtual.

        protected override bool BeginDraw () {
            _DrawCompleteEvent.WaitOne();
            _DrawCompleteEvent.Reset();
            return base.BeginDraw();
        }

        protected override void EndDraw () {
            _DrawRequiredEvent.Set();
        }

Of course, at this point, the two events used here are never set, so this code won’t work. Thus, we add a background thread to perform our painting:

        AutoResetEvent _DrawRequiredEvent = new AutoResetEvent(false);
        ManualResetEvent _DrawCompleteEvent = new ManualResetEvent(true);

        public Game () {
            ...

            _DrawThread = new Thread(DrawThreadFunc);
            _DrawThread.IsBackground = true;
            _DrawThread.Start();
        }

        protected void DrawThreadFunc () {
            while (true) {
                _DrawRequiredEvent.WaitOne();
                base.EndDraw();
                _DrawCompleteEvent.Set();
            }
        }

Fairly simple thread programming here: We create a thread, and set IsBackground to true so that it will stop as soon as the main thread exits. The thread spends all of its time waiting for a ‘required draw’ signal, and then performs an EndDraw call. Once the call is complete, it sets another signal to inform the Game class that the previous draw has finished and it’s safe to perform a BeginDraw call (this lets us make sure that we never use the GraphicsDevice on the main thread while the background thread is performing an EndDraw).

Once this is all done, I start up my game, and… the framerate isn’t any different. Huh? What’s more, my frame profiler indicates that Update is now taking ten times as long as it used to, while Draw isn’t any faster. Huh???

In situations like this, it’s always good to consult a profiler to see if you’re missing something important:

profile_01

So, we can see that the DrawThread is definitely doing its job – it calls EndDraw, then waits for a signal asking it to draw again. Both are taking about as much time as we’d expect. But why is Update taking so long…?

profile_02

… oh. Oops.

So it turns out that I was using GraphicsDevice.Viewport.Width and GraphicsDevice.Viewport.Height in my camera code. Accessing the Viewport property caused the XNA framework to call into Direct3D to retrieve the viewport, which acquired the exact same lock being used by EndDraw, causing my main thread to stall until the draw completed. WHOOPS.

This is especially embarassing since the viewport size never changes anyway, so I could have just stored the width/height into constants. After doing just that and starting the game again, the profile looks more like you’d expect:

profile_03

What’s more, this is actually an improvement: With vertical sync enabled, this results in a significant reduction in the amount of time spent inside the BeginDraw/Draw/EndDraw functions on the main thread, which means there’s more time left to perform Updates. This means that I can maintain a solid, smooth 60fps easier on dual-core/hyperthreaded machines.

Even with vertical sync disabled, this is still an improvement, though not as significant – apparently other things are happening inside EndDraw (not a big surprise), so by shifting that work off onto a second thread, I’m still gaining some time to spend performing the next update. When I disable the built in framerate balancer, this brings my framerate from ~350fps up to ~380fps. Not bad for a couple dozen lines of code!

Of course, it’s worth pointing out that the XNA Framework documentation doesn’t make any promises here, so it’s possible that this technique is unsafe. When it comes to concurrency, it’s very easy to do the wrong thing and get away with it – as you might have noticed here, I was doing something utterly stupid and unsafe in my Update function, and I got away with it because the DirectX developers had the foresight to put a lock in the right place. If they hadn’t, my game might have corrupted state from accessing the GraphicsDevice on two threads, and crashed intermittently.

Regardless, this is a handy technique – once I’ve had the chance to do lots of testing on various PC configurations (and the XBox 360), I’ll probably be using it in my game when I ship.

Tags: , , , , , ,

Collision detection and motion

Initially I implemented collision detection and motion in this platformer prototype the same way I had done it previously in Fury², my game creation system project from a while back. Entities have a position and velocity, which are both pairs of floats (x, y). Every frame, I compute the new position of the entity – pos+vel – in a given axis (x or y), and then do a check to see if it’s obstructed. If it’s obstructed, I halve the velocity and try again, and I repeat that iteratively a few times until I give up. After doing that for both axes, I have a relatively good approximation of how far the entity can move in its desired direction.

It was good enough for me, and the performance wasn’t *terrible*, so I didn’t worry about it much. But I always had this nagging doubt in my mind – this solution is obviously a hack. Isn’t there a way to do this with pure math?

Well, I’m not very good at math, but it turns out there is a way to do it with pure math. So today I threw out all the motion and collision code I had written for the platformer and decided to try a pure math approach, and see if I could resolve collisions non-iteratively. Suprisingly enough, it wasn’t actually that hard.

Here’s a summary of how it works:

I already do my collision detection using the separating axis theorem (SAT). To check for collision between a pair of convex polygons, I compute a set of axes based on the polygons (a vector perpendicular to every edge of each polygon) and then for each axis in the set, I project both polygons onto that axis and see if they overlap. As soon as I find an axis in which the polygons are not overlapping, I know that they are not intersecting – no need to check all of them.

Pretty simple, and actually quite easy to code – Around 20 lines of C# for a relatively efficient implementation of the check itself, and another 20 lines to compute the list of axes without any duplicates, since there’s no point in checking the same axis twice.

With the algorithm described above, it’s actually quite simple to build motion on top of it. Once I’ve projected both polygons onto a given axis, and confirmed that they don’t intersect, I can do another projection – I can project the velocity of the moving polygon onto that same axis, and add that to the original projection of the moving polygon to get the projection of its new, desired location. At that point, I know whether the polygon will intersect on this axis as a result of moving, and not only that, I can compute the amount of intersection by determining how much the two intervals overlap. Once I have that value, I can use it to compute a vector that will separate the two polygons.

Once I iterate over all the axes and compute separating vectors, I can pick the smallest separating vector and add it to the original velocity of the polygon. At this point I can do some checks to see if the new velocity is acceptable – for example, did the separating vector cause the polygon to change directions entirely? In that case, it’s probably not acceptable, and we can just flat-out reject movement and say that the polygon cannot move in this direction at all. Likewise, if the separating vector is enormous, that means the polygon moved all the way through the obstruction to avoid collision, which we obviously don’t want either, so we can reject it.

The end result was essentially this:

  • For every axis, project movingPolygon and stationaryPolygon onto that axis. This produces two intervals – movingInterval and stationaryInterval.
  • Check to see if movingInterval and stationaryInterval overlap. This tells us whether or not the polygons were already colliding.
  • Project the velocity of movingPolygon onto the axis, and add that projected value to movingInterval. This gives us the new interval of the moving polygon – newInterval.
  • Check to see if newInterval and stationaryInterval overlap. If they overlap, determine how much they overlap by – overlapAmount. At this point we know whether or not the two objects would have collided.
  • If the newInterval is overlapping, we can now compute a vector that will separate the two polygons, by multiplying the current axis by the amount of overlap (the axis is a unit vector, and the amount of overlap is a distance).
  • Add the separating vector to the original velocity of the polygon, to compute the new candidate velocity – newVector.
  • Check to see if Magnitude(newVector) <= Magnitude(movingPolygonVelocity). If not, reject the new vector.
  • Check to see if Normalize(newVector) ·dot· Normalize(movingPolygonVelocity) >= 0. If not, reject the new vector. (This ensures that we don’t push the object in the opposite direction to resolve the potential collision – in this case, the magnitude would have still passed the previous check)
  • If the new vector was not rejected, we know that the two objects will not be colliding.
  • Once you’ve iterated over all the axes, select the largest non-colliding vector (if any). That’s your resultVector.

At the end of this algorithm, you have four outputs:

  • areColliding (right now, before applying the velocity)
  • wouldHaveCollided (if we applied the original velocity)
  • willCollide (if you apply the result velocity)
  • resultVelocity

In the case where the collision could not be resolved, the resultVelocity will be 0. In the case where there was no collision, the resultVelocity will be the input velocity. In every other case, the resultVelocity is a vector that safely moves the polygon in the desired direction, without producing a collision.

One detail to note is that the quality of the result velocity depends on the axes that you check. If you only check axes generated from the shapes of the colliding objects, you can get strange (though valid) results – for example, two boxes will not produce any axes except X and Y to check, so if the object is moving at an angle, the algorithm will miss the ideal axis to compute the resultVelocity from. There are a few ways to solve this – you can pre-compute a set of ‘basic axes’ to check in every case, or you can derive a set of axes from the velocity of the object. Either way, though, the algorithm seems to work quite well – it works as well as the old one did in the platformer, and passes all my unit tests.

It’s always nice to go from 100 lines of code to 10 lines of code. Unfortunately, I’m not done yet. My previous implementation supported walking up and down slopes in a natural manner, and it also supported moving platforms. The new implementation supports neither, so I have some more work to do. Hopefully I can solve both of those problems using pure math, too – I’m working on sloped surfaces right now, and it looks like I can make it work with pure math by improving on my existing algorithm.

You can see the source code for my implementation here. Note that there’s some commented out bits in there from my work on sloped surfaces. :)

Tags: , , , , , , , ,

Fury² Post-post-mortem: Lessons learned

Warning: This is an incomplete draft. I haven’t found time to edit this down and fix mistakes yet.

Starting at the beginning of high school, through my few years of college, and into the first year of my professional career, I spent a significant portion of my free time building and maintaining a toolset and game engine called Fury². It was built as a way for me to quickly prototype games and ideas, to learn about game design and engine design, and most importantly, to let me build games that I didn’t think I could create using any of the tools that were already out there.
It was a long and painful development project, and by the end, I realized that I had made numerous mistakes; despite that, I also gained many valuable lessons and learned things that I might not have learned any other way, and with the clearer perspective that hindsight brings, I consider it one of the things that made it possible for me to start a career doing things that I love without having to struggle with debt or worry about how I’m going to afford my next meal. For me, this project embodies the famous Nietzsche quote, ‘What does not kill me, makes me stronger‘.

Learning how much sheer effort goes into the process of creating even a simple piece of software has changed me for the better in many ways, so I’m writing this article in hopes that I can share some of the lessons I’ve learned with others. Unfortunately, the benefit of hindsight also brings with it the fact that I don’t remember many of the things that happened during the 8 or so years I spent developing the project. I hope that despite this, you can benefit from what I’ve learned as much as I have.

Of course, I must preface the contents of this article with a warning: It’s written in fairly technical terminology, and does not have any particular narrative structure. I have tried to compile the most important issues and triumphs from the history of the project, and present them in a form that a programmer can understand. Regardless of whether you’re a programmer or not, I encourage you to respond via email or comments and let me know what parts of this you couldn’t understand, so that I can improve this article and make it worthy of your time.

- Kevin Gadd


The Fury² Game Development System: A post-post-post-mortem

This article is inspired by the format of and spirit behind the many wonderful ‘Postmortems’ published in Game Developer magazine. I will attempt to collect the most significant failures and successes that occurred during the life of my project, and try to give you a glimpse of what the project looked like and how it felt to use it.

Read the rest of this entry »

Tags: , , , , ,