Archive for category AltDevBlogADay

Copying pixels from a pointer to an XNA Texture2D

In the XNA framework, the only way to load pixel data into a texture is to provide it in the form of a .NET Array. In most cases, this isn’t a problem – you can specify an offset into the array, and the array can be of any type. However, if you’re using a library like Awesomium or Berkelium, you will end up with large blocks of pixel data in the form of a pointer. The only way to turn this into an array is to manually allocate an array of the appropriate size and copy the pixels over every time they change. Not only does this waste memory, but it’s relatively expensive.

In this post I’ll show you a (fairly evil) way to copy pixel data directly from a pointer into an XNA texture using C#, PInvoke, and some knowledge of XNA internals.

The first step is to understand how XNA’s Texture2D.SetData method works normally. I opened it up in ILSpy, a free .NET decompiler.
Looking at the three overloads of SetData reveals that they all call into a private method called CopyData, with this signature:

private unsafe void CopyData<T>(int level, Rectangle? rect, T[] data, int startIndex, int elementCount, uint options, [MarshalAs(UnmanagedType.U1)] bool isSetting) where T : struct

The method is large and complicated, and parts of it don’t decompile into readable, valid C#. But we can understand what it does and how it does it with some research and careful examination of the decompiled code.

The first part of the function spends a bunch of time validating all the provided arguments. It ensures that the texture being modified isn’t active as a render target or attached to one of the device’s texture samplers, and validates other parameters like the size of the texture and the size of the data provided.

However, you start to see rather confusing code like this in the disassembly:

	_D3DSURFACE_DESC d3DSURFACE_DESC;
	initblk(ref d3DSURFACE_DESC, 0, 32);
	IDirect3DTexture9* ptr = this.pComPtr;
	int num3 = calli(System.Int32 modopt(System.Runtime.CompilerServices.IsLong) modopt(System.Runtime.CompilerServices.CallConvStdcall)(System.IntPtr,System.UInt32,_D3DSURFACE_DESC*), ptr, level, ref d3DSURFACE_DESC, *(*(int*)ptr + 68));
	if (num3 < 0)
	{
		throw GraphicsHelpers.GetExceptionFromResult(num3);
	}

If you know enough about .NET bytecode (CIL/MSIL), it may be easier to understand what's going on here. Bytecodes that couldn't be mapped to C# constructs were basically spit out directly as if those bytecodes were functions. This is happening because XNA is mostly written in C++/CLI, not C#.

First, the initblk opcode is used to zero-initialize the contents of a new D3DSURFACE_DESC structure. This is equivalent to doing a memset in native C/C++.

Next, the calli opcode is used. Essentially, this opcode allows you to invoke a native function directly, given a function pointer and knowledge of the function’s signature. All of the decompiled code here is essentially describing the signature of the function and then providing arguments for it. You’ll probably never see this generated by C# code, but it’s used often in C++/CLI to invoke native functions – and in this case, it’s being used to invoke a method of a COM interface.

How can you tell it’s being used to invoke a method of a COM interface? There are a few clues here:
First, we notice that ptr contains a pointer of type IDirect3DTexture9 – a COM interface. Second, we can look at the argument list to the calli instruction:

System.Int32 modopt(System.Runtime.CompilerServices.IsLong) modopt(System.Runtime.CompilerServices.CallConvStdcall)(System.IntPtr,System.UInt32,_D3DSURFACE_DESC*), ptr, level, ref d3DSURFACE_DESC, *(*(int*)ptr + 68));

First we have the return type of the function – System.Int32. Next, two modifiers that notify the compiler about the nature of the function – it’s a stdcall, and the return type is a C++ ‘long‘ with the semantics that implies. This doesn’t really matter much to us at the moment, but it’s good to understand the basics.
Next, the argument types for the function are provided: System.IntPtr, System.UInt32, and a D3DSURFACE_DESC *.

Given this information, we now know the signature of the function being called, so we could write an equivalent delegate if we wanted. The decompilation doesn’t tell us the name of the function, but given the name of the interface (IDirect3DTexture9) and the argument list, we can check that interface’s documentation on MSDN and try to figure out which function it is.

Finally, let’s look at the actual arguments being passed when invoking the function:

ptr, level, ref d3DSURFACE_DESC, *(*(int*)ptr + 68)

First, we see the first three arguments are of the appropriate type for the function’s signature and their values make sense. There’s a fourth argument, though, and it looks funny – in fact, it looks like the decompiler didn’t quite make sense of it. What is that?

Well, we know that ptr is a pointer to an IDirect3DTexture9. First, the code is dereferencing the pointer. If you know enough about COM, you will realize that dereferencing a pointer to a COM interface will allow you to access that interface’s vtable. The vtable contains pointers to each function provided by the interface, which allows you to invoke those functions on a given instance.
Given that the code is dereferencing the interface pointer and then adding an offset to it, we can now infer that it’s pulling a specific function out of the interface’s VTable. Again, we can’t immediately tell which function, but we have a lot of information here that we could use to figure it out.

This makes it pretty clear that a COM function is being called on the interface, because we can see the function pointer being pulled out of the COM vtable and passed to the calli instruction, along with a function signature and argument types that seem like a perfect fit for a COM method (HRESULT return value, first argument is the interface pointer).

At this point, we could dig through the vtable and find our way to the necessary method calls to lock the texture and get a pointer to its pixels. Then we could do any necessary pixel format conversion, etc while copying from our buffer to the texture. However, there’s an easier solution!

If you read through the code for the method, in certain code paths, it makes use of a D3DX function called D3DXLoadSurfaceFromSurface. It’s a pretty useful function – it can convert pixel formats, resample textures, and even handle DXTC compression. And if you look through the documentation, there’s a version of this function that can take a pointer as a source instead of another surface – making it perfect for our needs. We just have to find a way to call that function and hand it our image data pointer and D3DX will handle any necessary pixel format conversion and copy our pixel data to the texture.

Now, to call the function. First, we need to get ourselves an interface pointer. If we’re willing to use reflection, this is quite simple – we can get ourselves a reference to the pComPtr field we see used in the disassembly and use that reference to get the interface pointer for a given texture, like so:

public static class TextureUtils {
  internal static FieldInfo pComPtr;

  static TextureUtils () {
    pComPtr = typeof(Texture2D).GetField("pComPtr", System.Reflection.BindingFlags.Instance | System.Reflection.BindingFlags.NonPublic);
  }

  public static unsafe void* GetIDirect3DTexture9 (this Texture2D texture) {
    return Pointer.Unbox(pComPtr.GetValue(texture));
  }

Next, we can set up the necessary P/Invoke skeleton to be able to invoke the D3DX function. We need two enumerations, a struct, and finally the function itself:

// Some enumeration members omitted for readability
public enum D3DFORMAT : uint {
    UNKNOWN              =  0,

    R8G8B8               = 20,
    A8R8G8B8             = 21,
    X8R8G8B8             = 22,
    A8B8G8R8             = 32,
    X8B8G8R8             = 33,
}

[Flags]
public enum D3DX_FILTER : uint {
    DEFAULT              = 0xFFFFFFFF,
    NONE                 = 0x00000001,
    POINT                = 0x00000002,
    LINEAR               = 0x00000003,
}

[StructLayout(LayoutKind.Sequential)]
public struct RECT {
    public int Left;
    public int Top;
    public int Right;
    public int Bottom;
}

// Note that we need to be careful to use the same exact version of D3DX that XNA uses.
[DllImport("d3dx9_41.dll")]
private static unsafe extern int D3DXLoadSurfaceFromMemory (
    void* pDestSurface, void* pDestPalette, RECT* pDestRect,
    void* pSrcMemory, D3DFORMAT srcFormat, uint srcPitch,
    void* pSrcPalette, RECT* pSrcRect,
    D3DX_FILTER filter, uint colorKey
);

At this point, however, you might have realized the problem: This function takes a pointer to an IDirect3DSurface9, not an IDirect3DTexture9. How do we go from a Texture to a Surface? Well, the MSDN documentation shows that there’s a convenient method that will give us the surface pointer for a given mip level of a texture: GetSurfaceLevel. But given that all we have is a void *, how can we call it? There are a few ways, of course – you could write a tiny C++ library to do the work, or you could add a reference to the Managed DirectX libraries, or something like that. But we’re going to do it the way they do it in C – with pointers!

To make sure we’re on the right track, we can first dig through the disassembly of CopyData to find the spot where it calls GetSurfaceLevel. Since we know the argument types and return type, it’s not too hard to find:

IDirect3DSurface9* ptr3;
IDirect3DTexture9* ptr4;
num5 = calli(System.Int32 modopt(System.Runtime.CompilerServices.IsLong) modopt(System.Runtime.CompilerServices.CallConvStdcall)(System.IntPtr,System.UInt32,IDirect3DSurface9**), ptr4, 0, ref ptr3, *(*(int*)ptr4 + 72));
if (num5 >= 0) {

With what we know about COM method invocations, we can tell this is invoking a method of IDirect3DTexture9 with an integer parameter (0) and a pointer-to-pointer parameter of type IDirect3DSurface9. This is definitely GetSurfaceLevel. This not only tells us what the right signature for the function looks like, but it also tells us where in the interface’s VTable we can find a pointer to the function in order to call it. So, armed with this knowledge, we can write some terrifying unsafe code to pull the function pointer out of the VTable:

// This could be written better, probably. I'm lazy.
public static unsafe void* AccessVTable (void* pInterface, uint offsetInBytes) {
    void* pVTable = (*(void**)pInterface);
    return *((void**)((ulong)pVTable + offsetInBytes));
}

And, using that function along with an appropriate delegate type, we can put these pieces together to get a surface pointer:

internal unsafe delegate int GetSurfaceLevelDelegate (void* pTexture, uint iLevel, void** pSurface);

public static class VTables {
    public static class IDirect3DTexture9 {
        public const uint GetSurfaceLevel = 72;
    }
}

public static unsafe void* GetSurfaceLevel (this Texture2D texture, int level) {
    void* pTexture = texture.GetIDirect3DTexture9();
    void* pGetSurfaceLevel = AccessVTable(pTexture, VTables.IDirect3DTexture9.GetSurfaceLevel);
    void* pSurface;

    var getSurfaceLevel = (GetSurfaceLevelDelegate)Marshal.GetDelegateForFunctionPointer(
        new IntPtr(pGetSurfaceLevel), typeof(GetSurfaceLevelDelegate)
    );

    var rv = getSurfaceLevel(pTexture, 0, &pSurface);
    if (rv == 0)
        return pSurface;
    else
        throw new COMException("GetSurfaceLevel failed", rv);
}

We use the marshalling API to turn the function pointer into a callable delegate with the appropriate signature, and then call it in order to get ourselves an interface pointer. Now, we’re finally ready to call D3DX! Let’s wrap this magic up with a helpful function:

public static unsafe void SetData (
    this Texture2D texture, int level, void* pData,
    int width, int height, uint pitch,
    D3DFORMAT pixelFormat
) {
    var rect = new RECT {
        Top = 0,
        Left = 0,
        Right = width,
        Bottom = height
    };

    void* pSurface = GetSurfaceLevel(texture, level);

    try {
        var rv = D3DXLoadSurfaceFromMemory(pSurface, null, &rect, pData, pixelFormat, pitch, null, &rect, D3DX_FILTER.NONE, 0);
        if (rv != 0)
            throw new COMException("D3DXLoadSurfaceFromMemory failed", rv);
    } finally {
        Release(pSurface);
    }
}

You’ll note that we’re also ensuring that we decrement the reference count on the surface after we’re done with it so it doesn’t leak. To do this, we wrote a quick helper function:

internal unsafe delegate uint ReleaseDelegate (void* pObj);

public static class VTables {
    public static class IUnknown {
        public const uint Release = 8;
    }
}

public static unsafe uint Release (void* pObj) {
    void* pRelease = AccessVTable(pObj, VTables.IUnknown.Release);
    var release = (ReleaseDelegate)Marshal.GetDelegateForFunctionPointer(new IntPtr(pRelease), typeof(ReleaseDelegate));

    return release(pObj);
}

And now, after all that hard work, copying pixel data from an Awesomium WebView into an XNA texture is as simple as this:

var rb = WebView.Render();
WebViewTexture = new Texture2D(GraphicsDevice, rb.Width, rb.Height, false, SurfaceFormat.Color);
WebViewTexture.SetData(0, rb.Buffer.ToPointer(), rb.Width, rb.Height, (uint)rb.Rowspan, D3DFORMAT.A8R8G8B8);

If you’d like to use this for yourself, you can download it in a ready-to-use form from my GitHub repository. Hope it’s helpful!

Tags: , , , , ,

Performance Challenges in Compiling Code to JavaScript

The following post is cross-posted from AltDevBlogADay.

For the past few months, I’ve been working on a tool to compile .NET applications into JavaScript. While there’s a lot of work involved, a large chunk of that is caused by static language idioms that cannot easily be expressed in JavaScript. Even when coding by hand, some of these patterns are difficult to turn into JS that performs well, works correctly, and is easy to understand. In this post, I’ll begin describing some of the challenges involved.

Value Type Semantics

Value type semantics are a common feature in static languages. Most static languages allow you to define custom types that inherit the semantics of the language’s built in primitive value types, so that your struct (or whatever your language happens to call it) behaves much like a single int or float would when being manipulated. This feature makes it possible for custom numeric types (like a fixed-point value, for example) or even composite types like vectors and matrices to follow the same rules.

At first glance, the lack of this feature might not seem significant – especially if you’re used to object oriented programming, it would seem like translating code dependent on these semantics is not an intractable problem. While it’s certainly not impossible, it has a number of downsides that surprised me, especially once I started profiling real-world code in modern JavaScript implementations:

Algorithms designed for value types are slower without them

Those of you who’ve worked with Java are probably familiar with this particular problem already: If you take an algorithm designed around the presence of value types and port it to a language without them, while you can easily preserve behavior by inserting copies in the correct places, you may find that your performance has gone through the floor. The reason for this is simple: In many runtime environments, the cost of an instance of a reference type is significantly larger than the cost of an instance of a value type.

The reasons for this cost tend to vary, but one common trait is that these algorithms produce tremendous amounts of garbage when run in a garbage-collected environment. Some of the more sophisticated garbage collectors provide mechanisms to improve performance for this kind of code, like generational collection and escape analysis, but the sad fact is that even the most sophisticated collectors available still suffer from visible pauses when the collector runs. These pauses not only impair the overall performance of your application, but in games, they cause painful framerate hitching and can even impair the playability of your title.

Another common source of higher costs for instances of reference types is inheritance. In particular, virtual inheritance. Virtual inheritance as implemented in most languages requires every object instance to have a few bytes devoted to identifying its type, via vtable pointers or the like. In some cases compilers can optimize this overhead out, but the overhead is usually there regardless of whether you use it, and in some languages, the default is to make everything virtual – so suddenly even your Vector3 object has a vtable pointer.

A third source of overhead is indirection – reference types almost always live in the application heap, which means they are allocated in a chunk of space dedicated to that particular instance. Even if you allocate an array of your Vector3s, reference type semantics mean that it is probably necessary for the heap to contain an allocation for each one, so that if half of them are destroyed, those resources can be freed and reused by some other data types. Not only does this mean that the simple act of creating that array has become more expensive, but it means that an array of values is no longer guaranteed to be contiguous in memory. Sure, your Vector3[] is contiguous, but all it contains is object references. Hm, what’s that cache locality thing people were talking about again?

(Somewhat obvious pre-emptive nitpick: This last bit here doesn’t apply to C++. But you’re using C++, so you know this already, and you should probably go make sure MSVC hasn’t run out of memory while compiling that boost template instantiation. You know the one.)

Algorithms based on value types are fragile using reference types

Despite the above problems, Java developers are able to produce quality software (depending on who you ask, anyway). The same is true for JavaScript: These algorithms can, with effort, be made to work, and even work well. Unfortunately, there is a significant cost involved. You have to rework your code in order to reduce the amount of garbage it creates, and avoid patterns that naturally require the creation of garbage. For example, in a language like C++, your game probably relies on a math library, featuring structured value types with names like Vector3 and Matrix. If you’re one of those people who likes operator overloading, you’ve probably overloaded various operators for those structures, so that you can write things like this:

The above code is simple and self-explanatory to someone familiar with the basics of algebra (and those of you who abstain from operator overloading have probably written similar code using functions instead of operators). As mentioned above, the performance of such algorithms will suffer badly in a garbage collected environment, but that’s not the worst part. First of all, to correctly translate these algorithms, you must carefully locate all the implied copies contained within the code. As written, the above algorithm is correct even without value types. Let’s convert it to valid JavaScript:

Seems fine, right? But, the keen-eyed reader realizes that a copy is implied in the Add and Multiply operations, so for them to be correct, they must create and return new Vector3 instances. Let’s fix it so it doesn’t produce garbage.

Hm, we need a copy here. That sucks. We’ll just store an extra vector in the object so we don’t need to create garbage. All good, right?

A little ugly, perhaps, but not bad, right? The clarity suffers a little, but we could address that by naming variables more carefully, etc. Not horrible. But wait a second. We pulled the values position and velocity out of an object. If we just modified position and velocity, does that mean we’ve modified the state of the object it came from? Yes, yes it does. Well, we just need to emulate the semantics of value types, right?

Okay, there. It’s correct again now. But wait, garbage collection! Uh, crap. Okay, let’s remove the copies we don’t actually need… we’ll assume that the value of NextPosition isn’t being used, so we don’t need to copy it. And we’re not modifying Position so we don’t need that copy either…

Okay, life is good again. But a week later, you’re reading over commits (sneaky interns keep adding features behind your back, the nerve) and you see something that makes you spit up your coffee:

Luckily, as noted in the comment, the game isn’t using ZoomFactor yet, so it’s always 1. Nothing horrible has happened yet. But now you’re worried, and you’re starting to sweat a little: What other bugs has JDoe introduced that are just waiting to break your code once it hits production? Didn’t this guy develop payment processing systems at a bank? Are your savings in danger? Oh man, what if a bug in code he wrote allows Albanian hackers to steal that money you’ve been saving up for your little startup venture? You might be stuck working here for another 10 years!

Quality algorithms designed without value types are hard to find

So, let’s say you’ve calmed down a little bit after that panic attack earlier. JDoe’s not a bad guy, he’s just new at this. You’ll go through and review his code with him, make sure he understands what he did wrong, and then you’ll add another 5 pages to the company’s ever-growing coding standards document, right after you find a new bank. Life moves on, right?

While you’re doing the code review, though, you run across something that makes you pause. You notice that JDoe modified some code that was written by Chris Beardy, a wizardly programmer type who had already been working here for 8 years when you started out. Nobody really knows what happened to Beardy; one day he just stopped showing up. Until now, his code hadn’t been a problem, since most of it worked great, and the bits that stopped working you just threw out and rewrote (with variable names longer than two characters).

JDoe has done his best to modify the code, and while it seems to work, you realize you don’t know whether it’s correct. You could assume that JDoe messed up, but what if you’re wrong? How can you be sure the code is correct? You ask around the office to see if anyone has Beardy’s phone number, but nobody is even certain if he owned a phone. You try to contact his next of kin but all of their phone numbers bear international dialing codes and it’s 3am in Singapore. This is a problem.

After consulting with a respected colleague, you hit upon a solution: It may be true that neither of you know whether the code is correct, but you know what it was supposed to do. It turns out it’s based on some old academic paper, and the paper’s been cited a bunch of times, and other games use this same algorithm. You figure you can just read a couple of those papers and implement the algorithm yourself – hell, maybe you can find a good implementation out there on the internet that’s fit to use!

Soon after you begin reading the papers, panic begins to set in once again. You’re starting to wonder if that literature major was such a good idea, and trying to remember the stuff you learned in the few math classes you did take. You’re not entirely convinced that these symbols used for variable names exist in any alphabet, and not even your coworkers know what it means when the author writes them upside down. You abandon your paper-reading expedition, reminding yourself that code reuse is better anyway, and as a good engineer you should avoid Not Invented Here syndrome by adopting someone else’s battle-tested implementation…

Now the horror begins. You narrow down the few available implementations for your programming language, rule out the ones that depend on libraries you can’t possibly bundle in, and check with Legal to figure out which specific open-source licenses don’t cause them to flee in terror. You scan over some documentation and things don’t seem too bad. This one is enterprise ready, and you can get a support contract! This one is written using Inversion of Control and Dependency Injection, whatever those are – you think you might have heard someone say something good about them once. You’re feeling pretty okay about things until you pop open the source code. Why are half of the comments in Japanese? This class name is 50 characters long and you’re pretty sure all it’s responsible for is adding numbers. The constructor for one of the key objects creates no less than 8 abstract factories, and two of those only exist to create other abstract factories…

While you start updating your resume, you start to wonder whether you should consider another line of work…

(Now, let’s be completely fair: This one applies to pretty much any programming language, value types or not. But I’ve never been the type of person to let reality interrupt a good rant, so the disclaimer sits here at the end.)

In Conclusion

Now, if you’ve read all this you might be thinking, ‘hey, I’m really offended by that comment about Albanians. What’s wrong with this guy?’. And you’re right, that was kind of offensive. I’m sorry.

You also might be thinking: ‘Wow, this is a really negative perspective on things. Can life really be that bad writing JavaScript? I wrote a webpage once, and it was pretty okay. Google writes whole applications in it and they seem to be making tons of money’. You’re also right! Maybe not as right as that other guy, but hey. I said I’m sorry.

These problems are not insurmountable. In my particular case, since I’m writing a compiler, instead of porting by hand or writing JS from scratch, I can solve these problems using verifiable code, instead of trying to get it right in my head. Despite this, fixing some of these problems involves writing some pretty hairy code – and if you’re like me and your Computer Science background is a little – let’s be generous here and call it ‘weak’ – you may find things like performing escape analysis on functions containing goto statements a tiny bit daunting.

When I finally got my translator working on some real-world game demos after a month or two of hacking, I was really excited; the framerate was bad, but I hadn’t done any optimization, so that was to be expected. When I finally ran a demo through a profiler, though, my heart sank a little: Wait, I’m spending 35% of my CPU time in the garbage collector? And another 20% running the constructor for Vector3? I realized that while I had anticipated the challenges in translating working code into another language, I had not anticipated the challenges involved in translating performant code from one language into performant code in another language. So, I hope that these few examples I’ve provided might give you a little insight.

Tags: , , ,

Solving Problems With Asynchrony: Pipelining

The following post is cross-posted from AltDevBlogADay.

Previous posts in this series:

We’ve gone over some simple examples and looked a bit into the implementation of primitives like Future. Now, let’s take a moment to think about how we can bring these examples closer to the real world.

A comment on one of the previous posts noted that, were we to go with the simplest possible implementation of our asset loading code, we’d be doing file I/O against multiple files at the same time, and that in practice, this is much slower than simply loading from those files sequentially. This turns out to be a common problem when we’re building asynchronous systems; we’re often trying to solve problems that require use of a limited resource – whether it be the hard disk, the GPU, or network bandwidth. If we don’t carefully manage our use of those shared resources, at best, our application will run much slower than it would otherwise – and if we’re unlucky, it will work incorrectly or fail to work at all.

There are a few ways we can tackle these shared resources. One of the simplest solutions is to serialize access to a resource, using a lock or (in cases where we don’t mind a little sharing) a semaphore. This can be done easily in most languages, and as long as we’re using worker threads to access the resource, things will work okay. It’s far from optimal, though – here are a few example gotchas:

  • Using a lock/semaphore means that any work items waiting to complete will tie up a worker thread, wasting valuable resources.
  • Locks and semaphores provide no way to prioritize or order work, so work with significant latency requirements – like streaming in the next 2 seconds of background music – cannot pre-empt work with less significant latency requirements, like streaming in a higher res version of an on-screen texture.
  • Relying on thread synchronization also makes it hard to interact with our pending work items – let’s say we started loading the face texture for a character, but he just disconnected from the server. We want to cancel that work item, since it’s not useful anymore, but the work item’s thread is currently blocked, waiting to acquire the resource loader lock.

To solve the problems caused by locks, we can complicate things a little and introduce a work queue – any time we start a work item, we push it onto our queue and return a Future that we will manually complete once the work is done. This addresses the issues with the previous solution – we can pull items out of the work queue if they’re cancelled, and maintain multiple queues or a sorted queue if we want to prioritize more important work over less important work. Let’s begin by building a simple texture loader that ensures textures are loaded one at a time.

In order to make this easier, we’ll introduce a simple utility class called a ‘blocking queue’. The nature of this queue is that dequeue operations always produce a Future instead of directly producing a value, and if you dequeue from the container while it is empty, you get a future that will become complete once a value is stored into the queue. This ends up being a good fit for pipelining work.

As you can see, the blocking queue is pretty simple. It’s built out of a pair of queues – one for values awaiting a caller to Dequeue them, and another for futures awaiting a caller to Enqueue a value to fill them. Dequeueing a value pulls first from the value queue, and if it is empty, it creates a Future and pushes that onto the waiter queue instead. Enqueueing a value pulls first from the waiter queue, and if that’s empty, it pushes the value into the value queue. With this blocking queue primitive, we can build a simple pipelined texture loader:

TextureLoaderTask is left running in the background, responsible for pulling work items off the texture load queue one at a time and completing them. In this case, we assume it’s not possible to load a texture asynchronously (it rarely is), so we do the actual load operation on the thread pool. Consumers of this system call LoadTexture with a filename, and get back a Future that will contain the texture once the load is complete.

We now know we won’t be paying the cost of simultaneous I/Os, because we’ll be loading one texture at a time, and we won’t be creating a bunch of idle threads waiting to acquire a lock. However, there’s still a lot of room for improvement in the real world. Even though we’re only performing one I/O operation at a time, platter based storage devices like hard disks have two kinds of I/O: random access and sequential. Random access I/O is prohibitively expensive, because each I/O operation requires the platter to be rotated into the correct position to perform the read or write. Sequential I/O is significantly faster, because the drive doesn’t have to rotate between each read – it will already be in the right position after completing the previous I/O operation. Right now, we’re still going to pay the cost of random access on every texture load operation.

If we’re willing to incur some latency on individual texture loads, having all loads go through the queue can allow us to minimize the cost we pay for random access I/O. Instead of pulling work items off the queue one at a time, we can pull them off in groups, and add a short delay in order to batch up texture load requests that occur within a short time period, and then sort the requests based on their location on disk, like this:

Of course, this solution is unlikely to accomplish much when using the regular file system, because there is no way to guarantee that a set of files are sequential on disk, or even to prevent them from being fragmented. However, if we to store textures in an archive file format like ZIP, we can use the information stored within the ZIP file to determine the correct order in which to load our textures, and as long as the archive file is not significantly fragmented, we will pay little or no cost for disk seeks because even if our textures are not directly sequential, they will be close to each other and the disk will not have to move very far.

Batching up work like this allows us to benefit from other improvements as well – for example, if we have multiple CPU cores to utilize and texture decompression is particularly expensive, we can do all the I/O for a batch sequentially and then use multiple threads to decompress the textures in parallel, without having to invest a lot of work into threading machinery.

For one real-world example, I use this kind of batching in my Data Mangler key-value store in order to perform read operations in parallel across available CPU cores. Because individual work items still occur sequentially, and each work item is either a read or a write (types aren’t mixed), I don’t have to invest effort in complex locking schemes used by more sophisticated database systems. As a result, consumers of the library can queue up dozens of database operations at once, and then wait for the futures while the work is done in parallel across all the system’s available cores, without having to do any locking or synchronization.

At this point I’m out of good ideas for ways to go into depth on this topic, so I’d welcome any questions or suggestions from those of you who’ve been reading. Got a problem you’d like to tackle with these techniques? Curious about potential drawbacks? Let me know, and I can go into more depth in future posts.

Solving Problems With Asynchrony: Build Your Own Future

The following post is cross-posted from AltDevBlogADay.

In my previous posts I showed how we can apply the Future primitive, along with C# iterator functions, to tackle some real-world problems. Now, I’ll dive in a bit deeper and show how, given a relatively simple implementation of Future, you can build more complex systems around it. For the examples in this article, I’ll be using a stripped down version of the interface provided by my particular implementation of Future, which looks like this:

A few things are worth mentioning here: We go to great lengths to ensure that information is not lost; attempting to assign a result to a Future twice throws an exception, because the alternative could result in information being silently discarded.
Likewise, we ensure that when registering an OnComplete callback, the callback is invoked even if the Future already has been completed. When dealing with traditional event handlers or synchronization mechanisms it’s possible to ‘miss’ an event by registering your callback too late, and end up with an application that’s hung silently for no reason. By invoking any callback that’s registered after our Future has become completed, we prevent that problem.
And in cases where a naive application directly accesses a Future‘s Result property without checking to see if the operation in question has completed (or failed), we throw an exception if no result is actually available. If we were writing this class from scratch, we might have considered instead returning a default value – default(T), or null – but doing so means that it’s possible for a failed operation to be missed if the end user forgets to write error handling code.

One of the simplest ways to perform some work asynchronously is to just run it on a thread, so it’s useful to have a straightforward way of doing this. We care about being able to wait for our work to complete, and we want to be able to get the result of the work, so we want to get a Future representing the work. Let’s implement a simple version of this:

Our solution is pretty simple: we take a function, and construct a future of the appropriate type to hold the result of the function. We shove a callback onto the .NET thread pool that will invoke our function, and then store the result into the future (or, if it fails, store the exception). The design decisions made when designing Future mean that as a result, this function will always behave the way we want to do, and avoid discarding any information. (Keen-eyed readers may spot an edge case where information can in fact be lost. This can be fixed, but I omitted the fix from the example since it’s not particularly relevant.)

Now that we have a way to take some simple work and make it asynchronous, what about waiting for work? We can easily wait for a single piece of work by registering a callback, but waiting for multiple pieces of work gets complicated – we could write an iterator function and yield upon each value in sequence, but that gets kind of awkward. We could also use a thread or threadpool work item to do this – in fact, the .NET ThreadPool includes a helper method specifically designed for this purpose – but it’s pretty silly to tie up a thread just to wait for work. As it turns out, we can implement a simple waiting primitive – let’s call it WaitForX – that allows us to construct a so-called ‘Composite Future‘ from a set of Futures that will become complete once X of the set’s Futures have become complete. Using this method, we can trivially implement WaitForAll and WaitForFirst helper methods:

Basically, we fill a list with the futures we want to wait for, and then register a callback on all of those futures so that we can find out if any of them become completed. The callback checks the current count of the list, and then either removes the future that was just completed from the list, or clears the list. The first future to be completed once the list reaches the specified count value causes our Composite Future to become complete as well. This allows us to wait for an entire set of Futures, or simply wait for any one of the Futures in a set to become complete. Even better, if we’re using this mechanism to wait for the first Future in a set to complete, a reference to that Future is stored into the composite, so we can pull it out and respond to whatever work that Future represented.

Now that I’ve given you a bit of a glimpse into how these building blocks are written, in my next post, I’ll describe how we can use them to tackle the problem of scaling an application up to take advantage of multiple cores, and pipeline work so that we get good performance when doing IO.

Solving Problems With Asynchrony: Asset Loading

The following post is cross-posted from AltDevBlogADay.

In my previous post, I gave a simple introduction to concurrency and asynchrony, using a cinematic scripting language as an example. We saw how the introduction of two key tools – an object representing work, called a Future, and a way for our code to wait for work to complete – allowed us to replace the use of threads (along with all the complication they implied) in our cinematic scripts, and keep the code short and understandable.

In this post, we’ll see how the four concepts introduced in the first post – concurrency, parallelism, synchronization, and asynchrony – apply to another problem.

In almost every game, loading your assets requires some time investment. You’ve got textures, models, sound, levels, etc – and in more complex games, all sorts of secondary assets enter the picture, like scripts, configuration files, and animations. For simple games this is addressed with the addition of a loading screen; you toss that thing up and load all the assets you need in one go (hopefully you don’t forget any) and then go about your merry way. But in general, players don’t enjoy load screens, so we like to strive to avoid them, or at least keep them short. In modern games, assets are often loaded in the background as the game runs, with great effort expended towards figuring out when to load a given asset and when it is no longer needed, to keep the game running smoothly without pauses.

Looking carefully at the problem of asset loading reveals that it requires all four of our concepts:

  • Asset loading should be Concurrent, because we will often need multiple new assets at once and we do not want to stop the game itself while assets are loaded.
  • Asset loading should be Parallel, because the loading of a texture should not prevent the loading of a sound effect from occurring at the same time, and more importantly if neither of these assets are required by the game at this very moment, the game should be able to continue.
  • Asset loading should be Synchronized with the state of the game, so that an asset is always loaded before the game attempts to use it – otherwise, partially-loaded assets could be used and the game will crash or look wrong.
  • And finally, asset loading should be Asynchronous, because if it isn’t, we’re forced to throw up a load screen and make the player wait.

For our example, let’s say that we’re loading a level in a very simple 2D game. The level lives in a map file, and the map file also depends on the contents of a few other assets – a tileset containing graphical tiles we used to construct the map, some individual creatures that inhabit the map (with their own textures and sound effects), and a piece of background music. Based on what we know, we can say this:

  • The level depends on its tilesets, creatures and background music.
  • The creatures depend on their textures and sound effects.
  • The tilesets do not depend on anything.
  • The background music does not depend on anything. Furthermore, our sound library might be able to stream it for us, so once playback has begun, we no longer need to wait.
  • If possible, we want to use some of these assets before they are finished loading: We can fade the background music in while the level loads, and we could show the level before it’s finished as long as all on-screen content has loaded.

Let’s begin tackling the challenge of loading this level, with the most obvious solution:

class Level {
    // ...

    Level (string filename) {
        using (var f = File.OpenRead(filename)) {
            this.Tilesets = ReadTilesets(f);

            foreach (var tileset in this.Tilesets)
                tileset.Texture = ReadTexture(tileset.TextureName);

            this.Tiles = ReadTiles(f, this.Tilesets);

            this.Entities = ReadEntities(f);

            foreach (var entity in this.Entities) {
                entity.Texture = ReadTexture(entity.TextureName);

                foreach (var soundName in entity.SoundNames)
                    entity.Sounds[soundName] = ReadSound(soundName);
            }

            var backgroundMusicName = ReadString(f);
            this.BackgroundMusic = ReadSound(backgroundMusicName);
        }
    }

    // ...

This isn’t an awful solution, but it’s painfully sequential: First, we read the tileset information from the file. Then we read in the textures for each of the tilesets. And so on.

Let’s begin by turning this into a coroutine, using Futures. It’s pretty straightforward, because C#’s runtime library gives you a way to do asynchronous IO, so we’ll assume the presence of some simple wrapper routines to give us Futures for things like file reads. We’ll ignore some language nits for now, like that constructors can’t be coroutines…

using (var file = File.OpenRead(filename)) {
    var fTilesets = ReadTilesets(file);
    yield return fTilesets;
    this.Tilesets = fTilesets.Result;

    foreach (var tileset in this.Tilesets) {
        var fTexture = ReadTexture(tileset.TextureName);
        yield return fTexture;
        tileset.Texture = fTexture.Result;
    }

    var fTiles = ReadTiles(file);
    yield return fTiles;
    this.Tiles = fTiles.Result;

    var fEntities = ReadEntities(file);
    yield return fEntities;
    this.Entities = fEntities.Result;

    foreach (var entity in this.Entities) {
        var fTexture = ReadTexture(entity.TextureName);
        yield return fTexture;
        entity.Texture = fTexture.Result;

        foreach (var soundName in entity.SoundNames) {
            var fSound = ReadSound(soundName);
            yield return fSound;
            entity.Sounds[soundName] = fSound.Result;
        }
    }

    var fBackgroundMusicName = ReadString(file);
    yield return fBackgroundMusicName;

    var fBackgroundMusic = ReadSound(fBackgroundMusicName.Result);
    yield return fBackgroundMusic;
    this.BackgroundMusic = fBackgroundMusic.Result;
}

This is much bigger, and there’s a lot of noise – all those explicit yield returns, and the need to access .Result. Don’t worry too much; even in a rigid language like C#, you can eliminate a lot of that. Let’s take note of a couple interesting ways in which we can now improve on this:

  • Only some of the operations we perform actually depend on file. The others that don’t can, assuming they don’t depend on global state, be run in any order.
  • Furthermore, each of the distinct object types we load does not depend on the other objects: Entities do not interact with tiles or tilesets, and the background music doesn’t interact with anything.

Based on these two observations, we can split this routine up into multiple helper Tasks, enabling further improvements. For the examples that follow I’ll be using some helper routines to cut down on the noise, too.

Task LoadLevel (string filename) {
    using (var file = File.OpenRead(filename)) {
        // We need to read sequentially from the file, so we must kick off the
        //  reads in the correct order. But since they produce futures, we don't
        //  have to *wait* for them in that same order!
        var fTilesets = ReadTilesets(file);
        var fTiles = ReadTiles(file);
        var fEntities = ReadEntities(file);
        var fBackgroundMusicName = ReadString(file);

        // Kick off three helpers to finish loading data, and then wait
        //  for those three helpers, plus the tiles.
        yield return new WaitForAll(
            FinishLoadingTilesets(
                fTilesets, (tileset) => this.Tileset = tileset
            ),
            FinishLoadingEntities(
                fEntities, (entities) => this.Entities = entities
            )
            FinishLoadingMusic(
                fBackgroundMusicName, (music) => this.BackgroundMusic = music
            ),
            fTiles
        );

        // We didn't start a helper to take care of waiting for the tiles.
        // They're done now, so we can assign them.
        this.Tiles = fTiles.Result;

        // We can't close the file until everything depending on it is done,
        //  so we keep the 'using' block open until here.
    }
}

Task FinishLoadingTilesets (Future<TilesetCollection> fTilesets, Action<TilesetCollection> doneLoading) {
    yield return fTilesets;
    var tilesets = fTilesets.Result;

    foreach (var tileset in tilesets) {
        var fTexture = ReadTexture(tileset.TextureName);
        yield return fTexture;
        tileset.Texture = fTexture.Result;
    }

    doneLoading(tileset);
}

Task FinishLoadingEntities (Future<EntityCollection> fEntities, Action<EntityCollection> doneLoading) {
    yield return fEntities;
    var entities = fEntities.Result;

    foreach (var entity in entities) {
        var fTexture = ReadTexture(entity.TextureName);
        yield return fTexture;
        entity.Texture = fTexture.Result;

        foreach (var soundName in entity.SoundNames) {
            var fSound = ReadSound(soundName);
            yield return fSound;
            entity.Sounds[soundName] = fSound.Result;
        }
    }    

    doneLoading(entities);
}

Task FinishLoadingMusic (Future<string> fFilename, Action<Sound> doneLoading) {
    yield return fFilename;

    var fBackgroundMusic = ReadSound(fFilename.Result);
    yield return fBackgroundMusic;  

    doneLoading(fBackgroundMusic.Result);
}

In this new version, we’re kicking off all the file reads right at the start. Since we start them in the correct order, as long as our file implementation is written correctly, they will read from the right positions in the file, and their Futures will become completed in that order, preserving the behavior we want, while allowing us to continue.

Similarly, we kick off helper tasks to finish loading each of those pieces of data, by handing them the future for the file read along with a callback to invoke when they’ve finished their work. The callback stores their finished data into its correct place. Once we’ve kicked off all the helper tasks, we wait for them all to complete. The order in which they complete doesn’t matter to us; we just want to know that everything’s finished. Once all the work is done, we can clean up.

By simplifying the main level loading function, what we’ve done here is encapsulate each distinct piece of work into something we can wait for – a Future – and by doing that, it’s become easy for us to start a lot of work and wait for it all to finish, instead of explicitly doing things in a specific order. Thanks to this new structure, many clever optimizations are possible. Here’s one example:

Task FinishLoadingEntities (Future<EntityCollection> fEntities, Action<EntityCollection> doneLoading) {
    yield return fEntities;
    var entities = fEntities.Result;

    var fTextures = ReadTextures(
        from entity in entities
        select entity.TextureName
    );
    var fSounds = ReadSounds(
        from entity in entities
        from soundName in entity.SoundNames
        select soundName
    );

    yield return Future.WaitForAll(
        fTextures, fSounds
    );  

    var textures = fTextures.Result;
    var sounds = fSounds.Result;

    foreach (var entity in entities) {
        entity.Texture = textures[entity.TextureName];

        foreach (var soundName in entity.SoundNames)
            entity.Sounds[soundName] = sounds[soundName];
    }    

    doneLoading(entities);
}

The fact that we no longer explicitly specify the order of work has allowed us to turn a series of individual asset loads into a pair of load operations that run on multiple files at once. Thanks to this improvement, we can now do all sorts of clever things in our asset loader – loading files ordered by their position on disc to avoid seek times, loading multiple files in parallel using multiple threads, etc – without any changes to this helper function.

If the places where your code waits for work are put there explicitly, you’re free to move them around. What we’ve done here is not only move those waits around, but combined them in order to make our code easier to optimize.

So far, we don’t have everything we want out of our asset loader, but we’ve gone from a strictly sequential loader to a loader that can easily take advantage of multiple cores and load levels in the background while the game is running. Not a bad start!

I’ll continue with this example in my next post, but for now, let’s stop for a moment to discuss Future, and some of its traits that make this possible:

  • A future represents a single value of a known type that may be available in the future.

    This trait makes it possible for us to do things like silently queue up work in the background – the actual work is not required to begin when the Future is created.

  • If the operation producing the value fails, that failure is captured within the future, and anyone who accesses the future to retrieve the value is given the failure instead.

    This trait ensures that any unhandled exceptions or unexpected failures within asynchronous work are not lost. Knowing this, we can limit our writing of error handlers to a few specific tasks, allowing child tasks to fail silently and propagate their errors up to the parent task that started them. When the parent task sees the error, it can respond as it sees fit – retry the operation, notify the user, etc.

  • It is possible to ask a future to notify you when work is completed, instead of having to sit and wait for the work to complete.

    This makes it possible for us to perform asynchronous work without being concerned about the nature of the work – we don’t need to use threads to wait for work; we can simply register a callback to be invoked when work is finished. The Scheduler in our examples uses these callbacks to ensure that a Task resumes once the Future it is waiting for has a value.

  • Once a future contains a value or a failure, its contents will never change. If a future does not contain a value or a failure, a temporary failure will be returned when you access it.

    This makes it possible for us to ignore a large number of potential race conditions and synchronization issues. As long as the contents of the object you place within a Future are not modified, you need not protect the object with a lock or any other synchronization mechanism; you either have a completely valid object, or no object.

These traits are relatively simple to provide – a simple Future class can be implemented in a few dozen lines of code in all the major languages I can think of – and they enable you to combine Futures to solve a variety of complex problems without spending much time thinking about threads or synchronization.

Solving Problems With Asynchrony: Cinematics

The following post is cross-posted from AltDevBlogADay.

I’m sure many of you are familiar with concurrency. I’m going to tell you about a related concept – Asynchrony – that deserves your attention. Let’s start by defining four terms:

Concurrency is about multiple things happening over a period of time. Each piece of work proceeds towards completion without regard for the status of other work.
Parallelism is about multiple things happening at the same time. Each piece of work proceeds towards completion independently of other work, for example, in a separate thread.
Synchronization is about ordering the performing and completion of work. When you have a large set of work to perform but do not know the order in which it will complete, Synchronization is used to prevent things from interfering with each other, and to ensure that work completes in a predictable order.
Asynchrony is about being free to do other work while you wait for a thing to finish. You don’t know how long the thing will take, and it may never finish.

Let’s dive in to a real world example.

Late during development of one of my first industry projects, I was moved from content design to our ‘cinematics team’, in order to help get our cinematics back on schedule. Until that point, the work had been handled by an unlucky effects artist that somehow ended up with the job of writing scripts using C macros – and he was falling behind.

I quickly realized that not only were cinematics more complex than I had realized, but our approach to scripting them was deeply flawed. The cinematic system for this particular engine was built by our lead programmer, and while the code was well-written the design failed to address our basic needs for cinematics.

The core problem we dealt with was timing: Everything in a cinematic took time to happen, and the scripting system did not let us easily wait for those things to occur, or find out how long they would take. We coped by manually timing important events, and then padding those durations out in hopes that it would shield us from variations that occurred when the cinematic actually ran, but player freedom meant that things still went wrong. Later we were given some basic concurrency primitives, like threads, but at its core the scripting language still failed to make timing simple.

I’ll begin with a simple example of what one of those scripts looked like:

BEGIN_THREAD(Camera)
  CameraLookAt(Position_Player_Start, Interpolation_None, 0),
  Wait(1000),
  CameraLookAt(Position_Player_Move_1, Interpolation_Linear, 1000),
  // ...
END_THREAD

BEGIN_THREAD(PlayerMovement)
  CharacterTeleport(Character_Player, Position_Player_Start),
  Wait(1000),
  CharacterMoveTo(Character_Player, Position_Player_Move_1),
  // …
END_THREAD

Over time as our needs changed, the scripting system grew until we had hundreds of different commands, all with their own timing rules. Every script had a handful of these threads, each one running for hundreds of seconds, every command carefully timed to line up. Building these scripts wasn’t hard due to a lack of tools: It was hard because we were given the wrong tools. Adding more tools to our scripting language didn’t fix the problem of the design being fundamentally wrong for our problem.

We spent a lot of time moving characters around, usually one or two at a time. The paths they took were fairly predictable, and they couldn’t do anything else while they ran. Combining this with our timing problems meant that if a character was moving, he couldn’t point or wave or turn around until a little bit after he stopped moving. This didn’t just make the cinematics hard to author – it made them look stilted, too. Moving a couple characters around would look something like this:

/* God only knows where the players are, so let’s teleport them to where we expect them to be and hope that they don’t notice */
CharacterTeleport(Character_Player, Position_Player_Start),
CharacterTeleport(Character_HeroicLeader, Position_HeroicLeader_Start),
CharacterMoveTo(Character_Player, Position_Player_Destination),
CharacterMoveTo(Character_HeroicLeader, Position_HeroicLeader_Destination),
Wait(5000) /* Hand-picked value for how long it should take to move from A to B. Let’s hope the artists don’t change the level geometry! */
CharacterAnimate(Character_Player, Animation_Wave);

The first key observation is that these scripts could benefit from being real, compiled code instead of a custom scripting language – nothing they’re doing is impossible in native C, and you get a debugger and compiler validation, instead of having to reinvent them yourself. For simplicity, I’m going to present my examples in C#. Let’s begin by translating that snippet directly:

Character Player, HeroicLeader;
Position Player_Start, Player_Destination, HeroicLeader_Start, HeroicLeader_Destination;
Animation Animation_Wave;

Player.Teleport(Player_Start);
HeroicLeader.Teleport(HeroicLeader_Start);

Player.MoveTo(Player_Destination);
HeroicLeader.MoveTo(HeroicLeader_Destination);

Thread.Sleep(5000);

Player.Animate(Animation_Wave);

This is pretty straightforward to implement if you know a little bit about threading. Some use of library primitives lets us kill the hard-coded sleep and make the behavior clearer:

Player.Teleport(Player_Start);
HeroicLeader.Teleport(HeroicLeader_Start);

WaitHandle.WaitAll( new [] {
  Player.MoveTo(Player_Destination),
  HeroicLeader.MoveTo(HeroicLeader_Destination)
});

Player.Animate(Animation_Wave);

We’ve taken some implicit information – like how long a MoveTo command takes, and whether it blocks – and made it explicit by turning it into a value (in this case, a WaitHandle). While this is already an improvement on the script we started with, the idea of having your designers or artists writing threading code probably makes you nervous. Let’s step back and re-evaluate some of our assumptions to see if we can make this less complicated.

We’ve established that we need the ability to have multiple things happen at once, but doing this does not require the use of threads; the only time we ever need a thread is when we want to be performing multiple computations at the same time. In this specific case, what we want is to know when something in the game engine has finished – a character has moved from point A to point B, or an animation has finished playing. This doesn’t require threads or explicit parallelism, because our game engine is already going to be playing animations and moving characters around. What we want here is to be notified when our events of interest have occurred. Since C# has built in support for events, we can try using those instead:

void step1 () {
  Player.Teleport(Player_Start);
  HeroicLeader.Teleport(HeroicLeader_Start);

  var moved = new bool[2];
  Player.MoveTo(Player_Destination).OnComplete += () => {
    moved[0] = true;
    if (moved[1])
      step2();
  };
  HeroicLeader.MoveTo(HeroicLeader_Destination).OnComplete += () => {
    moved[1] = true;
    if (moved[0])
      step2();
  };
}

void step2 () {
  Player.Animate(Animation_Wave);
}

In this example, commands like MoveTo now return an object representing the movement command, with an OnComplete event. We can subscribe to the event in order to find out when the movement completes. We could write some utilities to automate things like registering a callback on multiple events to make this example code a bit smaller, but one problem still persists: We now have to split each ‘thread’ of our cinematic into multiple functions, and carefully wire up the callbacks so that one function always calls the next. If we make a mistake here, playback will just stop, and we won’t have any state to inspect in order to determine what happened.

One thing that would make this easier is support for co-routines. An important trait co-routines possess is that they cooperate: Execution passes explicitly from one co-routine to another, at points of the programmer’s choosing. This has roughly the same advantages for us as pure event-based programming, but it lets us write single, sequential functions instead of having to chain up callbacks. Since our cinematic script’s ‘threads’ have specific places where they want to wait, we can use co-routines to represent them, by having the co-routine give control away when it’s time to wait for an event. As a result, we get synchronization, concurrency, and asynchrony, without the use of parallelism.

In C#, you can implement simple co-routines atop the language’s support for iterator functions. In an iterator function, you have two new statements – yield return and yield break – that allow you to specify points where your iterator should be suspended, and produce a value. With the addition of a simple scheduler to manage your iterators and wire up callbacks when they wait on an event, you’re ready to go. Let’s ignore the inner workings of this system for now and look at the iterator equivalent to the above code:

Player.Teleport(Player_Start);
HeroicLeader.Teleport(HeroicLeader_Start);

yield return Future.WaitForAll(
  Player.MoveTo(Player_Destination),
  HeroicLeader.MoveTo(HeroicLeader_Destination)
);

Player.Animate(Animation_Wave);

The code is about the same length, but now we don’t need an operating system thread to run it. In this case, we’ve taken that object representing a movement command from the last example, giving it a name – Future – and written a simple helper method that constructs a single future from a group of futures. When our iterator yields on one of these Futures, the co-routine scheduler can automatically wire up event listeners in order to resume the co-routine once the Future it’s waiting for has completed.

In practice, the code necessary behind the scenes to run a script like the above is actually very simple – not much larger than the equivalent event-based code. Debugging also becomes easier, because the compiler turns your iterators into a simple state machine, and you can inspect the state of those iterators in the debugger, or log it to a file, for easy troubleshooting.

Futures also happen to be simpler to implement than most other synchronization primitives because they are one-shot signals – once they’ve been written to, they’re never written to again. This means that despite us writing our iterators as if they are single-threaded, they can actually interact safely with other threads, as long as they only interact through Futures! This frees you up to construct the innards of your game engine in any way you like, while your cinematic scripts live in a single-threaded paradise.

In this example, we’ve taken advantage of the fact that our cinematics did not require concurrency – they had no need to actually run multiple OS threads in parallel – they just depended on asynchronous events, like characters moving around. This fact allowed us to strip out unnecessary threading and synchronization baggage while still keeping the traits we cared about.

In my next post, I’m going to explain in more detail how you can implement a primitive like Future, and how you can use it to solve real-world problems, even if your language doesn’t offer a feature like iterator functions. I’ll also explain how this approach can make your software more reliable and easier to debug.