One of the things I’ve been working on lately is a way to collect and store data on play sessions, so I can track achievements for players and track where players spend the most time in a level, or where they die the most. There are lots of details to get right for a system like this, but I’ve at least gotten a prototype working and started experimenting with ways to visualize the data.

For a system like this you have a few important pieces:

  • Your game needs to collect data during play sessions – in my case, I track important events like player/creature death, and periodically sample the player’s position to get a general idea of where players are in a given level during their session.
  • You need a way for your game to periodically report collected data to a remote server. You have to handle lots of edge cases here – for example, it’s likely some people will play without an active internet connection or suffer temporary loss of connection, so you need to batch up events to report later in this case – you definitely don’t want the game to fall over and choke without access to the internet, and if possible you want to avoid losing data too.
  • You need to build a set of services to collect data from the game. This typically means you also need services to handle things like uniquely identifying individual play sessions and computers, so that you can track achievements for individual players and perform analysis on individual sessions instead of only on a player’s entire play history.
  • You need to build an in-game frontend, to expose collected data to a player. This means turning your event data into user-friendly statistics – like a kill counter, or an achievement for killing a particular boss. One interesting challenge here is that you probably want to expose this information online, too, so that players can share their profiles and achievements.

In this post, I’m going to cover the first two.

Collecting Data

So, first, you need a way to collect important player data. For it to be useful, you need to collect it in as consistent a format as possible; you don’t want every piece of data to have a unique set of parameters attached to it, for example, because that would make it impossible to perform any generic analysis of your data. Likewise, you want to try and keep your data ‘denormalized’, by storing as much relevant data together as possible. If you achieve both of these goals, you end up with a stream of ‘events’ that represent the things you care about in a form that lets you inspect them individually or analyze them in large groups.

In my case, this step was actually quite straightforward. I already had a simple ‘event bus’ integrated into the game, that I was previously using to synchronize animation and sound effects with gameplay. Given this, it was simple to create a data collector that attaches to the event bus and listens for events that it wants to collect.

I did, however, have to make some changes – many of my events didn’t expose the information necessary to be useful; for example, the event representing a player death didn’t contain any information to tell you who (or what) killed the player, so I had to tweak the game code to attach that to the event. Likewise, many other events didn’t include position information, which made it hard to get meaningful information about them – knowing that the player grabbed onto a ledge isn’t particularly useful if you don’t know *where* or *which ledge*. In some cases, the data collector attached important parameters into the events so that game objects wouldn’t have to; for example any event coming from the Player automatically has PlayerX and PlayerY parameters attached to it (since the Player is the source of the event, it is simple to extract a position and attach it to the event).

One thing that required some special treatment as well was recording the player’s position over time – while attaching position information to events gives you a general idea of the player’s location, it’s not very useful for figuring out where players are getting stuck or confused. To address this, I built a simple class that periodically samples the player’s position and adds it to a list. Every time the list reaches a certain size, its contents are batched up into a single event – a ‘HistoryBatch’ – and sent to the data collector. This allows me to get fairly high-resolution position data without having to report hundreds of individual events every minute (which would have contained lots and lots of redundant information).

Reporting Data

Once you have your player data, you need to report it to your server (and probably store some of it locally, too). I honestly haven’t tackled the latter part yet, but here’s how I handled the former:

To handle periodically reporting events, I created a simple ‘Event Reporter’ class that collaborates with the data collector. Essentially, the event reporter maintains a queue of pending events, and runs a thread that is responsible for sending batches of events to the remote server once the queue gets large enough. It automatically waits a certain amount of time between requests, and attempts to batch events into groups instead of reporting them one at a time as they occur. This allows me to get relatively low latency (if I want it) but still remain relatively efficient in my use of the network. Whenever the data collector gets an event, it adds it to the event reporter’s queue.

Once I have a batch of events ready to send to the server, I pack them together into a class that’s laid out optimally for network transmission. I then use .NET 3.5’s built in JSON serializer to convert the batch into a blob of JSON data ready to send to my remote server. A typical blob looks like this:

{
  "PlayerId":xxxx,
  "SessionId":yyyy,
  "Events":[
    {"Type":"ActiveControllerChanged","Time":0,"Data":{
      "IsKeyboard":true,
      "ConnectedGamepads":0,
      "ActiveController":0
    }},
    {"Type":"LevelLoaded","Time":5436,"Data":"troupe"},
    {"Type":"HistoryBlock","Time":7788,"Data":{
      "LevelName":"troupe",
      "PlayerX":[-136,-74,35,192,209,209],
      "PlayerY":[588,875,920,920,920,920]
    }}
  ]
}

As you can see, events are small when possible – they simply store the event type and the timestamp at which the event occurred, along with their associated information. In cases where an event has lots of information attached, I send a dictionary, but in some cases I can just send a single value. In the case of a history block, I make sure to send the X and Y coordinates as individual lists, so that I don’t have to waste time encoding ‘PlayerX’ and ‘PlayerY’ along with every individual coordinate. A batch of events has a player ID and a session ID attached, which allows me to analyze individual play sessions and track achievements. (Note that both of these IDs are randomly generated on the server, so they’re anonymous.)

The reporter starts an HTTP POST to send the JSON to the server, and goes to sleep until it’s finished. If the send fails, the events are left on the queue and the thread sleeps for a while so it can retry sending them later. Once a send completes, the reporter checks to see if the queue is empty – if the queue is now empty, it goes to sleep until the queue contains items again. If the queue still contains items, it sleeps for a short while and prepares to send another batch – this allows the event reporter to spend the majority of its time asleep, but still handle large bursts of events without sending a huge blob of events at the server in one POST (since that would be very likely to fail or time out.)

One important detail is that the reporter needs to be able to correctly report events even if the player quits. In my case, the player can quit at any time, and I can end up with a very large number of events in the queue at the time. To deal with this, once the game has finished tearing down, the event reporter is given around 30 seconds to report any remaining events to the server. This usually allows events to get through, but prevents the game from hanging indefinitely on exit (in the event that there’s some sort of network failure, or the event queue is really full). Since this occurs after teardown, the game window is gone so the player doesn’t get the impression that the game has hung, and since the reporter spends most of its time sleeping, they won’t see any significant CPU usage either.

The reporter also runs in its own isolated thread and uses its own task scheduler, instead of sharing resources with the game. This increases the likelihood that the game will be able to successfully report events in the case of a bug or crash, and also means that if the reporter fails for some reason – probably a network outage – it won’t take the game down with it.

In my next post, I’ll cover the remaining two items and show you some of the things you can do once you’ve gathered player data. In the interim, you can take a look at my player profile!