In my previous post I gave an introduction to achievements and player data collection. In this post, I’ll cover the remaining two significant pieces: Services and front-ends. Unfortunately neither of these will be quite as complete as the coverage in the previous post, since I haven’t completely finished implementing these parts of my achievement system. Whoops!
Services
Boy, that’s a generic heading. Anyway, so your game is collecting important data on gameplay events, and it’s reporting that data periodically to a web service on your server. Unfortunately, since you haven’t actually written that service yet, it doesn’t seem to be working. Odd how that goes, isn’t it?
The primary issue you have to deal with when building a service to recieve collected data from your game is storing the data. There are other incidental issues – security, performance, serialization, etc – but none of them are of any significance if you can’t find a way to reliably store and access the data. This isn’t a simple problem, but there are ways to handle it.
Essentially, the biggest issue you have to confront here is that you are going to have a lot of data. You may not have a lot of data now, but you will eventually. On one hand, you could spend weeks and weeks trying to design the perfect solution for storing all of this data – but doing that doesn’t actually get your game any closer to shipping, and doesn’t actually guarantee that you won’t have scalability issues down the road. On the other hand, if you completely ignore the problem, you risk losing some of the data you’ve already gathered by discovering significant design issues after you ship. In practice, you probably want to prioritize shipping over the longevity of your data, since (assuming you ship and your game isn’t terrible) you can always get more data.
As a starting point, I decided to build my services on Google App Engine, since it was relatively easy to get up and running and provides fairly generous quotas for free.
I built my services in Python, using the provided datastore and JSON modules to handle most of the work. For those unfamiliar with how App Engine works, essentially your process is this:
- Modify your app.yaml file to specify the URLs of your services and the python scripts that implement them.
- Create one or more Python classes to be used for storing your data in the GAE datastore.
- Implement your services by creating python WSGI modules to process incoming requests and store the resulting data in the datastore.
For those more familiar with traditional databases, working with the App Engine datastore can be a challenge – its write performance borders on terrible, and the documentation is somewhat sparse. However, the actual API is very easy to use and since you’re using Python, most of the problems you’re dealing with are easy to solve. It helps that most of Django is included with the GAE APIs, since that means you have access to all sorts of useful tools like their HTML template module.
So, for the event format I showed in my previous post, the Python class looks like this:
class Event(db.Expando): playerId = db.IntegerProperty() sessionId = db.IntegerProperty() timeMs = db.IntegerProperty(indexed=False) eventType = db.StringProperty()
Fairly straightforward. Note that I’m deriving from Expando and not Model – to get an idea of what this means you’ll need to read the GAE documentation, but essentially this lets me add additional, unindexed fields to my instances as necessary. Since I only plan to do most queries based on event type and player ID, this is perfect.
Given that class definition, in my service, I can simply unpack the JSON I recieve from a game client and convert it into one or more Event instances, ready to send to the datastore:
playerId = int(body["PlayerId"])
sessionId = int(body["SessionId"])
events = body["Events"]
put_list = []
for event in events:
eventData = event["Data"]
extraArgs = {}
while eventData:
if isinstance(eventData, types.DictType):
d = eventData
eventData = None
for k in d.iterkeys():
value = d[k]
if k == "Data":
eventData = value
elif isinstance(value, types.DictType):
continue
else:
key = ("%s" % (k,)).encode("ascii")
extraArgs[key] = value
else:
extraArgs["eventData"] = eventData
eventData = None
evt = Event(
playerId=playerId,
sessionId=sessionId,
timeMs=event["Time"],
eventType=event["Type"],
**extraArgs
)
put_list.append(evt)
db.put(put_list)
The resulting data format doesn’t allow you to answer every question you could possibly have, but does leave you with a relatively easy to query volume of data in the google datastore. It’s also very simple, which makes it relatively straightforward to debug failures and make changes.
As mentioned before, the datastore’s write performance borders on terrible. If you attempt to perform a put on multiple objects in sequence, its write performance goes from bad to earth-shatteringly horrible. However, the datastore API allows you to hand it a list of objects, and it will attempt to put them all at once, which significantly improves your performance so that it’s at least tolerable. If I were using a relational database like MySQL this would be somewhat equivalent to trying to insert all my rows in a single multi-line query to reduce round-trip time.
Also of note are the somewhat ridiculous contortions I do here to convert event data into fields for the Event instance. Essentially, if the incoming event’s Data field is just one value, I shove it into a single field inside the Event – but if it’s a dictionary, I dig through it looking for values and store them into corresponding fields on the Event instance. This ensures that all the values attached to the event end up with corresponding columns in the datastore.
This also illustrates some slightly interesting design decisions: Events don’t have any global timestamp, nor do they have a sequence number – all they have is a player ID, session ID, and timestamp relative to the beginning of the session *in game time*. This is mostly a pragmatic decision; I could provide a global timestamp and get some benefits out of it, but the costs associated with that are quite significant, since accurate timestamps interact badly with batching and the general problem of internet latency.
One particularly nasty sticking point is that GAE has fairly rigid limits on how much CPU and ‘API CPU’ time you can consume in a given request, or in a given minute, or in a given hour, or in a given day. These limits are extremely easy to hit if you are using the datastore in the manner I am – so it’s important to make sure that your game isn’t sending large batches, or your requests will time out and google will become very angry with you for wasting precious CPU milliseconds. In practice, I’ve found that batch sizes between 6 and 12 reduce network traffic without hitting the built-in GAE limits. Your mileage may vary based on the nature of your data.
It’s also worth pointing out that the HistoryBlock approach I described in the previous approach is a bit problematic here: You’re basically forced to store all those position values within a single event, because turning them into multiple events would absolutely destroy the datastore. Unfortunately, it appears that storing a long list of values within a single event also absolutely destroys the datastore. Oh well, what can you do? (Other than switch to another hosting provider)
Also note that this doesn’t account for security, which means you probably can’t deploy it in production, because people like breaking your services. Oh well!
Now that we’re all familiar with how frustrating web service development is, let’s venture into the comforting realm of data analysis:
Presenting Your Data (Front-ends)
Mmm, much better.
Now that you’ve got all this data sitting around in the GAE datastore, what are you going to do with it?
Among the many things you can do, here are a few good ideas:
- You can perform high level queries against your event data to generate general ‘gameplay statistics’ for player profiles.
- You can perform statistical analysis of your event data to detect patterns in usage and player behavior.
- You can take the position values attached to your event data, and use it to generate heatmaps representing certain events or behaviors – player movement, player death, creature death, etc.
Since I’m uneducated, I’m going to skip the statistical analysis and describe the other two.
Creating gameplay statistics from your data is actually relatively easy, as long as you’re not worried about performance (and right now, you probably shouldn’t be). Even if performance is an issue, you can often solve this easily by track counters and statistics in a summary table, either by maintaining a count in your service when storing events, or by writing a cron job to periodically update your summaries from the most recently stored events. Doing this will allow you to maintain statistics without having to query your entire event history.
For example, given my event format, I can determine how many times a given player has killed the Drowned Queen by querying the datastore like so:
player_events = db.Query(Event).filter( 'playerId =', playerId ) drowned_queen_kill_events = player_events.filter( 'eventType =', 'PlayerKilledCreature' ).filter( 'CreatureName =', 'DrownedQueen' ) drowned_queen_kill_count = drowned_queen_kill_events.count()
Essentially, what I’m doing here is taking my entire event data set, and iteratively filtering it down – first I start with the player ID, then I filter by the event type. Both of these columns are indexed, and each of these filters will eliminate a huge number of rows. After this I can filter by CreatureName; as long as I don’t have thousands of PlayerKilledCreature events this will be relatively cheap to do. After that it’s simple to ask the datastore how many events match my filter.
Given statistics like these, if you choose you can then come up with achievements and other information to display on a player’s profile page. Players tend to love these things, even though they’re not directly integral to a game experience. They give players a way to gauge their own skill at a game versus others, and also provide them with interesting ‘carrots’ to run after, whether it’s finding all the secrets in a game or defeating a boss without taking any damage.
So! Now for heat maps. If you’re not familiar with them, you might want to glance at the linked Wikipedia page, but they’re fairly easy to understand, at least conceptually. You essentially take a large volume of data (in our case, events) and use it to generate a bitmap that you can overlay on an image of a space that you care about. To give a real-world example, if you had a list of recent car accidents, you could generate a heat map from that list and overlay it on a map of your neighborhood, and possibly see clusters around particularly dangerous intersections. You can’t always rely on a heat map for drawing conclusions, but they make it much easier to spot patterns and trends because our brains are great at analyzing data when it’s presented in this format.
In my case, I want to take all that HistoryBlock data and convert it into a heat map, so I can figure out where players are spending most of their time. To do this, first I needed to get images of all my levels – this took some work, since I hadn’t previously done any offline rendering of my game content. In some cases you can do this sort of thing offline during your build process, but in my case since my renderer relies on Direct3D and XNA, I simply start up an instance of the game engine in a ‘batch processing mode’, and it scans through a level, using the 3D card to generate screen-size ’tiles’ of the level, one at a time, and save them out to disk. After that I scan through the tiles and scale them down to create a ‘minimap’ that represents the level at a given magnification.
Once I’ve got images for each of the levels, I can begin to generate a heat map for them. First, I query a service that pulls down all of the HistoryBlock events from the server and returns them as JSON. This sort of thing is likely to have scalability issues, so in practice you might want to only pull down a subset of your events – the most recent ones, or a random sampling – and use them instead. Right now my dataset is small enough that this isn’t a problem.
Once I’ve got the list of positions, I can scan over them and create the heat map. Essentially, what you want to do is create a 2D bitmap that is roughly the same size as your minimap, or some even multiple of it. Then, you can through all your positions and map them to points within that bitmap. Every time you successfully map a position to a point, you increment the value at that point, so that the most frequently reported locations have higher values.
Once you’ve done that, you can compute a minimum, maximum, etc. for the heat map, and use that to translate those ‘heat’ values into colors at each point. Then, all you have to do layer the resulting color bitmap onto your mini map, and you have a heat map. Here’s a few ones I just generated, based on the events I’ve been recording while testing out my services and achievement system. You may need to click on them to view them at full size:
This heat map was generated from a dozen or so play-throughs of the first level in the Level Up 2009 demo.
There are a few interesting conclusions you can draw from this heatmap – for example, you can easily spot the locations of switches/cranks, because those points are orange or red (from the player tending to stand in front of them to manipulate the object). Because of the short number of playthroughs, you can actually see faint traces of me dying by falling into the water.
The data also gives you a general idea of typical paths through the level, and shows you where players spend more time – for example, the ledge on the right side of the water is much brighter, because the player tends to stand there for a while fighting the monsters that are spawned there.
This presents some good opportunities to improve the flow of the level, address difficult platforming challenges, or make it easier for the player to find objectives (keen-eyed observers will note that there’s a secret passage in this level that I did not enter in any of my playthroughs. (: )
This one was generated from the small arena in which the player fights the boss from the demo, the Drowned Queen.
Even a quick glance at this heat map reveals some interesting patterns, despite the fact that it is based on a small amount of data.
I rarely died from falling into the water below, which indicates that the water is not particularly effective as a hazard (or that I’m particularly adept at avoiding it).
The extremely bright clusters to the left side indicate that I rarely moved around the arena while fighting the Queen, tending to instead dart back and forth within the same area. This might indicate problems with the design of the level or the boss, since you would usually expect players to be running around in an arena like this while fighting a well-designed foe.
The big bright red point up in the top left is aligned perfectly with one of the arena’s grip plates. While you’re supposed to use the grip plates to avoid one of the Queen’s special attacks, the huge amount of time spent there suggests that the boss’s AI is leading me to remain on the plate for a long period of time, looking for a safe opportunity to drop down, instead of actively fighting her and having a good time.
Keen-eyed viewers who’ve played the demo will also note that there are yellowish regions at each of the locations where a dialogue sequence occurs. This indicates that I tend to stand still while progressing through the dialogue sequence, despite the fact that the player is allowed to move freely. Interesting!
This final one is generated from the second level in the demo, the aqueducts.
Rather embarassingly, this demonstrates the perils of trying to draw conclusions from player data: Almost all the points in this heatmap are barely visible.
Why? Well, most likely, I left the game running with the player standing in that position for a long time, which resulted in that point being far, far brighter than the rest of the level.
Of course, there are ways to address these sorts of problems with your data – but if you’re not careful, they may influence your conclusions without you even realizing it. In this case I would probably want to discard or ignore any points that have values vastly out of whack compared to the rest of the heat map, so that the heat map will demonstrate general trends instead of only showing outliers.
I hope these posts have given you an idea of how you can go about gathering player data in your games, and how you can put that data to work – both towards improving your game, and giving your players access to data they want.





#1 by Aubrey on September 3, 2009 - 5:34 am
Quote
That’s fantastic work!
On the point about the embarassingly bright point, you could maybe increase the opacity of each blob you draw, such that they readily hit a ceiling if overlapping more than a couple of times? Or pass the heightmap you’ve generated through a 1D striped texture to form isobars.
#2 by Kael on September 3, 2009 - 6:19 pm
Quote
Yeah, those are both great solutions. I hadn’t thought of using isobars, that’s a cool idea.