Showing posts from 2012

Visualizing Gamer Achievement Profiles using R

In this post, I'll describe how to go about visualising and interpreting gamer achievement data using R, the open source tool for statistical computing . Specifically, I'll show how you can create gamer achievement profiles based on publicly available achievement records from the  Steam community API . The visualisations and data interpretation will hopefully be of interest to a general audience, but for the more technically inclined reader I've included the steps required to create the visualisations. If you're mainly interested in the analysis and interpretation, you might want to skip ahead to the Achievement Rate Distributions  section. If you're not a coder, don't be put off - R really is straight forward. The following histogram, for example, can be created from a data set using just two lines of code: This histogram shows global achievement rates (in percentage points) for all Steam achievements - more on this below. Achievement Data So wh

Anomalies in Steam Community data

In a recent post I introduced the Steam Community API , and showed how to retrieve gamer data and perform a few simple but fun analyses. While writing the posting, I came across several problems associated with the data that's returned. If you're thinking about using Steam Community data, it's worth bearing these anomalies in mind because of the impact they'll have on downstream processing and further analysis. Frustratingly, the quality of the data available through the Steam Community API is quite variable - in particular there are many discrepancies between global achievement data compared to achievement data for individual players. I also came across several global achievement rates that were clearly invalid, and in some cases found that global achievement records for games were totally missing. The net result: it's hard to trust that the data that's returned. It is still possible to analyze returned data, but you're going to need strong

Harvesting Data from the Steam Community API

Introduction The Steam community API is a web service that provides public access to information about Steam users, their games, achievements, and other related information. In this blog posting I'll describe some of the interesting data you can access, as well as how to model, retrieve, and process that data. I'll also show you how to generate a few fun, simple rankings and statistics for a group of steam gamers.   This is primarily a technical article, but it concludes with the results of a simple analysis performed over a small number of friends and aquaintances on Steam, which may be of interest to the non-technically inclined. The examples shown here can be reproduced using the sample code found in this GitHub repository. It's a work in progress, but hopefully provides enough insight so you can either repeat the results or build your own equivalent. Accessing the API The first thing to know is that Steam community data is accessed using a RESTful web serv

Fresh Pickings

Welcome to The Variable Tree, a blog all about Programming and Software Engineering, with a leaning towards articles about data mining and analytics. This blog covers range of topics, but I (that is, me, James Siddle - see Bio below) have a particular interest in data mining and related topics. That includes things like data extraction pipelines, Natural Language Processing, classification and prediction using Machine Learning, data storage techniques, and more. That said, you might also find topics about programming in general, perhaps something about new programming languages, domain modelling, observations on development processes, or the odd article about interesting applications. First up, I'll be writing a few articles about how to retrieve, process, and analyze data from the Steam community API , showing how to generate a few interesting statistics from online gaming communities. Hopefully you'll find something of interest, enjoy reading :) Image courtesy of a