Image Tagging in SAGE Journals

I've just published an article over on the SAGE Ocean blog, all about Image Tagging:

"In part one of this series we introduced the topic of automated image tagging and showed how Cloud Vision APIs such as Clarifai can be used to classify images into different categories. We showed examples of SAGE images and the tags assigned by different Cloud Vision APIs, then discussed use cases for this innovative technology—primarily in discoverability and accessibility. In this follow-on post, we focus on data analysis and specifically co-occurrence networks. By way of example we present a co-occurrence network derived from Clarifai image tags, which represents a kind of mental model of the SAGE journal images we processed. The following image is a visualization of the co-occurrence network that we created"

You can read the full article here.

Springer Nature Expert Communities Visualised

Introduction A couple of years ago, the publisher Springer Nature entered into a strategic relationship with Zapnito, an “Expert Network” platform. The goal was to facilitate knowledge sharing between employees, academic experts, and other interested parties, and there are now more than twenty different Springer Nature communities hosted on Zapnito, twelve of which are “Nature Research” communities and are the focus of this article. This network of communities represents an interesting opportunity to explore the relationships between expert communities, along with the way Zapnito is facilitating knowledge sharing.
This blog post introduces the Springer Nature communities, then provides a light-touch exploration of community structure and connectivity, and its relationship to knowledge sharing and dissemination. The post is a result of a collaboration between Zapnito, Springer Nature, and myself, and is cross-posted on the Zapnito blog.
About the Visualisation The data visualisation s…

Visualizing Rebel Alliances in the UK Government

The UK will shortly go to the polls for the 2015 General Election. However there's currently no clear front-runner, and in fact no clear coalition on the cards for a new government. The "new normal" of hung parliaments and coalition forming as part of UK politics appears to be here to stay.

As such, I decided to take a look at the open dataset provided by The Public Whip project, with a view to visualizing the relationships between MPs (members of parliament) in the 2010 to 2015 UK government, using a tool called Gephi. The idea was to analyse how MPs are related through their voting patterns in the house of commons, and in particular how they are related through agreement or rebelliousness.

Also I'll admit it: I wanted to write an article with "Rebel Alliance" in the title because I like Star Wars.

In the rest of this article, I'll describe several visualizations that were created from that public whip dataset. These show various aspects of MP relation…

How to create your own replica of the SureChEMBL patent-chemistry dataset

Introduction: Why replicate SureChEMBL? SureChEMBL is a patent chemistry dataset and set of web services that provides a rich source of information to the drug discovery research community. It was previously owned, developed, and sold by Macmillan, but was recently handed over to the European Bioinformatics Institute (EMBL/EBI) and is now free for everyone to use.

SureChEMBL can already be accessed online, so why would a locally hosted replica be needed?

To answer that question, I'll give the reasons provided by a pharmaceutical company who recently commissioned me to develop a SureChEMBL data replication facility:
1) Firewall restrictions can be avoided - companies involved in drug discovery are often working with substructures or other related search queries which may lead to highly lucrative discoveries. As such, researchers are often prohibited from using external web services, even secure services such as SureChEMBL, as a risk-mitigation strategy. Downloading data files - e.g…

Anatomy of an Emerging Knowledge Network: The Zapnito Graph Vizualized

In this article, I take a high-level look Zapnito, a multi-tenant "Networked Knowledge" platform designed around small, expert communities.

Zapnito is a knowledge sharing platform that allows organizations to create branded networks of experts. It's aimed at publishers, consultancies, media companies, and other corporations. Zapnito includes some social features (such as follow relationships, collaboration), but its focus is knowledge sharing rather than social networking.

As the founder puts it: "Zapnito is a white label platform that offers knowledge network capabilities for publishers. We provide both highly private and open networks, and we own neither publisher content or associated data - both of these are retained by publishers." 

The aim of this article is to show some of the interesting insights that can be gained from basic Social Network Analysis (SNA) of Zapnito. I'll be showing visualizations (such as that on the right) built from an anonymiz…

I Know Where You Were Last Summer: London's public bike data is telling everyone where you've been

This article is about a publicly available dataset of bicycle journey data that contains enough information to track the movements of individual cyclists across London, for a six month period just over a year ago.

I'll also explore how this dataset could be linked with other datasets to identify the actual people who made each of these journeys, and the privacy concerns this kind of linking raises.


It probably won't surprise you to learn that there is a publicly available Transport For London dataset that contains records of bike journeys for London's bicycle hire scheme. What may surprise you is that this record includes unique customer identifiers, as well as the location and date/time for the start and end of each journey. The public dataset currently covers a period of six months between 2012 and 2013.

What are the consequences of this? It means that someone who has access to the data can extract and analyse the journeys made by individual cyclists within London du…

London maps and bike rental communities, according to Boris Bike journey data

Every time someone in London makes a journey on a Boris Bike (officially, the Barclays Cycle Hire Scheme), the local government body Transport For London (TFL) record that journey. TFL make some of this data available for download, to allow further analysis and experimentation.

Below, you'll find maps of the most popular bike stations and routes in London, created from the TFL data using Gephi, plus a few simple data processing scripts that I threw together. The idea for these maps originated within a project group at a course on Data Visualisation, held at the Guardian last year. We're working on a more publisher friendly form, so thank you to my course mates for giving me the go ahead to include them here.

First, here's a map showing all bike stations and all popular journeys.

The first map shows the most popular routes and bike stations, those with more than ~150 journeys made during the six months of data that TFL make available. The size of each bike station in this m…