Anatomy of an Emerging Knowledge Network: The Zapnito Graph Vizualized


In this article, I take a high-level look Zapnito, a multi-tenant "Networked Knowledge" platform designed around small, expert communities.

Zapnito is a knowledge sharing platform that allows organizations to create branded networks of experts. It's aimed at publishers, consultancies, media companies, and other corporations. Zapnito includes some social features (such as follow relationships, collaboration), but its focus is knowledge sharing rather than social networking.

As the founder puts it: "Zapnito is a white label platform that offers knowledge network capabilities for publishers. We provide both highly private and open networks, and we own neither publisher content or associated data - both of these are retained by publishers." 

The aim of this article is to show some of the interesting insights that can be gained from basic Social Network Analysis (SNA) of Zapnito. I'll be showing visualizations (such as that on the right) built from an anonymized subset of the Zapnito database, and discussing what can be learned from these.

Note: If you're a more SNA-savvy reader, I won't be diving into metrics such as Average Path Length or Clustering Coefficients - please look out for a future article on these topics.

Users and Followers

What does the core social network look like?

The graphs in this article were built using Gephi, a desktop application designed specifically for Social Network Analysis. All that's needed is a list of nodes (Zapnito users) and a list of edges (follow relationships) in flat file format, and you have a network.

The Zapnito team kindly provided me with an anonymized extract of their database covering several representative customers, so with a little data processing the network could be imported as a directed graph, then vizualized:

The core Zapnito network

Here, each node represents a Zapnito user, and each edge represents a follow relationship. Nodes are scaled according to the number of followers, which is a typical measure of influence in a social network.

The graph is organized using a built in algorithm, one that simulates a physical model in order to find an aesthetically pleasing layout. Another effect of the layout algorithm is that related users tend to be close to each other, while unrelated users are further apart.

There are a couple of observations we can make about the above graph.

  • First, there's one user who appears to be central to the network, someone with numerous connections to the rest of the graph, and relationships to many less influential users. This user is in fact the founder of Zapnito, who has had a long time to build up connections and is motivated to connect with as many Zapnito users as possible to encourage use of the network.
  • Second, you may notice several clumps or clusters - there's a large one to the left, one underneath, and one or two to the right. Apart from a small amount of adjustment by me, the graph as you see it represents the output of the layout algorithm, so what's going on? 
To understand further, we need to look at Zapnito's grouping mechanism.

Group Membership

Zapnito is designed to serve the needs of expert communities, so an essential feature is the set of communities that users can be part of.

These range from invitation-only, private communities with exclusive membership, to open communities that encourage public participation around selected contributors. Examples of public communities include the LifeLabs network, and Zapnito itself.

Note that Zapnito typically uses the term network to refer to the expert groups they host; here I'll use the term community to differentiate from the social network that's being analysed.

So what happens if we partition the nodes by community?

The Zapnito network, partitioned by community

We can now start to see the reason for the clumping generated by the layout algorithm: there is fairly high cohesion between members of a community. This is a nice result, and it's interesting to see the network of networks manifested in this way.

However, Zapnito users can actually be members of many different communities, which you can see above as dark grey nodes. It's important to know who these users are, because they can act as bridges in the network and may be instrumental in disseminating information between communities. Again it's understandable that the founder is a bridge, though there are several others worth noting.

As well as the bridge users, there are some interesting anomalies in the graph deserving of further analysis - but that's out of scope here.


Automatically Detected Communities

So we've seen the communities as defined by the Zapnito adminstrators, but there's another perspective we can take. Gephi has a built in feature to detect communities, using the Louvain algorithm. This detects the most strongly connected nodes within the network, and assigns them to groups. 

Here's what it finds in the Zapnito graph:

Automatically detected communities

Here, the algorithmically detected communities are quite similar to the real communities, but with some notable differences:

  • First, there's a distinct community around the founder. Again this simply reinforces the fact that the founder plays a central role in establishing and promoting the network.
  • Second, some smaller communities which are visible in the previous graph have been folded into the larger communities. 
This second point is worth bearing in mind if you're considering using community detection to provide social network features: you may offend your users if you assign them to the wrong group.

Of course the opposite may be true - a user's follow relationships may reveal the truth of their interests (or allegiance), and may be better indicator of community membership than the set of pre-configured communities on offer.


Contributions and Impact

So far, we've looked at the overall network structure, as well as communities within the network. But Zapnito is a content distribution system for experts, so what insights can we gain here?

The Zapnito database provides counts of video submissions, articles, and comments made by each user. By extracting this data we can highlight the users in the network who make the biggest contribution. Contributions can be counted individually by type, but it's more interesting to look at an aggregate view.

Below, users are shaded relatively according to an overall contribution score - where video submissions scored ten points, articles five points, and comments one point:

Biggest contributors

Here we can see that most users have modest numbers of contributions compared to a handful of very active users. Given the nature of expert communities, this is expected: apart from a small number of prolific content producers, most users will generate high quality submissions, but infrequently.

It's also worth noting that the largest contributor is not the most influential, at least in terms of followers. This is a useful thing to know - it may be beneficial, for example, to promote this user to increase their reach.

We may also want to find users who make little or no contribution, but have influence within the network. We can find these users by modifying the shading in Gephi to give more weight to users who have made at least a small contribution; this brings out the lurkers!

Lurkers (shown in orange)


Above, the red nodes represent users who have made no contributions - comments, posts or otherwise. These individuals, especially those with reasonable numbers of followers, are prime targets to encourage greater participation.

We can use a similar principle to bias the shading to the highest scoring contributors only:

Heroes (in purple)


Here again we're showing the heroes of the system - this is just an alternative view to the graph showing the overall contribution score, but here the biggest hitters in terms of contribution are emphasized.


Conclusions

There are a few key conclusions to take from the above analysis.

First, it's clear that Zapnito's founder has an important role to play in the emerging network, as a well-connected influencer and as a bridge between different communities. However the centrality of the founder's node in the graph is mostly related to his activities in promoting Zapnito and encouraging participation by following and engaging with other Zapnito users, and it will be interesting to see how this changes over time as the network grows.

Next, the difference between official and detected communities suggests that group membership is not clear cut, and is likely to shift over time. This may provide opportunities in the form of emergent groups that were not originally foreseen, as well as potential issues such as split loyalties or schisms in existing communities.

Finally, the process of scoring contributions to build an aggregate score is a useful technique for identifying key contributors, and contrasting such a score with a measure of reputation or impact helps identify influential lurkers, as well as major contributors with limited reach. The former can be encouraged to contribute, while the latter can be supported in building their network of followers, both of which will support dissemination of quality content across the network.

Comments

  1. An excellent blog post. Looking forward to more discussions on this and our panel discussion: blog.zapnito.com

    ReplyDelete

Post a Comment

Popular posts from this blog

I Know Where You Were Last Summer: London's public bike data is telling everyone where you've been

Harvesting Data from the Steam Community API

Personal Data Hacks: Visualizing Data from OpenFlights.org