Visual Analytics with Linked Open Data and Social Media

by Dr. Suvodeep Mazumdar, Dr. Tomi Kauppinen and Dr. Anna Lisa GentileThis tutorial was conducted as VISLOD2014 event at ESWC2014.

Emergencies require significant effort in order for emergency workers and the general public to respond effectively. Emergency Responders must rapidly gather information, determine where to deploy resources and make prioritisation decisions regarding how best to deal with the emergency. This calls for quick and easy means of exploring large datasets to gather an overview of a situation on the ground. Interactive Visualisations can offer significant help as they provide a lot of insight by presenting the data in multiple visual metaphors, thereby enabling users and analysts drill-down into data elements of interest.

In this tutorial, two key types of information are of interest – Linked Data and Social Media. By employing Visual Analytic techniques, such sources of information are explored and this tutorial provides a high level overview of how simple and freely available Open source tools can be used to develop basic Visual Analytic interfaces. At the end of this tutorial, we aim to present how several systems, tools and frameworks can be put together into an example system. The tutorial consists of three parts:

  • Part 1 – Extracting, indexing Social Media
  • Part 2 – Visualising Social Media
  • Part 3 –  Mashing-up Social Media visualisations with Linked Data
The first step in the entire process is identifying which frameworks to use while developing the system. Several key decisions need to be taken here – such as, which data store should be used to index Social Media; which framework should be used to collect the data; and which framework should be used to visualise the content. For this tutorial, our frameworks are as follows:

1. Data collection framework: node.js 
Other typical solutions involve using Java or PHP backends to harvest Social Media posts. I have been recently trying out node.js and found it to be an excellent platform that enables you to quickly write very simple code in Javascript to get a lot of tasks done.

2. Data storage: solr
Other solutions include using Triplestores such as Fuseki, Jena or Virtuoso. However, my personal experience with Solr has been excellent, and it’s faceted search provides very quick results for visualisations.

3. Visualisations: javascript/HTML based (d3.js, Highcharts, JIT)
Standalone applications can be developed for visualisations in Java, but given quicker deployment and rapid development in Browser graphics has created an excellent platform for such tools. This tutorial discusses several such toolkits to visualise the data.
This tutorial was organised for Extended Semantic Web Conference, 2014 at Heraklion, Crete, Greece. These pages essentially provide the content of the tutorial, with example programmes that were shared with the tutorial attendees. The final tutorial folder is available here, and the rest of the page shows how we arrive to the final system.
Please note: This tutorial will work on the latest version of Firefox browser. Some parts in the tutorial may not work on Internet Explorer, Chrome or Opera. 
Part 1 – Extracting, indexing Social Media

We use Twitter to provide an example of a typical Social Media post. First, we setup the datastore – as discussed earlier, we use a Solr database. Follow the next steps:

1. Download and unzip Solr from here (zip file)

2. Create a new collection (or edit the collection already existing ‘Collection1’ if not being used elsewhere) called ‘tweets’

3. Create the schema – for our example, we use a very basic schema as follows

<?xml version="1.0" encoding="UTF-8"?>
<schema name="tweets" version="1.1">
              <fieldType name="string" class="solr.StrField"></fieldType>
              <fieldType name="date" class="solr.TrieDateField" precisionStep="0" positionIncrementGap="0"></fieldType>
              <fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"></fieldType>
              <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false"></field>
              <field name="user" type="string" indexed="true" stored="true" required="true" multiValued="false"></field>
              <field name="createdate" type="date" indexed="true" stored="true" required="true" multiValued="false"></field>
              <field name="source" type="string" indexed="true" stored="true" required="false" multiValued="false"></field>
              <field name="location" type="string" indexed="true" stored="true" required="false" multiValued="false"></field>
              <field name="hashtag" type="string" indexed="true" stored="true" required="false" multiValued="true"></field>
              <field name="mentions" type="string" indexed="true" stored="true" required="false" multiValued="true"></field>
              <field name="content" type="string" indexed="true" stored="true" required="false" multiValued="false"></field>
              <field name="_version_" type="long" indexed="true" stored="true"></field>
          <!--catchall -->

4. Start Solr server, and check the URL http://localhost:8983/solr/#/tweets (given that the Solr port is 8983)

5. Once Solr is up and running, download and install node.js from here

6. Create a local directory for the Tutorial i.e. “VisLOD”. In the tutorial folder, download and install the following relevant node.js modules:

npm install ntwitter
npm install solr-client
npm install find-hashtags
npm install twitter-text


7. In the newly created “node_modules” folder, create a new file i.e. “TweetExtractor.js”. This file will contain the code to connect to Twitter’s streaming API and extract Tweets

8. Enter the following code in the new file:

var solr = require('solr-client');
var client = solr.createClient();
var findHashtags = require('find-hashtags');
var twittertext = require('twitter-text');

function streamTweets() {
    var credentials = {
        consumer_key: 'CONS_KEY',
        consumer_secret: 'CONS_SECRET',
        access_token_key: 'ACCESS_TOKEN_KEY',
        access_token_secret: 'ACCESS_TOKEN_SECRET'

    var twitter = require('ntwitter');
    var t = new twitter({
        consumer_key: credentials.consumer_key,
        consumer_secret: credentials.consumer_secret,
        access_token_key: credentials.access_token_key,
        access_token_secret: credentials.access_token_secret
            {track: ['flood', 'fire', 'police', 'disaster', 'earthquake', 'emergency', 'tornado']},
    function(stream) {
        stream.on('data', function(tweet) {
            var coords = tweet.coordinates;
            var coordinates = null;
            if (coords !== null)
                coordinates = coords.coordinates[0] + "," + coords.coordinates[1];
            var htags = findHashtags(tweet.text);
            var usernames = twittertext.extractMentions(tweet.text);
            var dt = new Date(tweet.created_at);
            var doc = {
                user: tweet.user.screen_name,
                createdate: dt.toISOString(),
                location: coordinates,
                source: tweet.source,
                content: tweet.text,
                mentions: usernames,
                hashtag: htags
            client.add(doc, function(err, obj) {
                if (err) {
                } else {
                    console.log("done ");
            console.log("Added " +;

client.autoCommit = true;
var docs = [];


9. From your Twitter account, sign in at  and go to ‘My Applications’

10. Select ‘Create New App’ and enter the App details as instructed. Select the API Keys tab and generate API keys. The interface will then provide the consumer key, consumer secret, access token key and access token secret. Copy the keys and paste in the relevant places

11. Go to “node_modules” folder, and enter

node TweetExtractor.js

This will start streaming data relevant to our keywords (‘flood’, ‘fire’, ‘police’, ‘disaster’, ‘earthquake’, ’emergency’, ‘tornado’) from Twitter and enter into Solr. The fields that are stored in Solr are ID, user, creation date, location, source, content, mentions and hashtags.

12. Go to the Solr query page (http://localhost:8983/solr/tweets/select?q=*%3A*&wt=xml&indent=true) to verify data being added (note the value of ‘numFound’)

Part 2 – Visualising Social Media

With the previous steps resulting in a Solr datastore populated with Social Media data, now we proceed with visualising the data. The benefit of using a Solr store is the faceting feature, which allows easy and quick categorisation of data. For this example, let us try a faceted search with “hashtag” as a faceting field. An example query would be:


The result would provide a json response with an object “facet_counts” as follows:

    "responseHeader": {
        "status": 0,
        "QTime": 6,
        "params": {
            "facet": "true",
            "indent": "true",
            "q": "*:*",
            "_": "1402252414832",
            "facet.field": "hashtag",
            "wt": "json",
            "rows": "0"
    "response": {
        "numFound": 100477,
        "start": 0,
        "docs": []
    "facet_counts": {
        "facet_queries": {},
        "facet_fields": {
            "hashtag": [
        "facet_dates": {},
        "facet_ranges": {}


We use queries such as this to make our visualisations and explore the data. Before we begin, we need to create the right directories:


The lib directory needs to have the right libraries as discussed below, following the file paths mentioned in the

For the visualisation front end, we create an HTML page (say base.html) as follows in the VisLOD tutorial folder:


<!DOCTYPE html>
        <title>Visual Analytics with Social Media for Emergency Response</title>
        <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
        <script language="javascript" type ="text/javascript" src="js/jquery.js"></script>
        <script language="javascript" type ="text/javascript" src="lib/Jit/jit.js"></script>
        <script language="javascript" type ="text/javascript" src="lib/Highstock-1.3.1/js/highstock.js"></script>
        <script language="javascript" type ="text/javascript" src="lib/d3.v3/d3.v3.min.js"></script>
        <script language="javascript" type ="text/javascript" src="lib/d3.v3/topojson.v1.min.js"></script>
        <script language="javascript" type ="text/javascript" src="js/control.js"></script>
        <link href="css/base.css" type="text/css" rel="stylesheet">

        <div id="filters"><button type="button" onclick="clearfilters()">Clear</button></div>
        <div id="container_map"></div>
        <div id="container_updates"></div>
        <table style="width:1200px">
                <td><div id="container1" class="plot_holder"/></td>
                <td><div id="container2" class="plot_holder"/></td>
                <td><div id="container3" class="plot_holder" style="height:500px;width:500px"/></td>
                <td><div id="container4" class="plot_holder"/></td>


We can make several observations from the code:

1. Our code makes use of the following tools: jquery, JIT, highstock, d3js, topojson

2. We use a javascript code (control.js) to control interactions and querying processes

3. There are 5 div elements, which are later going to be populated with visualisations and displays.

The containers 1 – 4  and container_map are five visualisation containers, while container_updates provides a textual display of social media posts. We proceed with building a very basic system for visualisation of social media data. Create a css file, “base.css” in the css folder with the following content:

    border-color: black;
    border-width: 2px;
    height: 500px;
    min-width: 500px;
    background-color: red;
    text-align: center;

.viztable {

    stroke: #fff;
    stroke-width: 1.5px;
#container_map {
    background: #F0F8FF;
    width: 900px;
    overflow: auto;
.hidden {
    display: none;




Step 1. Create a control.js file with the following basic code

var solrUrl = "http://localhost:8983/solr/tweets/select";
var pieField = "hashtag";
var timeField = "createdate";
var timelineFacet = "hashtag";
var barField = "user";
var networkRelation = "mentions";
var userField = "user";
var piechart;
window.onload = function() {
    // enter function calls here

function visualiseData() {
function listData() {
    var querystring = "*:*"
    if (loc !== null && loc !== undefined) {
// var loc = loc.replace(/-/g, "\\-");
        querystring = "location:" + loc.replace(/-/g, "\\-");
    var htmlstring = "<ul>";
        'url': solrUrl,
        'data': {'wt': 'json', 'q': querystring, 'rows': 200},
        'success': function(data) {
            var docs =;
            for (var docindex in docs) {
                htmlstring += "<li><span style = 'color:blue'>" + docs[docindex].user + "</span>:" + docs[docindex].content + "</li>";
            htmlstring += "</ul>";
            document.getElementById("container_updates").innerHTML = htmlstring;
        'dataType': 'jsonp',
        'jsonp': 'json.wrf'



As can be seen, the function listData() is called when the window is loaded. This function makes a simple Ajax query to the Solr service, for 200 Tweets. The results are rendered as a list of tweets, as shown here

A list of Tweets, as built from listData() function


Step 2. Add a pie chart on container1. Add the following function to the control.js file

function showPieChart(field) {
    var querystring = "*:*";
    var piefield = field;
    $.ajax({'url': solrUrl,
        'data': {'wt': 'json', 'facet': 'true', 'q': querystring, 'rows': 0, 'facet.field': field, 'facet.mincount': 1, 'facet.limit': 15},
        'success': function(data) {
            var seriesdata = [];
            var allfields = data.facet_counts.facet_fields[piefield];
            for (var i = 0; i < allfields.length; i = i + 2) {
                var field = allfields[i];
                var value = allfields[i + 1];
                seriesdata.push([field, value]);
            if (querystring.length < 5) {
                piechart = new Highcharts.Chart({
                    chart: {
                        plotBackgroundColor: null,
                        plotBorderWidth: null,
                        plotShadow: false,
                        renderTo: 'container1'
                    title: {
                        text: "plot of " + piefield
                    tooltip: {
                        pointFormat: '{}: <b>{point.percentage}%</b>',
                        percentageDecimals: 1
                    plotOptions: {
                        pie: {
                            allowPointSelect: true,
                            cursor: 'pointer',
                            dataLabels: {
                                enabled: true,
                                color: '#000000',
                                connectorColor: '#000000',
                                formatter: function() {
                                    return '<b>' + + '</b>';
                            events: {
                                click: function(e) {
                        series: [{
                                type: 'pie',
                                name: 'count',
                                data: seriesdata
            else {
        'dataType': 'jsonp',
        'jsonp': 'json.wrf'


This function initially makes a query to the Solr service – the query is a simple faceted query, facet field being “pieField”, which is in turn set as “hashtag”. The response of the query is a JSON object, with the distribution of hashtags in all the Tweets. The remainder of the code is an updated version of the code available from the Highcharts pie chart demo.

Step 3. Make a call to the function in the visualiseData() function

function visualiseData() {

Step 4. Follow the similar step to add a bar chart using the following function:

function showBarChart(field) {
    var querystring = "*:*"
    var piefield = field;
        'url': solrUrl,
        'data': {'wt': 'json', 'facet': 'true', 'q': querystring, 'facet.field': field, 'facet.mincount': 1, 'facet.limit': 15},
        'success': function(data) {
            var counts = [];
            var categories = [];
            var allfields = data.facet_counts.facet_fields[piefield];
            for (var i = 0; i < allfields.length; i = i + 2) {
                var term = allfields[i];
                var value = allfields[i + 1];
                chart: {
                    type: 'bar'
                title: {
                    text: 'No. of tweets by ' + field
                xAxis: {
                    categories: categories,
                    title: {
                        text: null
                yAxis: {
                    min: 0,
                    title: {
                        text: 'No. of tweets',
                        align: 'high'
                    labels: {
                        overflow: 'justify'
                tooltip: {
                    valueSuffix: ''
                plotOptions: {
                    bar: {
                        dataLabels: {
                            enabled: true
                        events: {
                            click: function(e) {
                legend: {
                    layout: 'vertical',
                    align: 'right',
                    verticalAlign: 'top',
                    x: -100,
                    y: 100,
                    floating: true,
                    borderWidth: 1,
                    backgroundColor: '#FFFFFF',
                    shadow: true
                credits: {
                    enabled: false
                series: [{
                        name: field,
                        data: counts

        'dataType': 'jsonp',
        'jsonp': 'json.wrf'

and add the function call in the visualiseData() function

Following this step, we can observe three div elements being populated as shown:

The empty space on the top left is reserved for a geographical map showing the locations of origin for the Social Media posts. There are several options for showing geographical maps such as Google Maps and MapBox , where map tile images are overlaid with visual elements, representing data. Alternatively, several Javascript tools are available which can load shape files to visualise locations such as Highmaps and D3.js. In our example, we use the latter.

We look at this example of visualising the world map using d3.js. In the example, a world map is initially loaded (“…world-110m2.json”) via a d3. In our example, we use the file that you can find within the data folder, present in the following zip archive of this tutorial. Several other d3js tutorials are also followed to develop the map that will be presented now. Add the following few functions on the control.js file:

function loadMap(){"resize", throttle);
function setupMap(width,height){
  projection = d3.geo.mercator()
    .translate([(width/2), (height/2)])
    .scale( width / 2 / Math.PI);
  path = d3.geo.path().projection(projection);
  svg ="#container_map").append("svg")
      .attr("width", width)
      .attr("height", height)
      .on("click", click)
  g = svg.append("g");
  d3.json("data/world_map.json", function(error, world) {
    var countries = topojson.feature(world, world.objects.countries).features;
    topo = countries;
function draw(topo) {
   .datum({type: "LineString", coordinates: [[-180, 0], [-90, 0], [0, 0], [90, 0], [180, 0]]})
   .attr("class", "equator")
   .attr("d", path);
  var country = g.selectAll(".country").data(topo);
      .attr("class", "country")
      .attr("d", path)
      .attr("id", function(d,i) { return; })
      .attr("title", function(d,i) { return; })
      .style("fill", function(d, i) { return "lightgrey"; });
  //offsets for tooltips
  var offsetL = document.getElementById('container_map').offsetLeft+20;
  var offsetT = document.getElementById('container_map').offsetTop+10;
    .on("mousemove", function(d,i) {
      var mouse = d3.mouse(svg.node()).map( function(d) { return parseInt(d); } );
      tooltip.classed("hidden", false)
             .attr("style", "left:"+(mouse[0]+offsetL)+"px;top:"+(mouse[1]+offsetT)+"px")
      .on("mouseout",  function(d,i) {
        tooltip.classed("hidden", true);

function redraw() {
  width = document.getElementById('container_map').offsetWidth;
  height = width / 2;'svg').remove();

function move() {
  var t = d3.event.translate;
  var s = d3.event.scale;
  zscale = s;
  var h = height/4;
  t[0] = Math.min(
    (width/height)  * (s - 1),
    Math.max( width * (1 - s), t[0] )

  t[1] = Math.min(
    h * (s - 1) + h * s,
    Math.max(height  * (1 - s) - h * s, t[1])
  g.attr("transform", "translate(" + t + ")scale(" + s + ")");
  //adjust the country hover stroke width based on zoom level
  d3.selectAll(".country").style("stroke-width", 1.5 / s);
  d3.selectAll(".gpoint circle").attr("r",3.5/s)

var throttleTimer;
function throttle() {
    throttleTimer = window.setTimeout(function() {
    }, 200);

//geo translation on mouse click in map
function click() {
  var latlon = projection.invert(d3.mouse(this));

Add the following lines on the control.js file, after line 8

var loc;
var colorhash = {};
var zoom = d3.behavior.zoom()
        .scaleExtent([1, 15])
        .on("zoom", move);
var width = 900;
var height = width / 2;
var topo,projection,path,svg,g;
var graticule = d3.geo.graticule();
var tooltip ="#container_map").append("div").attr("class", "tooltip hidden");

Finally, call the loadMap() function in the window.load() function previously defined

window.onload = function(){
    // enter function calls here

The code upto this point just builds the d3js map and visualises it on a div. However, in order to show the tweets, we add another function call search() that is called at the end of SetupMap() function, and add the following functions

function clearAllPoints(){

function search(){
        'url': solrUrl,
        'data': {'wt':'json','':'map','q':'location:*','rows':0,'facet':true,'facet.query':'location:*','facet.field':'location'},
        'success': function(data) {
            var docs = data.facet_counts.facet_fields.location;
            for (var docindex in docs){
        'dataType': 'jsonp',
        'jsonp': 'json.wrf'

    var query = "PREFIX skos: <>"+
                "PREFIX dbo: <>"+
                "PREFIX sdmx-dimension: <>"+
                "PREFIX sdmx-measure: <>"+
                "PREFIX year: <>"+
                "PREFIX property: <> "+
                "PREFIX indicator: <> "+
                "PREFIX g-meta: <>"+
                "PREFIX g-indicators: <>"+
                "SELECT ?country ?countryURI ?percentageOfGDP "+
                "WHERE {"+
                "GRAPH g-indicators: {"+
                "?observations "+
                "property:indicator indicator:2.1.3_SHARE.HYDRO ;"+
                "sdmx-dimension:refArea ?countryURI ;"+
                "sdmx-dimension:refPeriod year:2008 ;"+
                "sdmx-measure:obsValue ?percentageOfGDP ;"+
                "GRAPH g-meta: {"+
                "?countryURI "+
                "a dbo:Country ;"+
                "skos:prefLabel ?country ;"+
                "ORDER BY DESC(?percentageOfGDP)"+
                "LIMIT 3000";
            var results = data.results.bindings;
            var resultmap = {};
            for (var index in results){
                resultmap[results[index].country.value] = results[index].percentageOfGDP.value;
            d3.selectAll(".country").style("fill",function(d) {
                if (resultmap[]!==null && resultmap[]!==undefined){
                    console.log(fillscale(resultmap[])+" "+resultmap[])
                    return fillscale(resultmap[]);
                else return "#eeeeee";

function addpoint(lat,lon,text) {
  var offsetL = document.getElementById('container_map').offsetLeft+20;
  var offsetT = document.getElementById('container_map').offsetTop+10;  var gpoint = g.append("g").attr("class", "gpoint").attr("count",text).attr("lat",lat).attr("lon",lon);
  var x = projection([lat,lon])[0];
  var y = projection([lat,lon])[1];
        .attr("cx", x)
        .attr("cy", y)
        .attr("fill",function(text) {
                return scale(gpoint.attr("count"));
    gpoint.on("mousemove", function(d,i) {
      var mouse = d3.mouse(svg.node()).map( function(d) { return parseInt(d); } );
      tooltip.classed("hidden", false)
             .attr("style", "left:"+(mouse[0]+offsetL)+"px;top:"+(mouse[1]+offsetT)+"px")
      .on("mouseout",  function(d,i) {
        tooltip.classed("hidden", true);


The search() function first searches the Solr data to find all the tweets create points on the map. Once the first search is completed, another search is performed on a Linked Data endpoint that we define at the beginning of the control.js file as

var sparqlUrl = "";

The query is passed to the worldbank’s SPARQL endpoint. The query looks for countries and their share of hydro energy in total energy consumption. As it is, this does not provide much information, but is given as an example to show how several pieces of information from Linked Open Data can be incorporated into the map, such as population, economy and so on.

This completes the tutorial, and you should be able to view the final interface (without the network graph), as shown here:

The final tutorial material is download-able here , please feel free to make changes and explore different data sources. There is no need to edit any files on the final tutorial material, and it should work directly.