Tutorial: Using SPARQL on LODUM Data

by Franz-Benjamin Mocnik

The promise of the Linked Open Data technologies is to have connections between data items coming from various data sources. The data can be accessed in a standardized way, and by following links one gets additional “gratis data”.

As an example, the Linked Open Data University of Münster (LODUM) project aims to preserve and publish university contents and scientific data of different kinds. This enables to both preserve data and to make them accessible. The goal is to achieve this in a way that lives up with the growing variety of data, models and software as well as the increasing interlinking of scientific fields.

In this tutorial, we show how to query linked data and how to benefit from formulating your own query based on your needs. Using the example of the LODUM project, we will learn how to

  • query a list of a researcher’s publications
  • query a list of a researcher’s coauthors
  • query a list of a researcher’s coauthors publications
  • embed this data in a website

The structure of SPARQL queries

To query linked data, the SPARQL query language is commonly used. SPARQL is able to query data as well to even update the data (using SPARQL/Update). Similarly to SQL, there exist different query types (SELECT, ASK, CONSTRUCT, DESCRIBE, UPDATE, …). As SELECT is the most used to query data, we will focus on this query type in the tutorial.

Basically, SELECT-queries are structured as follows:

SELECT variables to return WHERE {
  expression to match
}

The expression that is matched consists of triples concatenated by inserting an dot in between. Such an triple can contain entities (denoted by URIs), literals (like strings or numbers) as well as variables. Variables can represent any entity or literal and are denoted by a string starting with a question mark.

In the SPARQL query, you can give a list of variables to return. If you want to return all variables, you can even use an asterisk.

To query a person in the LODUM data, we can e.g. use the following query:

SELECT * WHERE {
  ?person foaf:name "Kuhn, Werner"^^xsd:string
}

At this point, you should note that one has to insert the type behind a literal, in this case xsd:string.

If you compose a query, it is very important to know which data is stored in a certain dataset, and which vocabulary is used to describe the entities of the dataset. To find out both, there is mostly a list of entities in browsable form and information about the used vocabularies provided.

To execute the query, one can use any SPARQL endpoint, i.e. a server that processes your query by using the linked data provided on the internet. For performance reasons, it is reasonable to use an endpoint provided by the same institution than the data or (in case that you are querying after data from different sources) the main part of your data.

In the case of LODUM, it is recommended to use http://data.uni-muenster.de/sparql. As you may note, there are already a base and many prefixes defined that allow a shortened notation. However, we will not discuss these prefixes in detail.

If you have replaced the already listed query in the website of the SPARQL endpoint by the above query, you can run the query and should get the following results:

?person
http://data.uni-muenster.de/context/cris/person/9148
http://data.uni-muenster.de/context/csa/teacher/WernerKuhn

Congratulations! You have successfully executed your first query.

Query a researcher’s publications

The next step is to query a researcher’s publications. Instead of matching the researcher’s name as a string every time we execute a query, it is convenient to use the URI of the entity that is representing the person as a researcher in the LODUM data. To find out the URI, we take a look at the result of our first query. The URI is

http://data.uni-muenster.de/context/cris/person/9148

As the (already listed) base in the query window of the SPARQL endpoint is http://data.uni-muenster.de/context/, we can shorten the researcher’s entity’s URI to

<cris/person/9148>

To query the publications as well as their titles and publication dates, we use the following query:

SELECT * WHERE {
  ?pub bibo:producer <cris/person/9148> .
  ?pub dct:title ?title .
  ?pub dct:issued ?year
}

The variable ?pub matches all entities (in this case publications) that are authored by the researcher. The variables ?title resp. ?year match the publication’s title and year. (Observe, that a publication ?pub having no title or publication year will not match and thus not be returned!)

The result of your query should look similar to:

?pub ?title ?year
http://data.uni.../26911 "Affordances as Qualities" "2010"
http://data.uni.../17225 "Cognitive Semantics and..." "2007"
http://data.uni.../17177 "Electronic GI Marketplaces" "2004"

We will in this tutorial only show the queries’ simplified results to focus on the essentials. This means in particular that we only show the first rows and that we omit the types of literals.

Query a researcher’s coauthors

In order to query a list of a researcher’s coauthors, we can use the following query:

SELECT ?name WHERE {
  ?pub bibo:producer <cris/person/9148> .
  ?pub bibo:producer ?coauthor .
  ?coauthor foaf:name ?name
}

If we would use an asterisk as in the before queries, we would get a list of all publications published by the researcher as well as the coauthors names who coauthored these publications. This list would contain every coauthor one time for every publication he or she coauthored.

As we want the result to be a list of the researcher’s coauthors’ names only, we explicitly determine the variables we want to be returned. Thus, we get the following result:

?name
"Schwering, Angela"
"Scheider, Simon"
"Scheider, Simon"

Taking a closer look at the results, you will discover that many coauthors are named multiple times. As we are matching all publications written by coauthors and only hiding the publications entities, the results contain as many rows of one coauthor as he or she has coauthored one of the researcher’s articles.

To circumvent this problem, we introduce the command DISTINCT in our query:

SELECT DISTINCT ?name WHERE {
  ?pub bibo:producer <cris/person/9148> .
  ?pub bibo:producer ?coauthor .
  ?coauthor foaf:name ?name
}

This query generates the desired result.

As you may already have expected, the researcher is a coauthor of herself/himself! Indeed: The researcher is the publisher of every publication that he/she is a publisher of. To remove the researcher from the list of her/his coauthors, we have to introduce a new concept called filters. Filters are used to filter the results by certain relations. In our case, we want to ensure that the coauthor is not the researcher herself/himself.

It may happen that one thing is represented by two or more entities in the data, e.g. when representing the person as a researcher and as a lecturer. To express that two entities represent the same thing, one can use the predicate owl:sameAs. If this predicate is used to store these “same as”-relations in the data source, all entities that are representing the researcher match the expression

?coauthor owl:sameAs <cris/person/9148>

To test if the variable ?coauthor (which represents an entity in this case) matches this expression, we write

EXISTS{?coauthor owl:sameAs <cris/person/9148>}

Negating the expression and using it as a filter for the results, we get the following query:

SELECT DISTINCT ?name WHERE {
  ?pub bibo:producer <cris/person/9148> .
  ?pub bibo:producer ?coauthor .
  ?coauthor foaf:name ?name
  FILTER(NOT EXISTS{?coauthor owl:sameAs <cris/person/9148>})
}

Query a researcher’s coauthors’ publications

In a last step, we want to query for the researcher’s coauthors’ publications. Using the techniques we have learned before, we can extend the query that we used to query for the researcher’s coauthors:

SELECT DISTINCT ?name ?title ?year WHERE {
  ?pub bibo:producer <cris/person/9148> .
  ?pub bibo:producer ?coauthor .
  ?coauthor foaf:name ?name .
  ?pub dct:title ?title .
  ?pub dct:issued ?year
  FILTER(NOT EXISTS{?coauthor owl:sameAs <cris/person/9148>})
}

The result is the following:

?name ?title ?year
"Scheider, Simon" "Grounding Geographic Categories..." "2009"
"Keßler, Carsten" "Semantic Referencing of Geosensor..." "2011"
"Kauppinen, Tomi" "Semantic Referencing of Geosensor..." "2011"

This result list may be very long which essentially depends on the number of the researcher’s coauthors and how much they did publish. There are different ways to make such a list much more handy. Without doubt, it makes sense to order the list by year, title or name of the researcher’s coauthor. Similar to SQL, we can modify the query to order the results and even to limit the number of results

SELECT DISTINCT ?name ?title ?year WHERE {
  ?pub bibo:producer <cris/person/9148> .
  ?pub bibo:producer ?coauthor .
  ?coauthor foaf:name ?name .
  ?pub dct:title ?title .
  ?pub dct:issued ?year
  FILTER(NOT EXISTS{?coauthor owl:sameAs <cris/person/9148>})
} ORDER BY DESC(?year) ?title ?name LIMIT 12

After having composed this query, one may ask: If you would not have used SPARQL queries on linked data, could you get the same result in such a short time?

Embed data in a website

You can make use of the query results in different ways. One is by just reading the list of results as it is and manually leading after other publications on the internet. Another one may be to provide this information to others. In this last section, we will learn how to easily publish the data on a website. Even if there are many ways to do so, we will use one that needs only little effort and works well in most browsers.

Insert the following lines of code in the website header:

<script
  type="text/javascript"
  src="http://ajax.googleapis.com/ajax/libs/jquery/1.8.2/jquery.min.js">
</script>
<script
  type="text/javascript"
  src="http://data.uni-muenster.de/rdf-spark/jquery.spark.js">
</script>

These lines will load javascript libraries that allow you to simply embed the results of a SPARQL query. The element that shall contain the result has to be written as follows:

<span
  class="spark"
  data-spark-format="http://data.uni-muenster.de/rdf-spark/jquery.spark.lodumlist.js"
  data-spark-param-head="title of your data"
  data-spark-query="insert your SPARQL query here">
</span>

When inserting the query into this piece of code, you should observe two things:

  1. The query should return two variables: The second one is shown as text that is linked by the first one (which should be a URL).
  2. Some prefixes may already be preloaded. If no results are shown, try to leaf some prefixes out. Debug the code in your browser by locating the answer to the ajax query.

For example, we can use the following query to insert a list of a researcher’s coauthors’ publications:

base <http://data.uni-muenster.de/context/>
prefix dct: <http://purl.org/dc/terms/>
prefix person: <http://data.uni-muenster.de/context/cris/person/>
SELECT DISTINCT ?pub ?title WHERE {
  ?pub bibo:producer <cris/person/9148> .
  ?pub bibo:producer ?coauthor .
  ?coauthor foaf:name ?name .
  ?pub dct:title ?title .
  ?pub dct:issued ?year
  FILTER(NOT EXISTS{?coauthor owl:sameAs <cris/person/9148>})
} ORDER BY DESC(?year) ?title ?name LIMIT 12

Finally, we get the desired result that can be embedded on a website:




Leave a Reply