by Tomi Kauppinen and Benedikt Gräler
Tools are major enablers of Linked Science. One crucial aspect is how to access and analyse data, and especially how to get only that part of data which is of interest for a given research question. Linked Data solves the access part, and SPARQL allows to query only a subset of the data. For statistical computing there are tools like R (a open-source software environment for statistical computing).
For analyzing processes and operations of complex systems such as environmental and societal systems there is a need to have 1) well interconnected data about a system and 2) techniques for statistical computing and for other types of reasoning to find new information, and means to 3) explore and visualize this information.
As a solution to bridge the two communities, those of statistical computing and semantic web, there is now the SPARQL Package for R, which enables to get data from Linked Data services to R for analysis. Spatial data in R is widely represented through objects provided by the package sp - as the following tutorial will do as well.
This tutorial shows
- how to access Linked Spatio-Temporal Data about the complex environmental system of the Brazilian Amazon Rainforest, including deforestation statistics, and a variety of socio-economic variables,
- how to store the spatiotemporal data in a meaningful way and
- how to plot and handle it within R
The complete script is available for download: linkedAmazon_1
Accessing the data
At first, make sure that you have recent versions of the two R packages SPARQL and sp installed. Load the two packages by calling:
library(SPARQL) library(sp)
Define the endpoint that will provide you with the triples by
endpoint <- "http://spatial.linkedscience.org/sparql"
To reduce the XML’s file size, the data is queried piece-wise. The query is initiated by
q <- "SELECT ?cell ?row ?col ?polygon
WHERE {
?cell a <http://linkedscience.org/lsv/ns#Item> ;
<http://spatial.linkedscience.org/context/amazon/Lin> ?row ;
<http://spatial.linkedscience.org/context/amazon/Col> ?col ;
<http://linkedscience.org/lsv/ns#border> ?polygon .
}"
res <- SPARQL(url=endpoint, q)$results
and completed within a loop over all deforestation variables
for(var in c("DEFOR_2002", "DEFOR_2003", "DEFOR_2004", "DEFOR_2005", "DEFOR_2006",
"DEFOR_2007","DEFOR_2008")) {
tmp_q <- paste("SELECT ?cell ?",var,"\n WHERE { \n ?cell a <http://linkedscience.org/lsv/ns#Item> ;\n <http://spatial.linkedscience.org/context/amazon/",var,"> ?",var," .\n }\n",sep="")
cat(tmp_q)
res <- merge(res, SPARQL(endpoint, tmp_q)$results, by="cell")
}
Creating a SpatialPixelsDataFrame
The next lines will coerce the factors to numerical values:
## helper function ##
facToNum <- function(x){
as.numeric(levels(x)[as.numeric(x)])
}
amazon <- res
amazon$row <- -facToNum(res$row)
amazon$col <- facToNum(res$col)
for(var in colnames(amazon)[-(1:4)]) {
amazon[[var]] <- facToNum(amazon[[var]])
}
Assigning coordinates to a data.frame will result in a Spatial-object. Setting the type to gridded will produce a SpatialPixelsDataFrame:
coordinates(amazon) <- ~ col+row gridded(amazon) <- TRUE
Plotting and handling the data
As a first application, we produce a map showing relative deforestation per pixel during 2002 by:
spplot(amazon,"DEFOR_2002",col.regions=rev(heat.colors(17))[-1], at=(0:16)/100, main="relative deforestation per pixel during 2002")
The full data can be shown as time series of maps for the years from 2002 to 2008.
spplot(amazon, c("DEFOR_2002", "DEFOR_2003", "DEFOR_2004", "DEFOR_2005",
"DEFOR_2006", "DEFOR_2007","DEFOR_2008"),
col.regions=rev(heat.colors(26))[-1], at=(0:20)/80,as.table=T,
main="relative deforestation per pixel")
Finally, we calculate the cumulative absolute deforestation per year and plot our findings:
# assuming grid cells of 25km x 25km cumDefor <- apply(amazon@data[,-c(1,2)],2,function(x) sum(x)*25*25) plot(2002:2008,cumDefor,type="b", col="blue", ylab="Deforestation [km²]", xlab="year", main="Deforestation from 2002 to 2008", ylim=c(0,26000))
If you are done with this part I of the tutorial, then you may want to go to the part II of the SPARQL tutorial.
Credits about the script and the Linked Brazilian Amazon Rainforest Data used in this tutorial.
Linked Brazilian Amazon Rainforest Data:
- Aggregation of the spatiotemporal Brazilian Amazon Rainforest data made by Giovana M. de Espindola [1]
- Publication of the data using Linked Open Data technologies made
by Dr. Tomi Kauppinen [2] - Conversion of the coordinates to WGS84: Jim Jones and Alber Sanchez [3]
Script for R to analyze the data:
- Original version by Giovana M. de Espindola. [1]
- Linked Amazon Data enabled version that uses the SPARQL package
by Dr. Tomi Kauppinen [2] and Benedikt Gräler [3]
For citing this tutorial please use the following publication:
- Tomi Kauppinen, Giovana Mira de Espindola and Benedikt Graeler. Sharing and Analyzing Remote Sensing Observation Data for Linked Science. In poster proceedings of the 9th Extended Semantic Web Conference 2012 (ESWC2012), Heraklion, Crete, Greece, May, 2012. (to appear) [BibTeX]
[1] Earth System Science Center, National Institute for Space Research (INPE),
Av dos Astronautas 1758, 12227-010 Sao Jose dos Campos Brazil
[2] http://kauppinen.net/tomi
Institute for Geoinformatics, University of Muenster,
Weseler Strasse 253, 48151 Muenster, Germany
[3] Institute for Geoinformatics, University of Muenster,
Weseler Strasse 253, 48151 Muenster, Germany


