Tutorial part II – Calculating Deforestation per State

by Benedikt Gräler

This tutorial part II of using the SPARQL Package for R with the Brazilian Amazon Rainforest Data will illustrate the following:

  1. retrieval of the data including the pixel’s boundaries
  2. getting administrative boundaries from GADM
  3. aggregation of deforestation measures by state
  4. plotting the findings

The more advanced R users might want to immediately download the complete R-script: linkedAmazon_2.R from GitHub.

Data Retrieval

Load the libraries (SPARQL and sp)


The following few lines are identical with the first tutorial. In case you are using the same R workspace, you can skip them and continue with Building the Spatialdataframe.

Set the endpoint providing the linked data:

endpoint <- "http://spatial.linkedscience.org/sparql"

Due to the data size, it is recommended to query the triples piece-wise by initiating the data retrieval by

q <- "SELECT ?cell ?row ?col ?polygon
  ?cell a <http://linkedscience.org/lsv/ns#Item> ;
       <http://spatial.linkedscience.org/context/amazon/Lin> ?row ;
       <http://spatial.linkedscience.org/context/amazon/Col> ?col ;
       <http://observedchange.com/tisc/ns#geometry> ?polygon .
res <- SPARQL(url=endpoint, q)$results

and running a loop over all deforestation variables from 2002 to 2008:

for(var in c("DEFOR_2002", "DEFOR_2003", "DEFOR_2004", "DEFOR_2005",
             "DEFOR_2006", "DEFOR_2007", "DEFOR_2008")) {
  tmp_q <- paste("SELECT ?cell ?", var, "\n", "WHERE { \n ?cell a <http://linkedscience.org/lsv/ns#Item> ;\n <http://spatial.linkedscience.org/context/amazon/",var,"> ?",var," .\n }\n",sep="")
  cat(tmp_q) # plot the query to check its spelling
  res <- merge(res, SPARQL(endpoint, tmp_q)$results, by="cell")

Building the Spatialdataframe

Checking the variable polygon of the result object res by


shows that it is provided as a vector of strings describing a quadrangle by its closed poly-line. These strings have to be parsed and coerced to a SpatialPolygons-object from the sp package. This can be done by the following helper-function

createPolygons <- function(alistElem) {
  Polygon( matrix( unlist( lapply( strsplit( strsplit(alistElem, ";")[[1]], ","), as.numeric ) ), ncol=2, byrow=T ) )
polys <- lapply(res$polygon, createPolygons)

and a loop over all pixels (represented as rows in the data.frame res by now):

allPolys <- NULL
for ( i in 1:length(polys)) {
  allPolys <- append(allPolys, list(Polygons(list(polys[[i]]), ID=i)))

spPoly <- SpatialPolygons(allPolys, proj4string=CRS("+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs"))

Similar as in the preceding tutorial part I, we put everything together in a SpatialPolygonsDataFrame by:

amazon <- res
amazon$row <- -facToNum(res$row)
amazon$col <- facToNum(res$col)
for(var in colnames(amazon)[-(1:4)]) {
  amazon[[var]] <- facToNum(amazon[[var]])
amazonPoly <- SpatialPolygonsDataFrame(spPoly, data=amazon)

For testing purposes, plot the deforestation in 2002 by calling:

spplot(amazonPoly, "DEFOR_2002", col.regions=bpy.colors(),
       main="relative deforestation in 2002 by pixel")

Getting the administrative boundaries

The website www.gadm.org provides Global Administrative Areas. Fortunately, the boundaries are besides shape-files and Google’s kmz-files as well provided as RData-files containing a SpatialPolygonsDataFrame. We can load this data directly from R by calling:


The summary of the object lists all variables’ basic characteristics, the bounding box and the proj4string.  The administrative boundaries are as well as our data provided in latitude and longitude based on the WGS84 ellipsoid.

As a first application, we add the state boundary layer on top of the preceding plot using a different colour scheme:

spplot(amazonPoly, "DEFOR_2002",
       sp.layout = list("sp.polygons", gadm),
       main="realtive deforestation in 2002 by pixel")

Aggregation of pixels to states

At first, we will use the approximate of identifying pixels by their centroid. The overlay of the centroids and the state boundaries will yield desired mapping:

getCentroids <- function(spP) {
  t(sapply(spP@polygons, function(x) x@labpt))

amazonCentr <- SpatialPointsDataFrame(coords=getCentroids(amazonPoly),
proj4string(amazonCentr) <- proj4string(amazonPoly) # setting the proj4string
proj4string(gadm) <- proj4string(amazonPoly) # setting the proj4string

We calculate three aggregates, the absolute deforestation, the relative deforestation and the number of pixels for each year and state:

# estimating the absolute deforestation
addDefor <- function(x) {
  sum(x,na.rm=T)*25*25 # assuming cells of approx. 25kmx25km

# leave the meta data out (columns 1:4)
aggAmazonAdd <- aggregate(amazonCentr[,-(1:4)], gadm, addDefor)
aggAmazonRel <- aggregate(amazonCentr[,-(1:4)], gadm, mean, na.rm=T)

aggAmazonAdd$name <- gadm$HASC_1
aggAmazonRel$name <- gadm$HASC_1

# correct for some miss-aligned centroids
countCells <- function(x) sum(!is.na(x))
aggAmazonCount <- aggregate(amazonCentr[,-(1:4)], gadm, countCells)

bool <- which(apply(aggAmazonCount@data[,1:7]>50,1,prod)==1)

aggAmazonAdd <- aggAmazonAdd[bool,]
aggAmazonRel <- aggAmazonRel[bool,]

The third aggregate was only used to drop all values in states that have been assigned with less than 50 pixels.


To illustrate our findings, we will produce three different plots. At first, the absolute deforested area during 2002 per state:

spplot(aggAmazonAdd, "DEFOR_2002", col.regions=rev(heat.colors(21)),
       ylim=amazonPoly@bbox[2,], at=(0:21)*500,
       main="deforested area per state during 2002 [km]²",
       sp.layout=list("sp.text", getCentroids(aggAmazonAdd), aggAmazonAdd$name))

The relative deforested area during 2002 per state:

spplot(aggAmazonRel, "DEFOR_2002", col.regions=rev(heat.colors(20)),
       ylim=amazonPoly@bbox[2,], at=(0:15)/1000,
       main="relative deforested area per state during 2002",
       sp.layout=list("sp.text", getCentroids(aggAmazonRel), aggAmazonRel$name))

And finally a plot showing the development of the relative deforestation per year from 2002 to 2008 per state:

# transposing the dataframe
trdf <- as.data.frame(t(aggAmazonRel@data[-8]))
colnames(trdf) <- as.character(aggAmazonRel@data$name)

       data=trdf, type="l", xlab="year", ylab="relative deforestation",
       auto.key=list(space = "right", points = FALSE, lines = TRUE),
       main="relative deforestation per state during different years")

Leave a Reply