Category Archives: SPARQL

Workshop on GIScience in the Big Data Age 2012

Workshop on GIScience in the Big Data Age 2012 (GIBDA2012) will be organized in conjunction with the seventh International Conference on Geographic Information Science 2012 (GIScience 2012) in Columbus, Ohio, USA on September 18th, 2012.

Workshop Description and Scope

The rapidly increasing information universe with new data created at a speed surpassing our capacities to store it, calls for improved methods to retrieve, filter, integrate, and share data. The vision of a data-intensive science hopes that the open availability of data with a higher spatial, temporal, and thematic resolution will enable us to better address complex scientific and social questions. However, on the downside, understanding, sharing, and reusing these data becomes more challenging. Big Data is not only big because it involves a huge amount of data, but also because of the high-dimensionality and inter-linkage of these data sets. The on-the-fly integration of heterogeneous data from various sources has been named one of the frontiers of Digital Earth research, Bioinformatics, the Digital Humanities, and other emerging research visions.

From a more technical perspective, a knowledge infrastructure is required to handle Big Data. Currently, the most promising approach is the Linked Data cloud. While the Web has changed with the advent of the Social Web from mostly authoritative towards increasing amounts of user-generated content, it is essentially still about linked documents. These documents provide structure and context for the described data and easy their interpretation. In contrast, the upcoming Data Web is about linking data, not documents. Such data sets are not bound to a specific document but can be easily combined and used outside of the original context. With a growth rate of millions of new facts encoded as RDF-triples per month, the Linked Data cloud allows users to answer complex queries spanning multiple sources. Due to the uncoupling of data from its original creation context, semantic interoperability, identity resolution, and ontologies are central methodologies to ensure consistency and meaningful results.

Space and time are fundamental ordering relations to structure such data and provide an implicit context for their interpretation. Prominent geo-related Linked Data hubs include as well as the Linked Geo Data project, which provides a RDF serialization of Open Street Map. Furthermore, many other Linked Data sources contain location references, e.g., observation data provided by sensors.

This full day workshop is a follow-up event of the successful first workshop on Linked Spatiotemporal Data at GIScience 2010. While this first workshop was centered around Linked Data and geo-ontologies, the GiBDA 2012 workshop takes a broader perspective by highlighting data-intensive science as the research vision and Linked Data as a promising knowledge infrastructure. We hope that the workshop will help better define the data, knowledge representations, infrastructure, reasoning methodologies, and tools needed to link and query massive data based on their spatial and temporal characteristics.

List of Relevant Topics

Topics of interest for the Linked Spatiotemporal Data workshop include (but are not limited to):

  • Mining Big Data

    • Learning geo-ontologies out of massive data
    • Abduction-based frameworks and systems
    • Mining Location-based Social Networks
    • Studying the geo-indicativeness of massive, semi-structured data
    • Analogy-based search in Big Data
    • Semantic heterogeneity and ontology alignment
    • Semantics-enabled geo-statistics
  • Retrieving and browsing of Linked Spatiotemporal Data

    • Learning Linked Spatiotemporal Data from existing sources
    • Spatiotemporal indexing of Linked Data
    • Harvesting Linked Data from heterogeneous sources
    • Spatial extensions to query languages (e.g., GeoSPARQL)
    • Visualizing and browsing through Linked Spatiotemporal Data
  • Big Data and Volunteered Geographic Information (VGI)

    • Spatiotemporal aspects of data quality, trust, and provenance
    • Tag and vocabulary recommendations for annotating VGI
    • Maintenance of outgoing links
  • Application of Linked Spatiotemporal Data

    • Linked Data and Sensor Web Enablement (SWE)
    • Linked Data and mobile applications
    • Linked Data gazetteers and Points Of Interest
    • Linked Data in the domain of cultural heritage research
  • Integration and Interoperation of Linked Spatiotemporal Data

    • Ontologies and vocabularies to support interoperability
    • Geo-Ontology Design Patterns
    • Identity assumptions and resolution for data fusion and integration
    • The role of space and time to structure Linked Data
    • Versioning of spatiotemporal data.
    • Semantic annotation and Microformats
    • Adding contextual information to Linked Data

Workshop Format and Structure

The full day workshop will focus on intensive discussions setting a roadmap towards publishing, structuring, retrieving, and consuming Linked Spatiotemporal Data and understanding how GIScience can contribute to the vision of a data-intensive science. The workshop will accept three kinds of contributions, full research papers presenting new work in the indicated areas, statements of interest, and data challenge papers. While the research papers will be selected based on the review results adhering to classical scientific quality criteria, the statements of interest should raise questions, present visions, and point to the open gaps. However, statements of interest will also be reviewed to ensure quality and clarity of the presented ideas.

We also welcome demonstrations of existing tools, applications, and geo-ontologies. Details for the data challenge are given below. The presentation time per speaker will be restricted to 5 minutes for statements of interest and 10 minutes for full papers. Based on the presented work, all workshop participants will decide on 2–3 research topics to be discussed in breakout groups. In a final session, the breakout groups will present their findings on research topics and challenges and try to integrate them across the discussed topics.

Submissions and Proceedings

All presented papers will be made available through the workshop Web-page, the electronic conference proceedings of GIScience 2012, as well as via CEUR-WS. Full research papers should be approximately 7-10 pages, while statements of interest and data challenge papers should be between 5-6 pages. Selected papers may be considered for a fast-track submission to the Semantic Web journal by IOS Press.

Please upload your submission using the workshop’s EasyChair web-page.

Data Challenge

The website contains a growing collection of metadata for proceedings of conferences on topics related to geographic information science. So far, it contains most of the metadata for the GIScience, COSIT, ACM GIS, and AGILE conference series. Within the GIBDA Data Challenge, we are looking for

  • innovative analyses of the data
  • interactive visualizations
  • approaches for cleaning the data up
  • pattern and topic mining
  • enrichment and interlinking with other datasets (e.g., from the Linked Data cloud)
  • insights into GIScience as research field
  • adding social roles and aspects

The raw data can be queried via SPARQL using the SPARQL endpoint Submissions to the data challenge are to be submitted through EasyChair as a brief description of the entry, along with a link to the demo/analysis/dataset. Entries to the challenge will be evaluated by the program committee based on innovativeness and potential impact. The winner will be awarded a $250 price and will present at the workshop.

Important Dates

  • Submission due: 18. June 2012
  • Acceptance Notification: 6. July 2012
  • Camera-ready Copies: 16. July 2012


Programme Committee

  • TBA

Related Activities

Please feel free to contact the organizers for further questions at jano @ geog . ucsb. edu.


Tutorial on SPARQL Package for R

Tools are major enablers of Linked Science. One crucial aspect is how to access and analyze data, and especially how to get only that part of data which is of interest for a given research question.  Linked Data solves the access part, and SPARQL allows to query only a subset of the data. For statistical computing there are tools like R. As a solution to bridge the two communities, those of statistical computing and  semantic web, there is now a SPARQL Package for R, which enables to get data from Linked Data services to R for analysis. At you may now find a tutorial on SPARQL Package for R. By following it you can learn:

  1. how to access and query Linked Spatiotemporal Data of the  deforestation statistics related to the Brazilian Amazon Rainforest, and
  2. how to analyze it within R (which is a free software environment for statistical computing).