Why to manage and share research data?

Tomi Kauppinen, a blog post for his keynote on “How to manage and share Spatiotemporal Research Data?  Supporting learning and reproducibility online via Linked Open Science.” at the The 3rd LEARN workshop on Research Data Management“Make research data management policies work”, organized by the EU-funded project, LEARN (Leaders Activating Research Networks), Helsinki, June 28th, 2015. 


Why to manage and share research data?

With open data taking on and also open access (to publications), the big question remains: where is open science? I argue that for open science to really fly we need both the

  1. open research data = data used or produced by scientific efforts
  2. open accessible methods = methods in publications made reproducible

But how to do this? By whom, where and when? Essentially, first we need to answer the “why” questions – i.e. figuring out the excellent incentives – and then the other important questions (who, what, where, when, how) will naturally follow.

The “why” question calls us to think about the

  1. Incentives for a researcher to open their data. Thus: why would a researcher open his/her data for others? Is it enough that many journals (e.g. PLOS One, see our article as an example) now require data to be available?
  2. Incentives for funders and research managers to request opening of research data. Thus: why would decision-makers ask for the open data?
  3. Incentives for the society to ask for open data. Thus: why is it useful to have open research data?

Learning as the key term to answer the why questions

If we look at these why-questions there is an interesting answer that covers all of them. The answer to create incentives for opening research data and enabling reproducibility includes a key term, that is,  learning.

Interestingly, learning largely happens via reproducing existing efforts (just think about all the text books and their numerous examples with enumerated steps for reproducing success).

Thus if we manage to reach the learning layer, the reproducibility will follow.

Now let us get back to our “why” questions, and start from the “why” question number 3: what if we agree that the society at large wants to learn about what science produces (like  educating citizens to be well-informed about the world, educating students to be masters in their fields or educating companies to develop new systems and explore growth options)?

The society calls for better ways to to support learning, and preferably online as we are now living in the connected world.

Now we get an answer also for the “why” question number 2: the funders and managers act as the representatives of the society and listen for the requirements. Decision-makers are already in many countries requiring data to be managed and open (for instance NSF in USA with their requirement for the Data Management Plan).  However, as reported just recently by an expert group for the European Open Science Cloud  there is still ” an alarming lack of reproducibility of current published research”.  Thus after carefully listening the society decision-makers should increasingly ask for the learning and reproducibility layers as a prerequisite for positive funding agreements.

Incentivizing researchers via learning and communication settings

Now the last but not least “why” question number 1 concerning our researcher. Clearly, the availability of funding creates an incentive for the researcher to support reproducing of the research, and thus a proper research data management allowing to do so.  However, there is a bigger and better answer to the why question. Science is communication and so is learning. If we allow the researcher to move from the rather tedious task of “just research data management” to be able to allow others to learn (students, citizens, company people) how to in fact reproduce interesting research settings the picture is suddenly quite completely different.

Indeed, many researchers are also teachers and look for excellent ways for communicating what they feel is important for students to learn about. By creating a culture-shift towards online learning and reproducibility by utilizing excellent research data we thus create big incentives  for researchers to engage themselves in proper research data management.

Let us check some examples

As for examples there is the LODUM – Linked Open Data University of Münster project where we showed how to create the data infrastructure and the learning layer. The data created as part of LODUM has been in use by not only many student projects but also by new funded projects. As an example below is a visualization showing the amount of publications by university buildings (Keßler and Kauppinen, 2012).

Publications analyzed by buildings depict big differences among them.

Clearly creating of useful research data management schemes via opening linked data online calls for a culture-shift from traditional paper-as-the-end-result kind of publishing. To answer this call, Linked Open Science is an approach to enable interconnecting of scientific assets for allowing reproducibility and learning to happen.

Linked Open Science?

Linked Open Science (Kauppinen and de Espindola 2011) builds on the four key elements:

  • Linked Data: Input data, results and provenance information are published and archived using the Linked Data principles.
  • OpenSource and Web-based Environments: Methods are written for publication in open source environments.
  • Cloud Computing: The execution of methods and access to various resources are provided using the Cloud Computing approach.
  • Creative Commons: CC Licensing is in use to provide the legal and technical infrastructure for scientific assets.

This allows for creating of greater reproducibility environments where students and researchers can learn and explore new questions. In the context of complex phenomena such as the Brazilian Amazon Rainforest one can ask: How to link ecological, economical and social data? (Kauppinen et al. 2014)  What related processes can we evidence about the Brazilian Amazon Rainforest by interacting with visualizations? (Bartoschek et al. 2013). For this, tutorials of LinkedScience.org support  online learning.

An example visualization built on top of the Linked Brazilian Amazon Rainforest Data depicting the relation between GDP (the heights) and deforestation rates (red=more deforestation).


How does science work?

Further on, by studying scientific assets that are interconnected according to the Linked Science approach, it could perhaps be possible to find interesting laws about how science itself works. For instance, lately we analyzed data on 100 000 participations of scientists in conferences to reveal the associative nature of conference participation (Smiljanić, Chatterjee,  Kauppinen, Mitrović Dankulov 2016). See below a figure made to illustrate the idea in a visual way, and thus support learning about the research finding.

Storytelling via an example to illustrate the associative nature of conference participation
Here we illustrate the idea of the associative nature of conference participation via a simple example. Jim participated in a conference twice, then skipped one and participated once again, but did not participate at all after that. Tim participated the first five times and, although he skipped one conference, he then participated three times. The colors illustrate the likelihood to participate (red more probable, blue less probable).

To summarize

  • We need to focus on why-questions to find true incentives for different parties (researchers, decision-makers, citizens) to do and require proper research data management
  • As we discussed, learning is a great incentive as it requires good communication, and in essence often also reproducibility built on research data
  • Linked Open Science is an approach to interconnect scientific assets and to support reproducibility and learning
  • There is a big potential for research on understanding how science itself works by analyzing the traces left by researchers and scientific assets they produce.

Please contact via  @LinkedScience. The slides for this LEARN keynote are available online.




The model for likelihood to participate in conferences can be used to improve communities

Originally posted at Aalto University news.

Modelling revealed that the probability of participating in the same conference again increases in relation to previous regular participation.

Storytelling via an example to illustrate the associative nature of conference participation
Here we illustrate the idea of the associative nature of conference participation via a simple example. Jim participated in a conference twice, then skipped one and participated once again, but did not participate at all after that. Tim participated the first five times and, although he skipped one conference, he then participated three times. The colors illustrate the likelihood to participate (red more probable, blue less probable).

Researchers at Aalto University, Institute of Physics Belgrade and the Saha Institute in Kolkata have used a computational model to prove that participants make a more favourable decision to participating in scientific conferences the more often they have previously participated in the conference. The likelihood to participate grows regardless of the qualities of the conference, like its location, size or specialization.

“This first result opens up a novel, very rich research field. It will be interesting to study whether  the same behavior can be discovered  in other types of participations as well. Further on, the research agenda can include studying what kinds of actions increase community feelings and thus get people to participate. Our model can be used to research and understand participation phenomena, and perhaps can be used as a basis for new community building methods”, says docent Tomi Kauppinen from the Aalto University.

The researchers collaborated to analyse data from six scientific conferences of different sizes and programmes, held in different locations. The data comprised approximately 100 000 individual participation details covering a period of up to 30 years. The calculations are based on the so called Pólya Urn model, which is a probability theory based model used to make quantitative analysis of large sets of data. The result of the study was recently published in the scientific PLOS ONE journal.

“Modelling revealed that the probability of a researcher participating in the same conference again increases in relation to previous regular participation, and reduces when participation is irregular,’ explains the person responsible for the modelling, Marija Mitrović Dankulov from the Institute of Physics Belgrade. ‘The outcome is a fairly obvious one, but community inclusiveness, the common factor that we perceived, is apparent in all conference participation, and for the first time we were able to show this with the help of modelling.”

The result is in line with the so-called power law, which is a common physical law that is realised in many natural phenomena like the sizes of earthquakes or moon craters. Further on, also man-made phenomena like word frequencies in most languages follow the power law.

Digital information provides physicists and data scientists interested in societal phenomena and other researchers with immense possibilities to model social phenomena. The researchers already have thoughts about future research topics.

“It will be interesting to study whether our model can explain participation patterns of events organized both in physical places and online. Further on, by studying scientific assets that are interconnected according to the Linked Science approach, it could perhaps be possible to find interesting laws about how science works beyond these participation laws”, says Tomi Kauppinen.

Two of the co-authors Marija Mitrović Dankulov and Arnab Chatterjee, Saha Institute, are alumni of Aalto University.

The PLOS ONE publication:  A theoretical model for the associative nature of conference participation, PLOS ONE 11 (2016) e0148528

More information:

Tomi Kauppinen, Project Manager, Docent, Ph.D.
Tel: +358504315789
Aalto University School of Science
web: kauppinen.net/tomi
twitter: @LinkedScience

Marija Mitrović Dankulov, Dr.
Phone: +381 11 3713068
Institute of Physics Belgrade

ESWC 2016 Special Track on Smart Cities, Urban and Geospatial Data


ESWC is one of the key academic conferences to present research results and new developments in the area of the Semantic Web. For its 13th edition, ESWC will be back in Hersonissou, Crete, between Sunday May 29th and Thursday June 2nd 2016.

This time, ESWC will feature a special track on smart cities, urban and geospatial data:

Track Description

More than half of the world’s population is already living in urban areas today. UN projections show that this proportion will grow to 66% by 2050, adding another 2.5 billion people to our cities. Geospatial data provided by sensor networks, different remote sensing technologies, citizen scientists, social networks, as well as Open Data initiatives helps cities address these challenges and transform into smart cities.

However, in such a diversity of information, it is a fact that large amounts of valuable open data and sensor information remain unused, and aggregation of information from various sources is typically limited to specific application domains, with organizations and cities reaping the benefits often only after extensive investments. With the very most of the world’s information today still handled in siloes, there is an enormous potential for better information management, search, discovery and reuse of heterogeneous urban data using Semantic Technologies, in order to make cities more intelligent, innovative and integrated beyond the boundaries of isolated applications.

In this track, we invite submissions that address the use of Semantic Web technologies in the context of this transformation process. Submissions to this track should contain original, unpublished research that shows how urban and smart city applications can benefit from Semantic Web technologies. Authors are strongly encouraged to include concrete application examples, ideally using real data, in their papers. Papers in this track will be evaluated on the basis of the impact of semantic technologies in the society and the extent to which they address real-life problems in the context of cities. Papers are also expected to evaluate or provide a deeper insight on the significant advantages of a semantic solution over state of the art, common practitioner no semantic solutions.


  • Semantic integration and processing of remotely sensed data and data from in-situ sensors
    Semantic models for spatial-temporal change
  • The city as an API
  • Semantics of urban sensor networks
  • Semantic integration of distributed urban data
  • Semantic analysis of data streams
  • Semantic Web applications addressing urban topics such as transport, energy, building, safety, water, food, waste, or emissions
  • Semantics for citizen-centric Smart cities
  • Application of semantic technologies, sensors and semantic streams for e-Health, Life Sciences, e-Government, Environmental Monitoring, Cultural Heritage, Utility Services or Social Sensing
  • Intelligent User Interfaces and Interaction Paradigms that profit from semantics and knowledge graphs over Web Data, open government and corporate data relating to cities
  • Context- and location-aware (mobile) applications based on semantic technologies and geo-semantics
  • Provenance, access control, trust and privacy-preserving issues in smart cities
  • Semantic-based cloud applications for Smart Cities
  • Semantic reasoning, event detection, knowledge extraction and analytics for smart city platforms
  • Big data and scaling out in semantic cities. Managing real time and historical city data using knowledge representation models
  • Semantic platforms, knowledge acquisition, publishing, consumption, evolution and maintenance of city data


All deadlines are at 23:59 Hawaii Time.

Compulsory abstract submission for all papers: Friday 11th December 2015
Compulsory full paper submission: Friday 18th December 2015
Authors rebuttal: Friday 29th Jan – Friday 5th Feb 2016
Acceptance notification: Monday 22nd February 2016
Camera ready: Monday 7th of March 2016

Track Chairs

Carsten Kessler, Hunter College, City University of New York
Vanessa Lopez, IBM Research Ireland

Opening Reproducible Research project is hiring

Opening Reproducible Research (ORR) project at the Institute for Geoinformatics, University of Münster, Germany has announced the following two open positions (deadline for applying October 15, 2015):

If your institute has open positions related to Open Science or Linked Science (or both!), please share news about them to us via @LinkedScience or tomi.kauppinen@aalto.fi and we will add them to LinkedScience.org/jobs.


Five papers accepted to COSIT workshop on Teaching Spatial Thinking

We accepted the following five papers to be presented at the Workshop on Teaching Spatial Thinking from Interdisciplinary Perspectives at COSIT2015:

Announcement by Tomi Kauppinen (co-chair), on behalf of the organizing committee.

Six papers accepted to Linked Science 2015

We are happy to announce that the following papers were accepted to this year’s Workshop on Linked Science organized at ISWC2015 in Bethlehem, Pennsylvania, USA on October 12th, 2015.

  • Tony Hammond and Michele Pasin. The nature.com ontologies portal
  • Da Huo, Jaroslaw Nabrzyski and Charles Vardeman. An Ontology Design Pattern towards Preservation of Computational Experiments
  • Carsten Keßler. Using the Web as a Data Source: Challenges for Linked Science
  • Tobias Kuhn. nanopub-java: A Java Library for Nanopublications
  • Paulo Pinheiro, Deborah McGuinness and Henrique Santos. Human-Aware Sensor Network Ontology: Semantic Support for Empirical Data Collection
  • Rui Yan, Brenda Praggastis, William Smith and Deborah McGuinness. Towards Cache Maintenance for Ontology Based, History-Aware Stream Reasoning

Announced by Tomi Kauppinen, Co-chair of the 5th Workshop on Linked Science 2015— Best Practices and the Road Ahead (LISC2015)

Teaching Spatial Thinking from Interdisciplinary Perspectives

Workshop on Teaching Spatial Thinking from Interdisciplinary Perspectives (SPATIALTHINKING2015)

When: October 12, 2015
Where: Santa Fe, New Mexico, USA
Collocated with Conference on Spatial Information Theory XII (COSIT 2015)
Workshop URI: http://linkedscience.org/events/spatialthinking2015/
Hashtag: #SpatialThinking2015

The “Teaching Spatial Thinking from Interdisciplinary Perspectives” (SPATIALTHINKING2015)  workshop’s goals are to:
1) Assist educators in developing interdisciplinary courses on spatial thinking.
2) Develop a repository of educational materials that educators could use to create interdisciplinary courses on spatial thinking.

Organizers of SPATIALTHINKING2015 are Heather Burte (UCSB), Tomi Kauppinen (Aalto Uni) and Mary Hegarty (UCSB).

Read more and welcome to join!



Tutorial on Visual Analytics at ESWC2015

We are happy to announce that we will arrange the Tutorial on Visual Analytics with Linked Open Data and Social Media (VisLOD2015) at ESWC2015 in Portoroz, Slovenia on May 31 or June 1, 2015.

In the tutorial we will focus on mining and visualizing of interesting spatial, temporal and thematic patterns from Linked Open Data and Social Media.

The teachers of the tutorial are Dr.  Suvodeep Mazumdar (Uni Sheffield), Dr  Tomi Kauppinen (Aalto Uni) and Dr.   Anna Lisa Gentile (Uni Sheffield).

[more information on ViSLOD2015…]

The program of #VISUAL2014

We are happy to announce the program of VISUAL2014 (International Workshop on Visualizations and User Interfaces for Knowledge Engineering and Linked Data Analytics).

When: November 24, 2014
Where: Linköping, Sweden
Collocated with EKAW2014, 19th International Conference on Knowledge Engineering and Knowledge Management
Workshop URI: http://linkedscience.org/events/visual2014/
Hashtag: #VISUAL2014

Workshop Program

09:15 – 09:25: Opening
09:25 – 10:30: Session I: Ontology Visualization (Session Chair: Valentina Ivanova)
09:25 – 09:50: A Vision for Diagrammatic Ontology Engineering (full paper), Gem Stapleton, John Howse, Adrienne Bonnington, Jim Burton
09:50 – 10:15: OntoViBe – An Ontology Visualization Benchmark (full paper), Florian Haag, Steffen Lohmann, Stefan Negru, Thomas Ertl
10:15 – 10:30: Discussion Session I

10:30 – 11:00: Coffee Break

11:00 – 12:30: Session II: User-Oriented Ontology Alignment (Session Chair: Steffen Lohmann)
11:00 – 11:20: What Can the Ontology Describe? Visualizing Local Coverage in PURO Modeler (short paper), Marek Dudas, Tomas Hanzal, Vojtech Svatek
11:20 – 11:45: User Involvement for Large-Scale Ontology Alignment (full paper), Valentina Ivanova, Patrick Lambrix
11:45 – 12:00: Discussion Session II
12:00 – 12:30: Wrap Up Sessions I+II

12:30 – 14:00: Lunch Break

14:00 – 15:00: Session III: Visual Approaches to Linked Data (Session Chair: Valentina Ivanova)
14:00 – 14:25: Sensemaking on Wikipedia by Secondary School Students with SynerScope (full paper), Willem Robert Van Hage, Fernando Nunez-Serrano, Thomas Ploeger, Jesper Hoeksema
14:25 – 14:45: Towards a Visual Annotation Tool for End-User Semantic Content Authoring (short paper), Torgeir Lebesbye, Ahmet Soylu
14:45 – 15:00: Discussion Session III

15:00 – 15:30: Coffee Break

15:30 – 17:00: Session IV: Demo Jam (Session Chair: Steffen Lohmann)
15:30 – 16:30: Impromptu demos (everyone is invited to join and present)
16:30 – 17:00: Wrap Up Sessions III+IV
17:00 – End of Workshop

4th Workshop on Linked Science 2014— Making Sense Out of Data

4th Workshop on Linked Science 2014— Making Sense Out of Data (LISC2014)

We are proud to announce that we will have Dr Harith Alani from the Open University, UK as our keynote speaker for LISC2014.

Here once again the temporal and spatial coordinates for joining the Linked Science 2014 workshop:

When: October 19, 2014 (Full-day)
Where: Riva del Garda, Trentino, Italy
Collocated with the 13th International Semantic Web Conference (ISWC2014).
Workshop URI: http://linkedscience.org/events/lisc2014

You may also join remotely by following the hashtag #LISC2014.


9.15 – 9.30: Opening and introduction

9.30 – 10.30: Keynote: Harith Alani (Open University)

10:30 – 11:00 Coffee break

11:00 – 12:20: Paper presentation session I: Making sense out of scholarly data

11:00 – 11:20 Hajar Ghaem Sigarchian, Ben De Meester, Tom De Nies, Ruben Verborgh, Wesley De Neve, Erik Mannens and Rik Van de Walle. EPUB3 for Integrated and Customizable Representation of a Scientific Publication and its Associated Resources

11:20 – 11:40 Angelo Di Iorio, Silvio Peroni, Fabio Vitali and Jacopo Zingoni. Semantic lenses to bring digital and semantic publishing together

11:40 – 12:00 Francesco Osborne, Silvio Peroni and Enrico Motta. Clustering Citation Distributions for Semantic Categorization and Citation Prediction

12:00 – 12:20 Olga Giraldo, Alexander Garcia and Oscar Corcho. SMART Protocols: SeMAntic RepresenTation for Experimental Protocols

12:30 – 14:00 Lunch

14:00 – 15:30 Paper presentation session II: Modelling Scientific Data

14:00 – 14:20 Laleh Kazemzadeh, Maulik Kamdar, Oya Beyan, Stefan Decker and Frank Barry. LinkedPPI: Enabling Intuitive, Integrative Protein-Protein Interaction Discovery

14:20 – 14:40 Jodi Schneider, Paolo Ciccarese, Tim Clark and Richard D. Boyce. Using the Micropublications ontology and the Open Annotation Data Model to represent evidence within a drug-drug interaction knowledge base

14:40 – 15:00 Simon Jupp, James Malone and Alasdair J. G. Gray. Capturing Provenance for a Linkset of Convenience

15:00 – 15:20 Evan Patton and Deborah McGuinness. Connecting Science Data Using Semantics and Information Extraction

15:20 – 16:00 coffee break

16:00 – 17:30 Co-writing session: how can linked science techniques help with ‘sense making out of (scientific) data’ 2+45 minutes

  • 1st 45 minutes: produce linked science matrix in breakout groups, correlating sense-making challenges and technologies used to address these challenges

  • 2nd 45 minutes: merging matrices into consensus view and/or paper ideas/blog post, by identifying common challenges, technology bridges and/or gaps