All posts by linkedscientist

Why to manage and share research data?

Tomi Kauppinen, a blog post for his keynote on “How to manage and share Spatiotemporal Research Data?  Supporting learning and reproducibility online via Linked Open Science.” at the The 3rd LEARN workshop on Research Data Management“Make research data management policies work”, organized by the EU-funded project, LEARN (Leaders Activating Research Networks), Helsinki, June 28th, 2015. 

presentation-headline

Why to manage and share research data?

With open data taking on and also open access (to publications), the big question remains: where is open science? I argue that for open science to really fly we need both the

  1. open research data = data used or produced by scientific efforts
  2. open accessible methods = methods in publications made reproducible

But how to do this? By whom, where and when? Essentially, first we need to answer the “why” questions – i.e. figuring out the excellent incentives – and then the other important questions (who, what, where, when, how) will naturally follow.

The “why” question calls us to think about the

  1. Incentives for a researcher to open their data. Thus: why would a researcher open his/her data for others? Is it enough that many journals (e.g. PLOS One, see our article as an example) now require data to be available?
  2. Incentives for funders and research managers to request opening of research data. Thus: why would decision-makers ask for the open data?
  3. Incentives for the society to ask for open data. Thus: why is it useful to have open research data?

Learning as the key term to answer the why questions

If we look at these why-questions there is an interesting answer that covers all of them. The answer to create incentives for opening research data and enabling reproducibility includes a key term, that is,  learning.

Interestingly, learning largely happens via reproducing existing efforts (just think about all the text books and their numerous examples with enumerated steps for reproducing success).

Thus if we manage to reach the learning layer, the reproducibility will follow.

Now let us get back to our “why” questions, and start from the “why” question number 3: what if we agree that the society at large wants to learn about what science produces (like  educating citizens to be well-informed about the world, educating students to be masters in their fields or educating companies to develop new systems and explore growth options)?

The society calls for better ways to to support learning, and preferably online as we are now living in the connected world.

Now we get an answer also for the “why” question number 2: the funders and managers act as the representatives of the society and listen for the requirements. Decision-makers are already in many countries requiring data to be managed and open (for instance NSF in USA with their requirement for the Data Management Plan).  However, as reported just recently by an expert group for the European Open Science Cloud  there is still ” an alarming lack of reproducibility of current published research”.  Thus after carefully listening the society decision-makers should increasingly ask for the learning and reproducibility layers as a prerequisite for positive funding agreements.

Incentivizing researchers via learning and communication settings

Now the last but not least “why” question number 1 concerning our researcher. Clearly, the availability of funding creates an incentive for the researcher to support reproducing of the research, and thus a proper research data management allowing to do so.  However, there is a bigger and better answer to the why question. Science is communication and so is learning. If we allow the researcher to move from the rather tedious task of “just research data management” to be able to allow others to learn (students, citizens, company people) how to in fact reproduce interesting research settings the picture is suddenly quite completely different.

Indeed, many researchers are also teachers and look for excellent ways for communicating what they feel is important for students to learn about. By creating a culture-shift towards online learning and reproducibility by utilizing excellent research data we thus create big incentives  for researchers to engage themselves in proper research data management.

Let us check some examples

As for examples there is the LODUM – Linked Open Data University of Münster project where we showed how to create the data infrastructure and the learning layer. The data created as part of LODUM has been in use by not only many student projects but also by new funded projects. As an example below is a visualization showing the amount of publications by university buildings (Keßler and Kauppinen, 2012).

lodum-productivity
Publications analyzed by buildings depict big differences among them.

Clearly creating of useful research data management schemes via opening linked data online calls for a culture-shift from traditional paper-as-the-end-result kind of publishing. To answer this call, Linked Open Science is an approach to enable interconnecting of scientific assets for allowing reproducibility and learning to happen.

Linked Open Science?

Linked Open Science (Kauppinen and de Espindola 2011) builds on the four key elements:

  • Linked Data: Input data, results and provenance information are published and archived using the Linked Data principles.
  • OpenSource and Web-based Environments: Methods are written for publication in open source environments.
  • Cloud Computing: The execution of methods and access to various resources are provided using the Cloud Computing approach.
  • Creative Commons: CC Licensing is in use to provide the legal and technical infrastructure for scientific assets.

This allows for creating of greater reproducibility environments where students and researchers can learn and explore new questions. In the context of complex phenomena such as the Brazilian Amazon Rainforest one can ask: How to link ecological, economical and social data? (Kauppinen et al. 2014)  What related processes can we evidence about the Brazilian Amazon Rainforest by interacting with visualizations? (Bartoschek et al. 2013). For this, tutorials of LinkedScience.org support  online learning.

gdp2005
An example visualization built on top of the Linked Brazilian Amazon Rainforest Data depicting the relation between GDP (the heights) and deforestation rates (red=more deforestation).

 

How does science work?

Further on, by studying scientific assets that are interconnected according to the Linked Science approach, it could perhaps be possible to find interesting laws about how science itself works. For instance, lately we analyzed data on 100 000 participations of scientists in conferences to reveal the associative nature of conference participation (Smiljanić, Chatterjee,  Kauppinen, Mitrović Dankulov 2016). See below a figure made to illustrate the idea in a visual way, and thus support learning about the research finding.

Storytelling via an example to illustrate the associative nature of conference participation
Here we illustrate the idea of the associative nature of conference participation via a simple example. Jim participated in a conference twice, then skipped one and participated once again, but did not participate at all after that. Tim participated the first five times and, although he skipped one conference, he then participated three times. The colors illustrate the likelihood to participate (red more probable, blue less probable).

To summarize

  • We need to focus on why-questions to find true incentives for different parties (researchers, decision-makers, citizens) to do and require proper research data management
  • As we discussed, learning is a great incentive as it requires good communication, and in essence often also reproducibility built on research data
  • Linked Open Science is an approach to interconnect scientific assets and to support reproducibility and learning
  • There is a big potential for research on understanding how science itself works by analyzing the traces left by researchers and scientific assets they produce.

Please contact via  @LinkedScience. The slides for this LEARN keynote are available online.

Links:

References

 

The model for likelihood to participate in conferences can be used to improve communities

Originally posted at Aalto University news.

Modelling revealed that the probability of participating in the same conference again increases in relation to previous regular participation.

Storytelling via an example to illustrate the associative nature of conference participation
Here we illustrate the idea of the associative nature of conference participation via a simple example. Jim participated in a conference twice, then skipped one and participated once again, but did not participate at all after that. Tim participated the first five times and, although he skipped one conference, he then participated three times. The colors illustrate the likelihood to participate (red more probable, blue less probable).

Researchers at Aalto University, Institute of Physics Belgrade and the Saha Institute in Kolkata have used a computational model to prove that participants make a more favourable decision to participating in scientific conferences the more often they have previously participated in the conference. The likelihood to participate grows regardless of the qualities of the conference, like its location, size or specialization.

“This first result opens up a novel, very rich research field. It will be interesting to study whether  the same behavior can be discovered  in other types of participations as well. Further on, the research agenda can include studying what kinds of actions increase community feelings and thus get people to participate. Our model can be used to research and understand participation phenomena, and perhaps can be used as a basis for new community building methods”, says docent Tomi Kauppinen from the Aalto University.

The researchers collaborated to analyse data from six scientific conferences of different sizes and programmes, held in different locations. The data comprised approximately 100 000 individual participation details covering a period of up to 30 years. The calculations are based on the so called Pólya Urn model, which is a probability theory based model used to make quantitative analysis of large sets of data. The result of the study was recently published in the scientific PLOS ONE journal.

“Modelling revealed that the probability of a researcher participating in the same conference again increases in relation to previous regular participation, and reduces when participation is irregular,’ explains the person responsible for the modelling, Marija Mitrović Dankulov from the Institute of Physics Belgrade. ‘The outcome is a fairly obvious one, but community inclusiveness, the common factor that we perceived, is apparent in all conference participation, and for the first time we were able to show this with the help of modelling.”

The result is in line with the so-called power law, which is a common physical law that is realised in many natural phenomena like the sizes of earthquakes or moon craters. Further on, also man-made phenomena like word frequencies in most languages follow the power law.

Digital information provides physicists and data scientists interested in societal phenomena and other researchers with immense possibilities to model social phenomena. The researchers already have thoughts about future research topics.

“It will be interesting to study whether our model can explain participation patterns of events organized both in physical places and online. Further on, by studying scientific assets that are interconnected according to the Linked Science approach, it could perhaps be possible to find interesting laws about how science works beyond these participation laws”, says Tomi Kauppinen.

Two of the co-authors Marija Mitrović Dankulov and Arnab Chatterjee, Saha Institute, are alumni of Aalto University.

The PLOS ONE publication:  A theoretical model for the associative nature of conference participation, PLOS ONE 11 (2016) e0148528

More information:

Tomi Kauppinen, Project Manager, Docent, Ph.D.
tomi.kauppinen@aalto.fi
Tel: +358504315789
Aalto University School of Science
web: kauppinen.net/tomi
twitter: @LinkedScience

Marija Mitrović Dankulov, Dr.
mitrovic@ipb.ac.rs
Phone: +381 11 3713068
Institute of Physics Belgrade

Opening Reproducible Research project is hiring

Opening Reproducible Research (ORR) project at the Institute for Geoinformatics, University of Münster, Germany has announced the following two open positions (deadline for applying October 15, 2015):

If your institute has open positions related to Open Science or Linked Science (or both!), please share news about them to us via @LinkedScience or tomi.kauppinen@aalto.fi and we will add them to LinkedScience.org/jobs.

 

Five papers accepted to COSIT workshop on Teaching Spatial Thinking

We accepted the following five papers to be presented at the Workshop on Teaching Spatial Thinking from Interdisciplinary Perspectives at COSIT2015:

Announcement by Tomi Kauppinen (co-chair), on behalf of the organizing committee.