An interactive R Shiny application on data.aalto.fi lectures

Tuija Sonkkila

With this tutorial you can build an interactive web application with R that fetches up-to-date lecture data from the data.aalto.fi SPARQL endpoint, renders the result both as a table and a calendar-like chart, and offers a way to download data as iCal calendar events.

Here, to deploy the application over the web, we use a Shiny beta test account hosted by RStudio. For more info, see their Shiny About page.

Shiny

Have a quick look at a basic Shiny application. It comprises of two R source files, server.R and ui.R. The latter defines the user interface – what the web page will look like – whereas server.R contains all the R code to manipulate the data, plus all Shiny server-side logic. Note that instead of ui.R, you can also build the HTML page from the scratch.

Special Shiny features are reactive expressions and values. The tutorial explains reactivity like this:

Shiny web apps are interactive. The input values can change at any time, and the output values need to be updated immediately to reflect those changes. Changing input values will cause the right parts of your R code to be reexecuted, which will in turn cause any changed outputs to be updated.

Query

The idea of our web application is to show what lectures there will be in the near future.

data.aalto.fi offers a SPARQL endpoint at http://data.aalto.fi/sparql/. To query the endpoint from within R, we use the SPARQL client.

If you have used SPARQL web endpoints, the query string looks a bit more verbose. This is because here we will include full URIs in triple patterns. You can also pass prefixes separately like in this tutorial.

The WHERE section that takes a major portion of the whole query, says what data we want to take out of the named GRAPH. The SELECT statement then declares what parts of this data we like to get as a result. Data is assigned to a number of variables.

SELECT DISTINCT ?startDate ?endDate ?Room ?Title ?SchoolCode
WHERE
{ GRAPH <http://data.aalto.fi/id/courses/noppa/> {

Lectures are connected to courses, and although lectures themselves may have a comment or summary, we are only interested in the title of the course, be it in Finnish or English.

?course <http://linkedscience.org/teach/ns#arrangedAt> ?lecture ;
        <http://linkedscience.org/teach/ns#courseTitle> ?title .
FILTER ( (lang(?title) = 'fi') || (lang(?title) = 'en') )
BIND (str(?title) AS ?Title)

From the lecture, we need to know where and when the lecture takes place.

?lecture <http://linkedscience.org/teach/ns#room> ?Room ;
         <http://www.w3.org/2002/12/cal/icaltzd#dtstart> ?startTime ;
         <http://www.w3.org/2002/12/cal/icaltzd#dtend> ?endTime .
BIND (str(?endTime) AS ?endDate)
BIND (str(?startTime) AS ?startDate)

In order to get to the School data, we will go via the department that teaches the course.

?dept <http://purl.org/vocab/aiiso/schema#teaches> ?course;
      <http://purl.org/vocab/aiiso/schema#part_of> ?school .

?school <http://purl.org/vocab/aiiso/schema#organization> ?dept ;
        <http://purl.org/vocab/aiiso/schema#code> ?SchoolCode .

Which of the lectures start today or later this month? To find out, we first split the start time of the lectures into year, month and day, respectively, and bind these values to variables. The now() function returns the current date and time.

BIND (substr(str(?startTime),1,4) AS ?Year)
BIND (substr(str(?startTime),6,2) AS ?Month)
BIND (substr(str(?startTime), 9, 2) AS ?Day)

BIND (now() AS ?currentTime)
BIND (year(?currentTime) AS ?yearNow)
BIND (month(?currentTime) AS ?monthNow)
BIND (day(?currentTime) AS ?dayNow)

Finally, we FILTER out only those triples where it is true that year is as of today, and month and day are the same or bigger than today’s day.

FILTER (
( <http://www.w3.org/2001/XMLSchema#integer>(?Year) = <http://www.w3.org/2001/XMLSchema#integer>(?yearNow) )
&& ( <http://www.w3.org/2001/XMLSchema#integer>(?Month) >= <http://www.w3.org/2001/XMLSchema#integer>(?monthNow) )
&& ( <http://www.w3.org/2001/XMLSchema#integer>(?Day) >= <http://www.w3.org/2001/XMLSchema#integer>(?dayNow) )
)

The query results are stored in a data frame , which is a common R data structure with columns as variables and rows as observations. The variables in the SELECT statement become column names.

res <- SPARQL(url=endpoint, query=q)$results

Working with the data

Next, we need to do some pruning.

For example, there are old acronyms of Aalto University units in the data. With sub() we can replace the old variants with the present ones.

You can refer to individual columns in a data frame with a $ sign followed by the column label. For example, res$SchoolCode says that there is an R object named res which is a data frame, and we want to do something with its column SchoolCode.

res$SchoolCode <- sub("TaiK", "ARTS", res$SchoolCode)
res$SchoolCode <- sub("ECON", "BIZ", res$SchoolCode)
res$SchoolCode <- sub("ERI", "OU", res$SchoolCode)

The encoding needs to be adjusted too so that the Scandinavian characters come up as they should.

We also want to sort the lectures. The most present ones need to come at the top. Let’s order the dataframe by three columns: first by startDate, then by ShoolCode if there are several lectures during the same day, and last by Title if needed.

if ( nrow(res) > 0 ) {
  
  isData <- TRUE
  
  res$SchoolCode <- sub("TaiK", "ARTS", res$SchoolCode)
  res$SchoolCode <- sub("ECON", "BIZ", res$SchoolCode)
  res$SchoolCode <- sub("ERI", "OU", res$SchoolCode)
  
  res$Title <- iconv(res$Title, "UTF-8", "ISO-8859-1")
  res$Room <- iconv(res$Room, "UTF-8", "ISO-8859-1")
  
  res <- res[order(res$startDate, res$SchoolCode, res$Title), ]
}

The bracket notation is a way to tell R how we want to handle a data frame. Inside the brackets, whatever comes before a comma, refers to rows. All that comes after a comma, refers to columns. So here we say: concentrate only to rows. Within rows, make use of order. Give the function three parameters that dictate in which order the rows will be sorted.

The if statement tries to handle exceptions. In those rare cases where there are no lectures on the horizon, the SPARQL query returns no data, and the res data frame will have zero rows. If that happens, there will be no isData variable. Its existence will be tried later on.

Working with dates is not always an easy task, whatever the programming language, and this tutorial is by no means an exception.

We need dates in two places. First, dates are rendered as is into a familiar HTML table. We want to see quickly at a glance, what the near future offers. Second, dates and times of the day will be plotted on a chart. The idea is to build a mockup calendar of the coming events.

For plotting, we will use rCharts. One of its nice features is that you can easily build tooltips. As of writing this, the R version on the hosted site is 2.15.3. The rCharts installation requires the devtools package, but if you try to install devtools from the CRAN, you will get a message saying that the package is not available for this R version. The solution is explained here. In short, download and install devtools 1.1, and if you get errors about dependencies (as I did with httr and whisker), install those packages first.

First we need to ask today’s date from the system that runs the app.

td <- Sys.Date()

Now we can start to make comparisons between today an the lecture data returned from data.aalto.fi.

Because we have an interactive app, the user can choose different number of days to look by dragging the slider. This means that the number of days is reactive. How does the app know what the use has done? In the file ui.R, we define that the inputID days will store this value.

sliderInput(inputId = "days",
                label = "Number of days from now on:",
                min = 40,
                max = 60,
                value = 40,
                step = 10),

In server.R, we then use input$days as an expression in switch. Depending on the value of input$days, one of the alternatives will be chosen. For example, if the user likes to see a little more than a month ahead from now, the value will be 40. The switch says that if the input is 40, take those rows of the res dataframe where the startDate variable is less or equal to today’s date plus 40. Because the startDate is just characters, we have to convert it first to a Date object.

When we put the switch construction inside reactive, it becomes a reactive Shiny expression, and sliderValues a reactive value. However, it returns NULL in those cases where there are no lecture data and thus the isData variable does not exist.

if ( isData ) {

   sliderValues <- reactive({
     
     switch(as.character(input$days),
            "40" = res[as.Date(sub("T", " ", substr(res$startDate, 1, 19))) <= td+40, ],
            "50" = res[as.Date(sub("T", " ", substr(res$startDate, 1, 19))) <= td+50, ],
            "60" = res[as.Date(sub("T", " ", substr(res$startDate, 1, 19))) <= td+60, ])
     
   }) } else {
     
    return(NULL)
    
   }

Whenever the user changes the position of the slider, the reactive expression fires, the reactive value changes – and the table is rendered anew.

In the user interface file ui.R, we need to define where the output(s) should go. Here we say, that both the chart and the table should be outputted in the same mainPanel area, the chart on the top. With br() we add some extra space between them.

mainPanel(

    showOutput("chart", "polycharts"),

    br(),

    tableOutput("view")

  )

Next we have to add the output logic to the server side.

In server.R, we define that the sliderValues reactive value will be rendered as a table to the output element on the web page that has the id view. By default, the table will have row numbers. We can suppress them by passing an include.rownames parameter with the boolean value FALSE.

output$view <- renderTable({

    sliderValues()

  }, include.rownames = FALSE)

Chart

Rendering the chart follows the same type of logic than with the table. Inside renderChart we construct the chart data, and then pass it in p1 to the output element with an id chart.

We cannot modify the reactive output value as such. Therefore, if we want to do some changes like adding columns, the first thing to do is to take a copy of the output.

sv <- sliderValues()

We will add a couple of columns (variables) to this new sv data frame: DaystartsAt and endsAt. The first will contain the X axis values, the second the Y axis values, and the last one will be used in tooltips.

As we saw earlier, the startDate and endDate variables are strings. They also contain extra information that prevent us – or at least me! – from easily converting the strings to proper R date/time data structures. Therefore, we will simply tidy them up with substr() and sub(), and replace the old values. From now on, we can use the new values as the base for further conversions.

Disclaimer: there certainly are means to convert ISO 8601 to POSIX* in a sensible way, I just don’t yet know how. You might.

sv$startDate <- sub("T", " ", substr(sv$startDate, 1, 19))
sv$endDate <- sub("T", " ", substr(sv$endDate, 1, 19))

Setting up the X axis of the chart with date/time value is a bit tricky. Thanks to the active Github repository users of rCharts, the solution proved to be twofold: first, we need to convert the axis values to characters. Second, to get the right amount of tick marks, we have to define the min and max dates in unix time format – that is, in seconds since 1970, 1 Jan – and calculate the number of tickmarks.

From the string format, the conversion to dates goes first via as.Date. To get to unix time, there are two more conversion steps.

Fore more info about POSIX*, see eg. this explanation on Stack Overflow.

sv$Day <- as.character(as.Date(sv$startDate))

Because the data frame was initially sorted by the day, the min date is on the first row of the data frame sv, in the column startDate. We can refer to a certain row/column combination in a data frame by putting the row number in square brackets at the end like this sv$startDate[1]. For the same sorting reason, the max date will be on the last row. nrow() returns the number of all rows in a data frame, so we can use that construction inside the brackets as a shorthand.

The Y axis is more straightforward. In our application, we only need the time when the lecture starts, but we like to render it in a certain format. format() does just that, given the data is in POSIX*.

Note that here our target is time, not date. Therefore, we cannot go via as.Date() because that would loose the time information altogether.

sv$startsAt <- format(as.POSIXct(sv$startDate), format="%H:%M")

The tooltip is defined as a JavaScript function wrapped in a string and stored in a variable. Column names from the data frame are added to the item element with the dot notation.

Try to keep the number of lines in minimum. Based on trial and error, four lines seems to be already too much. Also, write the whole string without any line breaks.

mytooltip <- "function(item){return item.Title + '\n' + 'From: ' + item.startsAt + ' To: ' + item.endsAt + '\n' + 'In: ' + item.Room}"

The chart data is now ready for plotting. Within rPlot() we define the following: X and Y axes (Day, startAt); data frame (sv); type of the plotting symbol (point); tooltip text (mytooltip).

Within lists in set and guides, you can pass extra information to the plot function, like in here, what it should do with the X axis.

p1$guides(
      x = list(
        min = mindate, 
        max = maxdate,
        numticks = nrofdays
      )
    )
    

Subset table in the sidebar

Whenever the user changes the position of the slider, the selection of days in sliderValues() changes. We want this reactive value to be reflected on another table. It will present a new interactive element: a filter in the shape of a dropdown menu. We presume that the user might be interested in lectures of one particular School, and would want to download its calendar events only.

In ui.R, we write the School names as a character vector, and define that the selection will be stored in the variable school. The table will be rendered below the menu, after an empty line break, and the output variable in this case is subset.

selectInput("school",
            "Filter by school:",
            choices = c("All",
                        "ARTS",
                        "BIZ",
                        "CHEM",
                        "ELEC",
                        "ENG",
                        "SCI",
                        "Open University (OU)")),
br(),

tableOutput("subset"),

br(),

downloadButton('downloadData', 'Download as iCal')

In the default Shiny HTML page layout, as the name implies, the sidebarPanel is not as wide as the mainPanel, making horizontal space a bit scarce. Therefore, we limit the number of columns of the new table. To do this, we store, in the character vector drops, the name of those columns we do not want to be outputted. Then, in switch(), we define – inside the square brackets, after the comma sign – that all other columns of the reactive data frame will be rendered except those whose name is among the alternatives in drops. The exclamation sign makes a negation of the expression that follows it, and %in% does pattern matching.

You might think as I first did, that as the base for this new table, you can simply use values from the output$subset. However, if you try to do that, you will get an error Reading objects from shinyoutput object not allowed.

The solution is to change the server logic a bit. When we wrap the selection inside a function – let’s name it currentSelection – we can use the table in rendering.

currentSelection <- function() {
    
    drops <- c("startDate", "endDate", "Room")
    
    switch(input$school,
           "All"  = sliderValues()[ ,!(names(sliderValues()) %in% drops)] ,
           "ARTS" = sliderValues()[sliderValues()$SchoolCode == 'ARTS', !(names(sliderValues()) %in% drops)],
           "BIZ"  = sliderValues()[sliderValues()$SchoolCode == 'BIZ', !(names(sliderValues()) %in% drops)],
           "CHEM" = sliderValues()[sliderValues()$SchoolCode == 'CHEM', !(names(sliderValues()) %in% drops)],
           "ELEC" = sliderValues()[sliderValues()$SchoolCode == 'ELEC', !(names(sliderValues()) %in% drops)],
           "ENG"  = sliderValues()[sliderValues()$SchoolCode == 'ENG', !(names(sliderValues()) %in% drops)],
           "SCI"  = sliderValues()[sliderValues()$SchoolCode == 'SCI', !(names(sliderValues()) %in% drops)],
           "Open University (OU)" = sliderValues()[sliderValues()$SchoolCode == 'OU', !(names(sliderValues()) %in% drops)])     
    
  }

However, because we will need the start/end times of the lectures in the calendar download, let’s make another function currentSelectionFull where these are intact and not dropped. Note that there may well be more elegant solutions, see the Reactivity overview chapter of the Shiny tutorial.

Calendar events

As a final part of the app, we’ll define the calendar download file.

To download data from a Shiny web app, we use downloadHandler like the tutorial explains. It takes two arguments, filename and content.

With sink(file) we define into which file the standard output of R will go from now on, until sink() says when to stop. cat() prints to the standard output, so the lines get diverted to a file on disk.

Inside a for loop, we go over all rows of the subsetted table and construct proper start and end times for the lectures. To simplify things here, we use the timezone of Europe/Helsinki.

You are already familiar with sub()gsub() is used in a similar manner, only globally, meaning it replaces all occasions of the expression. paste() concatenates strings and glues them together with the value of the sep parameter. Here, with nothing.

output$downloadData <- downloadHandler(
    filename = "lectures.ics",
    content = function(file) {
      sink(file) 
      cat("BEGIN:VCALENDAR\n")
      ...
      for (i in 1:nrow(currentSelectionFull())) {
        cat("BEGIN:VEVENT\n")
        cat(paste("SUMMARY:", currentSelectionFull()$Title[i], " (" , currentSelectionFull()$SchoolCode[i], ") ", currentSelectionFull()$Room[i], "\n", sep=""))
        cat("DTSTAMP:20130530T180000Z\n")
        cat(paste("DTSTART;TZID=Europe/Helsinki:", paste(sub(':','',gsub('-','', substr(currentSelectionFull()$startDate[i], 1, 16))), '00Z', sep="")), "\n", sep="")
        cat(paste("DTEND;TZID=Europe/Helsinki:", paste(sub(':','',gsub('-', '', substr(currentSelectionFull()$endDate[i], 1, 16))), '00Z', sep="")), "\n", sep="")
        cat("END:VEVENT\n")          
      }
     cat("END:VCALENDAR")      
      sink()               
    }                        
  )

Result

The result should look like the online Shiny application and as in the screenshot below.

R Shiny application for exploring course data of the Aalto University

To-do

Our Shiny application is more of a proof-of-concept than real life. Some possible enhancements:

  • a check-box on every table row, so that the user can choose what lecture calendar data to download.
  • the chart could be more tolerant to lectures that start simultaneously; now only one of them is plotted.
  • during busy winter terms, there are so many lectures per day that the present HTML layout will become awkward to use and data processing will take more time too. As a remedy, we could define a shorter time span, say 5 days max.
  • a slider with a range rather than min and max would give the user more freedom to choose.
  • as a general design principle, it would maybe be better to hide the big table on a separate tab and let the chart take a more prominent role. Note however, that in the present R, rCharts and Shiny environment, rCharts does not seem to be able to plot on a Shiny tabset

Leave a Reply