Introduction to plotscaper

Try clicking and dragging to select some points in the scatterplot. You should see the corresponding cases get highlighted within the barplot!

Importantly, many of these interactions can either be done manually, by interacting with the figure (client-side), or programmatically, by calling functions from inside a running R session (“server-side”).

The scene and the schema

To take full advantage of plotscaper, you need to understand two core functions: create_schema and render. We’ll explore this on the example of the penguins data set (Horst, Hill, and Gorman 2020).

Creating a schema

The create_schema function initializes a schema - a sort of a recipe which we can use to define the figure. Like in other data visualization packages, we build up the schema step-by-step, by calling functions that append additional information to it:

library(palmerpenguins)

penguins <- na.omit(penguins) # missing data is not supported yet, unfortunately
names(penguins) <- names(penguins) |> gsub("(_mm|_g)", "", x = _)

schema <- create_schema(penguins) |> 
  add_scatterplot(c("body_mass", "flipper_length")) |> 
  add_barplot(c("species")) |>
  add_fluctplot(c("species", "sex")) |>
  add_histogram(c("bill_length"))

The schema then is really just a list of messages:

schema
#> plotscaper schema:
#> add-plot { type: scatter, variables: c("body_mass", "flipper_length") }
#> add-plot { type: bar, variables: species }
#> add-plot { type: fluct, variables: c("species", "sex") }
#> add-plot { type: histo, variables: bill_length }

Typically, you’ll use the schema to add more plots. However, you can also do other things such as select cases or set axis limits:

schema <- schema |>
  assign_cases(which(penguins$species == "Adelie")) |>
  set_scale("plot1", "x", min = 0) |>
  set_scale("plot1", "size", max = 5)

schema
#> plotscaper schema:
#> add-plot { type: scatter, variables: c("body_mass", "flipper_length") }
#> add-plot { type: bar, variables: species }
#> add-plot { type: fluct, variables: c("species", "sex") }
#> add-plot { type: histo, variables: bill_length }
#> set-assigned { cases: c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 
#> 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145), group: 1 }
#> set-scale { scale: x, min: 0 }
#> set-scale { scale: size, max: 5 }

In fact, most of the things you can do with the schema you can also do interactively, and vice versa. We will see that in the next section.

Rendering a scene

When it finally comes a time to turn the recipe into an actual interactive figure or scene, we can do so by calling the render function:

scene <- schema |> render()
scene

The render function takes the schema and turns it into a htmlwidgets widget that we can interact with in RStudio viewer or embed in RMarkdown documents. However, it can also do a little bit more than that.

If you’re following along in RStudio, try this:

scene |> select_cases(1:10)

You should see some cases of the data get highlighted. Importantly, notice that the figure does not get re-rendered!

The difference between scene and schema

Whereas the create_schema merely initializes a recipe which is just a regular R object (a list), the render function renders the recipe into an htmlwidgets widget, and, if inside a running R session, it also launches an httpuv server for direct communication between the R session and the figure.

We can use (mostly) the same functions to modify schema (created by call to create_schema) and scene (created by call to render). The difference is that while calling functions with a schema as the first argument merely appends the corresponding message to the recipe, calling functions with scene immediately sends a message to the figure via the server.

That is, while the following code:

# NOT RUN
scene <- schema |> select_cases(20:30) |> render()
scene
scene

causes two full re-renders, the following code:

# NOT RUN
scene <- schema |> render()
scene |> select_cases(20:30)
scene |> select_cases(20:30)

causes only one render (if inside a running R session).

In other words, plotscaper functions such as add_*, set_selected, and set_scale behave differently based on whether we call them on scene or schema:

Schema: calling a function lazily appends to a list of messages that will get executed in the future, when the schema is rendered
Scene: calling a function immediately sends a message to the scene and mutates its state

The second method only works if we are in an interactive R session because we need a server to communicate with the figure.

This is why running the following code (while the document you’re reading is being knitted) throws an error:

interactive()
#> [1] FALSE
scene |> select_cases(1:10)
#> Error: You can only send messages to scene from within an interactive R session

When we knit an RMarkdown document, we generate a static HTML file. By default, we cannot communicate with this file since it’s just a big blob of HTML, CSS, and JavaScript. To only way to change the file is to rewrite it. Thus, in RMarkdown, we can really only really write the schema and render - we can’t do anything with the figure once it’s been rendered.

In contrast, inside an interactive R session, we can launch a server that will listen and respond to messages from the R session and send them to the figure (and also send messages from the figure back to the R session - this is done via WebSockets).

This also means there are some functions that it only make sense to use inside a running R session. For example, the following functions query the selection status of the figure:

scene |> selected_cases()
scene |> assigned_cases()

That means that you can, for example, render a figure, select some cases of the data using a mouse, and then call scene |> selected_cases() to get the row indices of those cases. This doesn’t really make sense when writing a schema - the only way to select cases is via an explicit call to select_cases, so we would be querying cases that we have already specified before.

Likewise, there are functions such as pop_plot and remove_plot which can be used to remove plots from scene. These don’t really make sense to use while writing a schema - if you know that you don’t want a specific plot, you can just delete the line which adds it to the recipe. However, they do make sense when interacting with a scene inside a running R session. Perhaps you found some interesting trend in your data and want to see if it holds in other plots, but you’re running out of space in the viewer - you can pop_plot to remove the last plot and add_* to add a new plot, all the while keeping the rest of the state of the figure intact!

Troubleshooting the scene

Sometimes, when rendering multiple figures quickly, the figure may fail to connect to the server or the server may failed to be launched altogether. Often, the cause is that the default port number is already taken. I will try to fix this bug in the future, however, for the time being, you can always fix it by running either of the following functions:

start_server(random_port = TRUE) # Starts a server on a new random port

or:

httpuv::stopAllServers() # Stops all servers, now you should be able to relaunch the server

Bonus: Scatterplot matrix

Since the schema is lazy, we can use it to generate figures programmatically. For example, here’s how we could create an interactive scatterplot matrix (SPLOM) of the penguins data set:


schema <- create_schema(penguins)
keys <- names(penguins)[4:6]

# Loop through combinations of columns
for (i in seq_along(keys)) {
  for (j in seq_along(keys)) {
    # Add a scatterplot if row & column no.'s are different
    if (i != j) schema <- schema |> add_scatterplot(c(keys[i], keys[j]))
    # Add a histogram if row & column no.'s match
    else schema <- schema |> add_histogram(c(keys[i])) 
  }
}

# Options to make the plots fit better within the available space
opts <- list(size = 5, axis_title_size = 0.75, axis_label_size = 0.5)
schema |> render(options = opts)

Just to re-iterate the point from the previous section, we could also do this interactively, by writing out the calls to add the plots ourselves:

scene <- create_schema(penguins) |> render(opts)

scene |> add_histogram(c("bill_depth"))
scene |> add_scatterplot(c("bill_depth", "flipper_length"))
scene |> add_scatterplot(c("bill_depth", "body_mass"))
...

However, having to do this for all nine plots might quickly get tedious.

As such, there are different reasons why you might want to do something using a scene, a schema, or some combination of both. If you’re writing an RMarkdown document, you don’t really have a choice - you can’t do anything to a scene once it’s rendered.

Inside an interactive R session, you have more options. You can decide you first want to create a highly customized schema and only then start interacting with the figure live. Or you can just immediately fire off scene <- create_schema(data) |> render() and do everything interactively.

Each approach has some advantages and some disadvantages. With the schema way, you can always re-create most of the state, so if you mess up and need to go back, you can just print scene and you’re good to go. With the interactive scene way, you’re more flexible and you have the immediate feedback in seeing how the figure changes in front of you, however, it may be more difficult to recover some state if there are many intermediate steps.