To get started, install plotscaper
with:
Next, open up RStudio and run the following code:
library(plotscaper)
names(airquality) <- c("ozone", "solar radiation", "wind",
"temperature", "month", "day")
create_schema(airquality) |>
add_scatterplot(c("solar radiation", "ozone")) |>
add_barplot(c("day", "ozone"), list(reducer = "max")) |>
add_histogram(c("wind")) |>
add_pcoords(names(airquality)[1:4]) |>
render()
#> Warning in create_schema(airquality): Removed 42 rows with missing values from
#> the data
Try clicking and dragging to select some points in the scatterplot. You should see the corresponding cases get highlighted within the barplot!
There are many other ways interacting with plotscaper
figures, including:
Importantly, many of these interactions can either be done manually, by interacting with the figure (client-side), or programmatically, by calling functions from inside a running R session (“server-side”).
To take full advantage of plotscaper
, you need to
understand two core functions: create_schema
and
render
. We’ll explore this on the example of the
penguins
data set (Horst, Hill, and
Gorman 2020).
The create_schema
function initializes a schema
- a sort of a recipe which we can use to define the figure. Like in
other data visualization packages, we build up the schema step-by-step,
by calling functions that append additional information to it:
library(palmerpenguins)
penguins <- na.omit(penguins) # missing data is not supported yet, unfortunately
names(penguins) <- names(penguins) |> gsub("(_mm|_g)", "", x = _)
schema <- create_schema(penguins) |>
add_scatterplot(c("body_mass", "flipper_length")) |>
add_barplot(c("species")) |>
add_fluctplot(c("species", "sex")) |>
add_histogram(c("bill_length"))
The schema then is really just a list of messages:
schema
#> plotscaper schema:
#> add-plot { type: scatter, variables: c("body_mass", "flipper_length") }
#> add-plot { type: bar, variables: species }
#> add-plot { type: fluct, variables: c("species", "sex") }
#> add-plot { type: histo, variables: bill_length }
Typically, you’ll use the schema to add more plots. However, you can also do other things such as select cases or set axis limits:
schema <- schema |>
assign_cases(which(penguins$species == "Adelie")) |>
set_scale("plot1", "x", min = 0) |>
set_scale("plot1", "size", max = 5)
schema
#> plotscaper schema:
#> add-plot { type: scatter, variables: c("body_mass", "flipper_length") }
#> add-plot { type: bar, variables: species }
#> add-plot { type: fluct, variables: c("species", "sex") }
#> add-plot { type: histo, variables: bill_length }
#> set-assigned { cases: c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121,
#> 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145), group: 1 }
#> set-scale { scale: x, min: 0 }
#> set-scale { scale: size, max: 5 }
In fact, most of the things you can do with the schema you can also do interactively, and vice versa. We will see that in the next section.
When it finally comes a time to turn the recipe into an actual
interactive figure or scene, we can do so by calling the
render
function:
The render
function takes the schema and turns it into a
htmlwidgets
widget that we can interact with in RStudio
viewer or embed in RMarkdown documents. However, it can also do a little
bit more than that.
If you’re following along in RStudio, try this:
You should see some cases of the data get highlighted. Importantly, notice that the figure does not get re-rendered!
Whereas the create_schema
merely initializes a recipe
which is just a regular R object (a list
), the
render
function renders the recipe into an
htmlwidgets
widget, and, if inside a running R
session, it also launches an httpuv
server for direct
communication between the R session and the figure.
We can use (mostly) the same functions to modify schema (created by
call to create_schema
) and scene (created by call to
render
). The difference is that while calling functions
with a schema as the first argument merely appends the corresponding
message to the recipe, calling functions with scene immediately sends a
message to the figure via the server.
That is, while the following code:
causes two full re-renders, the following code:
causes only one render (if inside a running R session).
In other words, plotscaper
functions such as
add_*
, set_selected
, and
set_scale
behave differently based on whether we call them
on scene or schema:
The second method only works if we are in an interactive R session because we need a server to communicate with the figure.
This is why running the following code (while the document you’re reading is being knitted) throws an error:
interactive()
#> [1] FALSE
scene |> select_cases(1:10)
#> Error: You can only send messages to scene from within an interactive R session
When we knit an RMarkdown document, we generate a static HTML file.
By default, we cannot communicate with this file since it’s just a big
blob of HTML, CSS, and JavaScript. To only way to change the file is to
rewrite it. Thus, in RMarkdown, we can really only really write the
schema and render
- we can’t do anything with the figure
once it’s been rendered.
In contrast, inside an interactive R session, we can launch a server that will listen and respond to messages from the R session and send them to the figure (and also send messages from the figure back to the R session - this is done via WebSockets).
This also means there are some functions that it only make sense to use inside a running R session. For example, the following functions query the selection status of the figure:
That means that you can, for example, render a figure, select some
cases of the data using a mouse, and then call
scene |> selected_cases()
to get the row indices of
those cases. This doesn’t really make sense when writing a schema - the
only way to select cases is via an explicit call to
select_cases
, so we would be querying cases that we have
already specified before.
Likewise, there are functions such as pop_plot
and
remove_plot
which can be used to remove plots from scene.
These don’t really make sense to use while writing a schema - if you
know that you don’t want a specific plot, you can just delete the line
which adds it to the recipe. However, they do make sense when
interacting with a scene inside a running R session. Perhaps you found
some interesting trend in your data and want to see if it holds in other
plots, but you’re running out of space in the viewer - you can
pop_plot
to remove the last plot and add_*
to
add a new plot, all the while keeping the rest of the state of the
figure intact!
Sometimes, when rendering multiple figures quickly, the figure may fail to connect to the server or the server may failed to be launched altogether. Often, the cause is that the default port number is already taken. I will try to fix this bug in the future, however, for the time being, you can always fix it by running either of the following functions:
or:
Since the schema is lazy, we can use it to generate figures programmatically. For example, here’s how we could create an interactive scatterplot matrix (SPLOM) of the penguins data set:
schema <- create_schema(penguins)
keys <- names(penguins)[4:6]
# Loop through combinations of columns
for (i in seq_along(keys)) {
for (j in seq_along(keys)) {
# Add a scatterplot if row & column no.'s are different
if (i != j) schema <- schema |> add_scatterplot(c(keys[i], keys[j]))
# Add a histogram if row & column no.'s match
else schema <- schema |> add_histogram(c(keys[i]))
}
}
# Options to make the plots fit better within the available space
opts <- list(size = 5, axis_title_size = 0.75, axis_label_size = 0.5)
schema |> render(options = opts)
Just to re-iterate the point from the previous section, we could also do this interactively, by writing out the calls to add the plots ourselves:
scene <- create_schema(penguins) |> render(opts)
scene |> add_histogram(c("bill_depth"))
scene |> add_scatterplot(c("bill_depth", "flipper_length"))
scene |> add_scatterplot(c("bill_depth", "body_mass"))
...
However, having to do this for all nine plots might quickly get tedious.
As such, there are different reasons why you might want to do something using a scene, a schema, or some combination of both. If you’re writing an RMarkdown document, you don’t really have a choice - you can’t do anything to a scene once it’s rendered.
Inside an interactive R session, you have more options. You can
decide you first want to create a highly customized schema and only then
start interacting with the figure live. Or you can just immediately fire
off scene <- create_schema(data) |> render()
and do
everything interactively.
Each approach has some advantages and some disadvantages. With the
schema way, you can always re-create most of the state, so if you mess
up and need to go back, you can just print scene
and you’re
good to go. With the interactive scene way, you’re more flexible and you
have the immediate feedback in seeing how the figure changes in front of
you, however, it may be more difficult to recover some state if there
are many intermediate steps.