The ggplot
package is one of the most powerful and
widely used tool-kits available for visualizing data in R, and it’s easy
to understand why. Its well-crafted grammar of layers, aesthetics,
scales, and facets allows for extraordinary versatility and power but
with remarkable intuitiveness and delightfully tidy code. It is
personally my nearly universal go-to for generating visual expressions
of data; but when plotting drug combination data, it has some
understandable friction points. Consider the following example:
concentrations <- c(0,2^(-3:3))
surface <- data.frame(
concA = rep(rep(concentrations,each=length(concentrations)),each=3),
concB = rep(rep(concentrations,times=length(concentrations)),each=3),
replicate = rep(c(1,2,3),times=(length(concentrations)^2))
)
surface$actual <- evalBraidModel(
surface$concA,
surface$concB,
c(1, 1, 3, 3, 2, 0, 100, 100, 100)
)
surface$measure <- surface$actual + rnorm(nrow(surface),sd=7)
head(surface, 12)
#> concA concB replicate actual measure
#> 1 0 0.000 1 0.0000000 8.5931890
#> 2 0 0.000 2 0.0000000 -5.7543962
#> 3 0 0.000 3 0.0000000 12.9849676
#> 4 0 0.125 1 0.1949318 5.8142327
#> 5 0 0.125 2 0.1949318 -0.5981691
#> 6 0 0.125 3 0.1949318 -11.0448984
#> 7 0 0.250 1 1.5384615 -6.0873534
#> 8 0 0.250 2 1.5384615 4.1486735
#> 9 0 0.250 3 1.5384615 -5.6509026
#> 10 0 0.500 1 11.1111111 21.6085144
#> 11 0 0.500 2 11.1111111 -4.4634188
#> 12 0 0.500 3 11.1111111 1.2742389
This synthetic dataset reflects a a fairly typical combination study layout: both drugs have been tested at a range of serially diluted concentrations, and combined doses have been laid out in a a checkerboard so that every combination of concentration of the first drug (including 0) and the second drug (including 0) has been tested. Furthermore, each combined dose is tested in triplicate. The measured response surface has been simulated using a BRAID response surface model with a synergistic \(\kappa\) value of 2, but any smoothly varying, noisily measured function of two doses would work just as well.
How might we visualize such a response surface using
ggplot2
? A 3-D plot is right out: ggplot2
has
no standard support for such plots, nor should it, as 3-D plots are
notoriously poor at conveying accurate quantitative information. A more
straightforward approach would be to plot our measured effect as a
function of one concentration, conditioned on the level of the other.
For example:
ggplot(surface,aes(x=concA,y=measure,colour=factor(concB)))+
geom_point()+
stat_summary(geom="line",fun.data=mean_se)+
scale_x_log10()+
labs(x="Drug A",y="Effect",colour="Drug B")
This view is quite effective, and gives a clear sense of how the presence of differing amounts of drug B impact the behavior of drug B. But this plotting approach necessarily differentiates between the two drugs in how they are visualized, and is ineffective at showing the shape of their interaction.
The most intuitive way to present combination data is with the use of
a response surface heatmap: plot the two doses as the x- and y-
dimensions, and visualize their effect using a suitable color scale.
Unfortunately, our dataset includes multiple measurements for each
combination; were we to plot these using something like
geom_tile()
each replicate would be plotted over the
others, so that only the last replicates were visible.
ggplot2
does contain one way to address this
out-of-the-box, [stat_summary_2d()], but it doesn’t behave exactly as
we’d like:
ggplot(surface, aes(x=concA,y=concB))+
stat_summary_2d(aes(z=measure), fun="mean")+
scale_x_log10()+
scale_y_log10()+
scale_fill_distiller(palette="RdYlBu")+
coord_equal()+
labs(x="Drug A",y="Drug B",fill="Effect")
Not only are our evenly spaced concentrations pairs reduced to
awkwardly rounded mini-tiles, all measurements where either drug is zero
have been removed from the plot altogether. This is an intrinsic
property of nearly all ggplot2
stats and geoms: if a
transformed coordinate is infinite, it is removed from the plot. Yet
plotting serially diluted concentrations on a logarithmic scale is
incredibly intuitive, making comparison of non-zero concentrations and
zero concentrations frustratingly difficult. It is for these reasons
that we developed geom_braid()
:
ggplot(surface,aes(x=concA,y=concB))+
geom_braid(aes(fill=measure))+
scale_x_log10()+
scale_y_log10()+
scale_fill_distiller(palette="RdYlBu")+
coord_equal()+
labs(x="Drug A",y="Drug B",fill="Effect")
Like many ggplot2
extensions, geom_braid()
is in reality a Stat
, performing some useful preprocessing
of the data before passing it off to the true geom,
geom_tile()
. Duplicate concentration pairs are identified,
and averaged to give the aggregate value at each unique pair. Widths
between doses pairs are guessed or provided. Coordinates which, after
being transformed, would be infinite are instead offset from the main
body of measurements, ensuring they will be plotted along with the data.
The result is a simple and intuitive tool for quickly examining combined
action data.
As a ggplot2
extension, geom_braid()
leverages all of the customization and flexibility ggplot
objects traditionally afford, including integration with other layers,
ggplot
fill scales, and faceting:
ggplot(surface,aes(x=concA,y=concB))+
geom_braid(aes(fill=measure))+
geom_point(colour="black")+
scale_x_log10("Drug A")+
scale_y_log10("Drug B")+
scale_fill_viridis_c("Effect",option="A")+
coord_equal()+
facet_wrap(vars(replicate))
Widths and heights of tiles can (and generally should) be generated automatically, but can be passed to the stat as additional aesthetics if desired. Note that widths and heights should be expressed in the transformed coordinate space:
surface$tilewidth <- log10(2)*0.9
surface$tilewidth[surface$concA==0] <- log10(2)/2
surface$tileheight <- log10(2)*0.9
surface$tileheight[surface$concB==0] <- log10(2)/2
ggplot(surface,aes(x=concA,y=concB))+
geom_braid(aes(fill=measure,width=tilewidth,height=tileheight),space=3)+
scale_x_log10("Drug A")+
scale_y_log10("Drug B")+
scale_fill_distiller("Effect",palette="RdYlBu")+
coord_equal()
geom_braid()
is an effective tool for rendering classic
checkerboard layouts, but there is no guarantee that data will be laid
out so cleanly. Experiments might only measure a subset of a traditional
checkerboard, or might select points on an even denser arrangement. For
example, the following adjustment simulates an experiment in which
measurements from replicates 2 and 3 have increased the concentrations
of drug A and drug B respectively. This removes the traditional
“triplicate” approach in favor of a more varied, full-coverage
method:
glassSurface <- surface
glassSurface$concA[glassSurface$replicate==2] <-
glassSurface$concA[glassSurface$replicate==2]*1.25
glassSurface$concB[glassSurface$replicate==3] <-
glassSurface$concB[glassSurface$replicate==3]*1.25
glassSurface$actual <- evalBraidModel(
glassSurface$concA,
glassSurface$concB,
c(1, 1, 3, 3, -0.5, 0, 60, 100, 100)
)
glassSurface$measure <- glassSurface$actual+rnorm(nrow(glassSurface),sd=7)
head(glassSurface, 12)
#> concA concB replicate actual measure tilewidth tileheight
#> 1 0 0.00000 1 0.0000000 4.002236136 0.150515 0.150515
#> 2 0 0.00000 2 0.0000000 2.784350041 0.150515 0.150515
#> 3 0 0.00000 3 0.0000000 -0.360747637 0.150515 0.150515
#> 4 0 0.12500 1 0.1949318 -7.271100218 0.150515 0.270927
#> 5 0 0.12500 2 0.1949318 -6.950132441 0.150515 0.270927
#> 6 0 0.15625 3 0.3800201 5.750562994 0.150515 0.270927
#> 7 0 0.25000 1 1.5384615 -8.749887049 0.150515 0.270927
#> 8 0 0.25000 2 1.5384615 0.004843671 0.150515 0.270927
#> 9 0 0.31250 3 2.9613836 -0.598251267 0.150515 0.270927
#> 10 0 0.50000 1 11.1111111 16.965812489 0.150515 0.270927
#> 11 0 0.50000 2 11.1111111 -1.540629375 0.150515 0.270927
#> 12 0 0.62500 3 19.6232339 2.598331077 0.150515 0.270927
Due to its irregular spacing, plotting glassSurface
with
geom_braid()
produces a very unsatisfactory plot:
ggplot(glassSurface,aes(x=concA,y=concB))+
geom_braid(aes(fill=measure))+
geom_point(colour="black")+
scale_x_log10("Drug A")+
scale_y_log10("Drug B")+
scale_fill_distiller("Effect",palette="RdYlBu")+
coord_equal()
While this plot is technically correct, it fails to fill the
space in the way a response surface should. When dealing with such
irregular sampling, a better tool is geom_braid_glass()
which produces what we call a “stained glass” plot:
ggplot(glassSurface,aes(x=concA,y=concB))+
geom_braid_glass(aes(fill=measure))+
geom_point(colour="black")+
scale_x_log10("Drug A")+
scale_y_log10("Drug B")+
scale_fill_distiller("Effect",palette="RdYlBu")+
coord_equal()
In a stained glass plot, every point within the bounds of the plotted values is colored according to the value of the measured dose pair nearest to it, producing a mosaic of Voronoi cells that cover the full space. Values in the margins of the plot are given a height or width according the specified or inferred aesthetic, but the boundaries between them are again bisecting Voronoi boundaries.
The ability to customize the width of the resulting tiles, particularly in the margins, can be even more valuable in these plots, where the width and height default to the smallest spacing between distinct values:
While the discrete heatmaps of geom_braid
and
geom_braid_glass
are the most direct ways to visualize
combined action data, the harsh polygonal edges introduced can sometimes
mask the more important variations in shape and structure.
braidReports
also includes a ggplot
geom for
rendering smoothed surfaces, unsurprisingly named
geom_braid_smooth
:
ggplot(surface,aes(x=concA,y=concB))+
geom_braid_smooth(aes(fill=measure))+
scale_x_log10()+
scale_y_log10()+
scale_fill_distiller(palette="RdYlBu")+
coord_equal()+
labs(x="Drug A",y="Drug B",fill="Effect")
geom_braid_smooth
interpolates a regular grid of values
from the original (potentially irregular) data using a two-dimensional
Gaussian kernel, producing a smoothed surface that generally hews
extremely close to measured values at their respective points, but
produces intuitive, smoothly varying values in between them. It can be
run on both regularly laid-out checkerboard data and more irregularly
spaced measurements, though in the latter case specifying the smoothing
width and height explicitly using the width
and
height
aesthetics is often advisable:
ggplot(glassSurface,aes(x=concA,y=concB))+
geom_braid_smooth(aes(fill=measure))+
geom_point(colour="black")+
scale_x_log10("Drug A")+
scale_y_log10("Drug B")+
scale_fill_distiller("Effect",palette="RdYlBu")+
coord_equal()
Heatmaps are an intuitive and effective way of depicting the results
and shape of a combined response surface, but they still fall short when
it comes to quantitative depiction. Even the best-designed colormap
still carries considerable imprecision, making it difficult to perceive
the values and ranges at which particular numerical values are reached.
One effective tool for this task is the contour map, which markes the
boundaries of dose space at which a given effect level is crossed. To
support such plots, we have included geom_braid_contour()
which uses the same smoothing techniques as
geom_braid_smooth()
to produces an array of x-, y-, and
z-values for the built in ggplot
stat,
stat_contour()
. This allows us to visualize both the
overall shape of the surface and the boundaries of particular
effect spaces:
ggplot(surface,aes(x=concA,y=concB))+
geom_braid_smooth(aes(fill=measure))+
geom_braid_contour(aes(z=measure),breaks=10*(1:9),colour="black",linetype=2)+
scale_x_log10()+
scale_y_log10()+
scale_fill_distiller(palette="RdYlBu")+
coord_equal()+
labs(x="Drug A",y="Drug B",fill="Effect")
Note that geom_braid_contour()
uses the smoothed and
interpolated values like those produced by
geom_braid_smooth()
rather than the discrete values plotted
by geom_braid()
or geom_braid_glass()
. There
are two reasons for this: first, the underlying Stat
,
stat_contour()
requires that the data plotted be laid out
as a regular grid for it to perform its own interpolation.
Second, contours linearly interpolated between more parsely sampled data
are often disjointed and jagged, and carry much less information than
more smoothly interpolated values.
As with geom_braid_smooth()
,
geom_braid_contour()
can be applied to both regular and
irregular data, but with irregular data more care must be taken with the
smoothing widths and heights:
ggplot(glassSurface,aes(x=concA,y=concB))+
geom_braid_smooth(aes(fill=measure))+
geom_point(colour="black")+
geom_braid_contour(aes(z=measure),breaks=10*(1:9),colour="black",linetype=2)+
scale_x_log10("Drug A")+
scale_y_log10("Drug B")+
scale_fill_distiller("Effect",palette="RdYlBu")+
coord_equal()
ggplot(glassSurface,aes(x=concA,y=concB))+
geom_braid_smooth(aes(fill=measure,width=tilewidth,height=tileheight),space=2)+
geom_braid_contour(aes(z=measure,width=tilewidth,height=tileheight),space=2,
breaks=10*(1:9),colour="black",linetype=2)+
scale_x_log10("Drug A")+
scale_y_log10("Drug B")+
scale_fill_distiller("Effect",palette="RdYlBu")+
coord_equal()
It should also be noted that while we have plotted contours and smoothed surfaces together here, this is only to highlight their connection to the underlying interpolated data. BRAID contours can be plotted all on their own which can be quite effective at comparing the results of different surfaces:
surface$type <- "Synergy"
glassSurface$type <- "Antagonism"
allSurface <- rbind(surface,glassSurface)
allSurface$type <- factor(allSurface$type,c("Synergy","Antagonism"))
ggplot(allSurface,aes(x=concA,y=concB,colour=type))+
geom_point()+
geom_braid_contour(aes(z=measure,width=tilewidth,height=tileheight),
breaks=c(50,90), tight=TRUE)+
scale_x_log10()+
scale_y_log10()+
scale_color_brewer("Surface Type",palette="Set1")+
coord_equal()+
labs(x="Drug A",y="Drug B")
Astute ggplot2
observers will note that at several
points throughout this vignette, we have run standard
ggplot2
geoms such as geom_point()
on
logarithmically transformed data that result in infinite (transformed)
values. Ordinarily, this would produce a warning: and in reality, it
produces a warning here as well. We have suppressed warnings for this
vignette, because while the braid
,
braid_glass
, and braid_smooth
geoms all
explicitly handle such infinite transformed values, we have been unable
to find a way to suppress these built in warnings. The warnings result,
not when a stat is running, but before its functions are
handled, and as such are, for the time being, unavoidable. When running
a BRAID plotting function yourself, you will encounter these warnings as
well:
# With warnings enabled...
ggplot(surface,aes(x=concA,y=concB))+
geom_braid(aes(fill=measure))+
scale_x_log10()+
scale_y_log10()+
scale_fill_distiller(palette="RdYlBu")+
coord_equal()+
labs(x="Drug A",y="Drug B",fill="Effect")
#> Warning in scale_x_log10(): log-10 transformation introduced infinite values.
#> Warning in scale_y_log10(): log-10 transformation introduced infinite values.
While we feel that it is an unfortunate reality, we hope that this is a small enough inconvenience that the overall versatility and expressiveness of the BRAID plotting function outweighs it.