Plotting BRAID Surfaces

Introduction

The ggplot package is one of the most powerful and widely used tool-kits available for visualizing data in R, and it’s easy to understand why. Its well-crafted grammar of layers, aesthetics, scales, and facets allows for extraordinary versatility and power but with remarkable intuitiveness and delightfully tidy code. It is personally my nearly universal go-to for generating visual expressions of data; but when plotting drug combination data, it has some understandable friction points. Consider the following example:

concentrations <- c(0,2^(-3:3))
surface <- data.frame(
    concA = rep(rep(concentrations,each=length(concentrations)),each=3),
    concB = rep(rep(concentrations,times=length(concentrations)),each=3),
    replicate = rep(c(1,2,3),times=(length(concentrations)^2))
)
surface$actual <- evalBraidModel(
    surface$concA,
    surface$concB,
    c(1, 1, 3, 3, 2, 0, 100, 100, 100)
)
surface$measure <- surface$actual + rnorm(nrow(surface),sd=7)

head(surface, 12)
#>    concA concB replicate     actual     measure
#> 1      0 0.000         1  0.0000000   8.5931890
#> 2      0 0.000         2  0.0000000  -5.7543962
#> 3      0 0.000         3  0.0000000  12.9849676
#> 4      0 0.125         1  0.1949318   5.8142327
#> 5      0 0.125         2  0.1949318  -0.5981691
#> 6      0 0.125         3  0.1949318 -11.0448984
#> 7      0 0.250         1  1.5384615  -6.0873534
#> 8      0 0.250         2  1.5384615   4.1486735
#> 9      0 0.250         3  1.5384615  -5.6509026
#> 10     0 0.500         1 11.1111111  21.6085144
#> 11     0 0.500         2 11.1111111  -4.4634188
#> 12     0 0.500         3 11.1111111   1.2742389

This synthetic dataset reflects a a fairly typical combination study layout: both drugs have been tested at a range of serially diluted concentrations, and combined doses have been laid out in a a checkerboard so that every combination of concentration of the first drug (including 0) and the second drug (including 0) has been tested. Furthermore, each combined dose is tested in triplicate. The measured response surface has been simulated using a BRAID response surface model with a synergistic \(\kappa\) value of 2, but any smoothly varying, noisily measured function of two doses would work just as well.

How might we visualize such a response surface using ggplot2? A 3-D plot is right out: ggplot2 has no standard support for such plots, nor should it, as 3-D plots are notoriously poor at conveying accurate quantitative information. A more straightforward approach would be to plot our measured effect as a function of one concentration, conditioned on the level of the other. For example:

ggplot(surface,aes(x=concA,y=measure,colour=factor(concB)))+
    geom_point()+
    stat_summary(geom="line",fun.data=mean_se)+
    scale_x_log10()+
    labs(x="Drug A",y="Effect",colour="Drug B")

This view is quite effective, and gives a clear sense of how the presence of differing amounts of drug B impact the behavior of drug B. But this plotting approach necessarily differentiates between the two drugs in how they are visualized, and is ineffective at showing the shape of their interaction.

The most intuitive way to present combination data is with the use of a response surface heatmap: plot the two doses as the x- and y- dimensions, and visualize their effect using a suitable color scale. Unfortunately, our dataset includes multiple measurements for each combination; were we to plot these using something like geom_tile() each replicate would be plotted over the others, so that only the last replicates were visible. ggplot2 does contain one way to address this out-of-the-box, [stat_summary_2d()], but it doesn’t behave exactly as we’d like:

ggplot(surface, aes(x=concA,y=concB))+
    stat_summary_2d(aes(z=measure), fun="mean")+
    scale_x_log10()+
    scale_y_log10()+
    scale_fill_distiller(palette="RdYlBu")+
    coord_equal()+
    labs(x="Drug A",y="Drug B",fill="Effect")

Not only are our evenly spaced concentrations pairs reduced to awkwardly rounded mini-tiles, all measurements where either drug is zero have been removed from the plot altogether. This is an intrinsic property of nearly all ggplot2 stats and geoms: if a transformed coordinate is infinite, it is removed from the plot. Yet plotting serially diluted concentrations on a logarithmic scale is incredibly intuitive, making comparison of non-zero concentrations and zero concentrations frustratingly difficult. It is for these reasons that we developed geom_braid():

ggplot(surface,aes(x=concA,y=concB))+
    geom_braid(aes(fill=measure))+
    scale_x_log10()+
    scale_y_log10()+
    scale_fill_distiller(palette="RdYlBu")+
    coord_equal()+
    labs(x="Drug A",y="Drug B",fill="Effect")

Like many ggplot2 extensions, geom_braid() is in reality a Stat, performing some useful preprocessing of the data before passing it off to the true geom, geom_tile(). Duplicate concentration pairs are identified, and averaged to give the aggregate value at each unique pair. Widths between doses pairs are guessed or provided. Coordinates which, after being transformed, would be infinite are instead offset from the main body of measurements, ensuring they will be plotted along with the data. The result is a simple and intuitive tool for quickly examining combined action data.

Customizing BRAID Heatmaps

As a ggplot2 extension, geom_braid() leverages all of the customization and flexibility ggplot objects traditionally afford, including integration with other layers, ggplot fill scales, and faceting:

ggplot(surface,aes(x=concA,y=concB))+
    geom_braid(aes(fill=measure))+
    geom_point(colour="black")+
    scale_x_log10("Drug A")+
    scale_y_log10("Drug B")+
    scale_fill_viridis_c("Effect",option="A")+
    coord_equal()+
    facet_wrap(vars(replicate))

Widths and heights of tiles can (and generally should) be generated automatically, but can be passed to the stat as additional aesthetics if desired. Note that widths and heights should be expressed in the transformed coordinate space:

surface$tilewidth <- log10(2)*0.9
surface$tilewidth[surface$concA==0] <- log10(2)/2

surface$tileheight <- log10(2)*0.9
surface$tileheight[surface$concB==0] <- log10(2)/2

ggplot(surface,aes(x=concA,y=concB))+
    geom_braid(aes(fill=measure,width=tilewidth,height=tileheight),space=3)+
    scale_x_log10("Drug A")+
    scale_y_log10("Drug B")+
    scale_fill_distiller("Effect",palette="RdYlBu")+
    coord_equal()

Stained Glass Surfaces

geom_braid() is an effective tool for rendering classic checkerboard layouts, but there is no guarantee that data will be laid out so cleanly. Experiments might only measure a subset of a traditional checkerboard, or might select points on an even denser arrangement. For example, the following adjustment simulates an experiment in which measurements from replicates 2 and 3 have increased the concentrations of drug A and drug B respectively. This removes the traditional “triplicate” approach in favor of a more varied, full-coverage method:

glassSurface <- surface
glassSurface$concA[glassSurface$replicate==2] <- 
    glassSurface$concA[glassSurface$replicate==2]*1.25
glassSurface$concB[glassSurface$replicate==3] <- 
    glassSurface$concB[glassSurface$replicate==3]*1.25

glassSurface$actual <- evalBraidModel(
    glassSurface$concA,
    glassSurface$concB,
    c(1, 1, 3, 3, -0.5, 0, 60, 100, 100)
)
glassSurface$measure <- glassSurface$actual+rnorm(nrow(glassSurface),sd=7)

head(glassSurface, 12)
#>    concA   concB replicate     actual      measure tilewidth tileheight
#> 1      0 0.00000         1  0.0000000  4.002236136  0.150515   0.150515
#> 2      0 0.00000         2  0.0000000  2.784350041  0.150515   0.150515
#> 3      0 0.00000         3  0.0000000 -0.360747637  0.150515   0.150515
#> 4      0 0.12500         1  0.1949318 -7.271100218  0.150515   0.270927
#> 5      0 0.12500         2  0.1949318 -6.950132441  0.150515   0.270927
#> 6      0 0.15625         3  0.3800201  5.750562994  0.150515   0.270927
#> 7      0 0.25000         1  1.5384615 -8.749887049  0.150515   0.270927
#> 8      0 0.25000         2  1.5384615  0.004843671  0.150515   0.270927
#> 9      0 0.31250         3  2.9613836 -0.598251267  0.150515   0.270927
#> 10     0 0.50000         1 11.1111111 16.965812489  0.150515   0.270927
#> 11     0 0.50000         2 11.1111111 -1.540629375  0.150515   0.270927
#> 12     0 0.62500         3 19.6232339  2.598331077  0.150515   0.270927

Due to its irregular spacing, plotting glassSurface with geom_braid() produces a very unsatisfactory plot:

ggplot(glassSurface,aes(x=concA,y=concB))+
    geom_braid(aes(fill=measure))+
    geom_point(colour="black")+
    scale_x_log10("Drug A")+
    scale_y_log10("Drug B")+
    scale_fill_distiller("Effect",palette="RdYlBu")+
    coord_equal()

While this plot is technically correct, it fails to fill the space in the way a response surface should. When dealing with such irregular sampling, a better tool is geom_braid_glass() which produces what we call a “stained glass” plot:

ggplot(glassSurface,aes(x=concA,y=concB))+
    geom_braid_glass(aes(fill=measure))+
    geom_point(colour="black")+
    scale_x_log10("Drug A")+
    scale_y_log10("Drug B")+
    scale_fill_distiller("Effect",palette="RdYlBu")+
    coord_equal()

In a stained glass plot, every point within the bounds of the plotted values is colored according to the value of the measured dose pair nearest to it, producing a mosaic of Voronoi cells that cover the full space. Values in the margins of the plot are given a height or width according the specified or inferred aesthetic, but the boundaries between them are again bisecting Voronoi boundaries.

The ability to customize the width of the resulting tiles, particularly in the margins, can be even more valuable in these plots, where the width and height default to the smallest spacing between distinct values:

ggplot(glassSurface,aes(x=concA,y=concB))+
    geom_braid_glass(aes(fill=measure,width=tilewidth,height=tileheight),space=2)+
    scale_x_log10("Drug A")+
    scale_y_log10("Drug B")+
    scale_fill_distiller("Effect",palette="RdYlBu")+
    coord_equal()

Smoothed BRAID Response Surfaces

While the discrete heatmaps of geom_braid and geom_braid_glass are the most direct ways to visualize combined action data, the harsh polygonal edges introduced can sometimes mask the more important variations in shape and structure. braidReports also includes a ggplot geom for rendering smoothed surfaces, unsurprisingly named geom_braid_smooth:

ggplot(surface,aes(x=concA,y=concB))+
    geom_braid_smooth(aes(fill=measure))+
    scale_x_log10()+
    scale_y_log10()+
    scale_fill_distiller(palette="RdYlBu")+
    coord_equal()+
    labs(x="Drug A",y="Drug B",fill="Effect")

geom_braid_smooth interpolates a regular grid of values from the original (potentially irregular) data using a two-dimensional Gaussian kernel, producing a smoothed surface that generally hews extremely close to measured values at their respective points, but produces intuitive, smoothly varying values in between them. It can be run on both regularly laid-out checkerboard data and more irregularly spaced measurements, though in the latter case specifying the smoothing width and height explicitly using the width and height aesthetics is often advisable:

ggplot(glassSurface,aes(x=concA,y=concB))+
    geom_braid_smooth(aes(fill=measure))+
    geom_point(colour="black")+
    scale_x_log10("Drug A")+
    scale_y_log10("Drug B")+
    scale_fill_distiller("Effect",palette="RdYlBu")+
    coord_equal()


ggplot(glassSurface,aes(x=concA,y=concB))+
    geom_braid_smooth(aes(fill=measure,width=log10(2),height=log10(2)))+
    scale_x_log10("Drug A")+
    scale_y_log10("Drug B")+
    scale_fill_distiller("Effect",palette="RdYlBu")+
    coord_equal()


ggplot(glassSurface,aes(x=concA,y=concB))+
    geom_braid_smooth(aes(fill=measure,width=tilewidth,height=tileheight),space=2)+
    scale_x_log10("Drug A")+
    scale_y_log10("Drug B")+
    scale_fill_distiller("Effect",palette="RdYlBu")+
    coord_equal()

Response Surface Contours

Heatmaps are an intuitive and effective way of depicting the results and shape of a combined response surface, but they still fall short when it comes to quantitative depiction. Even the best-designed colormap still carries considerable imprecision, making it difficult to perceive the values and ranges at which particular numerical values are reached. One effective tool for this task is the contour map, which markes the boundaries of dose space at which a given effect level is crossed. To support such plots, we have included geom_braid_contour() which uses the same smoothing techniques as geom_braid_smooth() to produces an array of x-, y-, and z-values for the built in ggplot stat, stat_contour(). This allows us to visualize both the overall shape of the surface and the boundaries of particular effect spaces:

ggplot(surface,aes(x=concA,y=concB))+
    geom_braid_smooth(aes(fill=measure))+
    geom_braid_contour(aes(z=measure),breaks=10*(1:9),colour="black",linetype=2)+
    scale_x_log10()+
    scale_y_log10()+
    scale_fill_distiller(palette="RdYlBu")+
    coord_equal()+
    labs(x="Drug A",y="Drug B",fill="Effect")

Note that geom_braid_contour() uses the smoothed and interpolated values like those produced by geom_braid_smooth() rather than the discrete values plotted by geom_braid() or geom_braid_glass(). There are two reasons for this: first, the underlying Stat, stat_contour() requires that the data plotted be laid out as a regular grid for it to perform its own interpolation. Second, contours linearly interpolated between more parsely sampled data are often disjointed and jagged, and carry much less information than more smoothly interpolated values.

As with geom_braid_smooth(), geom_braid_contour() can be applied to both regular and irregular data, but with irregular data more care must be taken with the smoothing widths and heights:

ggplot(glassSurface,aes(x=concA,y=concB))+
    geom_braid_smooth(aes(fill=measure))+
    geom_point(colour="black")+
    geom_braid_contour(aes(z=measure),breaks=10*(1:9),colour="black",linetype=2)+
    scale_x_log10("Drug A")+
    scale_y_log10("Drug B")+
    scale_fill_distiller("Effect",palette="RdYlBu")+
    coord_equal()


ggplot(glassSurface,aes(x=concA,y=concB))+
    geom_braid_smooth(aes(fill=measure,width=tilewidth,height=tileheight),space=2)+
    geom_braid_contour(aes(z=measure,width=tilewidth,height=tileheight),space=2,
                       breaks=10*(1:9),colour="black",linetype=2)+
    scale_x_log10("Drug A")+
    scale_y_log10("Drug B")+
    scale_fill_distiller("Effect",palette="RdYlBu")+
    coord_equal()

It should also be noted that while we have plotted contours and smoothed surfaces together here, this is only to highlight their connection to the underlying interpolated data. BRAID contours can be plotted all on their own which can be quite effective at comparing the results of different surfaces:

surface$type <- "Synergy"
glassSurface$type <- "Antagonism"
allSurface <- rbind(surface,glassSurface)
allSurface$type <- factor(allSurface$type,c("Synergy","Antagonism"))

ggplot(allSurface,aes(x=concA,y=concB,colour=type))+
    geom_point()+
    geom_braid_contour(aes(z=measure,width=tilewidth,height=tileheight),
                       breaks=c(50,90), tight=TRUE)+
    scale_x_log10()+
    scale_y_log10()+
    scale_color_brewer("Surface Type",palette="Set1")+
    coord_equal()+
    labs(x="Drug A",y="Drug B")

A Word About Warnings

Astute ggplot2 observers will note that at several points throughout this vignette, we have run standard ggplot2 geoms such as geom_point() on logarithmically transformed data that result in infinite (transformed) values. Ordinarily, this would produce a warning: and in reality, it produces a warning here as well. We have suppressed warnings for this vignette, because while the braid, braid_glass, and braid_smooth geoms all explicitly handle such infinite transformed values, we have been unable to find a way to suppress these built in warnings. The warnings result, not when a stat is running, but before its functions are handled, and as such are, for the time being, unavoidable. When running a BRAID plotting function yourself, you will encounter these warnings as well:

# With warnings enabled...
ggplot(surface,aes(x=concA,y=concB))+
    geom_braid(aes(fill=measure))+
    scale_x_log10()+
    scale_y_log10()+
    scale_fill_distiller(palette="RdYlBu")+
    coord_equal()+
    labs(x="Drug A",y="Drug B",fill="Effect")
#> Warning in scale_x_log10(): log-10 transformation introduced infinite values.
#> Warning in scale_y_log10(): log-10 transformation introduced infinite values.

While we feel that it is an unfortunate reality, we hope that this is a small enough inconvenience that the overall versatility and expressiveness of the BRAID plotting function outweighs it.