1 Introduction
2 Characteristics of drugs and patients in the study
- 2.1 Drugs
- 2.2 Patient samples
3 Data quality control
- 3.1 Comparison of ATP luminescence of DMSO controls at timepoint 0 and 48 h
- 3.2 Reproducibility of the drug screening platform
4 Drug-induced effects on cell viability
5 Drug-drug correlation
- 5.1 Additional processing functions and parameters
- 5.2 CLL/T-PLL
  - 5.2.1 Effect of drugs with similar target
- 5.3 T-PLL
- 5.4 MCL
6 Disease-specific drug response phenotypes
- 6.1 Example: dose-response curves
- 6.2 Example: drug effects as bee swarms
7 Target profiling of AZD7762 and PF477736
8 Global overview of the drug response landscape
- 8.1 Setup
- 8.2 Define disease groups
- 8.3 Code for heatmap
- 8.4 Draw the heatmaps
  - 8.4.1 Identify places where gaps between the rows should be
9 Relative drug effects - ternary diagrams
- 9.1 Additional settings
- 9.2 Calculating the coordinates
- 9.3 Plot ternaries
10 Comparison of gene expression responses to drug treatments
- 10.1 Differential expression using Limma
11 Co-sensitivity patterns of the four response groups
- 11.1 Methodology of building groups
- 11.2 Mean of each group
- 11.3 Bee swarm plots for groups
12 Kaplan-Meier plot for groups (time from sample to next treatment)
13 Incidence of somatic gene mutations and CNVs in the four groups
- 13.1 Test gene mutations vs. groups
14 Differences in gene expression profiles between drug-response groups
- 14.1 Cytokine / chemokine expression in mTOR group
15 Response to cytokines in CLL
16 Gene set enrichment analysis on BTK, mTOR, MEK groups
- 16.1 Preprocessing RNAseq data
- 16.2 Differential gene expression
- 16.3 Gene set enrichment analysis
- 16.4 Everolimus response VS gene expression (within mTOR group)
  - 16.4.1 Correlation test
  - 16.4.2 Enrichment heatmaps for mTOR group with overlapped genes indicated
- 16.5 Ibrutinib response VS gene expression (within BTK group)
17 Single associations of drug response with gene mutation or type of disease (IGHV included)
- 17.1 Function which test associations of interest
- 17.2 Associations of ex vivo drug responses with genomic features in CLL
- 17.3 Associations of drug responses with genomic features in CLL independently of IGHV status
- 17.4 Drug response dependance on cell origin of disease
18 Effect of mutation on drug response - examples
19 Associations of drug responses with mutations in CLL (IGHV not included)
- 19.1 Additional functions
- 19.2 Data setup
- 19.3 Test for drug-gene associations
- 19.4 Plot results
  - 19.4.1 Main Figure
  - 19.4.2 Supplementary Figure (incl. pretreatment)
- 19.5 Comparison of $P$-Values
20 Association between HSP90 inhibitor response and IGHV status
21 Association between MEK/ERK inhibitor response and trisomy12
22 Expression profiling analysis of trisomy 12
- 22.1 Differential gene expression analysis using DESeq2
  - 22.1.1 Heatmap plot of differentially expressed genes
- 22.2 Gene set enrichment analysis
  - 22.2.1 Perform enrichment analysis
  - 22.2.2 Heatmap for selected gene sets
23 Drug response prediction
- 23.1 Assesment of omics capacity in explaining drug response
  - 23.1.1 Data pre-processing
  - 23.1.2 Lasso using multi-omic data
- 23.2 Lasso for drugs in a genetic-focussed model
24 Survival analysis
- 24.1 Univariate survival analysis
- 24.2 Kaplan-Meier curves
  - 24.2.1 Genetics factors
  - 24.2.2 Drug responses
- 24.3 Multivariate Cox-model
  - 24.3.1 Chemotherapies
  - 24.3.2 Targeted therapies
25 End of session

1 Introduction

Biocpkg("BloodCancerMultiOmics2017") is a multi-omic dataset comprising genome, transcriptome, DNA methylome data together with data from the ex vivo drug sensitivity screen of the primary blood tumor samples.

In this vignette we present the analysis of the Primary Blood Cancer Encyclopedia (PACE) project and source code for the paper

Drug-perturbation-based stratification of blood cancer

Sascha Dietrich*, Małgorzata Oleś*, Junyan Lu*, Leopold Sellner, Simon Anders, Britta Velten, Bian Wu, Jennifer Hüllein, Michelle da Silva Liberio, Tatjana Walther, Lena Wagner, Sophie Rabe, Sonja Ghidelli-Disse, Marcus Bantscheff, Andrzej K. Oleś, Mikołaj Słabicki, Andreas Mock, Christopher C. Oakes, Shihui Wang, Sina Oppermann, Marina Lukas, Vladislav Kim, Martin Sill, Axel Benner, Anna Jauch, Lesley Ann Sutton, Emma Young, Richard Rosenquist, Xiyang Liu, Alexander Jethwa, Kwang Seok Lee, Joe Lewis, Kerstin Putzker, Christoph Lutz, Davide Rossi, Andriy Mokhir, Thomas Oellerich, Katja Zirlik, Marco Herling, Florence Nguyen-Khac, Christoph Plass, Emma Andersson, Satu Mustjoki, Christof von Kalle, Anthony D. Ho, Manfred Hensel, Jan Dürig, Ingo Ringshausen, Marc Zapatka, Wolfgang Huber and Thorsten Zenz
J. Clin. Invest. (2018); 128(1):427–445. doi:10.1172/JCI93801.

The presented analysis was done by Małgorzata Oleś, Sascha Dietrich, Junyan Lu, Britta Velten, Andreas Mock, Vladislav Kim and Wolfgang Huber.

This vignette was put together by Małgorzata Oleś.

This vignette is build from the sub-vignettes, which each can be build separately. The parts are separated by the horizontal lines. Each part finishes with removal of all the created objects.

library("AnnotationDbi")
library("abind")
library("beeswarm")
library("Biobase")
library("biomaRt")
library("broom")
library("colorspace")
library("cowplot")
library("dendsort")
library("DESeq2")
library("doParallel")
library("dplyr")
library("foreach")
library("forestplot")
library("genefilter")
library("ggbeeswarm")
library("ggdendro")
library("ggplot2")
library("ggtern")
library("glmnet")
library("grid")
library("gridExtra")
library("gtable")
library("hexbin")
library("IHW")
library("ipflasso")
library("knitr")
library("limma")
library("magrittr")
library("maxstat")
library("nat")
library("org.Hs.eg.db")
library("BloodCancerMultiOmics2017")
library("pheatmap")
library("piano")
library("readxl")
library("RColorBrewer")
library("reshape2")
library("Rtsne")
library("scales")
library("SummarizedExperiment")
library("survival")
library("tibble")
library("tidyr")
library("tidyverse")
library("xtable")

options(stringsAsFactors=FALSE)

2 Characteristics of drugs and patients in the study

Loading the data.

data("drpar", "drugs", "patmeta", "mutCOM")

Creating vectors of patient samples and drugs within the drug screen. Within drugs, we omit the statistics for one drug combination, due to lack of possibility to assign its targets.

# PATIENTS
patM = colnames(drpar)

# DRUGS
drM = rownames(drpar)
drM = drM[!drM %in% "D_CHK"] # remove combintation of 2 drugs: D_CHK

General plotting parameters.

bwScale = c("0"="white","1"="black","N.A."="grey90")
lfsize = 16 # legend font size

2.1 Drugs

Categorize the drugs.

drugs$target_category = as.character(drugs$target_category)
drugs$group = NA
drugs$group[which(drugs$approved_042016==1)] = "FDA approved"
drugs$group[which(drugs$devel_042016==1)] = "clinical development/\ntool compound"

Show the characteristics.

                         FDA approved   clinical development/

tool compound
DNA damage response	4	12
EGFR	3	0
Other	3	4
ALK	1	1
Angiogenesis	1	0
Apoptosis (BH3, Survivin)	1	2
B-cell receptor	1	5
BCR/ABL	1	0
Epigenome	1	2
Hedgehog signaling	1	0
Immune modulation	1	0
JAK/STAT	1	2
MAPK	1	7
PI3K/AKT	1	5
Proteasome	1	1
mTOR	1	1
Cell cycle control	0	6
Cytoskeleton	0	2
HSP90	0	1
MET	0	1
Mitochondrial metabolism	0	2
NFkB	0	2
NOTCH	0	1
PIM	0	1
PKC	0	3
Reactive oxygen species	0	3
Splicing	0	1
TGF	0	1
WNT	0	1

2.2 Patient samples

Show number of samples stratified by the diagnosis.

Var1	Freq
CLL	184
T-PLL	25
MCL	10
MZL	6
AML	5
LPL	4
B-PLL	3
HCL	3
hMNC	3
HCL-V	2
Sezary	2
FL	1
PTCL-NOS	1

Within CLL group, we now show mutations with occurred in at least 4 samples.

# select CLL samples
patM = patM[patmeta[patM,"Diagnosis"]=="CLL"]

ighv = factor(setNames(patmeta[patM,"IGHV"], nm=patM), levels=c("U","M"))

mut1 = c("del17p13", "del11q22.3", "trisomy12", "del13q14_any")
mut2 = c("TP53", "ATM", "SF3B1", "NOTCH1", "MYD88")

mc = assayData(mutCOM)$binary[patM,]

## SELECTION OF MUTATIONS
# # include mutations with at least incidence of 4
mut2plot = names(which(sort(colSums(mc, na.rm=TRUE), decreasing=TRUE)>3))

# remove chromothrypsis
mut2plot = mut2plot[-grep("Chromothripsis", mut2plot)]
# divide mutations into gene mut and cnv
mut2plotSV = mut2plot[grep("[[:lower:]]", mut2plot)]
mut2plotSP = mut2plot[grep("[[:upper:]]", mut2plot)]

# remove some other things (it is quite manual thing, so be careful)
# IF YOU WANT TO REMOVE SOME MORE MUTATIONS JUST ADD THE LINES HERE!
mut2plotSV = mut2plotSV[-grep("del13q14_mono", mut2plotSV)]
mut2plotSV = mut2plotSV[-grep("del13q14_bi", mut2plotSV)]
mut2plotSV = mut2plotSV[-grep("del14q24.3", mut2plotSV)]

# rearrange the top ones to match the order in mut1 and mut2
mut2plotSV = c(mut1, mut2plotSV[!mut2plotSV %in% mut1])
mut2plotSP = c(mut2, mut2plotSP[!mut2plotSP %in% mut2])

factors = data.frame(assayData(mutCOM)$binary[patM, c(mut2plotSV, mut2plotSP)],
                     check.names=FALSE)
# change del13q14_any to del13q14
colnames(factors)[which(colnames(factors)=="del13q14_any")] = "del13q14"
mut2plotSV = gsub("del13q14_any", "del13q14", mut2plotSV)
# change it to factors
for(i in 1:ncol(factors)) {
  factors[,i] = factor(factors[,i], levels=c(1,0))
}

ord = order(factors[,1], factors[,2], factors[,3], factors[,4], factors[,5],
            factors[,6], factors[,7], factors[,8], factors[,9], factors[,10],
            factors[,11], factors[,12], factors[,13], factors[,14],
            factors[,15], factors[,16], factors[,17], factors[,18],
            factors[,19], factors[,20], factors[,21], factors[,22],
            factors[,23], factors[,24], factors[,25], factors[,26],
            factors[,27], factors[,28], factors[,29], factors[,30],
            factors[,31], factors[,32])

factorsord = factors[ord,]
patM = patM[ord]

(c(mut2plotSV, mut2plotSP))

##  [1] "del17p13"   "del11q22.3" "trisomy12"  "del13q14"   "del8p12"   
##  [6] "gain2p25.3" "gain8q24"   "del6q21"    "gain3q26"   "del9p21.3" 
## [11] "del15q15.1" "del6p21.2"  "TP53"       "ATM"        "SF3B1"     
## [16] "NOTCH1"     "MYD88"      "BRAF"       "KRAS"       "EGR2"      
## [21] "MED12"      "PCLO"       "MGA"        "ACTN2"      "BIRC3"     
## [26] "CPS1"       "FLRT2"      "KLHL6"      "NFKBIE"     "RYR2"      
## [31] "XPO1"       "ZC3H18"

Let’s now look deeper and for each mutation. We ask how many samples have (1) or don’t have (0) a particular mutation.

plotDF = meltWholeDF(factorsord)
plotDF$Mut =
  ifelse(sapply(plotDF$X,
                function(x) grep(x, list(mut2plotSV, mut2plotSP)))==1,"SV","SP")
plotDF$Status = "N.A."
plotDF$Status[plotDF$Measure==1 & plotDF$Mut=="SV"] = "1a"
plotDF$Status[plotDF$Measure==1 & plotDF$Mut=="SP"] = "1b"
plotDF$Status[plotDF$Measure==0] = "0"
plotDF$Status = factor(plotDF$Status, levels=c("1a","1b","0","N.A."))

plotDF$Y = factor(plotDF$Y, levels=patM)
plotDF$X = factor(plotDF$X, levels=rev(colnames(factorsord)))

mutPL = ggplotGrob(
  ggplot(data=plotDF, aes(x=Y, y=X, fill=Status)) + geom_tile() +
    scale_fill_manual(
      values=c("0"="white","1a"="forestgreen","1b"="navy","N.A."="grey90"),
      name="Mutation", labels=c("CNV","Gene mutation","WT","NA")) +
    ylab("") + xlab("") +
    geom_vline(xintercept=seq(0.5,length(patM)+1,5), colour="grey60") +
    geom_hline(yintercept=seq(0.5,ncol(factorsord)+1,1), colour="grey60") +
    scale_y_discrete(expand=c(0,0)) + scale_x_discrete(expand=c(0,0)) +
    theme(axis.ticks=element_blank(), axis.text.x=element_blank(),
          axis.text.y=element_text(
            size=60, face=ifelse(levels(plotDF$X) %in% mut2plotSV,
                                 "plain","italic")),
          axis.text=element_text(margin=unit(0.5,"cm"), colour="black"),
          legend.key = element_rect(colour = "black"),
          legend.text=element_text(size=lfsize),
          legend.title=element_text(size=lfsize)))

res = table(plotDF[,c("X","Measure")])
knitr::kable(res[order(res[,2], decreasing=TRUE),])

	0	1
del13q14	69	106
TP53	142	36
del11q22.3	146	28
trisomy12	146	27
del17p13	150	27
SF3B1	152	26
NOTCH1	160	19
ATM	89	11
BRAF	169	10
KRAS	136	8
del8p12	131	8
gain8q24	162	7
gain2p25.3	132	7
PCLO	94	6
MED12	94	6
EGR2	94	6
del6q21	163	6
MGA	95	5
MYD88	173	5
del15q15.1	134	5
del9p21.3	134	5
gain3q26	134	5
ZC3H18	96	4
XPO1	96	4
RYR2	96	4
NFKBIE	96	4
KLHL6	96	4
FLRT2	96	4
CPS1	96	4
BIRC3	96	4
ACTN2	96	4
del6p21.2	135	4

In the last part, we characterize samples according to metadata categories.

Age

ageDF = data.frame(Factor="Age",
                   PatientID=factor(patM, levels=patM),
                   Value=patmeta[patM,c("Age4Main")])

agePL = ggplotGrob(
  ggplot(ageDF, aes(x=PatientID, y=Factor, fill=Value)) + geom_tile() +
    scale_fill_gradient(low = "gold", high = "#3D1F00", na.value="grey92",
                        name="Age", breaks=c(40,60,80)) +
    theme(axis.ticks=element_blank(),
          axis.text=element_text(size=60, colour="black",
                                 margin=unit(0.5,"cm")),
          legend.text=element_text(size=lfsize),
          legend.title=element_text(size=lfsize)))
  
hist(ageDF$Value, col="slategrey", xlab="Age", main="")

Sex

sexDF = data.frame(Factor="Sex", PatientID=factor(patM, levels=patM),
                   Value=patmeta[patM, "Gender"])

sexPL = ggplotGrob(
  ggplot(sexDF, aes(x=PatientID, y=Factor, fill=Value)) + geom_tile() +
    scale_fill_manual(values=c("f"="maroon","m"="royalblue4","N.A."="grey90"),
                      name="Sex", labels=c("Female","Male","NA")) +
    theme(axis.ticks=element_blank(),
          axis.text=element_text(size=60, colour="black",
                                 margin=unit(0.5,"cm")),
          legend.key = element_rect(colour = "black"),
          legend.text=element_text(size=lfsize),
          legend.title=element_text(size=lfsize)))

table(sexDF$Value)

## 
##   f   m 
##  76 108

Treatment

Number of samples treated (1) or not treated (0) before sampling.

treatDF = data.frame(Factor="Treated", PatientID=factor(patM, levels=patM),
                     Value=ifelse(patmeta[patM, "IC50beforeTreatment"], 0, 1))
treatDF$Value[is.na(treatDF$Value)] = "N.A."
treatDF$Value = factor(treatDF$Value, levels=c("0","1","N.A."))

treatPL = ggplotGrob(
  ggplot(treatDF, aes(x=PatientID, y=Factor, fill=Value)) +geom_tile() +
    scale_fill_manual(values=bwScale, name="Treated",
                      labels=c("0"="No","1"="Yes","N.A."="NA")) +
    theme(axis.ticks=element_blank(),
          axis.text=element_text(size=60, colour="black",
                                 margin=unit(0.5,"cm")),
          legend.key = element_rect(colour = "black"),
          legend.text=element_text(size=lfsize),
          legend.title=element_text(size=lfsize)))

table(treatDF$Value)

## 
##    0    1 N.A. 
##  131   52    1

IGHV status

Number of samples with (1) and without (0) the IGHV mutation.

ighvDF = data.frame(Factor="IGHV", PatientID=factor(patM, levels=patM),
                    Value=patmeta[patM, "IGHV"])
ighvDF$Value = ifelse(ighvDF$Value=="M", 1, 0)
ighvDF$Value[is.na(ighvDF$Value)] = "N.A."
ighvDF$Value = factor(ighvDF$Value, levels=c("0","1","N.A."))

ighvPL = ggplotGrob(
  ggplot(ighvDF, aes(x=PatientID, y=Factor, fill=Value)) + geom_tile() +
    scale_fill_manual(values=bwScale, name="IGHV",
                      labels=c("0"="Unmutated","1"="Mutated","N.A."="NA")) +
    theme(axis.ticks=element_blank(), 
          axis.text=element_text(size=60, colour="black", margin=unit(0.5,"cm")),
          legend.key=element_rect(colour = "black"),
          legend.text=element_text(size=lfsize),
          legend.title=element_text(size=lfsize)))

table(ighvDF$Value)

## 
##    0    1 N.A. 
##   74   98   12

3 Data quality control

We performed multiple checks on the data quality. Below we show two examples.

First, we compared the values of ATP luminescence of DMSO controls at the beginning and after 48 h of incubation. Second, we assessed reproducibility of the drug screening platform.

3.1 Comparison of ATP luminescence of DMSO controls at timepoint 0 and 48 h

The ATP luminescence of the samples were measured on day 0. We compared this value with the ATP luminescence of negative control wells at 48 h, in order to assess the cell viability change without drug treatment during 48 h culturing.

Loading the data.

data("lpdAll")

Prepare table for plot.

plotTab = pData(lpdAll) %>%
  transmute(x=log10(ATPday0), y=log10(ATP48h), diff=ATP48h/ATPday0) %>%
  filter(!is.na(x))

Scatter plot to show the the correlation of ATP luminescence between day0 and 48h.

lm_eqn <- function(df){
    m <- lm(y ~ 1, df, offset = x)
    ypred <- predict(m, newdata = df)
    r2 = sum((ypred - df$y)^2)/sum((df$y - mean(df$y)) ^ 2)
    eq <- substitute(italic(y) == italic(x) + a*","~~italic(r)^2~"="~r2, 
         list(a = format(coef(m)[1], digits = 2), 
             r2 = format(r2, digits = 2)))
    as.character(as.expression(eq))                 
}

plotTab$ypred <- predict(lm(y~1,plotTab, offset = x), newdata = plotTab)
sca <- ggplot(plotTab, aes(x= x, y = y)) + geom_point(size=3) + 
  geom_smooth(data = plotTab, mapping = aes(x=x, y = ypred), method = lm, se = FALSE, formula = y ~ x) + 
  geom_text(x = 5.2, y = 6.2, label = lm_eqn(plotTab), parse = TRUE, size =8) + 
  xlab("log10(day0 ATP luminescence)") + ylab("log10(48h ATP luminescence)") +
  theme_bw() + theme(axis.title = element_text(size = 15, face = "bold"), 
                     axis.text = element_text(size=15), legend.position = "none") +
  coord_cartesian(xlim = c(4.6,6.3), ylim = c(4.6,6.3))

Histogram of the difference between day0 and 48h ATP level.

histo <- ggplot(plotTab, aes(x = diff)) + geom_histogram(col = "red", fill = "red", bins=30, alpha = 0.5) + theme_bw() +
  theme(axis.title = element_text(size = 15, face = "bold"), axis.text = element_text(size=15), legend.position = "none") +
  xlab("(48h ATP luminescence) / (day0 ATP luminescence)")

Combine plots together.

grid.arrange(sca, histo, ncol=2)

3.2 Reproducibility of the drug screening platform

Drug screening platform tested three samples twice. Moreover, the measurements were taken in the two time points: 48 h and 72 h after drug treatment. Here we compare the reproducibility of the screening platform by calculating Pearson correlation coefficients for the each pair of replicates.

Loading the data.

data("day23rep")

Arranging the data.

maxXY = 125

plottingDF = do.call(rbind, lapply(c("day2","day3"), function(day) {
  tmp = merge(
    meltWholeDF(assayData(day23rep)[[paste0(day,"rep1")]]),
    meltWholeDF(assayData(day23rep)[[paste0(day,"rep2")]]),
    by=c("X","Y"))
  colnames(tmp) = c("PatientID", "DrugID", "ViabX", "ViabY")
  tmp[,c("ViabX", "ViabY")] = tmp[,c("ViabX", "ViabY")] * 100
  tmp$Day = ifelse(day=="day2", "48h", "72h")
  tmp
}))

plottingDF$Shape =
  ifelse(plottingDF$ViabX > maxXY | plottingDF$ViabY > maxXY, "B", "A")

Calculate the Pearson correlation coefficient.

annotation = 
  do.call(rbind,
          tapply(1:nrow(plottingDF),
                 paste(plottingDF$PatientID,
                       plottingDF$Day, sep="_"),
                 function(idx) {
                   data.frame(X=110, Y=10,
                              Shape="A",
                              PatientID=plottingDF$PatientID[idx[1]],
                              Day=plottingDF$Day[idx[1]],
                              Cor=cor(plottingDF$ViabX[idx],
                                      plottingDF$ViabY[idx],
                                      method="pearson"))
}))

Plot the correlations together with coefficients (in a bottom-right corner).

#FIG# S31
ggplot(data=plottingDF, 
       aes(x=ifelse(ViabX>maxXY,maxXY,ViabX), y=ifelse(ViabY>maxXY,maxXY,ViabY),
           shape=Shape)) +
  facet_grid(Day ~ PatientID) + theme_bw() +
  geom_hline(yintercept=100, linetype="dashed",color="darkgrey") +
  geom_vline(xintercept=100, linetype="dashed",color="darkgrey") +
  geom_abline(intercept=0, slope=1, colour="grey") +
  geom_point(size=1.5, alpha=0.6) +
  scale_x_continuous(limits=c(0,maxXY), breaks=seq(0,maxXY,25)) +
  scale_y_continuous(limits=c(0,maxXY), breaks=seq(0,maxXY,25)) +
  xlab("% viability - replicate 1") + ylab("% viability - replicate 2") +
  coord_fixed() + expand_limits(x = 0, y = 0) +
  theme(axis.title.x=element_text(size = rel(1), vjust=-1),
        axis.title.y=element_text(size = rel(1), vjust=1),
        strip.background=element_rect(fill="gainsboro")) +
  guides(shape=FALSE, size=FALSE) +
  geom_text(data=annotation,
            aes(x=X, y=Y, label=format(Cor, digits=2), size=1.2),
            colour="maroon", hjust=0.2)

The reproducibility of the measurements is high (mean 0.84).

options(stringsAsFactors=FALSE)

4 Drug-induced effects on cell viability

Loading the data.

data("lpdAll")

Here we show a relative cell viability (as compared to negative control) under treatment with 64 drugs at 5 concentrations steps each.

Prepare data.

#select drug screening data on patient samples
lpd <- lpdAll[fData(lpdAll)$type == "viab", pData(lpdAll)$Diagnosis != "hMNC"]
viabTab <- Biobase::exprs(lpd)
viabTab <- viabTab[,complete.cases(t(viabTab))]
viabTab <- reshape2::melt(viabTab)
viabTab$Concentration <- fData(lpd)[viabTab$Var1,"subtype"]
viabTab <- viabTab[viabTab$Concentration %in% c("1","2","3","4","5"),]
viabTab$drugName <- fData(lpd)[viabTab$Var1,"name"]
viabTab <- viabTab[order(viabTab$Concentration),]

#order drug by mean viablitity
drugOrder <- group_by(viabTab, drugName) %>%
  summarise(meanViab = mean(value)) %>%
  arrange(meanViab)
viabTab$drugName <- factor(viabTab$drugName, levels = drugOrder$drugName)

Scatter plot for viabilities and using colors for concentrations.

The plot shows high variability of effects between different drugs, from mostly lethal (left) to mostly neutral (right), concentration dependence of effects and high variability of effects of the same drug/concentration across patients.

5 Drug-drug correlation

We compared response patterns produced by drugs using phenotype clustering within diseases (CLL, T-PLL and MCL) using Pearson correlation analysis. The results imply that the drug assays probe tumor cells’ specific dependencies on survival pathways.

Loading the data.

data("drpar", "patmeta", "drugs")

5.1 Additional processing functions and parameters

Function that return the subset of samples for a given diagnosis (or all samples if diag=NA).

givePatientID4Diag = function(pts, diag=NA) {
  pts = if(is.na(diag)) {
    names(pts)
  } else {
    names(pts[patmeta[pts,"Diagnosis"]==diag])
  }
  pts
}

Function that returns the viability matrix for given screen (for a given channel) for patients with given diagnosis.

giveViabMatrix = function(diag, screen, chnnl) {
  data = if(screen=="main") drpar
          else print("Incorrect screen name.")
  pid = colnames(data)
  if(!is.na(diag))
    pid = pid[patmeta[pid,"Diagnosis"]==diag]
  
  return(assayData(data)[[chnnl]][,pid])
}

Color scales for the heat maps.

palette.cor1 = c(rev(brewer.pal(9, "Blues"))[1:8],
                 "white","white","white","white",brewer.pal(7, "Reds"))
palette.cor2 = c(rev(brewer.pal(9, "Blues"))[1:8],
                 "white","white","white","white",brewer.pal(7, "YlOrRd"))

5.2 CLL/T-PLL

Pearson correlation coefficients were calculated based on the mean drug response for the two lowest concentration steps in the main screen and across CLL and T-PLL samples separately. Square correlation matrices were plotted together, with CLL in lower triangle and T-PLL in upper triangle. The drugs in a heat map are ordered by hierarchical clustering applied to drug responses of CLL samples.

main.cll.tpll = BloodCancerMultiOmics2017:::makeCorrHeatmap(
  mt=giveViabMatrix(diag="CLL", screen="main", chnnl="viaraw.4_5"),
  mt2=giveViabMatrix(diag="T-PLL", screen="main", chnnl="viaraw.4_5"),
  colsc=palette.cor2, concNo="one")

The major clusters in CLL include: kinase inhibitors targeting the B cell receptor, including idelalisib (PI3K), ibrutinib (BTK), duvelisib (PI3K), PRT062607 (SYK); inhibitors of redox signalling / reactive oxygen species (MIS−43, SD07, SD51); and BH3-mimetics (navitoclax, venetoclax).

5.2.1 Effect of drugs with similar target

Here we compare the effect of drugs designed to target components of the same signalling pathway.

# select the data
mtcll = as.data.frame(t(giveViabMatrix(diag="CLL",
                                       screen="main",
                                       chnnl="viaraw.4_5")))
colnames(mtcll) = drugs[colnames(mtcll),"name"]

# function which plots the scatter plot
scatdr = function(drug1, drug2, coldot, mtNEW, min){
  
  dataNEW = mtNEW[,c(drug1, drug2)]
  colnames(dataNEW) = c("A", "B")

  p = ggplot(data=dataNEW,  aes(A,  B)) + geom_point(size=3, col=coldot, alpha=0.8) +
    labs(x = drug1, y = drug2) + ylim(c(min, 1.35)) +  xlim(c(min, 1.35)) +
    theme(panel.background = element_blank(),
          axis.text = element_text(size = 15),
          axis.title = element_text(size = rel(1.5)),
          axis.line.x = element_line(colour = "black", size = 0.5),
          axis.line.y = element_line(colour = "black", size = 0.5),
          panel.grid.major = element_blank(),
          panel.grid.minor = element_blank()) +
    geom_smooth(method=lm) +
    geom_text(x=1, y=min+0.1,
              label=paste0("Pearson-R = ",
                           round(cor(dataNEW$A, dataNEW$B ), 2)),
              size = 5)
  
  return(p)
}

5.3 T-PLL

Pearson correlation coefficients were calculated based on the mean drug response for the two lowest concentration steps in the main screen across T-PLL samples.

main.tpll = BloodCancerMultiOmics2017:::makeCorrHeatmap(
  mt=giveViabMatrix(diag="T-PLL", screen="main", chnnl="viaraw.4_5"),
  colsc=palette.cor1, concNo="one")

Clusters of drugs with high correlation and anti-correlation are shown by red and blue squares, respectively.

Inhibitors of redox signaling / reactive oxygen species (MIS-43, SD07, SD51) are clustering together. Otherwise, in T-PLL samples correlations are not well pronounced.

5.4 MCL

Pearson correlation coefficients were calculated based on the mean drug response for the two lowest concentration steps in the main screen across MCL samples.

main.mcl = BloodCancerMultiOmics2017:::makeCorrHeatmap(
  mt=giveViabMatrix(diag="MCL", screen="main", chnnl="viaraw.4_5"),
  colsc=palette.cor1, concNo="one")

Clusters of drugs with high correlation and anti-correlation are shown by red and blue squares, respectively.

The major clusters include: kinase inhibitors of the B cell receptor, incl. idelalisib (PI3K), ibrutinib (BTK), duvelisib (PI3K), PRT062607 (SYK); inhibitors of redox signaling / reactive oxygen species (MIS-43, SD07, SD51) and BH3 mimetics (navitoclax, venetoclax).

6 Disease-specific drug response phenotypes

Loading the data.

data(list=c("lpdAll", "conctab", "patmeta"))

Preprocessing of drug screen data.

#Select rows contain drug response data
lpdSub <- lpdAll[fData(lpdAll)$type == "viab",]

#Only use samples with complete values
lpdSub <- lpdSub[,complete.cases(t(Biobase::exprs(lpdSub)))]

#Transformation of the values
Biobase::exprs(lpdSub) <- log(Biobase::exprs(lpdSub))
Biobase::exprs(lpdSub) <- t(scale(t(Biobase::exprs(lpdSub))))

#annotation for drug ID
anno <- sprintf("%s(%s)",fData(lpdSub)$name,fData(lpdSub)$subtype)
names(anno) <- rownames(lpdSub)

Function to run t-SNE.

tsneRun <- function(distMat,perplexity=10,theta=0,max_iter=5000, seed = 1000) {
  set.seed(seed)
  tsneRes <- Rtsne(distMat, perplexity = perplexity, theta = theta, 
                   max_iter = max_iter, is_distance = TRUE, dims =2)
  tsneRes <- tsneRes$Y
  rownames(tsneRes) <- labels(distMat)
  colnames(tsneRes) <- c("x","y")
  tsneRes
}

Setting color scheme for the plot.

colDiagFill = c(`CLL` = "grey80",
    `U-CLL` = "grey80",
    `B-PLL`="grey80",
    `T-PLL`="#cc5352",
    `Sezary`="#cc5352",
    `PTCL-NOS`="#cc5352",
    `HCL`="#b29441",
    `HCL-V`="mediumaquamarine",
    `AML`="#addbaf",
    `MCL`="#8e65ca",
    `MZL`="#c95e9e",
    `FL`="darkorchid4",
    `LPL`="#6295cd",
    `hMNC`="pink")

colDiagBorder <- colDiagFill
colDiagBorder["U-CLL"] <- "black"

Sample annotation.

annoDiagNew <- function(patList, lpdObj = lpdSub) {
  Diagnosis <- pData(lpdObj)[patList,c("Diagnosis","IGHV Uppsala U/M")]
  DiagNew <- c()
  
  for (i in seq(1:nrow(Diagnosis))) {
   if (Diagnosis[i,1] == "CLL") {
      if (is.na(Diagnosis[i,2])) {
        DiagNew <- c(DiagNew,"CLL")
      } else if (Diagnosis[i,2] == "U") {
        DiagNew <- c(DiagNew,sprintf("%s-%s",Diagnosis[i,2],Diagnosis[i,1]))
      } else if (Diagnosis[i,2] == "M") {
        DiagNew <- c(DiagNew,"CLL")
      }
    } else DiagNew <- c(DiagNew,Diagnosis[i,1])
  }
  DiagNew
}

Calculate t-SNE and prepare data for plotting the result.

#prepare distance matrix
distLpd <- dist(t(Biobase::exprs(lpdSub)))

#run t-SNE
plotTab <- data.frame(tsneRun(distLpd,perplexity=25, max_iter=5000, seed=338))

#annotated patient sample
plotTab$Diagnosis <- pData(lpdSub[,rownames(plotTab)])$Diagnosis
plotTab$Diagnosis <- annoDiagNew(rownames(plotTab,lpdSub)) #consider IGHV status
plotTab$Diagnosis <- factor(plotTab$Diagnosis,levels = names(colDiagFill))

6.1 Example: dose-response curves

Here we show dose-response curve for selected drugs and patients.

First, change concentration index into real concentrations according to conctab.

lpdPlot <- lpdAll[fData(lpdAll)$type == "viab",]
concList <- c()
for (drugID in rownames(fData(lpdPlot))) {
  concIndex <- as.character(fData(lpdPlot)[drugID,"subtype"])
  concSplit <- unlist(strsplit(as.character(concIndex),":"))
  ID <- substr(drugID,1,5)
  if (length(concSplit) == 1) {
    realConc <- conctab[ID,as.integer(concSplit)]
    concList <- c(concList,realConc)
  } else {
    realConc <- sprintf("%s:%s",
                        conctab[ID,as.integer(concSplit[1])],
                        conctab[ID,as.integer(concSplit[2])])
    concList <- c(concList,realConc)
  }
}

fData(lpdPlot)$concValue <- concList
lpdPlot <- lpdPlot[,complete.cases(t(Biobase::exprs(lpdPlot)))]

Select drugs and samples.

patDiag <- c("CLL","T-PLL","HCL","MCL")
drugID <- c("D_012_5","D_017_4","D_039_3","D_040_5","D_081_4","D_083_5")

lpdBee <- lpdPlot[drugID,pData(lpdPlot)$Diagnosis %in% patDiag]

Prepare the data for plot

lpdCurve <-
  lpdPlot[fData(lpdPlot)$name %in% fData(lpdBee)$name,
          pData(lpdPlot)$Diagnosis %in% patDiag]
lpdCurve <- lpdCurve[fData(lpdCurve)$subtype %in% seq(1,5),]
dataCurve <- data.frame(Biobase::exprs(lpdCurve))
dataCurve <- cbind(dataCurve,fData(lpdCurve)[,c("name","concValue")])
tabCurve <- melt(dataCurve,
                 id.vars = c("name","concValue"), variable.name = "patID")
tabCurve$Diagnosis <- factor(pData(lpdCurve[,tabCurve$patID])$Diagnosis,
                             levels = patDiag)
tabCurve$value <- tabCurve$value
tabCurve$concValue <- as.numeric(tabCurve$concValue)

# set order
tabCurve$name <- factor(tabCurve$name, levels = fData(lpdBee)$name)

#calculate the mean and mse for each drug+cencentration in different disease
tabGroup <- group_by(tabCurve,name,concValue,Diagnosis)
tabSum <- summarise(tabGroup,meanViab = mean(value))

Finally, plot dose-response curve for each selected drug.

#FIG# 2 C
tconc = expression("Concentration [" * mu * "M]")
fmt_dcimals <- function(decimals=0){
   # return a function responpsible for formatting the 
   # axis labels with a given number of decimals 
   function(x) as.character(round(x,decimals))
}

for (drugName in unique(tabSum$name)) {
  tabDrug <- filter(tabSum, name == drugName)
  p <- (ggplot(data=tabDrug, aes(x=concValue,y=meanViab, col=Diagnosis)) +
          geom_line() + geom_point(pch=16, size=4) +
          scale_color_manual(values = colDiagFill[patDiag])
        + theme_classic() +
          theme(panel.border=element_blank(),
                axis.line.x=element_line(size=0.5,
                                         linetype="solid", colour="black"),
                axis.line.y = element_line(size = 0.5,
                                           linetype="solid", colour="black"),
                legend.position="none",
                plot.title = element_text(hjust = 0.5, size=20),
                axis.text = element_text(size=15),
                axis.title = element_text(size=20)) +
          ylab("Viability") + xlab(tconc) + ggtitle(drugName) +
          scale_x_log10(labels=fmt_dcimals(2)) +
          scale_y_continuous(limits = c(0,1.3), breaks = seq(0,1.3,0.2)))
  plot(p)
}

6.2 Example: drug effects as bee swarms

#FIG# 2 D
lpdDiag <- lpdAll[,pData(lpdAll)$Diagnosis %in% c("CLL", "MCL", "HCL", "T-PLL")]
dr <- c("D_012_5", "D_083_5", "D_081_3", "D_040_4", "D_039_3")

m <- data.frame(t(Biobase::exprs(lpdDiag)[dr, ]), diag=pData(lpdDiag)$Diagnosis)
m <- melt(m)

## Using diag as id variables

m$lable <- 1
for (i in 1:nrow(m )) {
  m[i, "lable"] <- giveDrugLabel(as.character(m[i, "variable"]), conctab, drugs)
} 

  
ggplot( m, aes(diag, value, color=factor(diag) ) ) +
  ylim(0,1.3) + ylab("Viability") + 
  xlab("") +
  geom_boxplot(outlier.shape = NA) +
  geom_beeswarm(cex=1.4, size=1.4,alpha=0.5, color="grey80") +
  scale_color_manual("diagnosis", values=c(colDiagFill["CLL"], colDiagFill["MCL"], 
                                           colDiagFill["HCL"], colDiagFill["T-PLL"])) +
  theme_bw() +
  theme(legend.position="right") +
  theme(
    panel.background =  element_blank(), 
    panel.grid.minor.x =  element_blank(),
    axis.text = element_text(size=15),
    axis.title = element_text(size=15),
    strip.text = element_text(size=15)
  ) +
  facet_wrap(~ lable, ncol=1)

7 Target profiling of AZD7762 and PF477736

Cell lysates of K562 cells were used. Binding affinity scores were determined proteome-wide using the kinobead assay (Bantscheff M, Eberhard D, Abraham Y, Bastuck S, Boesche M, Hobson S, Mathieson T, Perrin J, Raida M, Rau C, et al. Quantitative chemical proteomics reveals mechanisms of action of clinical ABL kinase inhibitors. Nat Biotechnol. 2007;25(9):1035-44.).

# AZD7762 binding affinity constants
azd = read_excel(system.file("extdata","TargetProfiling.xlsx",
                             package="BloodCancerMultiOmics2017"), sheet = 1)

# PF477736 binding affinity constants
pf = read_excel(system.file("extdata","TargetProfiling.xlsx",
                            package="BloodCancerMultiOmics2017"), sheet = 2)

# BCR tagets Proc Natl Acad Sci U S A. 2016 May 17;113(20):5688-93
pProt = read_excel(system.file("extdata","TargetProfiling.xlsx",
                               package="BloodCancerMultiOmics2017"),sheet = 3)

Join the results into one data frame.

p <- full_join(azd, pf )

## Joining, by = "gene"

p <- full_join(p, pProt )

## Joining, by = "gene"

pp <- p[p$BCR_effect=="Yes",]
pp <- data.frame(pp[-which(is.na(pp$BCR_effect)),])

#FIG# 2B
rownames(pp) <- 1:nrow(pp)
pp <- as.data.frame(pp)
pp <- melt(pp)

## Using gene, BCR_effect as id variables

colnames(pp)[3] <- "Drugs"
colnames(pp)[4] <- "Score"


ggplot(pp, aes(x= reorder(gene, Score), Score, colour=Drugs ) )+ geom_point(size=3) +
  
scale_colour_manual(values = c(makeTransparent("royalblue1", alpha = 0.75),
                               makeTransparent("royalblue4", alpha = 0.75), 
                               makeTransparent("brown1", alpha = 0.55),
                               makeTransparent("brown3", alpha = 0.35)),
                    
                    breaks = c("az10", "az2", "pf10", "pf2"),
                    labels = c("AZD7762 10 µM","AZD7762 2 µM","PF477736 10 µM","PF477736 2 µM") ) +
  
  ylab("Binding affinity") +
  
  theme_bw() + geom_hline(yintercept = 0.5) +
  
  theme(axis.text.x = element_text(angle = 90, hjust = 1),
        axis.title.x=element_blank() )

Lower scores indicate stronger physical binding. Here, the data are shown for those proteins that had a score <0.5 in at least one assay, and that were previously identified as responders to B-cell receptor stimulation with IgM in B-cell lines (Corso J, Pan KT, Walter R, Doebele C, Mohr S, Bohnenberger H, Strobel P, Lenz C, Slabicki M, Hullein J, et al. Elucidation of tonic and activated B-cell receptor signaling in Burkitt’s lymphoma provides insights into regulation of cell survival. Proc Natl Acad Sci U S A. 2016;113(20):5688-93).

The table below shows complete data of targets identified in kinobead assays for AZD7762 and PF477736 at 2 and 10 $\mu$M. A target score of <0.5 indicates a good target specificity. The column BCR indicates if the protein was identified as a B-cell receptor responsive protein after IgM stimulation in Burkitt lymphoma cell lines (Corso et al. 2016).

j <- apply(p[,c("az10", "az2", "pf10", "pf2")], 1, function (x) { min(x, na.rm=FALSE) } )

p <-  p[which(j<0.5), ]
p <-  unique(p, by = p$gene)

knitr::kable(p)

gene	az10	az2	pf10	pf2	BCR_effect
AAK1	0.25	0.39	0.23	0.23	No
ABL1	0.28	0.40	0.44	0.73	Yes
AURKA	0.89	0.92	0.48	0.65	NA
AURKB	0.50	0.52	0.43	0.52	No
AXL	0.45	0.68	0.50	0.68	NA
BCR	0.28	0.37	0.57	0.78	No
BMP2K	0.23	0.43	0.21	0.20	No
BRSK2	0.37	0.52	0.62	0.90	NA
CAMK2G	0.48	0.79	0.60	1.00	NA
CHEK1	0.28	0.26	0.29	0.26	Yes
CHEK1	0.28	0.26	0.29	0.26	No
CHEK2	0.17	0.19	0.24	0.43	No
CIT	0.56	0.80	0.40	0.44	Yes
CSK	0.19	0.32	0.66	0.93	No
EGFR	0.33	0.60	0.54	0.86	NA
EPHA7	0.28	0.54	0.77	0.93	NA
FER	0.24	0.35	0.88	1.07	No
FES	0.46	0.72	0.94	1.04	NA
FGFR2	0.87	1.02	0.44	0.65	NA
FLT4	0.48	0.52	0.55	0.61	NA
FRK	0.39	0.69	0.82	0.95	NA
FYN	0.28	0.47	0.46	0.79	No
GAK	0.23	0.31	0.21	0.25	No
HCK	0.28	0.39	0.52	0.69	Yes
IRAK4	0.25	0.40	0.56	0.88	No
KIAA0999	0.40	0.48	0.41	0.45	No
LIMK2	1.11	1.07	0.33	0.41	No
LOC100128443	0.94	0.95	0.36	0.21	No
LYN	0.33	0.53	0.40	0.58	Yes
MAP2K1	0.30	0.47	0.73	0.95	Yes
MAP2K2	0.22	0.43	0.63	0.84	Yes
MAP2K5	0.21	0.29	0.45	0.74	No
MAP3K11	0.43	0.46	0.63	0.83	No
MAP3K2	0.40	0.64	0.67	0.83	NA
MAP3K3	0.46	0.71	1.13	0.94	NA
MAP3K7	0.49	0.77	0.77	0.95	NA
MAP4K3	0.29	0.45	0.47	0.73	No
MAP4K4	0.25	0.29	0.58	0.87	No
MAP4K5	0.27	0.32	0.37	0.71	Yes
MAPK3	1.16	1.14	0.38	0.74	NA
MARK1	0.25	0.33	0.36	0.54	No
MARK2	0.23	0.35	0.39	0.54	Yes
MARK2	0.23	0.35	0.39	0.54	No
MARK3	0.26	0.30	0.37	0.53	No
MARK4	0.34	0.55	0.17	0.24	No
MERTK	0.43	0.44	0.71	0.90	No
MINK1	0.37	0.39	0.55	0.80	No
PAK4	0.22	0.27	0.30	0.33	No
PDPK1	0.20	0.30	0.38	0.58	No
PKN2	0.34	0.52	0.36	0.61	NA
PRKAA1	0.33	0.52	0.41	0.66	NA
PRKAA2	0.42	0.65	0.58	0.85	NA
PRKCD	0.28	0.47	0.68	0.95	Yes
PRKG1	0.37	0.71	0.68	0.96	NA
PRKX	0.25	0.25	0.10	0.32	No
PTK2B	0.40	0.68	0.96	1.07	NA
RPS6KA4	0.54	0.70	0.45	0.69	NA
SIK2	0.28	0.33	0.27	0.34	No
SLK	0.27	0.44	0.73	0.92	Yes
SRC	0.25	0.35	0.50	0.83	No
STK10	0.27	0.41	0.78	0.94	Yes
STK17A	0.35	0.76	0.35	0.41	No
STK3	0.28	0.37	0.65	0.87	No
STK33	0.26	0.43	0.77	1.02	No
STK4	0.20	0.21	0.65	0.94	No
SYK	0.30	0.46	0.70	0.92	Yes
TGFBR2	0.36	0.36	0.62	0.78	No
TNIK	0.37	0.41	0.41	0.70	No
TNK2	0.42	0.61	0.79	0.85	NA
YES1	0.38	0.64	0.38	0.75	NA
ACADM	0.40	0.39	1.00	1.03	No
FASN	0.23	0.24	1.27	0.99	Yes
PRDX2	0.44	0.49	1.25	1.07	No
SOD2	0.44	0.53	1.45	1.13	NA
CTSD	0.45	0.49	1.26	0.95	No
ANXA2	0.40	0.43	1.20	0.96	Yes
AP2A1	0.28	0.48	0.43	0.45	No
AP2A2	0.30	0.37	0.54	0.52	No
AP2B1	0.26	0.43	0.19	0.21	No
AP2M1	0.25	0.37	0.32	0.34	No
AP2S1	0.32	0.45	0.50	0.49	No
FDPS	0.42	0.45	1.19	1.02	No
GRB2	0.30	0.32	0.50	0.72	Yes
INCENP	0.24	0.35	0.41	0.47	No
MOBKL1B	1.03	0.98	0.45	0.71	NA
PKP1	0.31	1.36	0.72	0.48	No
PRKAB1	0.23	0.40	0.31	0.61	No
PRKAB2	0.18	0.37	0.32	0.58	No
PRKAG1	0.27	0.47	0.36	0.64	No
PRKAG2	0.32	0.51	0.43	0.61	NA
S100A7	0.28	0.29	1.56	1.02	No
S100A8	0.16	0.24	5.77	1.04	No
S100A9	0.24	0.42	2.20	0.84	No
SERPINB12	0.69	2.30	0.62	0.48	No
YJ005_HUMAN	0.36	0.46	0.48	0.50	No

8 Global overview of the drug response landscape

Load data

data("conctab", "lpdAll", "drugs", "patmeta")
lpdCLL <- lpdAll[, lpdAll$Diagnosis=="CLL"]
lpdAll = lpdAll[, lpdAll$Diagnosis!="hMNC"]

8.1 Setup

8.1.1 Additional functions

someMatch <- function(...) {
  rv <- match(...)
  if (all(is.na(rv)))
    stop(sprintf("`match` failed to match any of the following: %s",
                 paste(x[is.na(rv)], collapse=", ")))
  rv  
}

8.1.2 Preprocess `IGHV gene usage`

Find the largest categories, which we will represent by colours, and merge the others in a group other.

colnames(pData(lpdAll))

## [1] "Diagnosis"               "Gender"                 
## [3] "IGHV Uppsala gene usage" "IGHV Uppsala % SHM"     
## [5] "IGHV Uppsala U/M"        "ATPday0"                
## [7] "ATP48h"

gu    <- pData(lpdAll)$`IGHV Uppsala gene usage`
tabgu <- sort(table(gu), decreasing = TRUE)
biggestGroups <- names(tabgu)[1:5]
gu[ is.na(match(gu, biggestGroups)) & !is.na(gu) ] <- "other"
pData(lpdAll)$`IGHV gene usage` <- factor(gu, levels = c(biggestGroups, "other"))

8.1.3 Some drugs that we are particularly interested in

stopifnot(is.null(drugs$id))
drugs$id <- rownames(drugs)

targetedDrugNames <- c("ibrutinib", "idelalisib",  "PRT062607 HCl", 
   "duvelisib", "spebrutinib", "selumetinib", "MK-2206",  
   "everolimus", "encorafenib")
id1 <- safeMatch(targetedDrugNames, drugs$name)
targetedDrugs <- paste( rep(drugs[id1, "id"], each = 2), 4:5, sep="_" )

chemoDrugNames <- c("fludarabine", "doxorubicine",  "nutlin-3")
id2 <- safeMatch(chemoDrugNames, drugs$name)
chemoDrugs <- paste( rep(drugs[id2, "id"], each = 5), 3:5, sep="_" )

tzselDrugNames <- c("ibrutinib", "idelalisib", "duvelisib", "selumetinib", 
    "AZD7762", "MK-2206", "everolimus", "venetoclax", "thapsigargin", 
    "AT13387", "YM155", "encorafenib", "tamatinib", "ruxolitinib", 
    "PF 477736", "fludarabine", "nutlin-3")
id3 <- safeMatch(tzselDrugNames, drugs$name)
tzselDrugs <- unlist(lapply(seq(along = tzselDrugNames), function(i)
  paste(drugs[id3[i], "id"], 
      if (tzselDrugNames[i] %in% c("fludarabine", "nutlin-3")) 2:3 else 4:5, 
    sep = "_" )))

8.1.4 Feature weighting and score for dendrogram reordering

The weights are used for weighting the similarity metric used in heatmap clustering. weights$hclust affects the clustering of the patients. weights$score affects the dendrogram reordering of the drugs.

bcrDrugs <- c("ibrutinib", "idelalisib", "PRT062607 HCl", "spebrutinib")
everolID <- drugs$id[ safeMatch("everolimus",  drugs$name)]
bcrID    <- drugs$id[ safeMatch(bcrDrugs, drugs$name)]

is_BCR  <- 
  (fData(lpdAll)$id %in% bcrID)    & (fData(lpdAll)$subtype %in% paste(4:5))
is_mTOR <- 
  (fData(lpdAll)$id %in% everolID) & (fData(lpdAll)$subtype %in% paste(4:5))

myin <- function(x, y) as.numeric( (x %in% y) & !is.na(x) )

weights1 <- data.frame(
  hclust = rep(1, nrow(lpdAll)) + 1.75 * is_mTOR, 
  score  = as.numeric( is_BCR ), 
  row.names = rownames(lpdAll))

weights2 <- data.frame(
  row.names = tzselDrugs,
  hclust = myin(drugs$target_category[id3], "B-cell receptor") * 0.3 + 0.7,
  score  = rep(1, length(tzselDrugs)))

Remove drugs that failed quality control: NSC 74859, bortezomib.

badDrugs <- c(bortezomib = "D_008", `NSC 74859` = "D_025")
stopifnot(identical(drugs[ badDrugs, "name"], names(badDrugs)))

candDrugs <- rownames(lpdAll)[  
  fData(lpdAll)$type=="viab" & !(fData(lpdAll)$id %in% badDrugs) &
  fData(lpdAll)$subtype %in% paste(2:5)
]

Threshold parameters: drugs are accepted that for at least effectNum samples have a viability effect less than or equal to effectVal. On the other hand, the mean viability effect should not be below viab.

thresh <- list(effectVal = 0.7, effectNum = 4, viab = 0.6, maxval = 1.1)
overallMean  <- rowMeans(Biobase::exprs(lpdAll)[candDrugs, ])
nthStrongest <- apply(Biobase::exprs(lpdAll)[candDrugs, ], 1,
                      function(x) sort(x)[thresh$effectNum])

par(mfrow = c(1, 3))
hist(overallMean,  breaks = 30, col = "pink")
abline(v = thresh$viab,      col="blue")
hist(nthStrongest, breaks = 30, col = "pink")
abline(v = thresh$effectVal, col="blue")
plot(overallMean, nthStrongest)
abline(h = thresh$effectVal, v = thresh$viab, col = "blue")

seldrugs1 and d1: as in the version of Figure 3A we had in the first submission to JCI. d2: two concentrations for each drug in tzselDrugNames.

seldrugs1 <- candDrugs[ overallMean >= thresh$viab &
                        nthStrongest <= thresh$effectVal ] %>%
            union(targetedDrugs) %>% 
            union(chemoDrugs)

d1 <- Biobase::exprs(lpdAll[seldrugs1,, drop = FALSE ]) %>%
  deckel(lower = 0, upper = thresh$maxval)

d2 <- Biobase::exprs(lpdAll[tzselDrugs,, drop = FALSE ]) %>%
  deckel(lower = 0, upper = thresh$maxval)

We are going to scale the data. But was is the best measure of scale (or spread)? Let’s explore different measures of spread. We’ll see that it does not seem to matter too much which one we use. We’ll use median centering and scaling by mad.

spreads <- sapply(list(mad = mad, `Q95-Q05` = function(x)
  diff(quantile(x, probs = c(0.05, 0.95)))), function(s) apply(d1, 1, s))
plot( spreads )

jj <- names(which( spreads[, "mad"] < 0.15 & spreads[, "Q95-Q05"] > 0.7))
jj

## [1] "D_041_2"

drugs[ stripConc(jj), "name" ]

## [1] "BAY 11-7085"

medianCenter_MadScale <- function(x) {
  s <- median(x)
  (x - s) / deckel(mad(x, center = s), lower = 0.05, upper = 0.2)
}

scaleDrugResp  <- function(x) t(apply(x, 1, medianCenter_MadScale)) 

scd1 <- scaleDrugResp(d1)
scd2 <- scaleDrugResp(d2)

8.2 Define disease groups

sort(table(lpdAll$Diagnosis), decreasing = TRUE)

## 
##      CLL    T-PLL      MCL      MZL      AML      LPL    B-PLL      HCL 
##      184       25       10        6        5        4        3        3 
##    HCL-V   Sezary       FL PTCL-NOS 
##        2        2        1        1

diseaseGroups <- list(
  `CLL` = c("CLL"), 
  `MCL` = c("MCL"),
  `HCL` = c("HCL", "HCL-V"),
  `other B-cell` = c("B-PLL", "MZL", "LPL", "FL"), 
  `T-cell` = c("T-PLL", "Sezary", "PTCL-NOS"),
  `myeloid` = c("AML"))
stopifnot(setequal(unlist(diseaseGroups), unique(lpdAll$Diagnosis)))

fdg <- factor(rep(NA, ncol(lpdAll)), levels = names(diseaseGroups))
for (i in names(diseaseGroups))
  fdg[ lpdAll$Diagnosis %in% diseaseGroups[[i]] ] <- i
lpdAll$`Disease Group` <- fdg

8.3 Code for heatmap

8.3.1 Matrix row / column clustering

The helper function matClust clusters a matrix x, whose columns represent samples and whose rows represent drugs. Its arguments control how the columns are clustered:

weights: a data.frame with a weight for each row of x. The weights are used in the computation of distances between columns and thus for column sorting. The data.frame’s column hclust contains the weights for hclust(dist()). The column score contains the weights for computing the score used for dendrogram reordering. See weights1 and weights2 defined above.
colgroups: a factor by which to first split the columns before clustering
reorderrows: logical. FALSE for previous behaviour (old Fig. 3A), TRUE for reordering the row dendrogram, too.

matClust <- function(x, 
                     rowweights,
                     colgroups   = factor(rep("all", ncol(x))),
                     reorderrows = FALSE) {
 
  stopifnot(is.data.frame(rowweights),
            c("hclust", "score") %in% colnames(rowweights),
            !is.null(rownames(rowweights)),
            !is.null(rownames(x)), 
            all(rownames(x) %in% rownames(rowweights)),
            is.factor(colgroups), 
            !any(is.na(colgroups)), 
            length(colgroups) == ncol(x))

  wgt <- rowweights[ rownames(x), ]
  
  columnsClust <- function(xk) {
    score   <- -svd(xk * wgt[, "score"])$v[, 1]
    cmns    <- colSums(xk * wgt[, "score"])
    ## make sure that high score = high sensitivity  
    if (cor(score, cmns) > 0) score <- (-score)
    
    ddraw  <- as.dendrogram(hclust(dist(t(xk * wgt[, "hclust"]), 
                                        method = "euclidean"), 
                                   method = "complete"))
    dd  <- reorder(ddraw, wts = -score, agglo.FUN = mean)
    ord <- order.dendrogram(dd)
    list(dd = dd, ord = ord, score = score)
  }

  sp <- split(seq(along = colgroups), colgroups)
  cc <- lapply(sp, function(k) columnsClust(x[, k, drop=FALSE]))
  cidx <- unlist(lapply(seq(along = cc), function (i)
    sp[[i]][ cc[[i]]$ord ]))
  csc  <- unlist(lapply(seq(along = cc), function (i)
    cc[[i]]$score[ cc[[i]]$ord ]))

  rddraw <- as.dendrogram(hclust(dist(x, method = "euclidean"),
                                   method = "complete"))
  ridx  <- if (reorderrows) {
    ww <- (colgroups == "CLL")
    stopifnot(!any(is.na(ww)), any(ww))
    rowscore <- svd(t(x) * ww)$v[, 1]
    dd <- reorder(rddraw, wts = rowscore, agglo.FUN = mean)
    order.dendrogram(dd)
  } else {
    rev(order.dendrogram(dendsort(rddraw)))
  }

  res <- x[ridx, cidx]
  
  stopifnot(identical(dim(res), dim(x)))
  attr(res, "colgap") <- cumsum(sapply(cc, function(x) length(x$score)))
  res
}

8.3.2 Prepare sample annotations

I.e. the right hand side color bar. IGHV Uppsala U/M is implied by IGHV Uppsala % SHM (see sensi2.Rmd). cut(..., right=FALSE) will use intervals that are closed on the left and open on the right.

translation = list(IGHV=c(U=0, M=1),
                   Methylation_Cluster=c(`LP-CLL`=0, `IP-CLL`=1, `HP-CLL`=2))

make_pd <- function(cn, ...) {
  df <- function(...) data.frame(..., check.names = FALSE)
  
  x <- lpdAll[, cn]
  pd <- df(
    t(Biobase::exprs(x)[    c("del17p13", "TP53", "trisomy12"), , drop = FALSE]) %>% 
      `colnames<-`(c("del 17p13", "TP53", "trisomy 12")))
  
  # pd <- df(pd,
  #    t(Biobase::exprs(x)[  c("SF3B1", "del11q22.3", "del13q14_any"),, drop = FALSE]) %>% 
  #    `colnames<-`(c("SF3B1", "del11q22.3", "del13q14"))) 
  
  pd <- df(pd,
      cbind(as.integer(Biobase::exprs(x)["KRAS",] | Biobase::exprs(x)["NRAS",])) %>% 
      `colnames<-`("KRAS | NRAS")) 
  
  pd <- df(pd,
    # IGHV = Biobase::exprs(x)["IGHV Uppsala U/M", ],
    `IGHV (%)` = cut(x[["IGHV Uppsala % SHM"]],
                     breaks = c(0, seq(92, 100, by = 2), Inf), right = FALSE),
    `Meth. Cluster` = names(translation$Methylation_Cluster)[
          someMatch(paste(Biobase::exprs(x)["Methylation_Cluster", ]),
                    translation$Methylation_Cluster)],
    `Gene usage` = x$`IGHV gene usage`)
  
  if(length(unique(x$Diagnosis)) > 1)  
    pd <- df(pd, Diagnosis = x$Diagnosis)
  
  pd <- df(pd, 
    pretreated  = ifelse(patmeta[colnames(x),"IC50beforeTreatment"],"no","yes"), 
    alive       = ifelse(patmeta[colnames(x),"died"]>0, "no", "yes"),
    sex         = factor(x$Gender))
  
  rownames(pd) <- colnames(Biobase::exprs(x))
  
  for (i in setdiff(colnames(pd), "BCR score")) {
    if (!is.factor(pd[[i]]))
      pd[[i]] <- factor(pd[[i]])
    if (any(is.na(pd[[i]]))) {
      levels(pd[[i]]) <- c(levels(pd[[i]]), "n.d.")
      pd[[i]][ is.na(pd[[i]]) ] <- "n.d."   
    }
  }
  
  pd
}

8.3.3 Define the annotation colors

gucol <- rev(brewer.pal(nlevels(lpdAll$`IGHV gene usage`), "Set3")) %>% 
             `names<-`(sort(levels(lpdAll$`IGHV gene usage`)))
gucol["IGHV3-21"] <- "#E41A1C"

make_ann_colors <- function(pd) {
  bw <- c(`TRUE` = "darkblue", `FALSE` = "#ffffff")
  res <- list(
  Btk = bw, Syk = bw, PI3K = bw, MEK = bw)
  
  if ("exptbatch" %in% colnames(pd))
    res$exptbatch <- brewer.pal(nlevels(pd$exptbatch), "Set2") %>%
    `names<-`(levels(pd$exptbatch))
  
  if ("IGHV (%)" %in% colnames(pd))
    res$`IGHV (%)` <- 
       c(rev(colorRampPalette(
         brewer.pal(9, "Blues"))(nlevels(pd$`IGHV (%)`)-1)), "white") %>% 
                        `names<-`(levels(pd$`IGHV (%)`))
  
  if ("CD38" %in% colnames(pd))
    res$CD38 <- colorRampPalette(
      c("blue", "yellow"))(nlevels(pd$CD38)) %>% `names<-`(levels(pd$CD38))
  
  if("Gene usage" %in% colnames(pd)) 
    res$`Gene usage` <- gucol 

  if("Meth. Cluster" %in% colnames(pd)) 
    res$`Meth. Cluster` <- brewer.pal(9, "Blues")[c(1, 5, 9)] %>% 
        `names<-`(names(translation$Methylation_Cluster))
  
  res <- c(res, BloodCancerMultiOmics2017:::sampleColors)   # from addons.R  
  
  if("Diagnosis" %in% colnames(pd)) 
    res$Diagnosis <- BloodCancerMultiOmics2017:::colDiagS[
      names(BloodCancerMultiOmics2017:::colDiagS) %in% levels(pd$Diagnosis) ] %>% 
         (function(x) x[order(names(x))])
  
  for(i in names(res)) {
    whnd <- which(names(res[[i]]) == "n.d.")
    if(length(whnd)==1)
      res[[i]][whnd] <- "#e0e0e0" else 
        res[[i]] <- c(res[[i]], `n.d.` = "#e0e0e0")
    stopifnot(all(pd[[i]] %in% names(res[[i]])))
  }
  
  res
}

8.3.4 Heatmap drawing function

theatmap <- function(x, cellwidth = 7, cellheight = 11) {
  
  stopifnot(is.matrix(x))
  patDat <- make_pd(colnames(x))
  
  bpp <- brewer.pal(9, "Set1")
  breaks <- 2.3 * c(seq(-1, 1, length.out = 101)) %>% `names<-`(
      colorRampPalette(c(rev(brewer.pal(7, "YlOrRd")), 
                         "white", "white", "white", 
                         brewer.pal(7, "Blues")))(101))

  if (!is.null(attr(x, "colgap")))
    stopifnot(last(attr(x, "colgap")) == ncol(x))

  pheatmapwh(deckel(x, lower = first(breaks), upper = last(breaks)), 
         cluster_rows = FALSE,  
         cluster_cols = FALSE,
         gaps_col = attr(x, "colgap"),
         gaps_row = attr(x, "rowgap"),
         scale = "none",
         annotation_col = patDat,
         annotation_colors = make_ann_colors(patDat),
         color = names(breaks),
         breaks = breaks, 
         show_rownames = TRUE, 
         show_colnames = !TRUE, 
         cellwidth = cellwidth, cellheight = cellheight, 
         fontsize = 10, fontsize_row = 11, fontsize_col = 8,
         annotation_legend = TRUE, drop_levels = TRUE)
  }

8.4 Draw the heatmaps

Things we see in the plot:

separation of U-CLL and M-CLL within CLL
BCR-targeting drugs a cluster at the top
everolimus stands out in M-CLL with a separate sensitivity pattern
encorafenib clusters together and pops out in HCL

clscd1/2: clustered and scaled drug matrix

clscd1 <- matClust(scd1, rowweights = weights1, 
                  colgroups = lpdAll$`Disease Group`)
clscd2 <- matClust(scd2, rowweights = weights2, 
                  colgroups = lpdAll$`Disease Group`, reorderrows = TRUE)

8.4.1 Identify places where gaps between the rows should be

setGapPositions <- function(x, gapAt) {
  rg <- if (missing(gapAt)) c(0) else {
    s <- strsplit(gapAt, split = "--")
    stopifnot(all(listLen(s) == 2L))
    s <- strsplit(unlist(s), split = ":")
    spname <- drugs[safeMatch(sapply(s, `[`, 1), drugs$name), "id"]
    spconc <- as.numeric(sapply(s, `[`, 2))
    spi <- mapply(function(d, cc) {
      i <- which(cc == conctab[d, ])
      stopifnot(length(i) == 1)
      i
     }, spname, spconc)
    spdrug <- paste(spname, spi, sep = "_")
    mt <- safeMatch(spdrug, rownames(x))
    igp <- seq(1, length(mt), by = 2L)
    stopifnot(all( mt[igp] - mt[igp + 1] == 1))
    #stopifnot(all( mt[igp] - mt[igp + 1] == 1))
    c(mt[igp + 1], 0)
  }
  attr(x, "rowgap") <- rg
  x
}

clscd1 %<>% setGapPositions(gapAt = c(
  "PF 477736:10--idelalisib:10",
  "spebrutinib:2.5--PF 477736:2.5",
  "PRT062607 HCl:10--selumetinib:2.5",
  "selumetinib:10--MK-2206:2.5",
  "MK-2206:0.156--tipifarnib:10",
  "AT13387:0.039--encorafenib:10",
  "encorafenib:2.5--SD07:1.111",
  "doxorubicine:0.016--encorafenib:0.625",
  "encorafenib:0.156--rotenone:2.5",
  "SCH 900776:0.625--everolimus:0.625",
  "everolimus:10--afatinib:1.667",
  "arsenic trioxide:1--thapsigargin:5",
  "thapsigargin:0.313--fludarabine:0.156"
))

clscd2 %<>% setGapPositions(gapAt = c(
  "AT13387:0.039--everolimus:0.156",
  "everolimus:0.625--nutlin-3:10", 
  "fludarabine:10--thapsigargin:0.078",
  "thapsigargin:0.313--encorafenib:0.625", 
  "encorafenib:0.156--ruxolitinib:0.156"
))

#FIG# S8
rownames(clscd1) <- with(fData(lpdAll)[ rownames(clscd1),, drop = FALSE], 
  paste0(drugs[id, "name"], " ", conctab[cbind(id, paste0("c", subtype))], "uM")) 
rownames(clscd1)

##   [1] "vorinostat 0.313uM"    "BAY 11-7085 10uM"      "fludarabine 0.156uM"  
##   [4] "thapsigargin 0.313uM"  "thapsigargin 1.25uM"   "thapsigargin 5uM"     
##   [7] "arsenic trioxide 1uM"  "nutlin-3 0.156uM"      "chaetoglobosin A 5uM" 
##  [10] "nutlin-3 0.625uM"      "nutlin-3 2.5uM"        "vorinostat 1.25uM"    
##  [13] "fludarabine 2.5uM"     "fludarabine 0.625uM"   "tofacitinib 10uM"     
##  [16] "rigosertib 10uM"       "KX2-391 0.625uM"       "KX2-391 2.5uM"        
##  [19] "afatinib 1.667uM"      "everolimus 10uM"       "everolimus 0.156uM"   
##  [22] "everolimus 0.625uM"    "SCH 900776 0.625uM"    "SCH 900776 2.5uM"     
##  [25] "silmitasertib 10uM"    "chaetocin 0.031uM"     "navitoclax 0.004uM"   
##  [28] "orlistat 2.5uM"        "orlistat 10uM"         "venetoclax 0.004uM"   
##  [31] "navitoclax 0.016uM"    "thapsigargin 0.078uM"  "rotenone 0.156uM"     
##  [34] "rotenone 0.625uM"      "rotenone 2.5uM"        "encorafenib 0.156uM"  
##  [37] "encorafenib 0.625uM"   "doxorubicine 0.016uM"  "doxorubicine 0.25uM"  
##  [40] "doxorubicine 0.063uM"  "YM155 0.008uM"         "SD51 0.37uM"          
##  [43] "SD51 1.111uM"          "MIS-43 0.37uM"         "MIS-43 1.111uM"       
##  [46] "SD07 1.111uM"          "encorafenib 2.5uM"     "encorafenib 10uM"     
##  [49] "AT13387 0.039uM"       "tipifarnib 2.5uM"      "tipifarnib 10uM"      
##  [52] "MK-2206 0.156uM"       "MK-2206 0.625uM"       "MK-2206 2.5uM"        
##  [55] "selumetinib 10uM"      "selumetinib 0.156uM"   "selumetinib 0.625uM"  
##  [58] "selumetinib 2.5uM"     "PRT062607 HCl 10uM"    "BIX02188 10uM"        
##  [61] "rabusertib 10uM"       "enzastaurin 10uM"      "VE-821 10uM"          
##  [64] "sotrastaurin 2.5uM"    "sotrastaurin 10uM"     "saracatinib 2.5uM"    
##  [67] "saracatinib 10uM"      "MK-2206 10uM"          "TAE684 2.5uM"         
##  [70] "sunitinib 10uM"        "CCT241533 2.5uM"       "SGI-1776 2.5uM"       
##  [73] "SGX-523 10uM"          "KU-60019 10uM"         "NU7441 10uM"          
##  [76] "ibrutinib 10uM"        "KU-60019 2.5uM"        "NU7441 2.5uM"         
##  [79] "PF 477736 0.625uM"     "PF 477736 2.5uM"       "spebrutinib 2.5uM"    
##  [82] "spebrutinib 10uM"      "PRT062607 HCl 0.156uM" "spebrutinib 0.156uM"  
##  [85] "spebrutinib 0.625uM"   "MK-1775 2.5uM"         "idelalisib 0.156uM"   
##  [88] "idelalisib 0.625uM"    "idelalisib 2.5uM"      "ibrutinib 0.156uM"    
##  [91] "ibrutinib 0.625uM"     "ibrutinib 2.5uM"       "tamatinib 2.5uM"      
##  [94] "tamatinib 10uM"        "MK-1775 10uM"          "AZD7762 0.156uM"      
##  [97] "AZD7762 0.625uM"       "BX912 2.5uM"           "dasatinib 0.156uM"    
## [100] "PRT062607 HCl 0.625uM" "PRT062607 HCl 2.5uM"   "duvelisib 0.156uM"    
## [103] "duvelisib 0.625uM"     "duvelisib 2.5uM"       "duvelisib 10uM"       
## [106] "idelalisib 10uM"       "PF 477736 10uM"        "AZD7762 2.5uM"        
## [109] "AZD7762 10uM"          "BX912 10uM"            "dasatinib 10uM"       
## [112] "dasatinib 0.625uM"     "dasatinib 2.5uM"       "AT13387 0.156uM"      
## [115] "AT13387 0.625uM"       "AT13387 2.5uM"

theatmap(clscd1)

#FIG# 3A
rownames(clscd2) <- with(fData(lpdAll)[ rownames(clscd2),, drop = FALSE], 
  paste0(drugs[id, "name"], " ", conctab[cbind(id, paste0("c", subtype))], "uM")) 

rownames(clscd2)

##  [1] "venetoclax 0.016uM"   "venetoclax 0.004uM"   "YM155 0.031uM"       
##  [4] "YM155 0.008uM"        "ruxolitinib 0.625uM"  "ruxolitinib 0.156uM" 
##  [7] "encorafenib 0.156uM"  "encorafenib 0.625uM"  "thapsigargin 0.313uM"
## [10] "thapsigargin 0.078uM" "fludarabine 10uM"     "fludarabine 2.5uM"   
## [13] "nutlin-3 2.5uM"       "nutlin-3 10uM"        "everolimus 0.625uM"  
## [16] "everolimus 0.156uM"   "AT13387 0.039uM"      "AT13387 0.156uM"     
## [19] "PF 477736 0.156uM"    "PF 477736 0.625uM"    "MK-2206 0.156uM"     
## [22] "MK-2206 0.625uM"      "selumetinib 0.156uM"  "selumetinib 0.625uM" 
## [25] "AZD7762 0.625uM"      "AZD7762 0.156uM"      "ibrutinib 0.156uM"   
## [28] "ibrutinib 0.625uM"    "duvelisib 0.156uM"    "duvelisib 0.625uM"   
## [31] "idelalisib 0.625uM"   "idelalisib 0.156uM"   "tamatinib 0.156uM"   
## [34] "tamatinib 0.625uM"

theatmap(clscd2)

options(stringsAsFactors=TRUE)

9 Relative drug effects - ternary diagrams

Ternary diagrams are a good visualisation tool to compare the relative drug effects of three selected drugs. Here we call the drugs by their targets (ibrutinib = BTK, idelalisib = PI3K, PRT062607 HCl = SYK, everolimus = mTOR and selumetinib = MEK). We compare the drug effects within CLL samples as well as U-CLL and M-CLL separatelly.

Load the data.

data("conctab", "lpdAll", "drugs", "patmeta")

Select CLL patients.

lpdCLL <- lpdAll[, lpdAll$Diagnosis=="CLL"]

9.1 Additional settings

Function that set the point transparency.

makeTransparent = function(..., alpha=0.18) {
  
  if(alpha<0 | alpha>1) stop("alpha must be between 0 and 1")
  
  alpha = floor(255*alpha)  
  newColor = col2rgb(col=unlist(list(...)), alpha=FALSE)
  
  .makeTransparent = function(col, alpha) {
    rgb(red=col[1], green=col[2], blue=col[3], alpha=alpha, maxColorValue=255)
  }
  newColor = apply(newColor, 2, .makeTransparent, alpha=alpha)
  
  return(newColor)
}

giveColors = function(idx, alpha=1) {
  bp = brewer.pal(12, "Paired")
  makeTransparent(
    sequential_hcl(12, h = coords(as(hex2RGB(bp[idx]), "polarLUV"))[1, "H"])[1],
    alpha=alpha)
}

9.2 Calculating the coordinates

# calculate  (x+c)/(s+3c), (y+c)/(s+3c), (z+c)/(s+3c)
prepareTernaryData = function(lpd, targets, invDrugs) {

  # calculate values for ternary
  df = sapply(targets, function(tg) {
    dr = paste(invDrugs[tg], c(4,5), sep="_")
    tmp = 1-Biobase::exprs(lpd)[dr,]
    tmp = colMeans(tmp)
    pmax(tmp, 0)
  })
  df = data.frame(df, sum=rowSums(df), max=rowMax(df))
  
  tern = apply(df[,targets], 2, function(x) {
    (x+0.005) / (df$sum+3*0.005)
  })
  colnames(tern) = paste0("tern", 1:3)
  # add IGHV status
  cbind(df, tern, IGHV=patmeta[rownames(df),"IGHV"],
        treatNaive=patmeta[rownames(df),"IC50beforeTreatment"])
}

9.3 Plot ternaries

makeTernaryPlot = function(td=ternData, targets, invDrugs) {
  
  drn = setNames(drugs[invDrugs[targets],"name"], nm=targets)
  
  plot = ggtern(data=td, aes(x=tern1, y=tern2, z=tern3)) +
        #countours
        stat_density_tern(geom='polygon', aes(fill=..level..),
                          position = "identity", contour=TRUE, n=400,
                          weight = 1, base = 'identity', expand = c(1.5, 1.5)) +  
        scale_fill_gradient(low='lightblue', high='red', guide = FALSE) +
     
        #points
        geom_mask() + 
        geom_point(size=35*td[,"max"],
                   fill=ifelse(td[,"treatNaive"],"green","yellow"),
                   color="black", shape=21) +
  
        #themes
        theme_rgbw( ) + 
        theme_custom(
          col.T=giveColors(2),
          col.L=giveColors(10),
          col.R=giveColors(4),
          tern.plot.background="white", base_size = 18 ) +
    
    labs( x = targets[1], xarrow = drn[targets[1]],
              y = targets[2], yarrow = drn[targets[2]],
              z = targets[3], zarrow = drn[targets[3]] ) +
              theme_showarrows() + theme_clockwise() +
    
        # lines 
        geom_Tline(Tintercept=.5, colour=giveColors(2)) +
        geom_Lline(Lintercept=.5, colour=giveColors(10)) +
        geom_Rline(Rintercept=.5, colour=giveColors(4)) 

   plot
}

# RUN TERNARY
makeTernary = function(lpd, targets, ighv=NA) {

  # list of investigated drugs and their targets
  invDrugs = c("PI3K"="D_003", "BTK"="D_002", "SYK"="D_166",
               "MTOR"="D_063", "MEK"="D_012")
  
  ternData = prepareTernaryData(lpd, targets, invDrugs)
  if(!is.na(ighv)) ternData = ternData[which(ternData$IGHV==ighv),]
  
  print(table(ternData$treatNaive))
  ternPlot = makeTernaryPlot(ternData, targets, invDrugs)
  
  ternPlot
}

9.3.1 BCR drugs

#FIG# 3B
makeTernary(lpdCLL, c("PI3K", "BTK", "SYK"), ighv=NA)

## 
## FALSE  TRUE 
##    52   131

#FIG# 3B
makeTernary(lpdCLL, c("PI3K", "BTK", "SYK"), ighv="M")

## 
## FALSE  TRUE 
##    14    83

#FIG# 3B
makeTernary(lpdCLL, c("PI3K", "BTK", "SYK"), ighv="U")

## 
## FALSE  TRUE 
##    33    41

9.3.2 BTK & MEK & MTOR

#FIG# 3BC
makeTernary(lpdCLL, c("MTOR", "BTK", "MEK"), ighv=NA)

## 
## FALSE  TRUE 
##    52   131

#FIG# 3BC
makeTernary(lpdCLL, c("MTOR", "BTK", "MEK"), ighv="M")

## 
## FALSE  TRUE 
##    14    83

#FIG# 3BC
makeTernary(lpdCLL, c("MTOR", "BTK", "MEK"), ighv="U")

## 
## FALSE  TRUE 
##    33    41

9.3.3 PI3K & MEK & MTOR

All CLL samples included.

#FIG# S9 left
makeTernary(lpdCLL, c("MTOR", "PI3K", "MEK"), ighv=NA)

## 
## FALSE  TRUE 
##    52   131

9.3.4 SYK & MEK & MTOR

All CLL samples included.

#FIG# S9 right
makeTernary(lpdCLL, c("MTOR", "SYK", "MEK"), ighv=NA)

## 
## FALSE  TRUE 
##    52   131

10 Comparison of gene expression responses to drug treatments

12 CLL samples (6 M-CLL and 6 U-CLL) were treated with ibrutinb, idelalisib, selumetinib, everolimus and negative control. Gene expression profiling was performed after 12 hours of drug incubation using Illumina microarrays.

Load the data.

data("exprTreat", "drugs")

Do some cosmetics.

e <- exprTreat
colnames( pData(e) ) <- sub( "PatientID", "Patient", colnames( pData(e) ) )
colnames( pData(e) ) <- sub( "DrugID", "Drug", colnames( pData(e) ) )
pData(e)$Drug[ is.na(pData(e)$Drug) ] <- "none"
pData(e)$Drug <- relevel( factor( pData(e)$Drug ), "none" )
pData(e)$SampleID <- colnames(e)
colnames(e) <- paste( pData(e)$Patient, pData(e)$Drug, sep=":" )

head( pData(e) )

##            Patient  Drug   Sentrix_ID Sentrix_Position IGHV trisomy12
## H112:none     H112  none 200128470091                A    M         0
## H112:D_002    H112 D_002 200128470091                B    M         0
## H112:D_003    H112 D_003 200128470091                C    M         0
## H112:D_012    H112 D_012 200128470091                D    M         0
## H112:D_063    H112 D_063 200128470091                E    M         0
## H112:D_049    H112 D_049 200128470091                F    M         0
##            del13q14       SampleID
## H112:none         0 200128470091_A
## H112:D_002        0 200128470091_B
## H112:D_003        0 200128470091_C
## H112:D_012        0 200128470091_D
## H112:D_063        0 200128470091_E
## H112:D_049        0 200128470091_F

Remove uninteresting fData columns

fData(e) <- fData(e)[ , c( "ProbeID", "Entrez_Gene_ID", "Symbol", 
    "Cytoband", "Definition" ) ]

Here is a simple heat map of correlation between arrays.

pheatmap( cor(Biobase::exprs(e)), symm=TRUE, cluster_rows = FALSE, cluster_cols = FALSE, 
   color = colorRampPalette(c("gray10","lightpink"))(100) )

10.1 Differential expression using Limma

Construct a model matrix with a baseline expression per patient, treatment effects for all drugs, and symmetric (+/- 1/2) effects for U-vs-M differences in drug effects.

mm <- model.matrix( ~ 0 + Patient + Drug, pData(e) )
colnames(mm) <- sub( "Patient", "", colnames(mm) )
colnames(mm) <- sub( "Drug", "", colnames(mm) )
head(mm)

##            H094 H108 H109 H112 H114 H167 H169 H173 H194 H233 H234 H238 D_002
## H112:none     0    0    0    1    0    0    0    0    0    0    0    0     0
## H112:D_002    0    0    0    1    0    0    0    0    0    0    0    0     1
## H112:D_003    0    0    0    1    0    0    0    0    0    0    0    0     0
## H112:D_012    0    0    0    1    0    0    0    0    0    0    0    0     0
## H112:D_063    0    0    0    1    0    0    0    0    0    0    0    0     0
## H112:D_049    0    0    0    1    0    0    0    0    0    0    0    0     0
##            D_003 D_012 D_049 D_063
## H112:none      0     0     0     0
## H112:D_002     0     0     0     0
## H112:D_003     1     0     0     0
## H112:D_012     0     1     0     0
## H112:D_063     0     0     0     1
## H112:D_049     0     0     1     0

Run LIMMA on this.

fit <- lmFit( e, mm )
fit <- eBayes( fit )

How many genes do we get that are significantly differentially expressed due to a drug, at 10% FDR?

a <- decideTests( fit, p.value = 0.1 )
t( apply( a[ , grepl( "D_...", colnames(a) ) ], 2, 
          function(x) table( factor(x,levels=c(-1,0,1)) ) ) )

##        -1     0   1
## D_002 244 47702 161
## D_003 196 47748 163
## D_012 271 47530 306
## D_049 374 47089 644
## D_063 423 47309 375

What is the % overlap of genes across drugs?

a <-
sapply( levels(pData(e)$Drug)[-1], function(dr1) 
   sapply( levels(pData(e)$Drug)[-1], function(dr2) 
     
   100*( length( intersect(
          unique( topTable( fit, coef=dr1, p.value=0.1,
                            number=Inf )$`Entrez_Gene_ID` ),
          unique( topTable( fit, coef=dr2, p.value=0.1,
                            number=Inf )$`Entrez_Gene_ID` ) ) ) / 
                      length(unique( topTable( fit, coef=dr1, p.value=0.1,
                                               number=Inf )$`Entrez_Gene_ID`)))
     ) 
   )

rownames(a) <-drugs[ rownames(a), "name" ]
colnames(a) <-rownames(a)
a <- a[-4, -4]
a

##             ibrutinib idelalisib selumetinib everolimus
## ibrutinib   100.00000   50.60606    15.74953   11.76471
## idelalisib   46.51811  100.00000    19.35484   14.70588
## selumetinib  23.11978   30.90909   100.00000   20.16807
## everolimus   23.39833   31.81818    27.32448  100.00000

Correlate top 2000 genes with median largest lfc with each other:

For each patient and drug, compute the LFC (log fold changed) treated/untreated
Select the 2000 genes that have the highest across all patients and drugs (median absolute LFC)
Compute the average LFC for each drug across the patients, resulting in 4 vectors of length 2000 (one for each drug)
Compute the pairwise correlation between them

extractGenes = function(fit, coef) {
  tmp = topTable(fit, coef=coef, number=Inf )[, c("ProbeID", "logFC")]
  rownames(tmp) = tmp$ProbeID
  colnames(tmp)[2] = drugs[coef,"name"]
  tmp[order(rownames(tmp)),2, drop=FALSE]
}

runExtractGenes = function(fit, drs) {
  tmp = do.call(cbind, lapply(drs, function(dr) {
    extractGenes(fit, dr)
  }))
  as.matrix(tmp)
}

mt = runExtractGenes(fit, drs=c("D_002","D_003","D_012","D_063"))

mt <- cbind( mt, median=rowMedians(mt))
mt <- mt[order(mt[,"median"]), ]
mt <- mt[1:2000, ]
mt <- mt[,-5]

(mtcr = cor(mt))

##             ibrutinib idelalisib selumetinib everolimus
## ibrutinib   1.0000000  0.7007428   0.3171206  0.2265553
## idelalisib  0.7007428  1.0000000   0.5211315  0.4087302
## selumetinib 0.3171206  0.5211315   1.0000000  0.3889721
## everolimus  0.2265553  0.4087302   0.3889721  1.0000000

#FIG# 3D
pheatmap(mtcr, cluster_rows = FALSE, cluster_cols = FALSE,
         col=colorRampPalette(c("white", "lightblue","darkblue") )(100))

11 Co-sensitivity patterns of the four response groups

Load data.

data("lpdAll", "drugs")

lpdCLL <- lpdAll[ , lpdAll$Diagnosis== "CLL"]

11.1 Methodology of building groups

Here we look at the distribution of viabilities for the three drugs concerned and use the mirror method to derive, first, a measure of the background variation of the values for these drugs (ssd) and then define a cutoff as multiple (z_factor) of that. The mirror method assumes that the observed values are a mixture of two components:

a null distribution, which is symmetric about 1, and
responder distribution, which has negligible mass above 1.

The choice of z_factor is, of course, a crucial step. It determines the trade-off between falsely called responders (false positives) versus falsely called non-responders (false negatives). Under normality assumption, it is related to the false positive rate (FPR) by

\[ \text{FPR} = 1 - \text{pnorm}(z) \]

An FPR of 0.05 thus corresponds to

z_factor <- qnorm(0.05, lower.tail = FALSE)
z_factor

## [1] 1.644854

Defining drugs representing BTK, mTOR and MEK inhibition.

drugnames <- c( "ibrutinib", "everolimus", "selumetinib")
ib  <- "D_002_4:5" 
ev  <- "D_063_4:5" 
se  <- "D_012_4:5"
stopifnot(identical(fData(lpdAll)[c(ib, ev, se), "name"], drugnames))

df  <- Biobase::exprs(lpdAll)[c(ib, ev, se), lpdAll$Diagnosis=="CLL"] %>%
  t %>% as_tibble %>% `colnames<-`(drugnames)

mdf <- melt(data.frame(df))

## No id variables; using all as measure variables

grid.arrange(ncol = 2,
  ggplot(df, aes(x = 1-ibrutinib,  y = 1-everolimus )) + geom_point(),
  ggplot(df, aes(x = 1-everolimus, y = 1-selumetinib)) + geom_point()  
  )

Determine standard deviation using mirror method.

pmdf <- filter(mdf, value >= 1)
ssd  <- mean( (pmdf$value - 1) ^ 2 ) ^ 0.5
ssd

## [1] 0.05990934

Normal density.

dn <- tibble(
  x = seq(min(mdf$value), max(mdf$value), length.out = 100),
  y = dnorm(x, mean = 1, sd = ssd) * 2 * nrow(pmdf) / nrow(mdf) 
)

First, draw histogram for all three drugs pooled.

#FIG# S10 A
thresh   <- 1 - z_factor * ssd
thresh

## [1] 0.9014579

hh <- ggplot() +
  geom_histogram(aes(x = value, y = ..density..),
                 binwidth = 0.025, data = mdf) +
  theme_minimal() +
  geom_line(aes(x = x, y = y), col = "darkblue", data = dn) +
  geom_vline(col = "red", xintercept = thresh)
hh

Then split by drug.

hh + facet_grid( ~ variable)

Decision rule.

thresh

## [1] 0.9014579

df <- mutate(df,
  group = ifelse(ibrutinib < thresh, "BTK",
          ifelse(everolimus < thresh, "mTOR",
          ifelse(selumetinib < thresh, "MEK", "Non-responder")))
)

Present the decision rule in the plots.

#FIG# S10 B
mycol <- c(`BTK` = "Royalblue4",
           `mTOR` = "chartreuse4",
           `MEK` = "mediumorchid4",
           `Non-responder` = "grey61")

plots <- list(
  ggplot(df, aes(x = 1-ibrutinib,  y = 1-everolimus)),
  ggplot(filter(df, group != "BTK"), aes(x = 1-selumetinib, y = 1-everolimus))
)

plots <- lapply(plots, function(x)
  x + geom_point(aes(col = group), size = 1.5) + theme_minimal() +
    coord_fixed() + 
    scale_color_manual(values = mycol) +
    geom_hline(yintercept = 1 - thresh) + 
    geom_vline(xintercept = 1- thresh) +
    ylim(-0.15, 0.32) + xlim(-0.15, 0.32) 
    
)

grid.arrange(ncol = 2, grobs  = plots)

The above roules of stratification of patients into drug response groups is contained within defineResponseGroups function inside the package.

sel = defineResponseGroups(lpd=lpdAll)

## Using PatientID as id variables

11.2 Mean of each group

# colors
c1 = giveColors(2, 0.5)
c2 = giveColors(4, 0.85)
c3 = giveColors(10, 0.75)

# vectors
p <- vector(); d <- vector();
pMTOR <- vector(); pBTK <- vector(); pMEK <- vector(); pNONE <- vector()
dMTOR <- vector(); dBTK <- vector(); dMEK <- vector(); dNONE <- vector()
dMTOR_NONE <- vector(); pMTOR_NONE <- vector()

# groups                               
sel$grMTOR_NONE <- ifelse(sel$group=="mTOR", "mTOR", NA)
sel$grMTOR_NONE <- ifelse(sel$group=="none", "none", sel$grMTOR_NONE)

sel$grMTOR <- ifelse(sel$group=="mTOR", "mTOR", "rest")
sel$col <- ifelse(sel$group=="mTOR", c3, "grey")
sel$grBTK  <- ifelse(sel$group=="BTK",  "BTK", "rest")
sel$col <- ifelse(sel$group=="BTK",  c1, sel$col)
sel$grMEK  <- ifelse(sel$group=="MEK",  "MEK", "rest")
sel$col <- ifelse(sel$group=="MEK",  c2, sel$col)
sel$grNONE <- ifelse(sel$group=="none", "none", "rest")



for (i in 1: max(which(fData(lpdCLL)$type=="viab"))) {
  
  fit <- aov(Biobase::exprs(lpdCLL)[i, rownames(sel)] ~ sel$group)
  p[i] <- summary(fit)[[1]][["Pr(>F)"]][1]
  
  
  calc_p = function(clmn) {
    p.adjust(
      t.test(Biobase::exprs(lpdCLL)[i, rownames(sel)] ~ sel[,clmn],
             alternative = c("two.sided") )$p.value, "BH" )
  }
  
  calc_d = function(clmn) {
    diff(
      t.test(Biobase::exprs(lpdCLL)[i, rownames(sel)] ~ sel[,clmn])$estimate,
      alternative = c("two.sided") )
  }
  
  pMTOR_NONE[i] <- calc_p("grMTOR_NONE")
  dMTOR_NONE[i] <- calc_d("grMTOR_NONE")
  
  pMTOR[i] <- calc_p("grMTOR")
  dMTOR[i] <- calc_d("grMTOR")
 
  pBTK[i] <- calc_p("grBTK")
  dBTK[i] <- calc_d("grBTK")
 
  pMEK[i] <- calc_p("grMEK")
  dMEK[i] <- calc_d("grMEK")
 
  pNONE[i] <- calc_p("grNONE")
  dNONE[i] <- calc_d("grNONE")
 
  # drugnames
  d[i] <- rownames(lpdCLL)[i]
}

#FIG# 3F
#construct data frame
ps <- data.frame(drug=d, pMTOR, pBTK, pMEK, pNONE, p )
ds <- data.frame(dMTOR, dBTK, dMEK, dNONE)
rownames(ps) <- ps[,1]; rownames(ds) <- ps[,1]

# selcet only rows for singel concentrations, set non-sig to zero 
ps45 <- ps[rownames(ps)[grep(rownames(ps), pattern="_4:5")],2:5 ]
for (i in 1:4) { ps45[,i] <- ifelse(ps45[,i]<0.05, ps45[,i], 0) }

ds45 <- ds[rownames(ds)[grep(rownames(ds), pattern="_4:5")],1:4 ]
for (i in 1:4) { ds45[,i] <- ifelse(ps45[,i]<0.05, ds45[,i], 0) }

# exclude non-significant rows
selDS <- rownames(ds45)[rowSums(ps45)>0]
selPS <- rownames(ps45)[rowSums(ps45)>0]
ps45 <- ps45[selPS, ]
ds45 <- ds45[selDS, ]

groupMean = function(gr) {
  rowMeans(Biobase::exprs(lpdCLL)[rownames(ps45), rownames(sel)[sel$group==gr]])
}

MBTK <- groupMean("BTK")
MMEK <- groupMean("MEK")
MmTOR <- groupMean("mTOR")
MNONE <- groupMean("none")

# create data frame, new colnames
ms <- data.frame(BTK=MBTK, MEK=MMEK, mTOR=MmTOR, NONE=MNONE)
colnames(ms) <- c("BTK", "MEK", "mTOR", "WEAK")
rownames(ms) <- drugs[substr(selPS, 1,5), "name"]

# select rows with effect sizes group vs. rest >0.05
ms <- ms[ which(rowMax(as.matrix(ds45)) > 0.05 ) ,  ]

# exclude some drugs
ms <- ms[-c(
  which(rownames(ms) %in%
          c("everolimus", "ibrutinib", "selumetinib", "bortezomib"))),]

pheatmap(ms[, c("MEK", "BTK","mTOR", "WEAK")], cluster_cols = FALSE,
         cluster_rows =TRUE, clustering_method = "centroid",
         scale = "row",
         color=colorRampPalette(
           c(rev(brewer.pal(7, "YlOrRd")), "white", "white", "white",
             brewer.pal(7, "Blues")))(101)
        )

11.3 Bee swarm plots for groups

For selected drugs, we show differences of drug response between patient response groups.

#FIG# 3G
# drug label
giveDrugLabel3 <- function(drid) {
  vapply(strsplit(drid, "_"), function(x) {
    k <- paste(x[1:2], collapse="_")
    paste0(drugs[k, "name"])
  }, character(1))
}

groups = sel[,"group", drop=FALSE]
groups[which(groups=="none"), "group"] = "WEAK"

# beeswarm function
beeDrug <- function(xDrug) {
 
   par(bty="l", cex.axis=1.5)
   beeswarm(
     Biobase::exprs(lpdCLL)[xDrug, rownames(sel)] ~ groups$group,
     axes=FALSE, cex.lab=1.5, ylab="Viability", xlab="", pch = 16,
     pwcol=sel$col, cex=1,
     ylim=c(min( Biobase::exprs(lpdCLL)[xDrug, rownames(sel)] ) - 0.05, 1.2) )
 
   boxplot(Biobase::exprs(lpdCLL)[xDrug, rownames(sel)] ~ groups$group,  add = T,
           col="#0000ff22", cex.lab=2, outline = FALSE) 
   
   mtext(side=3, outer=F, line=0, 
       paste0(giveDrugLabel3(xDrug) ), cex=2) 
}
 

beeDrug("D_001_4:5")

beeDrug("D_081_4:5")

beeDrug("D_013_4:5")

beeDrug("D_003_4:5")

beeDrug("D_020_4:5")

beeDrug("D_165_3")

12 Kaplan-Meier plot for groups (time from sample to next treatment)

#FIG# S11 A
patmeta[, "group"] <- sel[rownames(patmeta), "group"]

c1n <- giveColors(2)
c2n <- giveColors(4)
c3n <- giveColors(10)
c4n <- "lightgrey"

survplot(Surv(patmeta[ , "T5"],
              patmeta[ , "treatedAfter"] == TRUE)  ~ patmeta$group ,
  lwd=3, cex.axis = 1.2, cex.lab=1.5, col=c(c1n, c2n, c3n, c4n), 
  data = patmeta,  
  legend.pos = 'bottomleft', stitle = 'Drug response', 
  xlab = 'Time (years)', ylab = 'Patients without treatment (%)',
)

13 Incidence of somatic gene mutations and CNVs in the four groups

Load data.

data(lpdAll)

Select CLL patients.

lpdCLL <- lpdAll[ , lpdAll$Diagnosis== "CLL"]

Build groups.

sel = defineResponseGroups(lpd=lpdAll)

## Using PatientID as id variables

Calculate total number of mutations per patient.

genes <- data.frame(
  t(Biobase::exprs(lpdCLL)[fData(lpdCLL)$type %in% c("gen", "IGHV"), rownames(sel)]),
  group = factor(sel$group)
)

genes <- genes[!(is.na(rownames(genes))), ]
colnames(genes) %<>%
  sub("del13q14_any", "del13q14", .) %>%
  sub("IGHV.Uppsala.U.M", "IGHV", .)

Nmut = rowSums(genes[, colnames(genes) != "group"], na.rm = TRUE)

mf <- sapply(c("BTK", "MEK", "mTOR", "none"), function(i) 
  mean(Nmut[genes$group==i]) 
)

barplot(mf, ylab="Total number of mutations/CNVs per patient", col="darkgreen")

Test each mutation, and each group, marginally for an effect.

mutsUse <- setdiff( colnames(genes), "group" )
mutsUse <- mutsUse[ colSums(genes[, mutsUse], na.rm = TRUE) >= 4 ]
mutationTests <- lapply(mutsUse, function(m) {
  tibble(
    mutation = m, 
    p = fisher.test(genes[, m], genes$group)$p.value)
}) %>% bind_rows  %>% mutate(pBH = p.adjust(p, "BH")) %>% arrange(p)

mutationTests <- mutationTests %>% filter(pBH < 0.1)

Number of mutations with the p-value meeting the threshold.

nrow(mutationTests)

## [1] 8

groupTests <- lapply(unique(genes$group), function(g) {
  tibble(
    group = g, 
    p = fisher.test(
      colSums(genes[genes$group == g, mutsUse], na.rm=TRUE),
      colSums(genes[genes$group != g, mutsUse], na.rm=TRUE),
      simulate.p.value = TRUE, B = 10000)$p.value)
}) %>% bind_rows %>% arrange(p)

groupTests

## # A tibble: 4 x 2
##   group         p
##   <fct>     <dbl>
## 1 none  0.0001000
## 2 MEK   0.00110  
## 3 mTOR  0.00360  
## 4 BTK   0.00860

These results show that each of the groups has an imbalanced mutation distribution, and that each of the above-listed mutations is somehow imbalanced between the groups.

13.1 Test gene mutations vs. groups

Fisher.test genes vs. groups function. Below function assumes that g is a data.frame one of whose columns is group and all other columns are numeric (i.e., 0 or 1) mutation indicators.

fisher.genes <- function(g, ref) {
  stopifnot(length(ref) == 1)
  ggg <- ifelse(g$group == ref, ref, "other")
  idx <- which(colnames(g) != "group") 
  
  lapply(idx, function(i)
    if (sum(g[, i], na.rm = TRUE) > 2) {
      ft  <- fisher.test(ggg, g[, i])
      tibble(
        p    = ft$p.value,
        es   = ft$estimate, 
        prop = sum((ggg == ref) & !is.na(g[, i]), na.rm = TRUE),
        mut1 = sum((ggg == ref) & (g[, i] != 0),  na.rm = TRUE),
        gene = colnames(g)[i])
    }  else {
      tibble(p = 1, es = 1, prop = 0, mut1 = 0, gene = colnames(g)[i])
    }
  ) %>% bind_rows
}

Calculate p values and effects using the Fisher test and group of interest vs. rest.

pMTOR <- fisher.genes(genes, ref="mTOR") 
pBTK  <- fisher.genes(genes, ref="BTK") 
pMEK  <- fisher.genes(genes, ref="MEK") 
pNONE <- fisher.genes(genes, ref="none")

p    <- cbind(pBTK$p,    pMEK$p,    pMTOR$p,    pNONE$p)
es   <- cbind(pBTK$es,   pMEK$es,   pMTOR$es,   pNONE$es)
prop <- cbind(pBTK$prop, pMEK$prop, pMTOR$prop, pNONE$prop)
mut1 <- cbind(pBTK$mut1, pMEK$mut1, pMTOR$mut1, pNONE$mut1)

Prepare matrix for heatmap.

p <- ifelse(p < 0.05, 1, 0) 
p <- ifelse(es <= 1, p, -p) 
rownames(p) <- pMTOR$gene
colnames(p) <- c("BTK", "MEK", "mTOR", "NONE")

pM <- p[rowSums(abs(p))!=0, ]
propM <- prop[rowSums(abs(p))!=0, ]

Cell labels.

N <- cbind( paste0(mut1[,1],"/",prop[,1] ),
            paste0(mut1[,2],"/",prop[,2] ), 
            paste0(mut1[,3],"/",prop[,3] ),
            paste0(mut1[,4],"/",prop[,4] )
)

rownames(N) <- rownames(p)

Draw the heatmap only for significant factors in mutUse. Selection criteria for mutUse are >=4 mutations and adjusted p.value < 0.1 for 4x2 fisher test groups (mtor, mek, btk, none) vs mutation.

#FIG# S11 B
mutationTests <-
  mutationTests[which(!(mutationTests$mutation %in%
                          c("del13q14_bi", "del13q14_mono"))), ]
pMA <- p[mutationTests$mutation, ]
pMA

##           BTK MEK mTOR NONE
## IGHV       -1   0    1    1
## trisomy12   1   0    1   -1
## del13q14   -1   0    0    1
## KLHL6       0   0    1    0
## TP53        0   1    0    0
## MED12       0   1    0    0

pheatmap(pMA, cluster_cols = FALSE, 
         cluster_rows = FALSE, legend=TRUE, annotation_legend = FALSE, 
         color = c("red", "white", "lightblue"),
         display_numbers = N[ rownames(pMA), ]
        )

14 Differences in gene expression profiles between drug-response groups

data("dds")

sel = defineResponseGroups(lpd=lpdAll)

## Using PatientID as id variables

sel$group = gsub("none","weak", sel$group)

# select patients with CLL in the main screen data 
colnames(dds) <- colData(dds)$PatID
pat <- intersect(colnames(lpdCLL), colnames(dds))
dds_CLL <- dds[,which(colData(dds)$PatID %in% pat)]

# add group label 
colData(dds_CLL)$group <- factor(sel[colnames(dds_CLL), "group"])
colData(dds_CLL)$IGHV = factor(patmeta[colnames(dds_CLL),"IGHV"])

Select genes with most variable expression levels.

vsd <- varianceStabilizingTransformation( assay(dds_CLL) )
colnames(vsd) = colData(dds_CLL)$PatID
rowVariance <- setNames(rowVars(vsd), nm=rownames(vsd))
sortedVar <- sort(rowVariance, decreasing=TRUE)
mostVariedGenes <- sortedVar[1:10000]
dds_CLL <- dds_CLL[names(mostVariedGenes), ]

Run DESeq2.

cb <- combn(unique(colData(dds_CLL)$group), 2)

gg <- list(); ggM <- list(); ggU <- list()
DE <- function(ighv) {
 for (i in 1:ncol(cb)) {
   dds_sel <- dds_CLL[,which(colData(dds_CLL)$IGHV %in% ighv)]
   dds_sel <- dds_sel[,which(colData(dds_sel)$group %in% cb[,i])]
   design(dds_sel) = ~group
   dds_sel$group <- droplevels(dds_sel$group)
   dds_sel$group <- relevel(dds_sel$group, ref=as.character(cb[2,i]) )
   dds_sel <- DESeq(dds_sel)
   res <- results(dds_sel)
   gg[[i]] <- res[order(res$padj, decreasing = FALSE), ]
   names(gg)[i] <- paste0(cb[1,i], "_", cb[2,i])
 }
return(gg)
}

ggM <- DE(ighv="M")
ggU <- DE(ighv="U")
gg <- DE(ighv=c("M", "U"))

The above code is not executed due to long running time. We load the output from the presaved object instead.

load(system.file("extdata","gexGroups.RData", package="BloodCancerMultiOmics2017"))

We use biomaRt package to map ensembl gene ids to hgnc gene symbols. The maping requires Internet connection and to omit this obstacle we load the presaved output. For completness, we provide the code used to generate the mapping.

library("biomaRt")
# extract all ensembl ids
allGenes = unique(unlist(lapply(gg, function(x) rownames(x))))
# get gene ids for ensembl ids
genSymbols = getBM(filters="ensembl_gene_id",
                   attributes=c("ensembl_gene_id", "hgnc_symbol"),
                   values=allGenes, mart=mart)
# select first id if more than one is present
genSymbols = genSymbols[!duplicated(genSymbols[,"ensembl_gene_id"]),]
# set rownames to ens id
rownames(genSymbols) = genSymbols[,"ensembl_gene_id"]

load(system.file("extdata","genSymbols.RData", package="BloodCancerMultiOmics2017"))

Correlation of IL-10 mRNA expression and response to everolimus within the mTOR subgroup.

#FIG# S14
gen="ENSG00000136634" #IL10
drug   <- "D_063_4:5" 

patsel <- intersect( rownames(sel)[sel$group %in% c("mTOR")], colnames(vsd) )

c <- cor.test( Biobase::exprs(lpdCLL)[drug, patsel], vsd[gen, patsel] )

# get hgnc_symbol for gen
# mart = useDataset("hsapiens_gene_ensembl", useMart("ensembl"))
# genSym = getBM(filters="ensembl_gene_id", attributes="hgnc_symbol",
# values=gen, mart=mart)
genSym = genSymbols[gen, "hgnc_symbol"]

plot(vsd[gen, patsel], Biobase::exprs(lpdCLL)[drug, patsel],
     xlab=paste0(genSym, " expression"),
     ylab="Viability (everolimus)", pch=19, ylim=c(0.70, 0.92), col="purple",
     main = paste0("mTOR-group", "\n cor = ", round(c$estimate, 3),
                   ", p = ", signif(c$p.value,2 )),
     cex=1.2)
abline(lm(Biobase::exprs(lpdCLL)[drug, patsel] ~ vsd[gen, patsel]))

Set colors.

c1 = giveColors(2, 0.4)
c2 = giveColors(4, 0.7)
c3 = giveColors(10, 0.6)

Function which extracts significant genes.

sigEx <- function(real) {
  
  ggsig = lapply(real, function(x) {
    x = data.frame(x)
    x = x[which(!(is.na(x$padj))),]
    x = x[x$padj<0.1,]
    x = x[order(x$padj, decreasing = TRUE),]
    x = data.frame(x[ ,c("padj","log2FoldChange")], ensg=rownames(x) )
    x$hgnc1 = genSymbols[rownames(x), "hgnc_symbol"]
    x$hgnc2 = ifelse(x$hgnc1=="" | x$hgnc1=="T" | is.na(x$hgnc1),
                     as.character(x$ensg), x$hgnc1)
    x[-(grep(x[,"hgnc2"], pattern="IG")),]
  })
  return(ggsig)
}

barplot1 <- function(real, tit) { 
  
  # process real diff genes
  sigExPlus = sigEx(real)
  ng <- lapply(sigExPlus, function(x){ cbind(
    up=nrow(x[x$log2FoldChange>0, ]),
    dn=nrow(x[x$log2FoldChange<0, ]) ) } ) 
  ng = melt(ng)

  p <- ggplot(ng, aes(reorder(L1, -value)), ylim(-500:500)) + 
    geom_bar(data = ng, aes(y =  value, fill=Var2), stat="identity",
             position=position_dodge() ) +
    scale_fill_brewer(palette="Paired", direction = -1,
                      labels = c("up", "down")) +
    ggtitle(label=tit) +
    
    geom_hline(yintercept = 0,colour = "grey90") + 
    theme(
           panel.grid.minor = element_blank(),
           
           axis.title.x = element_blank(),
           axis.text.x = element_text(size=14, angle = 60, hjust = 1),
           axis.ticks.x = element_blank(),
           
           axis.title.y  = element_blank(),
           axis.text.y = element_text(size=14, colour="black"),
           axis.line.y = element_line(colour = "black",
                                      size = 0.5, linetype = "solid"),
           
           legend.key = element_rect(fill = "white"),
           legend.background = element_rect(fill = "white"),
           legend.title = element_blank(),
           legend.text = element_text(size=14, colour="black"),
  
           panel.background = element_rect(fill = "white", color="white")
          ) 
  
  plot(p) 
}

#FIG# S11 C
barplot1(real=gg, tit="")

14.1 Cytokine / chemokine expression in mTOR group

Here we compare expression levels of cytokines/chemokines within the mTOR group.

Set helpful functions.

# beeswarm funtion
beefun <- function(df, sym) {
 par(bty="l", cex.axis=1.5)
 beeswarm(df$x ~ df$y, axes=FALSE, cex.lab=1.5, col="grey", ylab=sym, xlab="",
          pch = 16, pwcol=sel[colnames(vsd),"col"], cex=1.3)
 
 boxplot(df$x ~ df$y, add = T, col="#0000ff22", cex.lab=1.5) 
}

Bulid response groups.

sel = defineResponseGroups(lpdCLL)

## Using PatientID as id variables

sel[,1:3] = 1-sel[,1:3]
sel$IGHV = pData(lpdCLL)[rownames(sel), "IGHV Uppsala U/M"]

c1 = giveColors(2, 0.5)
c2 = giveColors(4, 0.85)
c3 = giveColors(10, 0.75)

# add colors
sel$col <- ifelse(sel$group=="mTOR", c3, "grey")
sel$col <- ifelse(sel$group=="BTK",  c1, sel$col)
sel$col <- ifelse(sel$group=="MEK",  c2, sel$col)

For each cytokine/chemokine we compare level of expression between the response groups.

cytokines <- c("CXCL2","TGFB1","CCL2","IL2","IL12B","IL4","IL6","IL10","CXCL8",
               "TNF")
cyEN =  sapply(cytokines, function(i) {
  genSymbols[which(genSymbols$hgnc_symbol==i)[1],"ensembl_gene_id"]
})

makeEmpty = function() {
  data.frame(matrix(ncol=3, nrow=length(cyEN),
                    dimnames=list(names(cyEN), c("BTK", "MEK", "mTOR"))) )
}
p = makeEmpty()
ef = makeEmpty()

for (i in 1:length(cyEN) ) {
 
 geneID <- cyEN[i]
 df <- data.frame(x=vsd[geneID,  ], y=sel[colnames(vsd) ,"group"])
 df$y <- as.factor(df$y)

 beefun(df, sym=names(geneID))
 
 df <- within(df, y <- relevel(y, ref = "none"))
 fit <- lm(x ~y, data=df)
 
 p[i,] <- summary(fit)$coefficients[ 2:4, "Pr(>|t|)"]
 
 abtk = mean(df[df$y=="BTK", "x"], na.rm=TRUE) - mean(df[df$y=="none", "x"],
                                                      na.rm=TRUE)
 amek = mean(df[df$y=="MEK", "x"], na.rm=TRUE) - mean(df[df$y=="none", "x"],
                                                      na.rm=TRUE)
 amtor= mean(df[df$y=="mTOR", "x"], na.rm=TRUE) - mean(df[df$y=="none", "x"],
                                                       na.rm=TRUE)

 ef[i,] <-  c(as.numeric(abtk), as.numeric(amek), as.numeric(amtor)) 
 
 mtext( paste(    "pBTK=", summary(fit)$coefficients[ 2, "Pr(>|t|)"], 
                "\npMEK=", summary(fit)$coefficients[ 3, "Pr(>|t|)"], 
               "\npMTOR=", summary(fit)$coefficients[ 4, "Pr(>|t|)"], 
              side=3 ))
}

As a next step, we summarize the above comparisons in one plot.

#FIG# S11 D
# log p-values
plog <- apply(p, 2, function(x){-log(x)} )
plog_m <- melt(as.matrix(plog))
ef_m <- melt(as.matrix(ef))

# introduce effect direction
plog_m$value <- ifelse(ef_m$value>0, plog_m$value, -plog_m$value)

rownames(plog_m) <- 1:nrow(plog_m)

# fdr
fdrmin = min( p.adjust(p$mTOR, "fdr") )

### plot ####
colnames(plog_m) <- c("cytokine", "group", "p")

lev = names(sort(tapply(plog_m$p, plog_m$cytokine, function(p) min(p))))

plog_m$cytokine <- factor(plog_m$cytokine, levels=lev)


 ggplot(data=plog_m, mapping=aes(x=cytokine, y=p, color=group)) + 
    scale_colour_manual(values = c(c1, c2, c3)) +
    geom_point( size=3 ) + 
       theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.9),
       panel.grid.minor = element_blank(),
       panel.grid.major = element_blank(),
       panel.background = element_blank(),
       axis.line.x=element_line(),
       axis.line.y=element_line(),
       legend.position="none"
       ) +
    scale_y_continuous(name="-log(p-value)", breaks=seq(-3,7.5,2),
                       limits=c(-3,7.5)) +
    xlab("") +
    geom_hline(yintercept = 0) +
    geom_hline(yintercept = -log(0.004588897), color="purple", linetype="dashed") +
    geom_hline(yintercept = (-log(0.05)), color="grey", linetype="dashed")

Within the mTOR group it is only IL-10 which have significantly increased expression. The other important cytokines/chemokines were not differentially expressed.

15 Response to cytokines in CLL

In order to find out whether the cytokines have pro-survival effect on patient cells the drug sreen was performed. 18 patient samples were exposed to 6 different cytokines. The viability of the treated cells were normalized by untreated controls.

Load the drug response dataset.

data("cytokineViab")

Plot the drug response curves.

cond <- c("IL-2", "IL-4", "IL-10", "IL-21", "LPS", "IgM")

for (i in cond){  
  plot = ggplot(
    filter(cytokineViab, Duplicate%in%c("1"), Stimulation==i, Timepoint=="48h"),
    aes(x=as.factor(Cytokine_Concentration2), y=Normalized_DMSO, colour=mtor,
        group=interaction(Patient))) +
    ylab("viability") + xlab("c(stimulation)") + ylim(c(0.8, 1.4)) +
    geom_line() + geom_point() + ggtitle(i) + theme_bw() + guides(color="none")
  
    assign(paste0("p",i), plot)
}

grid.arrange(`pIL-2`,`pIL-10`,`pIL-4`,`pIL-21`,pLPS, pIgM, nrow=2)

IL-10 had a pro-survival effect on the majority of samples, but not on those in the mTOR group. IL-4 and IL-21 had pro-survival effects on most samples, including the mTOR group.

options(stringsAsFactors=FALSE)

16 Gene set enrichment analysis on BTK, mTOR, MEK groups

Based on the classification of drug response phenotypes we divided CLL samples into distinct groups driven by the increased sensitivity towards BTK, mTOR and MEK inhibition. Here we perform gene set enrichment analysis to find the causes of distinctive drug response phenotypes in the gene expression data.

Load objects.

data(list=c("dds", "lpdAll"))

gmts = list(H=system.file("extdata","h.all.v5.1.symbols.gmt",
                          package="BloodCancerMultiOmics2017"),
            C6=system.file("extdata","c6.all.v5.1.symbols.gmt",
                           package="BloodCancerMultiOmics2017"))

Divide patients into response groups.

patGroup = defineResponseGroups(lpd=lpdAll)

## Using PatientID as id variables

16.1 Preprocessing RNAseq data

Subsetting RNAseq data to include the CLL patients for which the drug screen was performed.

lpdCLL <- lpdAll[fData(lpdAll)$type=="viab",
                 pData(lpdAll)$Diagnosis %in% c("CLL")]

ddsCLL <- dds[,colData(dds)$PatID %in% colnames(lpdCLL)]

Read in group and add annotations to the RNAseq data set.

ddsCLL <- ddsCLL[,colData(ddsCLL)$PatID %in% rownames(patGroup)]

#remove genes without gene symbol annotations
ddsCLL <- ddsCLL[!is.na(rowData(ddsCLL)$symbol),]
ddsCLL <- ddsCLL[rowData(ddsCLL)$symbol != "",]

#add drug sensitivity annotations to coldata
colData(ddsCLL)$group <- factor(patGroup[colData(ddsCLL)$PatID, "group"])

Remove rows that contain too few counts.

#only keep genes that have counts higher than 10 in any sample
keep <- apply(counts(ddsCLL), 1, function(x) any(x >= 10)) 
ddsCLL <- ddsCLL[keep,]
dim(ddsCLL)

## [1] 20930   122

Remove transcripts which do not show variance across samples.

ddsCLL <- estimateSizeFactors(ddsCLL)
sds <- rowSds(counts(ddsCLL, normalized = TRUE))
sh <- shorth(sds)
ddsCLL <- ddsCLL[sds >= sh,]

Variance stabilizing transformation

ddsCLL.norm <- varianceStabilizingTransformation(ddsCLL)

16.2 Differential gene expression

Perform differential gene expression using DESeq2.

DEres <- list()
design(ddsCLL) <- ~ group

rnaRaw <- DESeq(ddsCLL, betaPrior = FALSE)

## using pre-existing size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

## -- replacing outliers and refitting for 524 genes
## -- DESeq argument 'minReplicatesForReplace' = 7 
## -- original counts are preserved in counts(dds)

## estimating dispersions

## fitting model and testing

#extract results for different comparisons
# responders versus weak-responders
DEres[["BTKnone"]] <- results(rnaRaw, contrast = c("group","BTK","none"))
DEres[["MEKnone"]] <- results(rnaRaw, contrast = c("group","MEK","none"))
DEres[["mTORnone"]] <- results(rnaRaw, contrast = c("group","mTOR","none"))

16.3 Gene set enrichment analysis

The gene set enrichment analysis will be performed by using the MSigDB gene set collections C6 and Hallmark (http://software.broadinstitute.org/gsea/msigdb/ ). For each collection we will show the top five enriched gene sets and respective differentially expressed genes. Gene set enrichment analysis will be performed on the ranked gene lists using the Parametric Analysis of Gene Set Enrichment (PAGE).

16.3.1 Functions for enrichment analysis and plots

Define cut-off.

pCut = 0.05

Function to run GSEA or PAGE in R.

runGSEA <- function(inputTab, gmtFile, GSAmethod="gsea", nPerm=1000){
    inGMT <- loadGSC(gmtFile,type="gmt")
     #re-rank by score
    rankTab <- inputTab[order(inputTab[,1],decreasing = TRUE),,drop=FALSE] 
    if (GSAmethod == "gsea"){
        #readin geneset database
        #GSEA analysis
        res <- runGSA(geneLevelStats = rankTab,
                      geneSetStat = GSAmethod,
                      adjMethod = "fdr", gsc=inGMT,
                      signifMethod = 'geneSampling', nPerm = nPerm)
        GSAsummaryTable(res)
    } else if (GSAmethod == "page"){
        res <- runGSA(geneLevelStats = rankTab,
                      geneSetStat = GSAmethod,
                      adjMethod = "fdr", gsc=inGMT,
                      signifMethod = 'nullDist')
        GSAsummaryTable(res)
    }
}

Function which run the GSE for each response group.

runGSE = function(gmt) {
  
  Res <- list()
  for (i in names(DEres)) {
    dataTab <- data.frame(DEres[[i]])
    dataTab$ID <- rownames(dataTab)
    #filter using pvalues
    dataTab <- filter(dataTab, pvalue <= pCut) %>%
                  arrange(pvalue) %>% 
                  mutate(Symbol = rowData(ddsCLL[ID,])$symbol)
    dataTab <- dataTab[!duplicated(dataTab$Symbol),]
    statTab <- data.frame(row.names = dataTab$Symbol, stat = dataTab$stat)
    resTab <- runGSEA(inputTab=statTab, gmtFile=gmt, GSAmethod="page")
    Res[[i]] <- arrange(resTab,desc(`Stat (dist.dir)`))
  }
  Res
}

Function to get the list of genes enriched in a set.

getGenes <- function(inputTab, gmtFile){
  geneList <- loadGSC(gmtFile,type="gmt")$gsc
  enrichedUp <- lapply(geneList, function(x) 
    intersect(rownames(inputTab[inputTab[,1] >0,,drop=FALSE]),x))
  enrichedDown <- lapply(geneList, function(x)
    intersect(rownames(inputTab[inputTab[,1] <0,,drop=FALSE]),x))
  return(list(up=enrichedUp, down=enrichedDown))
}

A function to plot the heat map of intersection of genes in different gene sets.

plotSetHeatmap <- 
  function(geneTab, enrichTab, topN, gmtFile, tittle="",
           asterixList = NULL, anno=FALSE) {

    if (nrow(enrichTab) < topN) topN <- nrow(enrichTab)
    enrichTab <- enrichTab[seq(1,topN),]

    geneList <- getGenes(geneTab,gmtFile)
    
    geneList$up <- geneList$up[enrichTab[,1]]
    geneList$down <- geneList$down[enrichTab[,1]]
    
    #form a table 
    allGenes <- unique(c(unlist(geneList$up),unlist(geneList$down)))
    allSets <- unique(c(names(geneList$up),names(geneList$down)))
    plotTable <- matrix(data=NA,ncol = length(allSets),
                        nrow = length(allGenes),
                        dimnames = list(allGenes,allSets))
    for (setName in names(geneList$up)) {
      plotTable[geneList$up[[setName]],setName] <- 1
    }
    for (setName in names(geneList$down)) {
      plotTable[geneList$down[[setName]],setName] <- -1
    }

    if(is.null(asterixList)) {
      #if no correlation table specified, order by the number of
      # significant gene
      geneOrder <- rev(
        rownames(plotTable[order(rowSums(plotTable, na.rm = TRUE),
                                 decreasing =FALSE),]))
    } else {
      #otherwise, order by the p value of correlation
      asterixList <- arrange(asterixList, p)
      geneOrder <- filter(
        asterixList, symbol %in% rownames(plotTable))$symbol
      geneOrder <- c(
        geneOrder, rownames(plotTable)[! rownames(plotTable) %in% geneOrder])
    }
    
    plotTable <- melt(plotTable)
    colnames(plotTable) <- c("gene","set","value")
    plotTable$gene <- as.character(plotTable$gene)
    
    if(!is.null(asterixList)) {
      #add + if gene is positivily correlated with sensitivity, else add "-"
      plotTable$ifSig <- asterixList[
        match(plotTable$gene, asterixList$symbol),]$coef
      plotTable <- mutate(plotTable, ifSig =
                            ifelse(is.na(ifSig) | is.na(value), "",
                                   ifelse(ifSig > 0, "-", "+")))
    }
    plotTable$value <- replace(plotTable$value,
                               plotTable$value %in% c(1), "Up")
    plotTable$value <- replace(plotTable$value,
                               plotTable$value %in%  c(-1), "Down")
    
    allSymbols <- plotTable$gene
    
    geneSymbol <- geneOrder
    
    if (anno) { #if add functional annotations in addition to gene names
      annoTab <- tibble(symbol = rowData(ddsCLL)$symbol, 
                        anno = sapply(rowData(ddsCLL)$description,
                                      function(x) unlist(strsplit(x,"[[]"))[1]))
      annoTab <- annoTab[!duplicated(annoTab$symbol),]
      annoTab$combine <- sprintf("%s (%s)",annoTab$symbol, annoTab$anno)
      plotTable$gene <- annoTab[match(plotTable$gene,annoTab$symbol),]$combine
      geneOrder <- annoTab[match(geneOrder,annoTab$symbol),]$combine
      geneOrder <- rev(geneOrder)
    }
    
    plotTable$gene <- factor(plotTable$gene, levels =geneOrder)
    plotTable$set <- factor(plotTable$set, levels = enrichTab[,1])

    g <- ggplot(plotTable, aes(x=set, y = gene)) +
      geom_tile(aes(fill=value), color = "black") +
      scale_fill_manual(values = c("Up"="red","Down"="blue")) +
      xlab("") + ylab("") + theme_classic() +
      theme(axis.text.x=element_text(size=7, angle = 60, hjust = 0),
            axis.text.y=element_text(size=7),
            axis.ticks = element_line(color="white"),
            axis.line = element_line(color="white"),
            legend.position = "none") +
      scale_x_discrete(position = "top") +
      scale_y_discrete(position = "right")
    
    if(!is.null(asterixList)) {
      g <- g + geom_text(aes(label = ifSig), vjust =0.40)
    }
    
    # construct the gtable
    wdths = c(0.05, 0.25*length(levels(plotTable$set)), 5)
    hghts = c(2.8, 0.1*length(levels(plotTable$gene)), 0.05)
    gt = gtable(widths=unit(wdths, "in"), heights=unit(hghts, "in"))
    ## make grobs
    ggr = ggplotGrob(g)
    ## fill in the gtable
    gt = gtable_add_grob(gt, gtable_filter(ggr, "panel"), 2, 2)
    gt = gtable_add_grob(gt, ggr$grobs[[5]], 1, 2) # top axis
    gt = gtable_add_grob(gt, ggr$grobs[[9]], 2, 3) # right axis
    
    return(list(list(plot=gt,
                     width=sum(wdths),
                     height=sum(hghts),
                     genes=geneSymbol)))
}

Prepare stats per gene for plotting.

statTab = setNames(lapply(c("mTORnone","BTKnone","MEKnone"), function(gr) {
  dataTab <- data.frame(DEres[[gr]])
  dataTab$ID <- rownames(dataTab)
  #filter using pvalues
  dataTab <- filter(dataTab, pvalue <= pCut) %>%
    arrange(pvalue) %>%
    mutate(Symbol = rowData(ddsCLL[ID,])$symbol) %>%
    filter(log2FoldChange > 0)
  dataTab <- dataTab[!duplicated(dataTab$Symbol),]
  data.frame(row.names = dataTab$Symbol, stat = dataTab$stat)
}), nm=c("mTORnone","BTKnone","MEKnone"))

16.3.2 Geneset enrichment based on Hallmark set (H)

Perform enrichment analysis using PAGE method.

hallmarkRes = runGSE(gmt=gmts[["H"]])

## Checking arguments...done!
## Calculating gene set statistics...done!
## Calculating gene set significance...done!
## Adjusting for multiple testing...done!
## Checking arguments...done!
## Calculating gene set statistics...done!
## Calculating gene set significance...done!
## Adjusting for multiple testing...done!
## Checking arguments...done!
## Calculating gene set statistics...done!
## Calculating gene set significance...done!
## Adjusting for multiple testing...done!

16.3.3 Geneset enrichment based on oncogenic signature set (C6)

Perform enrichment analysis using PAGE method.

c6Res = runGSE(gmt=gmts[["C6"]])

## Checking arguments...done!
## Calculating gene set statistics...done!
## Calculating gene set significance...done!
## Adjusting for multiple testing...done!
## Checking arguments...done!
## Calculating gene set statistics...done!
## Calculating gene set significance...done!
## Adjusting for multiple testing...done!
## Checking arguments...done!
## Calculating gene set statistics...done!
## Calculating gene set significance...done!
## Adjusting for multiple testing...done!

16.4 Everolimus response VS gene expression (within mTOR group)

To further investiage the association between expression and drug sensitivity group at gene level, correlation test was performed to identify genes whose expressions are correlated with the sensitivity to the mTOR inhibitor (everolimus) sensitivity within the mTOR group.

16.4.1 Correlation test

Prepare drug sensitivity vector and gene expression matrix

ddsCLL.mTOR <- ddsCLL.norm[,ddsCLL.norm$group %in% "mTOR"]
viabMTOR <- Biobase::exprs(lpdCLL["D_063_4:5", ddsCLL.mTOR$PatID])[1,]
stopifnot(all(ddsCLL.mTOR$PatID == colnames(viabMTOR)))

Filtering and applying variance stabilizing transformation on RNAseq data

#only keep genes that have counts higher than 10 in any sample
keep <- apply(assay(ddsCLL.mTOR), 1, function(x) any(x >= 10)) 
ddsCLL.mTOR <- ddsCLL.mTOR[keep,]
dim(ddsCLL.mTOR)

## [1] 9775   15

Association test using Pearson correlation

corResult <- tibble(ID = rownames(ddsCLL.mTOR), 
                    symbol = rowData(ddsCLL.mTOR)$symbol,
                    coef = NA,
                    p = NA)

for (i in seq(nrow(corResult))) {
  res <- cor.test(viabMTOR, assay(ddsCLL.mTOR[i,])[1,], method = "pearson")
  corResult[i,c(3,4)] <- c(res$estimate[[1]], res$p.value)
}

corResult <- arrange(corResult, p) %>% mutate(p.adj = p.adjust(p, method="BH"))

16.4.2 Enrichment heatmaps for mTOR group with overlapped genes indicated

The genes that are positively correlated with everolimus sensitivity are labeled as “+” in the heatmap and the negatively correlated genes are labeled as “-”.

Plot for C6 gene sets

pCut = 0.05
corResult.sig <- filter(corResult, p <= pCut)
c6Plot <- plotSetHeatmap(geneTab=statTab[["mTORnone"]],
                enrichTab=c6Res[["mTORnone"]],
                topN=5, gmtFile=gmts[["C6"]],
                #add asterix in front of the overlapped genes
                asterixList = corResult.sig, 
                anno=TRUE, i)

Plot for Hallmark gene sets

hallmarkPlot <- plotSetHeatmap(geneTab=statTab[["mTORnone"]],
                enrichTab=hallmarkRes[["mTORnone"]],
                topN=5, gmtFile=gmts[["H"]],
                asterixList = corResult.sig,
                anno=TRUE, i)

16.5 Ibrutinib response VS gene expression (within BTK group)

Correlation test was performed to identify genes whose expressions are correlated with the sensitivity to the BTK inhibitor (ibrutinib) sensitivity within the BTK group.

16.5.1 Correlation test

Prepare drug sensitivity vector and gene expression matrix

ddsCLL.BTK <- ddsCLL.norm[,ddsCLL.norm$group %in% "BTK"]
viabBTK <- Biobase::exprs(lpdCLL["D_002_4:5", ddsCLL.BTK$PatID])[1,]
stopifnot(all(ddsCLL.BTK$PatID == colnames(viabBTK)))

Filtering and applying variance stabilizing transformation on RNAseq data

#only keep genes that have counts higher than 10 in any sample
keep <- apply(assay(ddsCLL.BTK), 1, function(x) any(x >= 10)) 
ddsCLL.BTK <- ddsCLL.BTK[keep,]
dim(ddsCLL.BTK)

## [1] 10416    30

Association test using Pearson correlation

corResult <- tibble(ID = rownames(ddsCLL.BTK), 
                    symbol = rowData(ddsCLL.BTK)$symbol,
                    coef = NA,
                    p = NA)

for (i in seq(nrow(corResult))) {
  res <- cor.test(viabBTK, assay(ddsCLL.BTK[i,])[1,], method = "pearson")
  corResult[i,c(3,4)] <- c(res$estimate[[1]], res$p.value)
}

corResult <- arrange(corResult, p) %>% mutate(p.adj = p.adjust(p, method="BH"))

16.5.2 Enrichment heatmaps for BTK group with overlapped genes indicated

Plot for C6 gene sets

pCut = 0.05
corResult.sig <- filter(corResult, p <= pCut)
c6Plot <- plotSetHeatmap(geneTab=statTab[["BTKnone"]],
                enrichTab=c6Res[["BTKnone"]],
                topN=5, gmtFile=gmts[["C6"]],
                #add asterix in front of the overlapped genes
                asterixList = corResult.sig, 
                anno=TRUE, i)

Plot for Hallmark gene sets

hallmarkPlot <- plotSetHeatmap(geneTab=statTab[["BTKnone"]],
                enrichTab=hallmarkRes[["BTKnone"]],
                topN=5, gmtFile=gmts[["H"]],
                asterixList = corResult.sig,
                anno=TRUE, i)

16.5.3 Selumetinib response VS gene expression (within MEK group)

Correlation test was performed to identify genes whose expressions are correlated with the sensitivity to the MEK inhibitor (selumetinib) sensitivity within the MEK group.

16.5.4 Correlation test

Prepare drug sensitivity vector and gene expression matrix

ddsCLL.MEK <- ddsCLL.norm[,ddsCLL.norm$group %in% "MEK"]
viabMEK <- Biobase::exprs(lpdCLL["D_012_4:5", ddsCLL.MEK$PatID])[1,]
stopifnot(all(ddsCLL.MEK$PatID == colnames(viabMEK)))

Filtering and applying variance stabilizing transformation on RNAseq data

#only keep genes that have counts higher than 10 in any sample
keep <- apply(assay(ddsCLL.MEK), 1, function(x) any(x >= 10)) 
ddsCLL.MEK <- ddsCLL.MEK[keep,]
dim(ddsCLL.MEK)

## [1] 10174    18

Association test using Pearson correlation

corResult <- tibble(ID = rownames(ddsCLL.MEK), 
                    symbol = rowData(ddsCLL.MEK)$symbol,
                    coef = NA,
                    p = NA)

for (i in seq(nrow(corResult))) {
  res <- cor.test(viabMEK, assay(ddsCLL.MEK[i,])[1,], method = "pearson")
  corResult[i,c(3,4)] <- c(res$estimate[[1]], res$p.value)
}

corResult <- arrange(corResult, p) %>% mutate(p.adj = p.adjust(p, method="BH"))

Within MEK group, no gene expression was correlated with Selumetinib response

16.5.5 Enrichment heatmaps for MEK group

Plot for C6 gene sets

pCut = 0.05
corResult.sig <- filter(corResult, p <= pCut)
c6Plot <- plotSetHeatmap(geneTab=statTab[["MEKnone"]],
                enrichTab=c6Res[["MEKnone"]],
                topN=5, gmtFile=gmts[["C6"]],
                anno=TRUE, i)

Plot for Hallmark gene sets

hallmarkPlot <- plotSetHeatmap(geneTab=statTab[["MEKnone"]],
                enrichTab=hallmarkRes[["MEKnone"]],
                topN=5, gmtFile=gmts[["H"]],
                asterixList = corResult.sig,
                anno=TRUE, i)

options(stringsAsFactors=FALSE)

17 Single associations of drug response with gene mutation or type of disease (IGHV included)

We univariantly tested different features (explained in detail below) for their associations with the drug response using Student t-test (two-sided, with equal variance). Each concentration was tested separately. The minimal size of the compared groups was set to 3. p-values were adjusted for multiple testing by applying the Benjamini-Hochberg procedure. Adjusted p-values were then used for setting the significance threshold.

Loading the data.

data(list=c("drpar", "patmeta", "drugs", "mutCOM", "conctab"))

17.1 Function which test associations of interest

Below is a general function with which all the tests for single associations were performed.

testFactors = function(msrmnt, factors, test="student", batch=NA) {
  
  # cut out the data
  tmp = colnames(factors)
  factors = data.frame(factors[rownames(msrmnt),], check.names=FALSE)
  colnames(factors) = tmp
  for(cidx in 1:ncol(factors))
    factors[,cidx] = factor(factors[,cidx], levels=c(0,1))
  
  # calculate the group size
  groupSizes = do.call(rbind, lapply(factors, function(tf) {
    tmp = table(tf)
    data.frame(n.0=tmp["0"], n.1=tmp["1"])
  }))
  
  # remove the factors with less then 2 cases per group
  factors = factors[,names(which(apply(groupSizes, 1,
                                       function(i) all(i>2)))), drop=FALSE]
  
  # calculate the effect
  effect = do.call(rbind, lapply(colnames(factors), function(tf) {
    tmp = aggregate(msrmnt ~ fac, data=data.frame(fac=factors[,tf]), mean)
    rownames(tmp) = paste("mean", tmp$fac, sep=".")
    tmp = t(tmp[2:ncol(tmp)])
    data.frame(TestFac=tf,
               DrugID=rownames(tmp),
               FacDr=paste(tf, rownames(tmp), sep="."),
               n.0=groupSizes[tf,"n.0"], n.1=groupSizes[tf,"n.1"],
               tmp, WM=tmp[,"mean.0"]-tmp[,"mean.1"])
  }))
  
  # do the test
  T = if(test=="student") {
    do.call(rbind, lapply(colnames(factors), function(tf) {
      tmp = do.call(rbind, lapply(colnames(msrmnt), function(dr) {
        res = t.test(msrmnt[,dr] ~ factors[,tf], var.equal=TRUE)
        data.frame(DrugID=dr, TestFac=tf,
                   pval=res$p.value, t=res$statistic,
                   conf1=res$conf.int[1], conf2=res$conf.int[2])
      }))
      tmp
    }))
  } else if(test=="anova") {
    do.call(rbind, lapply(colnames(factors), function(tf) {
      tmp = do.call(rbind, lapply(colnames(msrmnt), function(dr) {
        # make sure that the order in batch is the same as in msrmnt
        stopifnot(identical(rownames(msrmnt), names(batch)))
        res = anova(lm(msrmnt[,dr] ~ factors[,tf]+batch))
        data.frame(DrugID=dr, TestFac=tf, pval=res$`Pr(>F)`[1],
                   f=res$`F value`[1], meanSq1=res$`Mean Sq`[1],
                   meanSq2=res$`Mean Sq`[2])
      }))
      tmp
    }))
  } else {
    NA
  }
  
  enhanceObject = function(obj) {
    # give nice drug names
    obj$Drug = giveDrugLabel(obj$DrugID, conctab, drugs)
    # combine the testfac and drug id
    obj$FacDr = paste(obj$TestFac, obj$DrugID, sep=".")
    # select just the drug name
    obj$DrugID2 = substring(obj$DrugID, 1, 5)
    obj
  }
  
  list(effect=effect, test=enhanceObject(T))
}

17.2 Associations of ex vivo drug responses with genomic features in CLL

17.2.1 Prepare objects for testing

## VIABILITIES
## list of matrices; one matrix per screen/day
## each matrix contains all CLL patients
measurements=list()

### Main Screen
patM = colnames(drpar)[which(patmeta[colnames(drpar),"Diagnosis"]=="CLL")]
measurements[["main"]] =
  do.call(cbind,
          lapply(list("viaraw.1","viaraw.2","viaraw.3","viaraw.4","viaraw.5"),
                 function(viac) {
  tmp = t(assayData(drpar)[[viac]][,patM])
  colnames(tmp) = paste(colnames(tmp), conctab[colnames(tmp),
            paste0("c",substring(viac,8,8))], sep="-")
  tmp
}))

pats = sort(unique(patM))

## TESTING FACTORS
testingFactors = list()
# ighv
ighv = setNames(patmeta[pats, "IGHV"], nm=pats)
# mutations
tmp = cbind(IGHV=ifelse(ighv=="U",1,0), assayData(mutCOM)$binary[pats,])
testingFactors[["mutation"]] = tmp[,-grep("Chromothripsis", colnames(tmp))]

# BATCHES
j = which(pData(drpar)[patM, "ExpDate"] < as.Date("2014-01-01"))
k = which(pData(drpar)[patM, "ExpDate"] < as.Date("2014-08-01") &
            pData(drpar)[patM, "ExpDate"] > as.Date("2014-01-01"))
l = which(pData(drpar)[patM, "ExpDate"] > as.Date("2014-08-01"))

measurements[["main"]] = measurements[["main"]][c(patM[j], patM[k], patM[l]),]
batchvec = factor(
  setNames(c(rep(1, length(j)), rep(2, length(k)), rep(3, length(l))),
           nm=c(patM[j], patM[k], patM[l])))

# LABELS FOR GROUPING
beelabs = t(sapply(colnames(testingFactors[["mutation"]]), function(fac) {
  if(fac=="IGHV")
    c(`0`="IGHV mut", `1`="IGHV unmut")
  else if(grepl("[[:upper:]]",fac)) # if all letters are uppercase
    c(`0`=paste(fac, "wt"),`1`=paste(fac, "mt"))
  else
    c(`0`="wt",`1`=fac)
}))

17.2.2 Assesment of importance of batch effect

We first used the approach explained in the introduction section to test for associations between drug viability assay results and genomic features, which comprised: somatic mutations (aggregated at the gene level), copy number aberrations and IGHV status.

allresultsT = testFactors(msrmnt=measurements[["main"]],
                          factors=testingFactors[["mutation"]],
                          test="student", batch=NA)

resultsT = allresultsT$test
resultsT$adj.pval = p.adjust(resultsT$pval, method="BH")

However, we ware aware that the main screen was performed in three groups of batches over a time period of 1.5 years; these comprise, respectively, the samples screened in 2013, in 2014 before August and in 2014 in August and September. Therefore, to control for confounding by the different batch groups we repeated the drug-feature association tests using batch group as a blocking factor and a two-way ANOVA test.

allresultsA = testFactors(msrmnt=measurements[["main"]],
                          factors=testingFactors[["mutation"]],
                          test="anova", batch=batchvec)

resultsA = allresultsA$test
resultsA$adj.pval = p.adjust(resultsA$pval, method="BH")

We then compared the p-values from both tests.

Only one drug, bortezomib, showed discrepant p-values, and exploration of its data suggested that it lost its activity during storage. The data for this drug and NSC 74859 were discarded from further analysis.

badrugs = c("D_008", "D_025")

measurements = lapply(measurements, function(drres) {
  drres[,-grep(paste(badrugs, collapse="|"), colnames(drres))]
})

For all remaining associations, testing with and without batch as a blocking factor yielded equivalent results. Therefore, all reported p-values for associations come from the t-tests without using blocking for batch effects.

17.2.3 Associations of drug response with mutations in CLL

We tested for associations between drug viability assay results and genomic features (43 features for the pilot screen and 63 for the main screen). p-values were adjusted for multiple testing by applying the Benjamini-Hochberg procedure, separately for the main screen and for each of the two incubation times of the pilot screen.

allresults1 = lapply(measurements, function(measurement) {
  testFactors(msrmnt=measurement, factors=testingFactors[["mutation"]],
              test="student", batch=NA)
})

effects1 = lapply(allresults1, function(res) res[["effect"]])
results1 = lapply(allresults1, function(res) res[["test"]])

results1 = lapply(results1, function(res) {
  res$adj.pval = p.adjust(res$pval, method="BH")
  res
})

measurements1 = measurements
testingFactors1 = testingFactors
beelabs1 = beelabs

17.2.4 Volcano plots: summary of the results

In this section we summarize all significant associations for a given mutation in a form of volcano plots. The pink color spectrum indicates a resistant phenotype and the blue color spectrum a sensitive phenotype in the presence of the tested mutation. FDR of 10 % was used.

IGHV.

Trisomy 12.

17.3 Associations of drug responses with genomic features in CLL independently of IGHV status

To assess associations between drug effects and genomic features independently of IGHV status, we performed the analyses separately within U-CLL and M-CLL samples. These analyses were only performed if 3 or more samples carried the analyzed feature within both M-CLL and U-CLL subgroups.

Find out which factors we will be testing (with threshold >2 patients in each of the four groups).

fac2test = lapply(measurements, function(mea) {
  tf = testingFactors[["mutation"]][rownames(mea),]
  names(which(apply(tf,2,function(i) {
    if(length(table(i,tf[,1]))!=4)
      FALSE
    else
      all(table(i,tf[,1])>2)
  })))
})

Construct the table with drug responses.

measurements2 = setNames(lapply(names(measurements), function(mea) {
  ig = testingFactors[["mutation"]][rownames(measurements[[mea]]),"IGHV"]
  patU = names(which(ig==1))
  patM = names(which(ig==0))
  list(U=measurements[[mea]][patU,], M=measurements[[mea]][patM,])
}), nm=names(measurements))

Testing.

allresults2 = setNames(lapply(names(measurements2), function(mea) {
  list(U = testFactors(msrmnt=measurements2[[mea]]$U,
                       factors=testingFactors[["mutation"]][
                         rownames(measurements2[[mea]]$U),fac2test[[mea]]]),
  M = testFactors(msrmnt=measurements2[[mea]]$M,
                  factors=testingFactors[["mutation"]][
                    rownames(measurements2[[mea]]$M),fac2test[[mea]]]))
}), nm=names(measurements2))

Divide results to list of effects and list of results.

results2 = lapply(allresults2, function(allres) {
  list(U=allres[["U"]][["test"]], M=allres[["M"]][["test"]])
})

effects2 = lapply(allresults2, function(allres) {
  list(U=allres[["U"]][["effect"]], M=allres[["M"]][["effect"]])
})

p-values were adjusted for multiple testing by applying the Benjamini-Hochberg procedure to joined results for M-CLL and U-CLL for each screen separately.

results2 = lapply(results2, function(res) {
  tmp = p.adjust(c(res$U$pval,res$M$pval), method="BH")
  l = length(tmp)
  res$U$adj.pval = tmp[1:(l/2)]
  res$M$adj.pval = tmp[(l/2+1):l]
  res
})

testingFactors2 = testingFactors
beelabs2 = beelabs

As an example we show the summary of the results for trisomy 12.

Trisomy 12 - IGHV unmutated.

Trisomy 12 - IGHV mutated.

17.4 Drug response dependance on cell origin of disease

We tested for drug sensitivity differences between different disease entities. The largest group, the CLL samples, was used as the baseline for these comparisons. Here, we compared drug sensitivities across studied diseases entities against all CLL samples Only groups with 3 or more data points were considered (T-PLL, AML, MZL, MCL, B-PLL, HCL, LPL and healthy donor cells hMNC). p-values were adjusted for multiple testing by applying the Benjamini-Hochberg procedure to results for each disease entity separately.

Here we prepare the data for testing the drug response dependence on cell origin of disease.

## VIABILITIES
### main
pats = colnames(drpar)
# make the big matrix with viabilities
measureTMP = do.call(cbind,
                     lapply(list("viaraw.1","viaraw.2","viaraw.3",
                                 "viaraw.4","viaraw.5"), function(viac) {
  tmp = t(assayData(drpar)[[viac]][,pats])
  colnames(tmp) = paste(colnames(tmp),
                        conctab[colnames(tmp),
                                paste0("c",substring(viac,8,8))], sep="-")
  tmp
}))

# select diagnosis to work on
pats4diag = tapply(pats, patmeta[pats,"Diagnosis"], function(i) i)

diags = names(which(table(patmeta[pats,"Diagnosis"])>2))
diags = diags[-which(diags=="CLL")]
# there will be two lists: one with CLL and the second with other diagnosis
# (first one is passed as argument to the createObjects function)
pats4diag2 = pats4diag[diags]

# function that creates testingFactors, measurements and beelabs
createObjects = function(pats4diag1, beefix="") {
  
  measurements=list()
  testingFactors=list()
  # make the list for testing
  for(m in names(pats4diag1)) {
    for(n in names(pats4diag2)) {
      p1 = pats4diag1[[m]]
      p2 = pats4diag2[[n]]
      measurements[[paste(m,n,sep=".")]] = measureTMP[c(p1, p2),]
      testingFactors[[paste(m,n,sep=".")]] = setNames(c(rep(0,length(p1)),
                                                        rep(1,length(p2))),
                                                      nm=c(p1,p2))
    }
  }
  
  # reformat testingFactors to the df
  pats=sort(unique(c(unlist(pats4diag1),unlist(pats4diag2))))
  testingFactors = as.data.frame(
    do.call(cbind, lapply(testingFactors, function(tf) {
    setNames(tf[pats], nm=pats)
  })))
  
  # Labels for beeswarms
  beelabs = t(sapply(colnames(testingFactors), function(fac) {
    tmp = unlist(strsplit(fac, ".", fixed=TRUE))
    c(`0`=paste0(tmp[1], beefix),`1`=tmp[2])
  }))
  
  return(list(msrmts=measurements, tf=testingFactors, bl=beelabs))
}

# all CLL together
res = createObjects(pats4diag1=pats4diag["CLL"])
measurements3 = res$msrmts
testingFactors3 = res$tf
beelabs3 = res$bl

Testing.

allresults3 = setNames(lapply(names(measurements3), function(mea) {
  tmp = data.frame(testingFactors3[,mea])
  colnames(tmp) = mea
  rownames(tmp) = rownames(testingFactors3)
  testFactors(msrmnt=measurements3[[mea]], factors=tmp)
}), nm=names(measurements3))

Divide results to list of effects and list of t-test results.

results3 = lapply(allresults3, function(res) res[["test"]])
effects3 = lapply(allresults3, function(res) res[["effect"]])

Adjust p-values.

results3 = lapply(results3, function(res) {
  res$adj.pval = p.adjust(res$pval, method="BH")
  res
})

We summarize the result as a heat map.

18 Effect of mutation on drug response - examples

data(drugs, lpdAll, mutCOM, conctab)

lpdCLL = lpdAll[ , lpdAll$Diagnosis %in% "CLL"]

Here we highlight the selection of mutation-drug response associations within the different disease subtypes.

#FIG# 4D
BloodCancerMultiOmics2017:::beeF(diag="CLL", "D_010_2", "TP53", cs=T, y1=0, y2=1.2, custc=T)

BloodCancerMultiOmics2017:::beeF(diag="CLL", "D_006_3", "TP53", cs=T,y1=0, y2=1.2, custc=T)

BloodCancerMultiOmics2017:::beeF(diag="CLL", "D_063_5", "CREBBP", cs=T, y1=0, y2=1.2, custc=T)

BloodCancerMultiOmics2017:::beeF(diag="CLL", "D_056_5", "PRPF8", cs=T, y1=0, y2=1.2, custc=T)

BloodCancerMultiOmics2017:::beeF(diag="CLL", "D_012_5", "trisomy12", cs=F,y1=0.6, y2=1.2, custc=T)

#FIG# S17
par(mfrow = c(3,4), mar=c(5,4.5,5,2))

BloodCancerMultiOmics2017:::beeF(diag="CLL", drug="D_159_3", mut="TP53", cs=T, y1=0, y2=1.2, custc=T)
BloodCancerMultiOmics2017:::beeF(diag="CLL", "D_006_2", "del17p13", cs=T, y1=0, y2=1.2, custc=T)
BloodCancerMultiOmics2017:::beeF(diag="CLL", "D_159_3", "del17p13", cs=T, y1=0, y2=1.2, custc=T)
BloodCancerMultiOmics2017:::beeF(diag="CLL", "D_010_2", "del17p13", cs=T, y1=0, y2=1.2, custc=T)
BloodCancerMultiOmics2017:::beeF(diag="MCL", "D_006_2", "TP53", cs=T,  y1=0, y2=1.2, custc=T)
BloodCancerMultiOmics2017:::beeF(diag="MCL", "D_010_2", "TP53", cs=T,  y1=0, y2=1.2, custc=T)
BloodCancerMultiOmics2017:::beeF(diag=c("HCL", "HCL-V"), "D_012_3", "BRAF", cs=T, y1=0, y2=1.2,
            custc=F)
BloodCancerMultiOmics2017:::beeF(diag=c("HCL", "HCL-V"), "D_083_4", "BRAF", cs=T, y1=0, y2=1.2,
            custc=F)
BloodCancerMultiOmics2017:::beeF(diag="CLL", "D_012_5", "KRAS", cs=T, y1=0.6, y2=1.2, custc=T)
BloodCancerMultiOmics2017:::beeF(diag="CLL", "D_083_5", "KRAS", cs=T, y1=0.6, y2=1.45, custc=T)
BloodCancerMultiOmics2017:::beeF(diag="CLL", "D_081_4", "UMODL1", cs=T,  y1=0, y2=1.2, custc=T)
BloodCancerMultiOmics2017:::beeF(diag="CLL", "D_001_4", "UMODL1", cs=T,  y1=0, y2=1.2, custc=T)

Bee swarms for pretreatment.

#FIG# S18
par(mfrow = c(2,3), mar=c(5,4.5,2,2)) 

BloodCancerMultiOmics2017:::beePretreatment(lpdCLL, "D_006_1:5", y1=0.2, y2=1.3, fac="TP53",
                       val=c(0,1), name="Fludarabine")
BloodCancerMultiOmics2017:::beePretreatment(lpdCLL, "D_006_1:5", y1=0.2, y2=1.3, fac="TP53",
                       val=c(0),   name="p53 wt:  Fludarabine")
BloodCancerMultiOmics2017:::beePretreatment(lpdCLL, "D_006_1:5", y1=0.2, y2=1.3, fac="TP53",
                       val=c(1),   name="p53 mut: Fludarabine")

BloodCancerMultiOmics2017:::beePretreatment(lpdCLL, "D_002_4:5", y1=0.4, y2=1.3,
                       fac="IGHV Uppsala U/M", val=c(0,1), name="Ibrutinib")
BloodCancerMultiOmics2017:::beePretreatment(lpdCLL, "D_002_4:5", y1=0.4, y2=1.3,
                       fac="IGHV Uppsala U/M", val=c(0), name="U-CLL: Ibrutinib")
BloodCancerMultiOmics2017:::beePretreatment(lpdCLL, "D_002_4:5", y1=0.4, y2=1.3,
                       fac="IGHV Uppsala U/M", val=c(1), name="M-CLL: Ibrutinib")

options(stringsAsFactors=FALSE)

19 Associations of drug responses with mutations in CLL (IGHV not included)

In this part, we use both gene mutations and chromosome aberrations to test for gene-drug response associations. In contrast to the analysis done previously, we exclude IGHV status from testing. Additionally, we use information on patient treatment status to account for its effect on drug response screening.

19.1 Additional functions

Accessor functions:

# get drug responsee data
get.drugresp <- function(lpd) {
  drugresp = t(Biobase::exprs(lpd[fData(lpd)$type == 'viab'])) %>%
    dplyr::tbl_df() %>% dplyr::select(-ends_with(":5")) %>%
    dplyr::mutate(ID = colnames(lpd)) %>%
    tidyr::gather(drugconc, viab, -ID) %>%
    dplyr::mutate(drug = drugs[substring(drugconc, 1, 5), "name"],
           conc = sub("^D_([0-9]+_)", "", drugconc)) %>%
    dplyr::mutate(conc = as.integer(gsub("D_CHK_", "", conc)))
  
  drugresp
}

# extract mutations and IGHV status
get.somatic <- function(lpd) {
  somatic = t(Biobase::exprs(lpd[Biobase::fData(lpd)$type == 'gen' | 
                                   Biobase::fData(lpd)$type == 'IGHV']))
  ## rename IGHV Uppsala to 'IGHV' (simply)
  colnames(somatic)[grep("IGHV", colnames(somatic))] = "IGHV"
  
  ## at least 3 patients should have this mutation
  min.samples = which(Matrix::colSums(somatic, na.rm = T) > 2)
  somatic = dplyr::tbl_df(somatic[, min.samples]) %>%
    dplyr::select(-one_of("del13q14_bi", "del13q14_mono", 
                          "Chromothripsis", "RP11-766F14.2")) %>%
    dplyr::rename(del13q14 = del13q14_any) %>% 
    dplyr::mutate(ID = colnames(lpd)) %>%
    tidyr::gather(mutation, mut.value, -ID)
  somatic
}

Define the ggplot themes

t1<-theme(                              
  plot.background = element_blank(), 
  panel.grid.major = element_line(),
  panel.grid.major.x = element_line(linetype = "dotted", colour = "grey"),
  panel.grid.minor = element_blank(), 
  panel.border = element_blank(), 
  panel.background = element_blank(),
  axis.line = element_line(size=.4),
  axis.line.x = element_line(),
  axis.line.y = element_line(),
  axis.text.x  = element_text(angle=90, size=12, 
                              face="bold", hjust = 1, vjust = 0.4),
  axis.text.y = element_text(size = 14),
  axis.ticks.x = element_line(linetype = "dotted"),
  axis.ticks.length = unit(0.3,"cm"),
  axis.title.x = element_text(face="bold", size=16), 
  axis.title.y = element_text(face="bold", size=20),
  plot.title = element_text(face="bold", size=16, hjust = 0.5)
)

## theme for the legend
t.leg <-  theme(legend.title = element_text(face='bold', 
                                            hjust = 1, size=11),
                legend.position = c(0, 0.76),
                legend.key = element_blank(),
                legend.text = element_text(size=12),
                legend.background = element_rect(color = "black"))

Define the main color palette:

colors= c("#015872","#3A9C94","#99977D","#ffbf00","#5991C7","#99cc00",
          "#D5A370","#801416","#B2221C","#ff5050","#33bbff","#5c5cd6",
          "#E394BB","#0066ff","#C0C0C0")

Get pretreatment status:

get.pretreat <- function(patmeta, lpd) {
  patmeta = patmeta[rownames(patmeta) %in% colnames(lpd),]
  data.frame(ID=rownames(patmeta), pretreat=!patmeta$IC50beforeTreatment) %>% 
    mutate(pretreat = as.factor(pretreat))
  
}

Merge drug response, pretreatment information and somatic mutation data sets

make.dr <- function(resp, features, patmeta, lpd) {
  treat = get.pretreat(patmeta, lpd)
  dr = full_join(resp, features) %>% 
    inner_join(treat) 
}

Summarize viabilities using Tukey’s medpolish

get.medp <- function(drugresp) {
  tab = drugresp %>% group_by(drug, conc) %>% 
    do(data.frame(v = .$viab, ID = .$ID)) %>% spread(ID, v)
  
  med.p = foreach(n=unique(tab$drug), .combine = cbind) %dopar% {
    tb = filter(tab, drug == n) %>% ungroup() %>% dplyr::select(-(drug:conc)) %>% 
      as.matrix %>% `rownames<-`(1:5)
    mdp = stats::medpolish(tb)
    df = as.data.frame(mdp$col) + mdp$overall
    colnames(df) <- n
    df
  }
  
  medp.viab = dplyr::tbl_df(med.p) %>% dplyr::mutate(ID = rownames(med.p)) %>%
    tidyr::gather(drug, viab, -ID) 
  medp.viab
}

Process labels for the legend:

get.labels <- function(pvals) {
  lev = levels(factor(pvals$mutation))
  lev = gsub("^(gain)([0-9]+)([a-z][0-9]+)$", "\\1(\\2)(\\3)", lev)
  lev =  gsub("^(del)([0-9]+)([a-z].+)$", "\\1(\\2)(\\3)", lev)
  lev = gsub("trisomy12", "trisomy 12", lev)
  lev
}

Get order of mutations

get.mutation.order <- function(lev) {
  ord = c("trisomy 12", "TP53",
          "del(11)(q22.3)", "del(13)(q14)",
          "del(17)(p13)",
          "gain(8)(q24)",
          "BRAF", "CREBBP", "PRPF8",
          "KLHL6", "NRAS", "ABI3BP", "UMODL1")
  mut.order = c(match(ord, lev),
                grep("Other", lev), grep("Below", lev))
  
  mut.order
}

Group drugs by pathway/target

get.drug.order <- function(pvals, drugs) {
  ## determine drug order by column sums of log-p values
  dr.order = pvals %>% 
    mutate(logp = -log10(p.value)) %>% 
    group_by(drug) %>% summarise(logsum = sum(logp)) 
  
  dr.order = inner_join(dr.order, pvals %>%
                          group_by(drug) %>% 
                          summarise(n = length(unique(mutation)))) %>% 
    arrange(desc(n), desc(logsum))
  
  dr.order = inner_join(dr.order, drugs %>% rename(drug = name))
  
  dr.order = left_join(dr.order, dr.order %>% 
                         group_by(`target_category`) ) %>%
    arrange(`target_category`, drug) %>%
    filter(! `target_category` %in% c("ALK", "Angiogenesis", "Other")) %>%
    filter(!is.na(`target_category`))
  
  dr.order
}

Add pathway annotations for selected drug classes

make.annot <- function(g, dr.order) {
  # make a color palette for drug pathways
  drug.class = c("#273649", "#647184", "#B1B2C8",
                 "#A7755D", "#5D2E1C", "#38201C")
  pathways = c("BH3","B-cell receptor","DNA damage",
               "MAPK", "PI3K", "Reactive oxygen species")
  names(pathways) = c("BH3", "BCR inhibitors", "DNA damage",
                      "MAPK", "PI3K", "ROS")
  
  for (i in 1:6) {
    prange = grep(pathways[i], dr.order$`target_category`)
    path.grob <- grobTree(rectGrob(gp=gpar(fill=drug.class[i])),
                          textGrob(names(pathways)[i], 
                                   gp = gpar(cex =0.8, col = "white")))
    g = g + 
      annotation_custom(path.grob, 
                        xmin = min(prange) -0.25 - 0.1 * ifelse(i == 2, 1, 0), 
                        xmax = max(prange) + 0.25 + 0.1 * ifelse(i == 2, 1, 0), 
                        ymin = -0.52, ymax = -0.2)
  }
  g
}

Define a function for glegend

g_legend<-function(a.gplot){
  tmp <- ggplot_gtable(ggplot_build(a.gplot))
  leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
  legend <- tmp$grobs[[leg]]
  legend
} ## end define

19.2 Data setup

Load the data.

data(list=c("conctab", "drugs", "lpdAll", "patmeta"))

Get drug response data.

lpdCLL <- lpdAll[ , lpdAll$Diagnosis=="CLL"]
## extract viability data for 5 different concentrations
drugresp = get.drugresp(lpdCLL)

Get somatic mutations and structural variants.

## extract somatic variants
somatic = get.somatic(lpdCLL) %>% 
  mutate(mut.value = as.factor(mut.value))

19.3 Test for drug-gene associations

Summarize drug response using median polish.

## compute median polish patient effects and recalculate p-values
medp.viab = get.medp(drugresp)
dr = make.dr(medp.viab, somatic, patmeta, lpdCLL)

## Joining, by = "ID"
## Joining, by = "ID"

Calculate $p$ values and FDR (10%).

pvals = dr %>% group_by(drug, mutation) %>%
  do(tidy(t.test(viab ~ mut.value, data = ., var.equal = T))) %>%
  dplyr::select(drug, mutation, p.value)

# compute the FDR threshold
fd.thresh = 10
padj = p.adjust(pvals$p.value, method = "BH")
fdr = max(pvals$p.value[which(padj <= fd.thresh/100)])

Remove unnecessary mutations and bad drugs.

# selected mutations
select.mutations = c("trisomy12", "TP53",
        "del11q22.3", "del13q14",
        "del17p13",
        "gain8q24",
        "BRAF", "CREBBP", "PRPF8",
        "KLHL6", "NRAS", "ABI3BP", "UMODL1")

pvals = filter(pvals, mutation != 'IGHV')
pvals = pvals %>% ungroup() %>%
  mutate(mutation = ifelse(p.value > fdr, 
                           paste0("Below ", fd.thresh,"% FDR"), mutation)) %>%
  mutate(mutation = ifelse(!(mutation %in% select.mutations) & 
                             !(mutation == paste0("Below ", fd.thresh,"% FDR")), 
                           "Other", mutation)) %>%
  filter(drug != "bortezomib" & drug != "NSC 74859")

Reshape names of genomic rearrangements.

## order of mutations
lev = get.labels(pvals)
folge = get.mutation.order(lev)

Set order of drugs.

drugs = drugs[,c("name", "target_category")]
# get the drug order
dr.order = get.drug.order(pvals, drugs)

## Joining, by = "drug"
## Joining, by = "drug"

## Joining, by = c("drug", "logsum", "n", "target_category")

19.4 Plot results

19.4.1 Main Figure

Function for generating the figure.

plot.pvalues <- function(pvals, dr.order, folge, colors, shapes) {
  g = ggplot(data = filter(pvals, drug %in% dr.order$drug)) +
  geom_point(aes(x = factor(drug, levels = dr.order$drug), y = -log10(p.value), 
                 colour = factor(mutation, levels(factor(mutation))[folge]),
                 shape  = factor(mutation, levels(factor(mutation))[folge])),
             size=5, show.legend = T)  + 
  scale_color_manual(name = "Mutations",
                     values = colors,
                     labels = lev[folge]) + 
  scale_shape_manual(name = "Mutations",
                     values = shapes,
                     labels = lev[folge]) + t1 + 
  labs(x = "", y = expression(paste(-log[10], "p")), title = "") +
    scale_y_continuous(expression(italic(p)*"-value"),
                       breaks=seq(0,10,5),
                       labels=math_format(expr=10^.x)(-seq(0,10,5))) 
  g
}

#FIG# 4A
## plot the p-values 
g = plot.pvalues(pvals, dr.order, folge, 
                 colors, shapes = c(rep(16,13), c(1,1)))

## add FDR threshold
g = g + geom_hline(yintercept = -log10(fdr),
                   linetype="dashed", size=0.3)

g = g + 
  annotation_custom(grob = textGrob(label = paste0("FDR", fd.thresh, "%"), 
                                    hjust = 1, vjust = 1, 
                                    gp = gpar(cex = 0.5,
                                              fontface = "bold",
                                              fontsize = 25)),
                    ymin = -log10(fdr) - 0.2, 
                    ymax = -log10(fdr) + 0.5, 
                    xmin = -1.3, xmax = 1.5) + 
  theme(legend.position = "none")

# generate pathway/target annotations for certain drug classes
#g = make.annot(g, dr.order)

# legend guide
leg.guides <- guides(colour = guide_legend(ncol = 1, 
                                           byrow = TRUE,
                                           override.aes = list(size = 3),
                                           title = "Mutations",
                                           label.hjust = 0,
                                           keywidth = 0.4,
                                           keyheight = 0.8), 
                      shape = guide_legend(ncol = 1, 
                                           byrow = TRUE,
                                           title = "Mutations",
                                           label.hjust = 0,
                                           keywidth = 0.4,
                                           keyheight = 0.8))

# create a legend grob
legend = g_legend(g + t.leg +  leg.guides)

## arranget the main plot and the legend
# using grid graphics
gt <- ggplot_gtable(ggplot_build(g + theme(legend.position = 'none')))
gt$layout$clip[gt$layout$name == "panel"] <- "off"

grid.arrange(gt, legend,
             ncol=2, nrow=1, widths=c(0.92,0.08))

19.4.2 Supplementary Figure (incl. pretreatment)

In the supplementary figure we use pretreatment status as a blocking factor, i.e. we model drug sensitivity - gene variant associations as: lm(viability ~ mutation + pretreatment)

## lm(viab ~ mutation + pretreatment.status)
pvals = dr %>% group_by(drug, mutation) %>%
  do(tidy(lm(viab ~ mut.value + pretreat, data = .))) %>%
  filter(term == 'mut.value1') %>%
  dplyr::select(drug, mutation, p.value)

# compute the FDR threshold
fd.thresh = 10
padj = p.adjust(pvals$p.value, method = "BH")
fdr = max(pvals$p.value[which(padj <= fd.thresh/100)])


pvals = filter(pvals, mutation != 'IGHV')
pvals = pvals %>% ungroup() %>%
  mutate(mutation = ifelse(p.value > fdr,
                           paste0("Below ", fd.thresh,"% FDR"),
                           mutation)) %>%
  mutate(mutation = ifelse(!(mutation %in% select.mutations) &
                             !(mutation == paste0("Below ",
                                                  fd.thresh,"% FDR")), 
                           "Other", mutation)) %>%
  filter(drug != "bortezomib" & drug != "NSC 74859")


lev = get.labels(pvals)
folge = get.mutation.order(lev)

# get the drug order
dr.order = get.drug.order(pvals, drugs)

## Joining, by = "drug"
## Joining, by = "drug"

## Joining, by = c("drug", "logsum", "n", "target_category")

mut.order = folge[!is.na(folge)]

After recomputing the $p$-values (using a linear model that accounts for pretreatment status), plot the figure:

#FIG# S19
## plot the p-values 
g = plot.pvalues(pvals, dr.order, mut.order, 
                 colors[which(!is.na(folge))], shapes = c(rep(16,9), c(1,1)))

## add FDR threshold
g = g + geom_hline(yintercept = -log10(fdr),
                   linetype="dashed", size=0.3)

g = g + 
  annotation_custom(grob = textGrob(label = paste0("FDR", fd.thresh, "%"), 
                                    hjust = 1, vjust = 1, 
                                    gp = gpar(cex = 0.5,
                                              fontface = "bold",
                                              fontsize = 25)),
                    ymin = -log10(fdr) - 0.2, 
                    ymax = -log10(fdr) + 0.5, 
                    xmin = -1.3, xmax = 1.5) + 
  theme(legend.position = "none")

# generate pathway/target annotations for certain drug classes
#g = make.annot(g, dr.order)

# legend guide
leg.guides <- guides(colour = guide_legend(ncol = 1, 
                                           byrow = TRUE,
                                           override.aes = list(size = 3),
                                           title = "Mutations",
                                           label.hjust = 0,
                                           keywidth = 0.4,
                                           keyheight = 0.8), 
                      shape = guide_legend(ncol = 1, 
                                           byrow = TRUE,
                                           title = "Mutations",
                                           label.hjust = 0,
                                           keywidth = 0.4,
                                           keyheight = 0.8))

# create a legend grob
legend = g_legend(g + t.leg +  leg.guides)

## arranget the main plot and the legend
# using grid graphics
gt <- ggplot_gtable(ggplot_build(g + theme(legend.position = 'none')))
gt$layout$clip[gt$layout$name == "panel"] <- "off"

grid.arrange(gt, legend,
             ncol=2, nrow=1, widths=c(0.92,0.08))

19.5 Comparison of $P$-Values

pvals.main = dr %>% group_by(drug, mutation) %>%
  do(tidy(t.test(viab ~ mut.value, data = ., var.equal = T))) %>%
  dplyr::select(drug, mutation, p.value)

p.main.adj = p.adjust(pvals.main$p.value, method = "BH")
fdr.main = max(pvals.main$p.value[which(p.main.adj <= fd.thresh/100)])

pvals.main = filter(pvals.main, mutation != "IGHV") %>%
             rename(p.main = p.value)


## lm(viab ~ mutation + pretreatment.status)
pvals.sup = dr %>% group_by(drug, mutation) %>%
  do(tidy(lm(viab ~ mut.value + pretreat, data = .))) %>%
  filter(term == 'mut.value1') %>%
  dplyr::select(drug, mutation, p.value)

p.sup.adj = p.adjust(pvals.sup$p.value, method = "BH")
fdr.sup = max(pvals.sup$p.value[which(p.sup.adj <= fd.thresh/100)])

pvals.sup = filter(pvals.sup, mutation != "IGHV") %>%
  rename(p.sup = p.value)


pvals = inner_join(pvals.main, pvals.sup)

## Joining, by = c("drug", "mutation")

pvals = mutate(pvals, signif = ifelse(p.main > fdr.main, 
                                      ifelse(p.sup > fdr.sup, 
                                             "Below 10% FDR in both models", 
                                             "Significant with pretreatment accounted"), 
                                      ifelse(p.sup > fdr.sup, 
                                             "Significant without pretreatment in the model", 
                                             "Significant in both models")))

t2<-theme(                              
  plot.background = element_blank(), 
  panel.grid.major = element_line(),
  panel.grid.major.x = element_line(),
  panel.grid.minor = element_blank(), 
  panel.border = element_blank(), 
  panel.background = element_blank(),
  axis.line = element_line(size=.4),
  axis.line.x = element_line(),
  axis.line.y = element_line(),
  axis.text.x  = element_text(size=12),
  axis.text.y = element_text(size = 12),
  axis.title.x = element_text(face="bold", size=12), 
  axis.title.y = element_text(face="bold", size=12),
  legend.title = element_text(face='bold', 
                                            hjust = 1, size=10),
  legend.position = c(0.78, 0.11),
  legend.key = element_blank(),
  legend.text = element_text(size=10),
  legend.background = element_rect(color = "black")
)

#FIG# S19
ggplot(pvals, aes(-log10(p.main), -log10(p.sup), colour = factor(signif))) + 
  geom_point() + t2 + labs(x = expression(paste(-log[10], "p, pretreatment not considered", sep = "")),
                                   y = expression(paste(-log[10], "p, accounting for pretreatment", sep = ""))) +
  coord_fixed() +
  scale_x_continuous(breaks = seq(0,9,by = 3)) + 
  scale_y_continuous(breaks = seq(0,9,by = 3)) + 
  geom_abline(intercept = 0, slope = 1, linetype = "dashed") + 
  scale_color_manual(name = "Statistical Significance",
                     values = c("#F1BB7B","#669999", "#FD6467", "#5B1A18"))

What are the drug-mutation pairs that are significant only in one or another model (i.e. only without pretreatment or with pretreatment included)?

signif.in.one = filter(pvals, 
                       signif %in% c("Significant with pretreatment accounted",
                            "Significant without pretreatment in the model")) %>%
        arrange(signif)
kable(signif.in.one, digits = 4,
      align = c("l", "l", "c", "c", "c"),
      col.names = c("Drug", "Mutation", "P-value (Main)",
                    "P-value (Supplement)", "Statistical significance"),
      format.args = list(width = 14))

Drug	Mutation	P-value (Main)	P-value (Supplement)	Statistical significance
AT13387	LAMA1	0.0198	0.0014	Significant with pretreatment accounted
AZD7762	trisomy12	0.0065	0.0004	Significant with pretreatment accounted
MK-2206	CREBBP	0.0069	0.0021	Significant with pretreatment accounted
arsenic trioxide	del17p13	0.0150	0.0019	Significant with pretreatment accounted
spebrutinib	trisomy12	0.0096	0.0024	Significant with pretreatment accounted
AT13387	KLHL6	0.0025	0.0068	Significant without pretreatment in the model
AT13387	gain8q24	0.0009	0.0207	Significant without pretreatment in the model
AZD7762	del11q22.3	0.0005	0.0108	Significant without pretreatment in the model
BIX02188	KLHL6	0.0023	0.0035	Significant without pretreatment in the model
BX912	del11q22.3	0.0009	0.0069	Significant without pretreatment in the model
BX912	gain8q24	0.0024	0.0144	Significant without pretreatment in the model
CCT241533	del11q22.3	0.0007	0.0084	Significant without pretreatment in the model
KX2-391	del11q22.3	0.0005	0.0029	Significant without pretreatment in the model
MIS-43	trisomy12	0.0028	0.0058	Significant without pretreatment in the model
MK-1775	del11q22.3	0.0001	0.0032	Significant without pretreatment in the model
PF 477736	gain8q24	0.0012	0.0460	Significant without pretreatment in the model
SD07	KLHL6	0.0022	0.0040	Significant without pretreatment in the model
UCN-01	del11q22.3	0.0006	0.0030	Significant without pretreatment in the model
VE-821	del11q22.3	0.0013	0.0085	Significant without pretreatment in the model
doxorubicine	TP53	0.0017	0.0050	Significant without pretreatment in the model
doxorubicine	del17p13	0.0019	0.0047	Significant without pretreatment in the model
encorafenib	BRAF	0.0020	0.0069	Significant without pretreatment in the model
ibrutinib	del11q22.3	0.0019	0.0174	Significant without pretreatment in the model
navitoclax	UMODL1	0.0015	0.0029	Significant without pretreatment in the model
nutlin-3	del13q14	0.0016	0.0026	Significant without pretreatment in the model
spebrutinib	del11q22.3	0.0007	0.0049	Significant without pretreatment in the model
tamatinib	del11q22.3	0.0017	0.0110	Significant without pretreatment in the model
tipifarnib	PRPF8	0.0029	0.0028	Significant without pretreatment in the model
vorinostat	PRSS50	0.0022	0.0031	Significant without pretreatment in the model

Produce LaTeX output for the Supplement:

print(kable(signif.in.one, format = "latex", digits = 4,
       align = c("l", "l", "c", "c", "c"),
      col.names = c("Drug", "Mutation", "P-value (Main)",
                    "P-value (Supplement)", "Statistical significance")))

20 Association between HSP90 inhibitor response and IGHV status

We investigated additional HSP90 inhibitors (ganetespib, onalespib) in 120 patient samples from the original cohort (CLL), for whom IGHV status was available.

Load the additional drug response dataset.

data(list= c("validateExp","lpdAll"))

Preparing table for association test and plotting.

plotTab <- filter(validateExp, Drug %in% c("Ganetespib", "Onalespib")) %>%
  mutate(IGHV = Biobase::exprs(lpdAll)["IGHV Uppsala U/M", patientID]) %>%
  filter(!is.na(IGHV)) %>%
  mutate(IGHV = as.factor(ifelse(IGHV == 1, "M","U")),
         Concentration = as.factor(Concentration))

Association test using Student’s t-test.

pTab <- group_by(plotTab, Drug, Concentration) %>%
  do(data.frame(p = t.test(viab ~ IGHV, .)$p.value)) %>%
  mutate(p = format(p, digits =2, scientific = TRUE))

Bee swarm plot.

pList <- group_by(plotTab, Drug) %>% 
  do(plots = ggplot(., aes(x=Concentration, y = viab)) + 
       stat_boxplot(geom = "errorbar", width = 0.3,
                    position = position_dodge(width=0.6), 
                    aes(dodge = IGHV)) +
       geom_boxplot(outlier.shape = NA, position = position_dodge(width=0.6), 
                    col="black", width=0.5, aes(dodge = IGHV)) + 
       geom_beeswarm(size=1,dodge.width=0.6, aes(col=IGHV)) +
       theme_classic() +
       scale_y_continuous(expand = c(0, 0),breaks=seq(0,1.2,0.20)) +
       coord_cartesian(ylim = c(0,1.30)) +
       xlab("Concentration (µM)") + ylab("Viability") + 
       ggtitle(unique(.$Drug)) +
       geom_text(data=filter(pTab, Drug == unique(.$Drug)), y = 1.25, 
                 aes(x=Concentration, label=sprintf("p=%s",p)),
                 size = 4.5) + 
      theme(axis.line.x = element_blank(),
            axis.ticks.x = element_blank(),
             axis.text  = element_text(size=15),
            axis.title = element_text(size =15),
             legend.text = element_text(size=13),
            legend.title = element_text(size=15),
             plot.title = element_text(face="bold", hjust=0.5, size=17),
             plot.margin = unit(c(0.5,0.5,0.5,0.5), "cm"))) 
grid.arrange(grobs = pList$plots, ncol =2)

The HSP90 inhibitors had higher activity in U-CLL, consistent with the result for AT13387. These data suggest that the finding of BCR (IGHV mutation) specific effects appears to be a compound class effect and further solidifies the results.

21 Association between MEK/ERK inhibitor response and trisomy12

To further investigate the association of trisomy 12 and MEK dependence, we investigated additional MEK and ERK inhibitors (cobimetinib, SCH772984 and trametinib) in 119 patients from the original cohort, for whom trisomy 12 status was available.

Preparing table for association test and plotting.

plotTab <- filter(validateExp, Drug %in%
                    c("Cobimetinib","SCH772984","Trametinib")) %>%
  mutate(Trisomy12 = Biobase::exprs(lpdAll)["trisomy12", patientID]) %>%
  filter(!is.na(Trisomy12)) %>%
  mutate(Trisomy12 = as.factor(ifelse(Trisomy12 == 1, "present","absent")),
         Concentration = as.factor(Concentration))

Association test using Student’s t-test.

pTab <- group_by(plotTab, Drug, Concentration) %>% 
  do(data.frame(p = t.test(viab ~ Trisomy12, .)$p.value)) %>%
  mutate(p = format(p, digits =2, scientific = FALSE))

Bee swarm plot.

pList <- group_by(plotTab, Drug) %>% 
  do(plots = ggplot(., aes(x=Concentration, y = viab)) + 
       stat_boxplot(geom = "errorbar", width = 0.3,
                    position = position_dodge(width=0.6), 
                    aes(dodge = Trisomy12)) +
       geom_boxplot(outlier.shape = NA, position = position_dodge(width=0.6), 
                    col="black", width=0.5, aes(dodge = Trisomy12)) + 
       geom_beeswarm(size=1,dodge.width=0.6, aes(col=Trisomy12)) +
       theme_classic() +
       scale_y_continuous(expand = c(0, 0),breaks=seq(0,1.2,0.2)) +
       coord_cartesian(ylim = c(0,1.3)) +
       xlab("Concentration (µM)") + ylab("Viability") + 
       ggtitle(unique(.$Drug)) +
       geom_text(data=filter(pTab, Drug == unique(.$Drug)), y = 1.25, 
                 aes(x=Concentration, label=sprintf("p=%s",p)), size = 5) + 
       theme(axis.line.x = element_blank(),
             axis.ticks.x = element_blank(),
             axis.text  = element_text(size=15),
             axis.title = element_text(size =15),
             legend.text = element_text(size=13),
             legend.title = element_text(size=15),
             plot.title = element_text(face="bold", hjust=0.5, size=17),
             plot.margin = unit(c(0.5,0,0.5,0), "cm"))) 

grid.arrange(grobs = pList$plots, ncol =1)

Consistent with the data from the screen, samples with trisomy 12 showed higher sensitivity to MEK/ERK inhibitors.

22 Expression profiling analysis of trisomy 12

Load and prepare expression data set.

data(list=c("dds", "patmeta", "mutCOM"))

#load genesets
gmts = list(
  H=system.file("extdata","h.all.v5.1.symbols.gmt",
                package="BloodCancerMultiOmics2017"),
  C6=system.file("extdata","c6.all.v5.1.symbols.gmt",
                 package="BloodCancerMultiOmics2017"),
  KEGG=system.file("extdata","c2.cp.kegg.v5.1.symbols.gmt",
                   package="BloodCancerMultiOmics2017"))

Choose CLL samples with trisomy 12 annotation from the gene expression data set.

#only choose CLL samples
colData(dds)$Diagnosis <- patmeta[match(dds$PatID,rownames(patmeta)),]$Diagnosis
ddsCLL <- dds[,dds$Diagnosis %in% "CLL"]

#add trisomy 12 and IGHV information
colData(ddsCLL)$trisomy12 <-
  factor(assayData(mutCOM[ddsCLL$PatID,])$binary[,"trisomy12"])
colData(ddsCLL)$IGHV <- factor(patmeta[ddsCLL$PatID,]$IGHV)

#remove samples that do not have trisomy 12 information
ddsCLL <- ddsCLL[,!is.na(ddsCLL$trisomy12)]

#how many genes and samples we have?
dim(ddsCLL)

## [1] 63677   131

Remove transcripts that do not have gene symbol annotations, show low counts or do not show variance across samples.

#remove genes without gene symbol annotations
ddsCLL <- ddsCLL[!is.na(rowData(ddsCLL)$symbol),]
ddsCLL <- ddsCLL[rowData(ddsCLL)$symbol != "",]

#only keep genes that have counts higher than 10 in any sample
keep <- apply(counts(ddsCLL), 1, function(x) any(x >= 10)) 
ddsCLL <- ddsCLL[keep,]

#Remove transcripts do not show variance across samples
ddsCLL <- estimateSizeFactors(ddsCLL)
sds <- rowSds(counts(ddsCLL, normalized = TRUE))
sh <- shorth(sds)
ddsCLL <- ddsCLL[sds >= sh,]

#variance stabilization
ddsCLL.norm <- varianceStabilizingTransformation(ddsCLL, blind=TRUE)

#how many genes left
dim(ddsCLL)

## [1] 13816   131

22.1 Differential gene expression analysis using DESeq2

DESeq2 was used to identify genes that are differentially expressed between wild-type CLL samples and samples with trisomy 12.

Run DESeq2

design(ddsCLL) <- ~ trisomy12
ddsCLL <- DESeq(ddsCLL, betaPrior = FALSE)

## using pre-existing size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

## -- replacing outliers and refitting for 688 genes
## -- DESeq argument 'minReplicatesForReplace' = 7 
## -- original counts are preserved in counts(dds)

## estimating dispersions

## fitting model and testing

DEres <- results(ddsCLL)
DEres.shr <- lfcShrink(ddsCLL, contrast = c("trisomy12","1","0"), res = DEres)

## using 'normal' for LFC shrinkage, the Normal prior from Love et al (2014).
## additional priors are available via the 'type' argument, see ?lfcShrink for details

Plot gene dosage effect.

#FIG# S23 A
plotTab <- as.data.frame(DEres)
plotTab$onChr12 <- rowData(ddsCLL)$chromosome == 12
dosePlot <- ggplot(plotTab) +
  geom_density(aes(x=log2FoldChange, col=onChr12, fill=onChr12), alpha=0.4) +
  xlim( -3, 3 )
dosePlot

The distributions of the logarithmic (base 2) fold change between samples with and without trisomy 12 are shown separately for the genes on chromosome 12 (green) and on other chromosomes (red). The two distributions are shifted with respected to each by an amount that is consistent with log2(3/2) ~ 0.58 and thus with gene dosage effects.

22.1.1 Heatmap plot of differentially expressed genes

A heat map plot was used to show the normalized expression value (Z-score) of the differentially expressed genes in samples with and without trisomy 12.

Prepare matrix for heat map plot.

#filter genes
fdrCut <- 0.1
fcCut <- 1.5

allDE <- data.frame(DEres.shr) %>%
  rownames_to_column(var = "ID") %>% 
  mutate(Symbol = rowData(ddsCLL[ID,])$symbol,
         Chr = rowData(ddsCLL[ID,])$chromosome) %>% 
  filter(padj <= fdrCut & abs(log2FoldChange) > fcCut) %>% 
  arrange(pvalue) %>% filter(!duplicated(Symbol)) %>%
  mutate(Chr12 = ifelse(Chr == 12, "yes", "no"))

#get the expression matrix
plotMat <- assay(ddsCLL.norm[allDE$ID,])
colnames(plotMat) <- ddsCLL.norm$PatID
rownames(plotMat) <- allDE$Symbol

#sort columns of plot matrix based on trisomy 12 status
plotMat <- plotMat[,order(ddsCLL.norm$trisomy12)]

#calculate z-score and scale
plotMat <- t(scale(t(plotMat)))
plotMat[plotMat >= 4] <- 4
plotMat[plotMat <= -4] <- -4

Plot the heat map.

#FIG# S23 B
#prepare colums and row annotations
annoCol <- data.frame(row.names=ddsCLL.norm$PatID, Tris12=ddsCLL.norm$trisomy12)
levels(annoCol$Tris12) <- list(wt = 0, mut =1)
annoRow <- data.frame(row.names = allDE$Symbol, Chr12 = allDE$Chr12)
annoColor <- list(Tris12 = c(wt = "grey80", mut = "black"),
                  Chr12 = c(yes="red", no = "grey80"))


pheatmap(plotMat,
         color=colorRampPalette(rev(brewer.pal(n=7, name="RdBu")))(100),
         cluster_cols = FALSE,
         annotation_row = annoRow, annotation_col = annoCol,
         show_colnames = FALSE, fontsize_row = 3,
         breaks = seq(-4,4, length.out = 101),
         annotation_colors = annoColor, border_color = NA)

According to the gene expression heat map, the samples with trisomy 12 show distinct expression pattern. 84 genes are significantly up-regulated in trisomy 12 samples and 37 genes are down-regulated in trisomy 12 samples (FDR =0.1 and log2FoldChange > 1.5). Among the 84 up-regulated genes, only 12 genes are from chromosome 12, suggested the distinct expression pattern of trisomy 12 samples can not be merely explained by gene dosage effect.

22.2 Gene set enrichment analysis

Gene set enrichment analysis using PAGE (Parametric Analysis of Gene Set Enrichment) was used to unravel the pathway activity changes underlying trisomy 12.

22.2.1 Perform enrichment analysis

Function to run PAGE in R.

runGSEA <- function(inputTab, gmtFile, GSAmethod="gsea", nPerm=1000){
    inGMT <- loadGSC(gmtFile,type="gmt")
    #re-rank by score
    rankTab <- inputTab[order(inputTab[,1],decreasing = TRUE),,drop=FALSE] 
    if (GSAmethod == "gsea"){
        #readin geneset database
        #GSEA analysis
        res <- runGSA(geneLevelStats = rankTab,
                      geneSetStat = GSAmethod,
                      adjMethod = "fdr", gsc=inGMT,
                      signifMethod = 'geneSampling', nPerm = nPerm)
        GSAsummaryTable(res)
    } else if (GSAmethod == "page"){
        res <- runGSA(geneLevelStats = rankTab,
                      geneSetStat = GSAmethod,
                      adjMethod = "fdr", gsc=inGMT,
                      signifMethod = 'nullDist')
        GSAsummaryTable(res)
    }
}

Function for plotting enrichment bar.

plotEnrichmentBar <- function(resTab, pCut=0.05, ifFDR=FALSE,
                              setName="Signatures") {
  pList <- list()
  rowNum <- c()
  for (i in names(resTab)) {
    plotTab <- resTab[[i]]
    if (ifFDR) {
      plotTab <- dplyr::filter(
        plotTab, `p adj (dist.dir.up)` <= pCut | `p adj (dist.dir.dn)` <= pCut)
    } else {
      plotTab <- dplyr::filter(
        plotTab, `p (dist.dir.up)` <= pCut | `p (dist.dir.dn)` <= pCut)
    }
    if (nrow(plotTab) == 0) {
      print("No sets passed the criteria")
      next
    } else {
      #firstly, process the result table
      plotTab <- apply(plotTab, 1, function(x) {
        statSign <- as.numeric(x[3])
        data.frame(Name = x[1],
                   p = as.numeric(ifelse(statSign >= 0, x[4], x[6])),
                   geneNum = ifelse(statSign >= 0, x[8], x[9]),
                   Direction = ifelse(statSign > 0, "Up", "Down"),
                   stringsAsFactors = FALSE)
      }) %>% do.call(rbind,.)

      plotTab$Name <- sprintf("%s (%s)",plotTab$Name,plotTab$geneNum)
      plotTab <- plotTab[with(plotTab,order(Direction, p, decreasing=TRUE)),]
      plotTab$Direction <- factor(plotTab$Direction, levels = c("Down","Up"))
      plotTab$Name <- factor(plotTab$Name, levels = plotTab$Name)
      #plot the barplot
      pList[[i]] <- ggplot(data=plotTab, aes(x=Name, y= -log10(p),
                                             fill=Direction)) +
        geom_bar(position="dodge",stat="identity", width = 0.5) +
        scale_fill_manual(values=c(Up = "blue", Down = "red")) +
        coord_fixed(ratio = 0.5) + coord_flip() + xlab(setName) +
        ggtitle(i) + theme_bw() + theme(
          plot.title = element_text(face = "bold", hjust =0.5),
          axis.title = element_text(size=15))
      rowNum <-c(rowNum,nrow(plotTab))
    }
  }

  if (length(pList) == 0) {
    print("Nothing to plot")
  } else {
    rowNum <- rowNum
    grobList <- lapply(pList, ggplotGrob)
    grobList <- do.call(rbind,c(grobList,size="max"))
    panels <- grobList$layout$t[grep("panel", grobList$layout$name)]
    grobList$heights[panels] <- unit(rowNum, "null")
  }
  return(grobList)
}

Prepare input table for gene set enrichment analysis. A cut-off of raw p value < 0.05 was used to select genes for the analysis.

pCut <- 0.05

dataTab <- data.frame(DEres)
dataTab$ID <- rownames(dataTab)

#filter using raw pvalues
dataTab <- filter(dataTab, pvalue <= pCut) %>%
  arrange(pvalue) %>%
  mutate(Symbol = rowData(ddsCLL[ID,])$symbol)
dataTab <- dataTab[!duplicated(dataTab$Symbol),]
statTab <- data.frame(row.names = dataTab$Symbol, stat = dataTab$stat)

Gene set enrichment analysis using Hallmarks gene set from MsigDB.

hallmarkRes <- list()

#run PAGE
resTab <- runGSEA(statTab, gmts$H ,GSAmethod = "page")

## Checking arguments...done!
## Calculating gene set statistics...done!
## Calculating gene set significance...done!
## Adjusting for multiple testing...done!

#remove the HALLMARK_
resTab$Name <- gsub("HALLMARK_","",resTab$Name)

hallmarkRes[["Gene set enrichment analysis"]] <- 
  arrange(resTab,desc(`Stat (dist.dir)`))

hallBar <- plotEnrichmentBar(hallmarkRes, pCut = 0.01, ifFDR = TRUE,
                             setName = "Hallmark gene sets")

Gene set enrichment analysis using kegg gene set from MsigDB.

keggRes <- list()

resTab <- runGSEA(statTab,gmts$KEGG,GSAmethod = "page")

## Checking arguments...done!
## Calculating gene set statistics...done!
## Calculating gene set significance...done!
## Adjusting for multiple testing...done!

#remove the KEGG_
resTab$Name <- gsub("KEGG_","",resTab$Name)

keggRes[["Gene set enrichment analysis"]] <- resTab

keggBar <- plotEnrichmentBar(keggRes, pCut = 0.01, ifFDR = TRUE,
                             setName = "KEGG gene sets")

22.2.2 Heatmap for selected gene sets

Heatmap plots were used to show the expression values of differentially expressed genes from KEGG_CHEMOKINE_SIGNALING_PATHWAY gene set

Prepare the matrix for heatmap plot.

#select differentially expressed genes
fdrCut <- 0.05
cytoDE <- data.frame(DEres) %>% rownames_to_column(var = "ID") %>% 
  mutate(Symbol = rowData(ddsCLL[ID,])$symbol,
         Chr=rowData(ddsCLL[ID,])$chromosome) %>% 
  filter(padj <= fdrCut, log2FoldChange > 0) %>% 
  arrange(pvalue) %>% filter(!duplicated(Symbol)) %>%
  mutate(Chr12 = ifelse(Chr == 12, "yes", "no"))

#get the expression matrix
plotMat <- assay(ddsCLL.norm[cytoDE$ID,])
colnames(plotMat) <- ddsCLL.norm$PatID
rownames(plotMat) <- cytoDE$Symbol

#sort columns of plot matrix based on trisomy 12 status
plotMat <- plotMat[,order(ddsCLL.norm$trisomy12)]

#calculate z-score and sensor
plotMat <- t(scale(t(plotMat)))
plotMat[plotMat >= 4] <- 4
plotMat[plotMat <= -4] <- -4

annoCol <- data.frame(row.names = ddsCLL.norm$PatID,
                      Tris12 = ddsCLL.norm$trisomy12)
levels(annoCol$Tris12) <- list(wt = 0, mut =1)
annoRow <- data.frame(row.names = cytoDE$Symbol, Chr12 = cytoDE$Chr12)

Heatmap for genes from KEGG_CHEMOKINE_SIGNALING_PATHWAY geneset.

gsc <- loadGSC(gmts$KEGG)
geneList <- gsc$gsc$KEGG_CHEMOKINE_SIGNALING_PATHWAY
plotMat.chemo <- plotMat[rownames(plotMat) %in% geneList,]
keggHeatmap <- pheatmap(plotMat.chemo,
                        color = colorRampPalette(
                          rev(brewer.pal(n=7, name="RdBu")))(100),
         cluster_cols = FALSE, clustering_method = "ward.D2",
         annotation_row = annoRow, annotation_col = annoCol,
         show_colnames = FALSE, fontsize_row = 8,
         breaks = seq(-4,4, length.out = 101), treeheight_row = 0,
         annotation_colors = annoColor, border_color = NA,
         main = "CHEMOKINE_SIGNALING_PATHWAY",silent = TRUE)$gtable

Combine enrichment plot and heatmap plot.

#FIG# S24 ABC
ggdraw() + 
  draw_plot(hallBar, 0, 0.7, 0.5, 0.3) + 
  draw_plot(keggBar, 0.5, 0.7, 0.5, 0.3) +
  draw_plot(keggHeatmap, 0.1, 0, 0.8, 0.65) +
  draw_plot_label(c("A","B","C"), c(0, 0.5, 0.1), c(1, 1, 0.7), 
                  fontface = "plain", size=20)

Based on the gene set enrichment analysis results, genes from PI3K_ATK_MTOR pathway are significantly up-regulated in the samples with trisomy 12, which partially explained the increased sensitivity of trisomy 12 samples to PI3K and MTOR inhibitors. In addition, genes that are up-regulated in trisomy 12 are enrichment in chemokine signaling pathway.

options(stringsAsFactors=FALSE)

23 Drug response prediction

Drug response heterogeneity is caused by the unique deregulations in biology of the tumor cell. Those deregulations leave trace on the different molecular levels and have a various impact on cell’s drug sensitivity profile. Here we use multivariate regression to integrate information from the multi-omic data in order to predict drug response profiles of the CLL samples.

Loading the data.

data(list=c("conctab", "drpar", "drugs", "patmeta", "lpdAll", "dds", "mutCOM",
"methData"))

23.1 Assesment of omics capacity in explaining drug response

23.1.1 Data pre-processing

Filtering steps and transformations.

e<-dds
colnames(e)<-colData(e)$PatID


#only consider CLL patients
CLLPatients<-rownames(patmeta)[which(patmeta$Diagnosis=="CLL")]

#Methylation Data
methData = t(assay(methData)) 


#RNA Data
eCLL<-e[,colnames(e) %in% CLLPatients]
###
#filter out genes without gene namce
AnnotationDF<-data.frame(EnsembleId=rownames(eCLL),stringsAsFactors=FALSE)
AnnotationDF$symbol <- mapIds(org.Hs.eg.db,
                     keys=rownames(eCLL),
                     column="SYMBOL",
                     keytype="ENSEMBL",
                     multiVals="first")

## 'select()' returned 1:many mapping between keys and columns

eCLL<-eCLL[AnnotationDF$EnsembleId[!is.na(AnnotationDF$symbol)],]

#filter out low count genes
###
minrs <- 100
rs  <- rowSums(assay(eCLL))
eCLL<-eCLL[ rs >= minrs, ]
#variance stabilize the data
#(includes normalizing for library size and dispsersion estimation) 
vstCounts<-varianceStabilizingTransformation(eCLL)
vstCounts<-assay(vstCounts)
#no NAs in data
any(is.na(vstCounts))

## [1] FALSE

#filter out low variable genes
ntop<-5000
vstCountsFiltered<-vstCounts[order(apply(vstCounts, 1, var, na.rm=T),
                                   decreasing = T)[1:ntop],]
eData<-t(vstCountsFiltered)
#no NAs
any(is.na(eData))

## [1] FALSE

#genetics
#remove features with less than 5 occurences
mutCOMbinary<-channel(mutCOM, "binary")
mutCOMbinary<-mutCOMbinary[featureNames(mutCOMbinary) %in% CLLPatients,]
genData<-Biobase::exprs(mutCOMbinary)
idx <- which(colnames(genData) %in% c("del13q14_bi", "del13q14_mono"))
genData <- genData[,-idx]
colnames(genData)[which(colnames(genData)=="del13q14_any")] = "del13q14"
minObs <- 5
#remove feutes with less than 5 occurecnes
genData<-genData[,colSums(genData, na.rm=T)>=minObs]

#IGHV
translation <- c(`U` = 0, `M` = 1)
stopifnot(all(patmeta$IGHV %in% c("U","M", NA)))
IGHVData <- matrix(translation[patmeta$IGHV], 
                   dimnames = list(rownames(patmeta), "IGHV"), ncol = 1)
IGHVData<-IGHVData[rownames(IGHVData) %in% CLLPatients,,drop=F]
#remove patiente with NA IGHV status
IGHVData<-IGHVData[!is.na(IGHVData), ,drop=F]
any(is.na(IGHVData))

## [1] FALSE

#demographics (age and sex)
patmeta<-subset(patmeta, Diagnosis=="CLL")
gender <- ifelse(patmeta[,"Gender"]=="m",0,1)


# impute missing values in age by mean
ImputeByMean <- function(x) {x[is.na(x)] <- mean(x, na.rm=TRUE); return(x)}
age<-ImputeByMean(patmeta[,"Age4Main"])


demogrData <- cbind(age=age,gender=gender)
rownames(demogrData) <- rownames(patmeta)

#Pretreatment
pretreated<-patmeta[,"IC50beforeTreatment", drop=FALSE]

##### drug viabilites
summaries <- c(paste("viaraw", 1:5, sep=".") %>% `names<-`(paste(1:5)), 
               `4:5` = "viaraw.4_5", `1:5` = "viaraw.1_5")
a <- do.call( abind, c( along=3, lapply( summaries, 
                                         function(x) assayData(drpar)[[x]]))) 
dimnames(a)[[3]] <- names(summaries)
names(dimnames(a)) <- c( "drug", "patient", "summary" )
viabData <- acast( melt(a), patient ~ drug + summary )
rownames(viabData)<-c(substr(rownames(viabData),1,4)[1:3],
                      substr(rownames(viabData),1,5)[4:nrow(viabData)])

Check overlap of data and take care of missing values present in methylation and genetic data.

# common patients 
Xlist<-list(RNA=eData, meth=methData, gen=genData, IGHV=IGHVData,
            demographics=demogrData, drugs=viabData, pretreated=pretreated)
PatientsPerOmic<-lapply(Xlist, rownames)
sapply(PatientsPerOmic, length)

##          RNA         meth          gen         IGHV demographics 
##          136          196          200          188          200 
##        drugs   pretreated 
##          249          200

allPatients<-Reduce(union, PatientsPerOmic)
PatientOverview<-sapply(Xlist, function(M) allPatients %in% rownames(M))
Patients <- (1:nrow(PatientOverview))
Omics <- (1:ncol(PatientOverview))
image(Patients,Omics, PatientOverview*1, axes=F, col=c("white", "black"),
      main="Sample overview across omics")
axis(2, at = 1:ncol(PatientOverview), labels=colnames(PatientOverview), tick=F)

commonPatients<-Reduce(intersect, PatientsPerOmic)
length(commonPatients)

## [1] 112

XlistCommon<-lapply(Xlist, function(data) data[commonPatients,, drop=F])

#Take care of missing values (present in  genetic data)
ImputeByMean <- function(x) {x[is.na(x)] <- mean(x, na.rm=TRUE); return(x)}

#NAs in genetic
#remove feauters with less 90% completeness
RarlyMeasuredFeautres<-
  which(colSums(is.na(XlistCommon$gen))>0.1*nrow(XlistCommon$gen))
XlistCommon$gen<-XlistCommon$gen[,-RarlyMeasuredFeautres]
#remove patients with less than 90% of genetic feautres measured
IncompletePatients<-
  rownames(XlistCommon$gen)[
    (rowSums(is.na(XlistCommon$gen))>0.1*ncol(XlistCommon$gen))]
commonPatients<-commonPatients[!commonPatients %in% IncompletePatients]
XlistCommon<-lapply(XlistCommon, function(data) data[commonPatients,, drop=F])
#replace remaining NA by mean and round to 0 or 1
XlistCommon$gen<-round(apply(XlistCommon$gen, 2, ImputeByMean))

#NAs in methylation
#remove feauters with less 90% completeness
XlistCommon$meth<-
  XlistCommon$meth[,colSums(is.na(XlistCommon$meth))<0.1*nrow(methData)]
#impute remainin missing values by mean for each feautre across patients
XlistCommon$meth<-(apply(XlistCommon$meth, 2, ImputeByMean))

#final dimensions of the data
sapply(XlistCommon, dim)

##       RNA meth gen IGHV demographics drugs pretreated
## [1,]  102  102 102  102          102   102        102
## [2,] 5000 5000  11    1            2   448          1

Use top 20 PCs of methylation and expression as predictors.

pcaMeth<-prcomp(XlistCommon$meth, center=T, scale. = F)
XlistCommon$MethPCs<-pcaMeth$x[,1:20]
colnames(XlistCommon$MethPCs)<-
  paste("meth",colnames(XlistCommon$MethPCs), sep="")

pcaExpr<-prcomp(XlistCommon$RNA, center=T, scale. = F)
XlistCommon$RNAPCs<-pcaExpr$x[,1:20]
colnames(XlistCommon$RNAPCs)<-paste("RNA",colnames(XlistCommon$RNAPCs), sep="")

Choose drug viabilites of interest as response variables.

DOI <- c("D_006_1:5", "D_010_1:5", "D_159_1:5","D_002_4:5", "D_003_4:5",
         "D_012_4:5", "D_063_4:5", "D_166_4:5")
drugviab<-XlistCommon$drugs
drugviab<-drugviab[,DOI, drop=F]
colnames(drugviab) <- drugs[substr(colnames(drugviab),1,5),"name"]

Construct list of designs used and scale all predictors to mean zero and unit variance.

ZPCs<-list(expression=XlistCommon$RNAPCs,
        genetic=XlistCommon$gen, 
        methylation= XlistCommon$MethPCs,
        demographics=XlistCommon$demographics, 
        IGHV=XlistCommon$IGHV,
        pretreated = XlistCommon$pretreated)
ZPCs$all<-do.call(cbind, ZPCs)
ZPCsunscaled<-ZPCs
ZPCsscaled<-lapply(ZPCs, scale)
lapply(ZPCsscaled, colnames)

## $expression
##  [1] "RNAPC1"  "RNAPC2"  "RNAPC3"  "RNAPC4"  "RNAPC5"  "RNAPC6"  "RNAPC7" 
##  [8] "RNAPC8"  "RNAPC9"  "RNAPC10" "RNAPC11" "RNAPC12" "RNAPC13" "RNAPC14"
## [15] "RNAPC15" "RNAPC16" "RNAPC17" "RNAPC18" "RNAPC19" "RNAPC20"
## 
## $genetic
##  [1] "del6q21"    "gain8q24"   "del11q22.3" "trisomy12"  "del13q14"  
##  [6] "del17p13"   "BRAF"       "MYD88"      "NOTCH1"     "SF3B1"     
## [11] "TP53"      
## 
## $methylation
##  [1] "methPC1"  "methPC2"  "methPC3"  "methPC4"  "methPC5"  "methPC6" 
##  [7] "methPC7"  "methPC8"  "methPC9"  "methPC10" "methPC11" "methPC12"
## [13] "methPC13" "methPC14" "methPC15" "methPC16" "methPC17" "methPC18"
## [19] "methPC19" "methPC20"
## 
## $demographics
## [1] "age"    "gender"
## 
## $IGHV
## [1] "IGHV"
## 
## $pretreated
## [1] "IC50beforeTreatment"
## 
## $all
##  [1] "expression.RNAPC1"    "expression.RNAPC2"    "expression.RNAPC3"   
##  [4] "expression.RNAPC4"    "expression.RNAPC5"    "expression.RNAPC6"   
##  [7] "expression.RNAPC7"    "expression.RNAPC8"    "expression.RNAPC9"   
## [10] "expression.RNAPC10"   "expression.RNAPC11"   "expression.RNAPC12"  
## [13] "expression.RNAPC13"   "expression.RNAPC14"   "expression.RNAPC15"  
## [16] "expression.RNAPC16"   "expression.RNAPC17"   "expression.RNAPC18"  
## [19] "expression.RNAPC19"   "expression.RNAPC20"   "genetic.del6q21"     
## [22] "genetic.gain8q24"     "genetic.del11q22.3"   "genetic.trisomy12"   
## [25] "genetic.del13q14"     "genetic.del17p13"     "genetic.BRAF"        
## [28] "genetic.MYD88"        "genetic.NOTCH1"       "genetic.SF3B1"       
## [31] "genetic.TP53"         "methylation.methPC1"  "methylation.methPC2" 
## [34] "methylation.methPC3"  "methylation.methPC4"  "methylation.methPC5" 
## [37] "methylation.methPC6"  "methylation.methPC7"  "methylation.methPC8" 
## [40] "methylation.methPC9"  "methylation.methPC10" "methylation.methPC11"
## [43] "methylation.methPC12" "methylation.methPC13" "methylation.methPC14"
## [46] "methylation.methPC15" "methylation.methPC16" "methylation.methPC17"
## [49] "methylation.methPC18" "methylation.methPC19" "methylation.methPC20"
## [52] "demographics.age"     "demographics.gender"  "IGHV"                
## [55] "IC50beforeTreatment"

Define colors.

set1 <- brewer.pal(9,"Set1")
colMod<-c(paste(set1[c(4,1,5,3,2,7)],"88",sep=""), "grey")
names(colMod) <-
  c("demographics", "genetic", "IGHV","expression", "methylation", "pretreated",
    "all")

23.1.2 Lasso using multi-omic data

Fit a linear model using Lasso explaining drug response by each one of the omic set separately as well as all together. As measure of explained variance use R2 from linear models for unpenalized models (IGHV) and fraction of variance explained, i.e. 1- cross-validated mean squared error/total sum of squares for others.

To ensure fair treatment of all features they are standardized to mean 0 and unit variance. To study robustness the cross-validation is repeated 100-times to obtain the mean and standard deviation shown in the figure.

Fit model and show resulting omic-prediction profiles.

Optionally, the model can also be fit for all drugs in the study.

nfolds<-10
nrep<-100

DOI <-
  grepl("1:5",colnames(XlistCommon$drugs)) |
  grepl("4:5",colnames(XlistCommon$drugs))
drugviabAll<-XlistCommon$drugs
drugviabAll<-drugviabAll[,DOI]
colnames(drugviabAll) <- 
  paste0(drugs[substr(colnames(drugviabAll),1,5),"name"],
         substr(colnames(drugviabAll),6,9))

R2ForPenReg(Zscaled, drugviabAll, nfold=nfolds, alpha=1, nrep=nrep,
            Parmfrow=c(4,4), ylimMax=0.6)

23.2 Lasso for drugs in a genetic-focussed model

We perform regression analysis by adaptive LASSO using genetic features, IGHV status (coded as 0 -1 for mutated/unmutated), pretreatment status (coded as 0 -1) and methylation cluster (coded as 0 for lowly-programmed (LP), 0.5 for intermediately-programmed (IP) and 1 for highly-programmed (HP)). For each model we select the optimal penalization parameter of the second step Lasso fit of the adaptive Lasso by repeated cross-validation to get robust results.

As output bar plots showing the coefficients of the selected predictors are produced.

23.2.1 General definitions

We use the following abbreviations for the different data types.

	data type
M	methylation cluster
G	mutations
V	viability
I	ighv
P	pretreatment

Color definitions for the groups.

23.2.2 Data pre-processing

Subselect CLL patients.

lpdCLL = lpdAll[ , lpdAll$Diagnosis=="CLL"]

Prepare the data and limit the number of features by subselecting only those which include at least 5 recorded incidences. List the predictors.

lpdCLL = lpdAll[ , lpdAll$Diagnosis=="CLL"]
lpdCLL = BloodCancerMultiOmics2017:::prepareLPD(lpd=lpdCLL, minNumSamplesPerGroup=5)
(predictorList = BloodCancerMultiOmics2017:::makeListOfPredictors(lpdCLL))

## $predictorsM
## [1] "Methylation_Cluster"
## 
## $predictorsG
##  [1] "del6q21"    "gain8q24"   "del11q22.3" "trisomy12"  "del13q14"  
##  [6] "del17p13"   "BRAF"       "NOTCH1"     "SF3B1"      "TP53"      
## 
## $predictorsI
## [1] "IGHV"
## 
## $predictorsP
## [1] "Pretreatment"

23.2.3 Drug response prediction

The prediction will be made for the following drugs and concentrations.

	drug name	conc
D_159	doxorubicine	1-5
D_006	fludarabine	1-5
D_010	nutlin-3	1-5
D_166	PRT062607 HCl	4-5
D_003	idelalisib	4-5
D_002	ibrutinib	4-5
D_012	selumetinib	4-5
D_063	everolimus	4-5

drs = list("1:5"=c("D_006", "D_010", "D_159"),
           "4:5"=c("D_002", "D_003", "D_012", "D_063", "D_166"))

predvar = unlist(BloodCancerMultiOmics2017:::makePredictions(drs=drs,
                                 lpd=lpdCLL,
                                 predictorList=predictorList,
                                 lim=0.15, std=FALSE, adaLasso=TRUE,
                                 colors=coldef),
                 recursive=FALSE)

## [1] "Prediction for: D_006_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_010_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_159_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_002_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_003_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_012_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_063_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_166_4:5; #samples: 168; #features: 13"

Plot the legends.

legends = BloodCancerMultiOmics2017:::makeLegends(legendFor=c("G","I","M", "P"),
                                                  coldef)

Additionaly we plot the prediction for rotenone.

drs_rot = list("4:5"=c("D_067"))
predvar_rot = unlist(BloodCancerMultiOmics2017:::makePredictions(drs=drs_rot,
                                 lpd=lpdCLL,
                                 predictorList=predictorList,
                                 lim=0.23, std=FALSE, adaLasso=TRUE,
                                 colors=coldef),
                 recursive=FALSE)

## [1] "Prediction for: D_067_4:5; #samples: 168; #features: 13"

In a same way the prediction for all the drugs can be made.

alldrs = unique(fData(lpdCLL)[fData(lpdCLL)$type=="viab","id"])
drs = list("1:5"=alldrs, "4:5"=alldrs)

predvar2 = BloodCancerMultiOmics2017:::makePredictions(drs=drs,
                                        lpd=lpdCLL,
                                        predictorList=predictorList,
                                        lim=0.23,
                                        colors=coldef)

## [1] "Prediction for: D_001_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_002_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_003_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_004_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_006_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_007_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_008_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_010_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_011_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_012_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_013_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_015_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_017_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_020_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_021_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_023_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_024_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_025_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_029_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_030_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_032_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_033_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_034_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_035_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_036_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_039_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_040_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_041_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_043_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_045_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_048_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_049_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_050_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_053_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_054_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_056_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_060_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_063_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_066_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_067_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_071_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_074_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_075_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_077_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_078_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_079_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_081_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_082_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_083_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_084_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_127_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_141_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_149_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_159_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_162_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_163_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_164_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_165_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_166_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_167_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_168_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_169_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_172_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_CHK_1:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_001_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_002_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_003_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_004_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_006_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_007_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_008_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_010_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_011_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_012_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_013_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_015_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_017_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_020_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_021_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_023_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_024_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_025_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_029_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_030_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_032_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_033_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_034_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_035_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_036_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_039_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_040_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_041_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_043_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_045_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_048_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_049_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_050_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_053_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_054_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_056_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_060_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_063_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_066_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_067_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_071_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_074_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_075_4:5; #samples: 168; #features: 13"
## [1] "No (0) predictors for given parameters!"
## [1] "Prediction for: D_077_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_078_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_079_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_081_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_082_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_083_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_084_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_127_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_141_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_149_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_159_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_162_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_163_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_164_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_165_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_166_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_167_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_168_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_169_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_172_4:5; #samples: 168; #features: 13"
## [1] "Prediction for: D_CHK_4:5; #samples: 168; #features: 13"

23.2.4 Effect of pre-treatment

In order to find out the effect of pre-treatment on the predictions of drug response we provide the following overview. The summary is made for all drugs, separating however, on ranges of drug concentrations (mean drug effect of: all five (1-5) and two lowest (4-5) concentrations of the drugs).

givePreatreatSum = function(predNum) {
  
  idx = sapply(predvar2[[predNum]], function(x) length(x)==1)
  predvar2[[predNum]] = predvar2[[predNum]][!idx]
  # get model coefficients and reshape
  coeffs <- do.call(cbind,lapply(predvar2[[predNum]], "[[", 'coeffs'))
  coeffs <- coeffs[-1,]
  coeffs <- as.matrix(coeffs)
  # colnames(coeffs) <- unlist(drs["1:5"])
  colnames(coeffs) = names(predvar2[[predNum]])
  colnames(coeffs) <- drugs[colnames(coeffs),"name"]
  coeffDF <- melt(as.matrix(coeffs))
  colnames(coeffDF) <- c("predictor", "drug", "beta")
  coeffDF$selected <- coeffDF$beta !=0
  
  #sort by times being selected
  coeffDF$predictor <- factor(coeffDF$predictor, level=)
  
  # number of drugs a predictor is chosen for
  gg1 <- coeffDF %>% group_by(predictor) %>% 
    dplyr::summarize(selectedSum = sum(selected)) %>%
    mutate(predictor = factor(predictor,
                              levels=predictor[order(selectedSum)])) %>%
    ggplot(aes(x=predictor, y=selectedSum)) + 
    geom_bar(stat="identity")+ylab("# drugs selected for") +
    coord_flip()
  
  # boxplots of non-zero coeffients
  orderMedian <- filter(coeffDF, selected) %>% group_by(predictor) %>% 
    dplyr::summarize(medianBeta = median(abs(beta)))
  coeffDF$predictor <- factor(
    coeffDF$predictor,
    levels=orderMedian$predictor[order(orderMedian$medianBeta)] )
  gg2 <- ggplot(filter(coeffDF, selected), aes(x=predictor, y=abs(beta))) +
    geom_boxplot() +
    coord_flip() + ggtitle("Distribution of non-zero coefficients")
  gridExtra::grid.arrange(gg1,gg2, ncol=1)
  
  # coefficeints per drug
  ggplot(filter(coeffDF, selected), 
         aes(x= drug, y=abs(beta), col= predictor=="Pretreatment")) +
    geom_point() +
    coord_flip()
  
  #drugs pretreatment is selected for
  as.character(filter(coeffDF, predictor=="Pretreatment" & beta!=0)$drug)
  PselDrugs <- as.character(
    filter(coeffDF, predictor=="Pretreatment" & beta!=0)$drug)
  length(PselDrugs)
  # length(drs[[1]])
}

Conc. 1-5

## [1] 26

Conc. 4-5

## [1] 11

options(stringsAsFactors=FALSE)

24 Survival analysis

Load the data.

data(lpdAll, patmeta, drugs)

Prepare survival data.

lpdCLL <- lpdAll[ , lpdAll$Diagnosis=="CLL"   ]

# data rearrangements
survT = patmeta[colnames(lpdCLL),]
survT[which(survT[,"IGHV"]=="U") ,"IGHV"] = 0
survT[which(survT[,"IGHV"]=="M") ,"IGHV"] = 1
survT$IGHV = as.numeric(survT$IGHV)

colnames(survT) = gsub("Age4Main", "age", colnames(survT))

survT$ibr45  <- 1-Biobase::exprs(lpdCLL)[ "D_002_4:5", rownames(survT)  ]  
survT$ide45  <- 1-Biobase::exprs(lpdCLL)[ "D_003_4:5", rownames(survT)  ] 
survT$prt45  <- 1-Biobase::exprs(lpdCLL)[ "D_166_4:5", rownames(survT)  ] 
survT$selu45 <- 1-Biobase::exprs(lpdCLL)[ "D_012_4:5", rownames(survT)  ]
survT$ever45 <- 1-Biobase::exprs(lpdCLL)[ "D_063_4:5", rownames(survT)  ]

survT$nut15 <- 1-Biobase::exprs(lpdCLL)[ "D_010_1:5", rownames(survT)  ] 
survT$dox15 <- 1-Biobase::exprs(lpdCLL)[ "D_159_1:5", rownames(survT)  ] 
survT$flu15 <- 1-Biobase::exprs(lpdCLL)[ "D_006_1:5", rownames(survT)  ] 

survT$SF3B1      <- Biobase::exprs(lpdCLL)[ "SF3B1",      rownames(survT)  ]
survT$NOTCH1     <- Biobase::exprs(lpdCLL)[ "NOTCH1",     rownames(survT)  ]
survT$BRAF       <- Biobase::exprs(lpdCLL)[ "BRAF",       rownames(survT)  ]
survT$TP53       <- Biobase::exprs(lpdCLL)[ "TP53",       rownames(survT)  ]
survT$del17p13   <- Biobase::exprs(lpdCLL)[ "del17p13",   rownames(survT)  ]
survT$del11q22.3 <- Biobase::exprs(lpdCLL)[ "del11q22.3", rownames(survT)  ]
survT$trisomy12 <-  Biobase::exprs(lpdCLL)[ "trisomy12", rownames(survT)  ]
survT$IGHV_cont <- patmeta[ rownames(survT) ,"IGHV Uppsala % SHM"]


# competinting risk endpoint fpr 
survT$compE <- ifelse(survT$treatedAfter == TRUE, 1, 0)
survT$compE <- ifelse(survT$treatedAfter == FALSE & survT$died==TRUE,
                      2, survT$compE )
survT$T7  <- ifelse(survT$compE == 1, survT$T5, survT$T6 )

24.1 Univariate survival analysis

24.1.1 Define forest functions

forest <- function(Time, endpoint, title, sdrugs, split, sub) {  
  stopifnot(is.character(Time), is.character(title), is.character(split),
            is.character(endpoint), 
            all(c(Time, split, endpoint) %in% colnames(survT)),
            is.logical(survT[[endpoint]]),
            is.character(sdrugs), !is.null(names(sdrugs)))
  
  clrs <- fpColors(box="royalblue",line="darkblue", summary="royalblue")

  res <- lapply(sdrugs, function(g) { 
    drug <- survT[, g] * 10

    suse <- if (identical(sub, "none")) 
      rep(TRUE, nrow(survT)) 
    else 
      (survT[[split]] == sub)
    stopifnot(sum(suse, na.rm = TRUE) > 1)
    
    surv <- coxph(Surv(survT[,Time], survT[,endpoint]) ~ drug, subset=suse)  
    sumsu <- summary(surv) 
    c(p      = sumsu[["coefficients"]][, "Pr(>|z|)"], 
      coef   = sumsu[["coefficients"]][, "exp(coef)"], 
      lower  = sumsu[["conf.int"]][, "lower .95"], 
      higher = sumsu[["conf.int"]][, "upper .95"])
  })
  
  s <- do.call(rbind, res)
  rownames(s) <- names(sdrugs)

  tabletext <- list(c(NA, rownames(s)), append(list("p-value"),
                                               sprintf("%.4f", s[,"p"])))

  forestplot(tabletext, 
           rbind(
             rep(NA, 3), 
             s[, 2:4]), 
           page = new,
           clip = c(0.8,20), 
           xlog = TRUE, xticks = c(0.5,1, 1.5), title = title,
           col = clrs, 
           txt_gp = fpTxtGp(ticks = gpar(cex=1) ),
           new_page = TRUE)
}

Combine OS and TTT in one plot.

com <- function( Time, endpoint, scaleX, sub, d, split, drug_names) {  
  
  res <- lapply(d, function(g)  { 
  
  drug <- survT[,g] * scaleX
  ## all=99, M-CLL=1, U-CLL=0
  if(sub==99) { surv <- coxph(Surv(survT[,paste0(Time)],
                                   survT[,paste0(endpoint)] == TRUE) ~ drug)} 
  if(sub<99)  { surv <- coxph(Surv(survT[,paste0(Time)],
                                   survT[,paste0(endpoint)] == TRUE) ~ drug,
                              subset=survT[,paste0(split)]==sub)}    
  
  c(summary(surv)[[7]][,5], summary(surv)[[7]][,2], 
                summary(surv)[[8]][,3], 
                summary(surv)[[8]][,4])
 })
 s <- do.call(rbind, res)
 colnames(s) <- c("p", "HR", "lower", "higher")
 rownames(s) <- drug_names
 s
}


fp <- function( sub, title, d, split, drug_names, a, b, scaleX) {  
   ttt <- com(Time="T5", endpoint="treatedAfter", sub=sub, d=d,
              split=split, drug_names=drug_names, scaleX=scaleX)
   rownames(ttt) <- paste0(rownames(ttt), "_TTT")
   
   os <-  com(Time="T6", endpoint="died", sub=sub, d=d, split=split,
              drug_names=drug_names, scaleX=scaleX)
   rownames(os) <- paste0(rownames(os), "_OS")
  
   n <- c( p=NA, HR=NA, lower=NA, higher=NA )
   nn <- t( data.frame( n ) )
   for (i in 1:(nrow(ttt)-1) ) { nn <-rbind(nn, n )  }
   rownames(nn) <- drug_names
   
   od <- order( c(seq(nrow(nn)), seq(nrow(ttt)), seq(nrow(os)) ))
      
   s <- data.frame( rbind(nn, ttt, os)[od,  ] )
   s$Name <- rownames(s)
   s$x <- 1:nrow(s)
   s$col <- rep(c("white", "black", "darkgreen"), nrow(ttt) )
   s$Endpoint <- factor( c(rep("nn", nrow(nn) ), rep("TTT", nrow(ttt) ),
                           rep("OS", nrow(os) ) )[od] )
   s$features <- "";  s[ which(s$Endpoint=="OS"),"features"] <- drug_names
   s[which(s$Endpoint=="nn"), "Endpoint"] <- "" 
   s <- rbind(s, rep(NA, 8))

   p <- ggplot(data=s ,aes(x=x, y=HR, ymin=lower, ymax=higher,
                           colour=Endpoint)) +  geom_pointrange() + 
     theme(legend.position="top", legend.text = element_text(size = 20) ) +
     scale_x_discrete(limits=s$x, labels=s$features ) +
     expand_limits(y=c(a,b)) +
     scale_y_log10(breaks=c(0.01,0.1,0.5,1,2,5,10),
                   labels=c(0.01,0.1,0.5,1,2,5,10)) +
     theme(
           panel.grid.minor = element_blank(),
           axis.title.x = element_text(size=16),
           axis.text.x = element_text(size=16, colour="black"),
           axis.title.y  = element_blank(),
           axis.text.y = element_text(size=12, colour="black"),
           legend.key = element_rect(fill = "white"),
           legend.background = element_rect(fill = "white"),
           legend.title = element_blank(),
           panel.background = element_rect(fill = "white", color="black"),
           panel.grid.major = element_blank(),
           axis.ticks.y = element_blank() 
          ) +
     coord_flip() + 
     scale_color_manual(values=c("OS"="darkgreen", "TTT"="black"),
                        labels=c("OS", "TTT", "")) +
     geom_hline(aes(yintercept=1), colour="black", size=1.5,
                linetype="dashed", alpha=0.3)  +
     annotate("text", x = 1:nrow(s)+0.5, y = s$HR+0.003,
              label = ifelse( s$p<0.001, paste0("p<","0.001"), 
                paste0("p=", round(s$p,3) ) ), colour=s$col)
   plot(p)
}

24.1.2 Forest plot for genetic factors

#FIG# S27
d <- c("SF3B1", "NOTCH1", "BRAF", "TP53", "del17p13", "del11q22.3",
       "trisomy12", "IGHV")
drug_names <- c("SF3B1", "NOTCH1", "BRAF", "TP53", "del17p13", "del11q22.3",
                "Trisomy12" ,"IGHV")

fp(sub=99, d=d, drug_names=drug_names, split="IGHV", title="", a=0, b=10,
   scaleX=1)

24.1.3 Forest plot for drug responses

#FIG# 6A
d <- c("flu15", "nut15", "dox15", "ibr45", "ide45", "prt45", "selu45",
       "ever45")
drug_names <- c("Fludarabine",  "Nutlin-3", "Doxorubicine", "Ibrutinib",
                "Idelalisib", "PRT062607 HCl", "Selumetinib" ,"Everolimus")

fp(sub=99, d=d, drug_names=drug_names, split="TP53", title="", a=0, b=5,
   scaleX=10)

24.2 Kaplan-Meier curves

24.2.1 Genetics factors

#FIG# S27 left (top+bottom)
par(mfcol=c(1,2))

for (fac in paste(c("IGHV", "TP53"))) {
survplot( Surv(survT$T5, survT$treatedAfter == TRUE)  ~ as.factor(survT[,fac]), 
   snames=c("wt", "mut"),
   lwd=1.5, cex.axis = 1, cex.lab=1, col= c("darkmagenta", "dodgerblue4"),
   show.nrisk = FALSE,
   legend.pos = FALSE, stitle = "", hr.pos= "topright",
   main = paste(fac), 
   xlab = 'Time (Years)', ylab = 'Time to treatment')
}

#FIG# 6B left
#FIG# S27 right (top+bottom)
par(mfcol=c(1,2))

for (fac in paste(c("IGHV", "TP53"))) {
survplot( Surv(survT$T6, survT$died == TRUE)  ~ as.factor(survT[,fac]), 
   snames=c("wt", "mut"),
   lwd=1.5, cex.axis = 1.0, cex.lab=1.0, col= c("darkmagenta", "dodgerblue4"),
   show.nrisk = FALSE,
   legend.pos = FALSE, stitle = "", hr.pos= "bottomleft",
   main = paste(fac), 
   xlab = 'Time (Years)', ylab = 'Overall survival')
}

24.2.2 Drug responses

Drug responses were dichotomized using maximally selected rank statistics. The analysis is also perforemd within subgroups: TP53 wt/ mut. and IGHV wt/ mut.

Time to next treatment (maxstat).

par(mfrow=c(1,3), mar=c(5,5,2,0.9))
km(drug = "D_006_1:5", split = "TP53", t="TTT",
   title=c("(TP53", "wt)", "mut)"),  hr="tr", c="maxstat")

## Fludarabine cutpoint for TTT:  0.65

km(drug = "D_159_1:5", split = "TP53", t="TTT",
   title=c("(TP53", "wt)", "mut)"), hr="tr",  c="maxstat")

## Doxorubicine cutpoint for TTT:  0.56

km(drug = "D_010_1:5", split = "TP53", t="TTT",
   title=c("(TP53", "wt)", "mut)"), hr="tr",  c="maxstat")

## Nutlin-3 cutpoint for TTT:  0.85

km(drug = "D_002_4:5", split = "IGHV", t="TTT",
   title=c("(IGHV",  "wt)" , "mut)"), hr="tr", c="maxstat" )

## Ibrutinib cutpoint for TTT:     1

km(drug = "D_003_4:5", split = "IGHV", t="TTT",
   title=c("(IGHV",  "wt)" , "mut)"), hr="tr", c="maxstat" )

## Idelalisib cutpoint for TTT:  0.99

km(drug = "D_166_4:5", split = "IGHV", t="TTT",
   title=c("(IGHV",  "wt)" , "mut)"), hr="tr", c="maxstat" )

## PRT062607 HCl cutpoint for TTT:  0.98

Overall survival (maxstat).

par(mfrow=c(1,3), mar=c(5,5,2,0.9))
km(drug = "D_006_1:5", split = "TP53", t="OS",
   title=c("(TP53", "wt)", "mut)"), hr="bl", c="maxstat")

## Fludarabine cutpoint for OS:   0.6

#FIG# 6B right
#FIG# 6C
km(drug = "D_159_1:5", split = "TP53", t="OS", # doxorubicine
   title=c("(TP53", "wt)", "mut)"), hr="bl", c="maxstat" )

## Doxorubicine cutpoint for OS:  0.53

#FIG# 6B middle
km(drug = "D_010_1:5", split = "TP53", t="OS", # nutlin-3
   title=c("(TP53", "wt)", "mut)"), hr="bl", c="maxstat" )

## Nutlin-3 cutpoint for OS:  0.89

km(drug = "D_002_4:5", split = "IGHV", t="OS",
   title=c("(IGHV",  "wt)" , "mut)"), hr="bl", c="maxstat" )

## Ibrutinib cutpoint for OS:  0.95

km(drug = "D_003_4:5", split = "IGHV", t="OS",
   title=c("(IGHV",  "wt)" , "mut)"), hr="bl", c="maxstat" )

## Idelalisib cutpoint for OS:   0.9

## Warning in fitter(X, Y, strats, offset, init, control, weights = weights, :
## Loglik converged before variable 1 ; beta may be infinite.

km(drug = "D_166_4:5", split = "IGHV", t="OS",
   title=c("(IGHV",  "wt)" , "mut)"), hr="bl", c="maxstat" )

## PRT062607 HCl cutpoint for OS:  0.87

24.3 Multivariate Cox-model

extractSome <- function(x) {
  sumsu <- summary(x)
  data.frame(
    `p-value`      = 
      sprintf("%6.3g", sumsu[["coefficients"]][, "Pr(>|z|)"]),
    `HR`           = 
      sprintf("%6.3g", signif( sumsu[["coefficients"]][, "exp(coef)"], 2) ), 
    `lower 95% CI` = 
      sprintf("%6.3g", signif( sumsu[["conf.int"]][, "lower .95"], 2) ),
    `upper 95% CI` = 
      sprintf("%6.3g", signif( sumsu[["conf.int"]][, "upper .95"], 2),
              check.names = FALSE) )
}

Define covariates and effects.

24.3.1 Chemotherapies

24.3.1.1 Fludarabine

surv1 <- coxph(
  Surv(T6, died) ~  
    age +
    as.factor(IC50beforeTreatment) +
    as.factor(trisomy12) +
    as.factor(del11q22.3) +
    as.factor(del17p13) +
    as.factor(TP53) +
    IGHVwt +
    flu15,       # continuous
    #dox15 +     # continuous
    #flu15:TP53,
    #TP53:dox15,
  data = survT )
extractSome(surv1)

##    p.value     HR lower.95..CI upper.95..CI
## 1    0.403    1.2         0.81          1.7
## 2 0.000285   0.11        0.033         0.36
## 3   0.0196    5.3          1.3           22
## 4    0.784    1.2         0.38          3.6
## 5    0.962      1          0.3          3.6
## 6     0.47   0.63         0.18          2.2
## 7   0.0516    2.8         0.99          7.8
## 8    0.085   0.77         0.57            1

cat(sprintf("%s patients considerd in the model; number of events %1g\n", 
            summary(surv1)$n, summary(surv1)[6] ) )

## 156 patients considerd in the model; number of events 24

24.3.1.2 Doxorubicine

surv2 <- coxph(
  Surv(T6, died) ~   #as.factor(survT$TP53) , data=survT )
    age +
    as.factor(IC50beforeTreatment) +
    as.factor(trisomy12) +
    as.factor(del11q22.3) +
    as.factor(del17p13) +
    as.factor(TP53) +
    IGHVwt +
    #flu15 +    # continuous
    dox15 ,     # continuous
    #flu15:TP53 ,
    #TP53:dox15,
  data = survT )
extractSome(surv2)

##    p.value     HR lower.95..CI upper.95..CI
## 1    0.121    1.3         0.92            2
## 2 0.000186   0.11        0.036         0.36
## 3   0.0123    6.1          1.5           25
## 4    0.992      1         0.34            3
## 5     0.92    1.1          0.3          3.8
## 6    0.716   0.81         0.26          2.5
## 7   0.0393    2.9          1.1          8.1
## 8   0.0347   0.52         0.28         0.95

cat(sprintf("%s patients considerd in the model; number of events %1g\n", 
            summary(surv2)$n, summary(surv2)[6] ) )

## 156 patients considerd in the model; number of events 24

24.3.2 Targeted therapies

24.3.2.1 Ibrutinib TTT

surv4 <- coxph(
  Surv(T5, treatedAfter) ~ 
    age +
    as.factor(IC50beforeTreatment) +
    as.factor(trisomy12) +
    as.factor(del11q22.3) +
    as.factor(del17p13) +
    IGHVwt +
    ibr45 +
    IGHVwt:ibr45,
  data = survT )

extractSome(surv4)

##    p.value     HR lower.95..CI upper.95..CI
## 1    0.284   0.91         0.76          1.1
## 2 9.61e-08   0.22         0.13         0.39
## 3    0.246    1.5         0.76            3
## 4    0.778   0.91         0.48          1.7
## 5     0.31    1.4         0.74          2.6
## 6   0.0111    2.2          1.2          3.9
## 7   0.0289    1.6          1.1          2.5
## 8   0.0417    0.6         0.37         0.98

cat(sprintf("%s patients considerd in the model; number of events %1g\n", 
            summary(surv4)$n, summary(surv4)[6] ) )

## 152 patients considerd in the model; number of events 83

24.3.2.2 Idelalisib TTT

surv6 <- coxph(
  Surv(T5, treatedAfter) ~ 
    age +
    as.factor(IC50beforeTreatment) +
    as.factor(trisomy12) +
    as.factor(del11q22.3) +
    as.factor(del17p13) +
    IGHVwt +
    ide45 +
    IGHVwt:ide45,
  data = survT )

extractSome(surv6)

##    p.value     HR lower.95..CI upper.95..CI
## 1    0.302   0.91         0.76          1.1
## 2 3.44e-08   0.21         0.12         0.37
## 3    0.214    1.5         0.78            3
## 4    0.667   0.87         0.46          1.7
## 5    0.308    1.4         0.74          2.6
## 6  0.00789    2.4          1.3          4.7
## 7   0.0421    1.6            1          2.6
## 8    0.067    0.6         0.35            1

cat(sprintf("%s patients considerd in the model; number of events %1g\n",
            summary(surv6)$n, summary(surv6)[6] ) )

## 152 patients considerd in the model; number of events 83

24.3.2.3 PRT062607 HCl TTT

surv8 <- coxph(
  Surv(T5, treatedAfter) ~ 
    age +
    as.factor(IC50beforeTreatment) +
    as.factor(trisomy12) +
    as.factor(del11q22.3) +
    as.factor(del17p13) +
    IGHVwt +
    prt45 +
    IGHVwt:prt45,
  data = survT )

extractSome(surv8)

##    p.value     HR lower.95..CI upper.95..CI
## 1    0.402   0.93         0.78          1.1
## 2 2.19e-08    0.2         0.12         0.36
## 3    0.397    1.4         0.67          2.7
## 4    0.584   0.83         0.43          1.6
## 5    0.415    1.3         0.69          2.4
## 6  0.00446    2.7          1.4          5.4
## 7    0.012    1.6          1.1          2.4
## 8    0.018   0.58         0.37         0.91

cat(sprintf("%s patients considerd in the model; number of events %1g\n", 
            summary(surv8)$n, summary(surv8)[6] ) )

## 152 patients considerd in the model; number of events 83

25 End of session

sessionInfo()

## R version 3.5.1 Patched (2018-07-12 r74967)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.5 LTS
## 
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.8-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.8-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
##  [1] grid      stats4    parallel  stats     graphics  grDevices utils    
##  [8] datasets  methods   base     
## 
## other attached packages:
##  [1] bindrcpp_0.2.2                  xtable_1.8-3                   
##  [3] scales_1.0.0                    Rtsne_0.13                     
##  [5] RColorBrewer_1.1-2              readxl_1.1.0                   
##  [7] piano_1.22.0                    pheatmap_1.0.10                
##  [9] org.Hs.eg.db_3.7.0              nat_1.8.11                     
## [11] rgl_0.99.16                     maxstat_0.7-25                 
## [13] limma_3.38.0                    knitr_1.20                     
## [15] ipflasso_0.1                    survival_2.43-1                
## [17] IHW_1.10.0                      hexbin_1.27.2                  
## [19] gtable_0.2.0                    gridExtra_2.3                  
## [21] glmnet_2.0-16                   Matrix_1.2-14                  
## [23] ggtern_3.0.0                    ggdendro_0.1-20                
## [25] ggbeeswarm_0.6.0                genefilter_1.64.0              
## [27] forestplot_1.7.2                checkmate_1.8.5                
## [29] magrittr_1.5                    doParallel_1.0.14              
## [31] iterators_1.0.10                foreach_1.4.4                  
## [33] dendsort_0.3.3                  cowplot_0.9.3                  
## [35] colorspace_1.3-2                broom_0.5.0                    
## [37] biomaRt_2.38.0                  beeswarm_0.2.3                 
## [39] abind_1.4-5                     AnnotationDbi_1.44.0           
## [41] forcats_0.3.0                   stringr_1.3.1                  
## [43] dplyr_0.7.7                     purrr_0.2.5                    
## [45] readr_1.1.1                     tidyr_0.8.2                    
## [47] tibble_1.4.2                    tidyverse_1.2.1                
## [49] ggplot2_3.1.0                   reshape2_1.4.3                 
## [51] DESeq2_1.22.0                   SummarizedExperiment_1.12.0    
## [53] DelayedArray_0.8.0              BiocParallel_1.16.0            
## [55] matrixStats_0.54.0              GenomicRanges_1.34.0           
## [57] GenomeInfoDb_1.18.0             IRanges_2.16.0                 
## [59] S4Vectors_0.20.0                Biobase_2.42.0                 
## [61] BiocGenerics_0.28.0             BloodCancerMultiOmics2017_1.2.0
## [63] BiocStyle_2.10.0               
## 
## loaded via a namespace (and not attached):
##   [1] utf8_1.1.4              proto_1.0.0             tidyselect_0.2.5       
##   [4] RSQLite_2.1.1           htmlwidgets_1.3         devtools_2.0.1         
##   [7] munsell_0.5.0           codetools_0.2-15        miniUI_0.1.1.1         
##  [10] withr_2.1.2             energy_1.7-5            highr_0.7              
##  [13] rstudioapi_0.8          robustbase_0.93-3       bayesm_3.1-0.1         
##  [16] labeling_0.3            slam_0.1-43             GenomeInfoDbData_1.2.0 
##  [19] lpsymphony_1.10.0       bit64_0.9-7             rprojroot_1.3-2        
##  [22] xfun_0.4                sets_1.0-18             R6_2.3.0               
##  [25] locfit_1.5-9.1          manipulateWidget_0.10.0 fgsea_1.8.0            
##  [28] bitops_1.0-6            assertthat_0.2.0        promises_1.0.1         
##  [31] nnet_7.3-12             processx_3.2.0          rlang_0.3.0.1          
##  [34] splines_3.5.1           lazyeval_0.2.1          acepack_1.4.1          
##  [37] BiocManager_1.30.3      yaml_2.2.0              modelr_0.1.2           
##  [40] crosstalk_1.0.0         backports_1.1.2         httpuv_1.4.5           
##  [43] Hmisc_4.1-1             relations_0.6-8         tensorA_0.36.1         
##  [46] tools_3.5.1             usethis_1.4.0           bookdown_0.7           
##  [49] nabor_0.5.0             gplots_3.0.1            sessioninfo_1.1.0      
##  [52] latex2exp_0.4.0         Rcpp_0.12.19            plyr_1.8.4             
##  [55] nat.utils_0.5.1         base64enc_0.1-3         progress_1.2.0         
##  [58] zlibbioc_1.28.0         RCurl_1.95-4.11         ps_1.2.0               
##  [61] prettyunits_1.0.2       rpart_4.1-13            haven_1.1.2            
##  [64] cluster_2.0.7-1         exactRankTests_0.8-29   fs_1.2.6               
##  [67] data.table_1.11.8       mvtnorm_1.0-8           pkgload_1.0.2          
##  [70] hms_0.4.2               mime_0.6                evaluate_0.12          
##  [73] XML_3.98-1.16           testthat_2.0.1          compiler_3.5.1         
##  [76] KernSmooth_2.23-15      crayon_1.3.4            htmltools_0.3.6        
##  [79] later_0.7.5             Formula_1.2-3           geneplotter_1.60.0     
##  [82] filehash_2.4-1          lubridate_1.7.4         DBI_1.0.0              
##  [85] MASS_7.3-51.1           boot_1.3-20             compositions_1.40-2    
##  [88] cli_1.0.1               marray_1.60.0           gdata_2.18.0           
##  [91] igraph_1.2.2            bindr_0.1.1             pkgconfig_2.0.2        
##  [94] foreign_0.8-71          xml2_1.2.0              annotate_1.60.0        
##  [97] vipor_0.4.5             webshot_0.5.1           XVector_0.22.0         
## [100] rvest_0.3.2             callr_3.0.0             digest_0.6.18          
## [103] fastmatch_1.1-0         rmarkdown_1.10          cellranger_1.1.0       
## [106] htmlTable_1.12          gtools_3.8.1            shiny_1.1.0            
## [109] nlme_3.1-137            jsonlite_1.5            desc_1.2.0             
## [112] fansi_0.4.0             pillar_1.3.0            lattice_0.20-35        
## [115] httr_1.3.1              DEoptimR_1.0-8          pkgbuild_1.0.2         
## [118] glue_1.3.0              remotes_2.0.2           fdrtool_1.2.15         
## [121] bit_1.1-14              stringi_1.2.4           blob_1.1.1             
## [124] caTools_1.17.1.1        latticeExtra_0.6-28     memoise_1.1.0

BloodCancerMultiOmics2017 - complete analysis

1 November 2018

Contents

1 Introduction

2 Characteristics of drugs and patients in the study

2.1 Drugs

2.2 Patient samples

3 Data quality control

3.1 Comparison of ATP luminescence of DMSO controls at timepoint 0 and 48 h

3.2 Reproducibility of the drug screening platform

4 Drug-induced effects on cell viability

5 Drug-drug correlation

5.1 Additional processing functions and parameters

5.2 CLL/T-PLL

5.2.1 Effect of drugs with similar target

5.3 T-PLL

5.4 MCL

6 Disease-specific drug response phenotypes

6.1 Example: dose-response curves

6.2 Example: drug effects as bee swarms

7 Target profiling of AZD7762 and PF477736

8 Global overview of the drug response landscape

8.1 Setup

8.1.1 Additional functions

8.1.2 Preprocess IGHV gene usage

8.1.3 Some drugs that we are particularly interested in

8.1.4 Feature weighting and score for dendrogram reordering

8.2 Define disease groups

8.3 Code for heatmap

8.3.1 Matrix row / column clustering

8.3.2 Prepare sample annotations

8.3.3 Define the annotation colors

8.3.4 Heatmap drawing function

8.4 Draw the heatmaps

8.4.1 Identify places where gaps between the rows should be

9 Relative drug effects - ternary diagrams

9.1 Additional settings

9.2 Calculating the coordinates

9.3 Plot ternaries

9.3.1 BCR drugs

9.3.2 BTK & MEK & MTOR

9.3.3 PI3K & MEK & MTOR

9.3.4 SYK & MEK & MTOR

10 Comparison of gene expression responses to drug treatments

10.1 Differential expression using Limma

11 Co-sensitivity patterns of the four response groups

11.1 Methodology of building groups

11.2 Mean of each group

11.3 Bee swarm plots for groups

12 Kaplan-Meier plot for groups (time from sample to next treatment)

13 Incidence of somatic gene mutations and CNVs in the four groups

13.1 Test gene mutations vs. groups

14 Differences in gene expression profiles between drug-response groups

14.1 Cytokine / chemokine expression in mTOR group

15 Response to cytokines in CLL

16 Gene set enrichment analysis on BTK, mTOR, MEK groups

16.1 Preprocessing RNAseq data

16.2 Differential gene expression

16.3 Gene set enrichment analysis

16.3.1 Functions for enrichment analysis and plots

16.3.2 Geneset enrichment based on Hallmark set (H)

16.3.3 Geneset enrichment based on oncogenic signature set (C6)

16.4 Everolimus response VS gene expression (within mTOR group)

16.4.1 Correlation test

16.4.2 Enrichment heatmaps for mTOR group with overlapped genes indicated

16.5 Ibrutinib response VS gene expression (within BTK group)

16.5.1 Correlation test

16.5.2 Enrichment heatmaps for BTK group with overlapped genes indicated

16.5.3 Selumetinib response VS gene expression (within MEK group)

16.5.4 Correlation test

16.5.5 Enrichment heatmaps for MEK group

17 Single associations of drug response with gene mutation or type of disease (IGHV included)

17.1 Function which test associations of interest

17.2 Associations of ex vivo drug responses with genomic features in CLL

17.2.1 Prepare objects for testing

17.2.2 Assesment of importance of batch effect

17.2.3 Associations of drug response with mutations in CLL

17.2.4 Volcano plots: summary of the results

17.3 Associations of drug responses with genomic features in CLL independently of IGHV status

17.4 Drug response dependance on cell origin of disease

8.1.2 Preprocess `IGHV gene usage`