In this short tutorial we showcase a simple pipeline to create a bulkAnalyseR app using a publicly available dataset from the Gene Expression Omnibus (GEO). No pre-requisites are required, as the installation of bulkAnalyseR and download of the data are included.
The example app described in this vignette can be found here.
First, install the latest version of bulkAnalyseR, starting with the CRAN and Bioconductor dependencies:
<- c(
packages.cran "ggplot2", "shiny", "shinythemes", "gprofiler2", "stats", "ggrepel",
"utils", "RColorBrewer", "circlize", "shinyWidgets", "shinyjqui",
"dplyr", "magrittr", "ggforce", "rlang", "glue", "matrixStats",
"noisyr", "tibble", "ggnewscale", "ggrastr", "visNetwork", "shinyLP",
"grid", "DT", "scales", "shinyjs", "tidyr", "UpSetR", "ggVennDiagram"
)<- packages.cran[!(packages.cran %in% installed.packages()[, "Package"])]
new.packages.cran if(length(new.packages.cran))
install.packages(new.packages.cran)
<- c(
packages.bioc "edgeR", "DESeq2", "preprocessCore", "GENIE3", "ComplexHeatmap"
)<- packages.bioc[!(packages.bioc %in% installed.packages()[,"Package"])]
new.packages.bioc if(length(new.packages.bioc)){
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
::install(new.packages.bioc)
BiocManager
}
install.packages("bulkAnalyseR")
We start by downloading and reading in the expression matrix. Rows represent genes/features and columns represent samples (note you need an internet connection to run the code below). The matrix is from a 2022 study on the Stem Cell transcriptional response to Microglia-Conditioned Media. We only use a few samples in the study for illustrative purposes.
<- paste0(tempdir(), "expression_matrix.csv.gz")
download_path download.file(
"https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE178620&format=file&file=GSE178620%5Fraw%5Fabundances%2Ecsv%2Egz",
download_path
)<- as.matrix(read.csv(download_path, row.names = 1))[, c(1,2,19,20)]
exp head(exp)
##> control_G322_G322_1 control_G322_G322_2 microglia_067MG_G322_1
##> ENSG00000223972 0 0 0
##> ENSG00000227232 51 45 25
##> ENSG00000278267 6 0 0
##> ENSG00000243485 0 0 0
##> ENSG00000284332 0 0 0
##> ENSG00000237613 0 0 0
##> microglia_067MG_G322_2
##> ENSG00000223972 0
##> ENSG00000227232 40
##> ENSG00000278267 0
##> ENSG00000243485 0
##> ENSG00000284332 0
##> ENSG00000237613 0
We use a very simple metadata table with just the main condition in the experiment. Detailed metadata is available for all GEO datasets and can be downloaded and used instead.
<- data.frame(
meta name = colnames(exp),
condition = sapply(colnames(exp), USE.NAMES = FALSE, function(nm){
strsplit(nm, "_")[[1]][1]
})
)
meta##> name condition
##> 1 control_G322_G322_1 control
##> 2 control_G322_G322_2 control
##> 3 microglia_067MG_G322_1 microglia
##> 4 microglia_067MG_G322_2 microglia
We can now denoise and normalise the data using bulkAnalyseR
<- bulkAnalyseR::preprocessExpressionMatrix(exp, output.plot = TRUE)
exp.proc ##> >>> noisyR counts approach pipeline <<<
##> The input matrix has 60671 rows and 4 cols
##> number of genes: 60671
##> number of samples: 4
##> Calculating the number of elements per window
##> the number of elements per window is 6067
##> the step size is 303
##> the selected similarity metric is correlation_pearson
##> Working with sample 1
##> Working with sample 2
##> Working with sample 3
##> Working with sample 4
##> Calculating noise thresholds for 4 samples...
##> similarity.threshold = 0.25
##> method.chosen = Boxplot-IQR
##> Denoising expression matrix...
##> removing noisy genes
##> adjusting matrix
##> >>> Done! <<<
##> Performing quantile normalisation...
##> Done!
Finally, we can create a shiny app. This example app can be found here.
::generateShinyApp(
bulkAnalyseRshiny.dir = "shiny_GEO",
app.title = "Shiny app for visualisation of GEO data",
modality = "RNA",
expression.matrix = exp.proc,
metadata = meta,
organism = "hsapiens",
org.db = "org.Hs.eg.db"
)
sessionInfo()
##> R version 4.2.2 (2022-10-31 ucrt)
##> Platform: x86_64-w64-mingw32/x64 (64-bit)
##> Running under: Windows 10 x64 (build 22621)
##>
##> Matrix products: default
##>
##> locale:
##> [1] LC_COLLATE=C
##> [2] LC_CTYPE=English_United Kingdom.utf8
##> [3] LC_MONETARY=English_United Kingdom.utf8
##> [4] LC_NUMERIC=C
##> [5] LC_TIME=English_United Kingdom.utf8
##>
##> attached base packages:
##> [1] stats graphics grDevices utils datasets methods base
##>
##> loaded via a namespace (and not attached):
##> [1] tidyselect_1.2.0 xfun_0.35 bslib_0.4.1
##> [4] lattice_0.20-45 splines_4.2.2 colorspace_2.0-3
##> [7] vctrs_0.5.1 generics_0.1.3 htmltools_0.5.4
##> [10] yaml_2.3.6 mgcv_1.8-41 utf8_1.2.2
##> [13] noisyr_1.0.0 rlang_1.0.6 jquerylib_0.1.4
##> [16] pillar_1.8.1 later_1.3.0 glue_1.6.2
##> [19] withr_2.5.0 DBI_1.1.3 foreach_1.5.2
##> [22] lifecycle_1.0.3 stringr_1.5.0 munsell_0.5.0
##> [25] gtable_0.3.1 codetools_0.2-18 evaluate_0.19
##> [28] labeling_0.4.2 knitr_1.41 fastmap_1.1.0
##> [31] httpuv_1.6.7 fansi_1.0.3 highr_0.9
##> [34] preprocessCore_1.60.0 Rcpp_1.0.9 xtable_1.8-4
##> [37] scales_1.2.1 promises_1.2.0.1 cachem_1.0.6
##> [40] jsonlite_1.8.4 bulkAnalyseR_1.1.0 farver_2.1.1
##> [43] mime_0.12 ggplot2_3.4.0 digest_0.6.31
##> [46] stringi_1.7.8 dplyr_1.0.10 shiny_1.7.3
##> [49] grid_4.2.2 cli_3.4.1 tools_4.2.2
##> [52] magrittr_2.0.3 philentropy_0.7.0 sass_0.4.4
##> [55] tibble_3.1.8 pkgconfig_2.0.3 Matrix_1.5-1
##> [58] ellipsis_0.3.2 assertthat_0.2.1 rmarkdown_2.18
##> [61] rstudioapi_0.14 iterators_1.0.14 R6_2.5.1
##> [64] nlme_3.1-160 compiler_4.2.2