Interperting the residual variance

In general, I recommend against interpreting the fraction of variance explained by residuals. This fraction is driven by:

  1. the particulars of the study design
  2. measurement precision (i.e. high read counts give more precise measurements)
  3. biological variability
  4. technical variability (i.e. batch effects).

If you have additional variables that explain variation in measured gene expression, you should include them in order to avoid confounding with your variable of interest. But a particular residual fraction is not ‘good’ or ‘bad’ and is not a good metric of determining whether more variables should be included.

Current GitHub issues

See GitHub page for up-to-date responses to users’ questions.

Session Info

## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /media/volume/teran2_disk/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB             
##  [4] LC_COLLATE=C               LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
## [10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.37     R6_2.5.1          fastmap_1.2.0     xfun_0.49         cachem_1.1.0     
##  [6] knitr_1.49        htmltools_0.5.8.1 rmarkdown_2.29    lifecycle_1.0.4   cli_3.6.3        
## [11] sass_0.4.9        jquerylib_0.1.4   compiler_4.4.1    tools_4.4.1       evaluate_1.0.1   
## [16] bslib_0.8.0       yaml_2.3.10       rlang_1.1.4       jsonlite_1.8.9