This repository contains the R code and package for the mi4p methodology (Multiple Imputation for Proteomics), proposed by Marie Chion, Christine Carapito and Frédéric Bertrand (2021) in Accounting for multiple imputation-induced variability for differential analysis in mass spectrometry-based label-free quantitative proteomics, https://arxiv.org/abs/2108.07086.
The following material is available on the Github repository of the package https://github.com/mariechion/mi4p/.
The Functions
folder contains all the functions used
for the workflow.
The Simulation-1
, Simulation-2
and
Simulation-3
folders contain all the R scripts and data
used to conduct simulated experiments and evaluate our
methodology.
The Arabidopsis_UPS
and Yeast_UPS
folders contain all the R scripts and data used to challenge our
methodology on real proteomics datasets. Raw experimental data were
deposited with the ProteomeXchange Consortium via the PRIDE partner
repository with the dataset identifiers PXD003841 and
PXD027800.
This website and these examples were created by M. Chion, C. Carapito and F. Bertrand.
You can install the released version of mi4p from CRAN with:
You can install the development version of mi4p from github with:
set.seed(4619)
datasim <- protdatasim()
str(datasim)
#> 'data.frame': 200 obs. of 11 variables:
#> $ id.obs: int 1 2 3 4 5 6 7 8 9 10 ...
#> $ X1 : num 99.6 99.9 100.2 99.8 100.4 ...
#> $ X2 : num 97.4 101.3 100.3 100.2 101.7 ...
#> $ X3 : num 100.3 100.9 99.1 101.2 100.6 ...
#> $ X4 : num 99.4 99.2 98.5 99.1 99.5 ...
#> $ X5 : num 98.5 99.7 100 100.2 100.7 ...
#> $ X6 : num 200 199 199 200 199 ...
#> $ X7 : num 200 200 202 199 199 ...
#> $ X8 : num 202 199 200 199 201 ...
#> $ X9 : num 200 200 199 201 200 ...
#> $ X10 : num 200 198 200 201 199 ...
#> - attr(*, "metadata")='data.frame': 10 obs. of 3 variables:
#> ..$ Sample.name: chr [1:10] "X1" "X2" "X3" "X4" ...
#> ..$ Condition : Factor w/ 2 levels "A","B": 1 1 1 1 1 2 2 2 2 2
#> ..$ Bio.Rep : int [1:10] 1 2 3 4 5 6 7 8 9 10
MV1pct.impMLE.mi4limma.res <- mi4limma(qData = apply(MV1pct.impMLE,1:2,mean),
sTab = attr(datasim, "metadata"),
VarRubin = sqrt(MV1pct.impMLE.VarRubin.S2))
MV1pct.impMLE.mi4limma.res
(simplify2array(MV1pct.impMLE.mi4limma.res)$P_Value.A_vs_B_pval)[1:10]
(simplify2array(MV1pct.impMLE.mi4limma.res)$P_Value.A_vs_B_pval)[11:200]<=0.05
True positive rate
False positive rate
MV1pct.impMLE.dapar.res <-limmaCompleteTest.mod(qData = apply(MV1pct.impMLE,1:2,mean), sTab = attr(datasim, "metadata"))
MV1pct.impMLE.dapar.res
Simulate a list of 100 datasets.
set.seed(4619)
norm.200.m100.sd1.vs.m200.sd1.list <- lapply(1:100, protdatasim)
metadata <- attr(norm.200.m100.sd1.vs.m200.sd1.list[[1]],"metadata")
100 datasets with parallel comuting support. Quite long to run even with parallel computing support.
MV1pct.impMLE.VarRubin.S2 <- lapply(1:length(MV1pct.impMLE.VarRubin.Mat), function(id.dataset){
print(paste("Dataset", id.dataset, "out of",length(MV1pct.impMLE.VarRubin.Mat), Sys.time()))
as.numeric(lapply(MV1pct.impMLE.VarRubin.Mat[[id.dataset]], function(aaa){
DesMat = mi4p::make.design(metadata)
return(max(diag(aaa)%*%t(DesMat)%*%DesMat))
}))
})
MV1pct.impMLE.mi4limma.res <- foreach(iforeach = 1:100, .errorhandling = 'stop', .verbose = T) %dopar%
mi4limma(qData = apply(MV1pct.impMLE[[iforeach]],1:2,mean),
sTab = metadata,
VarRubin = sqrt(MV1pct.impMLE.VarRubin.S2[[iforeach]]))
MV1pct.impMLE.dapar.res <- foreach(iforeach = 1:100, .errorhandling = 'stop', .verbose = T) %dopar%
limmaCompleteTest.mod(qData = apply(MV1pct.impMLE[[iforeach]],1:2,mean),
sTab = metadata)