Workflow4Metabolomics (W4M) provides tools to help people to process metabolomic datasets. Its preferred Chanel is the web-based interface Galaxy, particularly suited for analysts with no skill regarding software programing nor programing language in general. Galaxy also has the advantage to be compatible with a large variety of languages, enabling to combine tools coded in various programing languages, which is a huge advantage regarding to the diversity of steps (and thus tools) needed for comprehensive metabolomic data processing.
Nonetheless, a significant part of the tools provided by W4M are coded using R. Due to the diversity of tools, W4M chose to adopt common standards across its tools, in particular regarding data table formats and basics data checks. It also intents to gather utility functions to help the construction of R-based Galaxy tools using W4M standards, along with functions to easily handle basic actions given the W4M table data format.
The W4MRUtils
package is meant to gather such functions, to enable W4M R-based Galaxy tools to have easy access to these standardised approaches.
Some functions are closely linked to Galaxy integration as the parse_args
function for example, but a large variety of functions can be helpful independently of Galaxy. To use these functions, it is useful to understand the choices made by W4M, in particular regarding the formats, as these will be often used as input parameters for functions.
The main format aspect to consider is the W4M 3-tables format. Metabolomic data, similarly to other types of ’omics data, can be divided in 3 main types of information:
sampleMetadata
in short): it corresponds to information regarding the samples that have been analysed with an analytical device; this covers any information about the samples that is not the analytical measures themselves (examples: studied biological groups, batches of analysis)dataMatrix
in short): it corresponds to the intensities measured using the analytical device used; it is a table with intensity values for each sample and for each analytical variablevariableMetadata
in short): it corresponds to information regarding the variables that have been measured through the analytical device; this covers any information about the variables that is not the analytical measures themselves (examples: m/z ratios for MS data, chemical shift for NMR data)These 3 types of information can be stocked in 3 distinct tables. In the package, the chosen format is the following:
These 3 tables are standard inputs and/or outputs for several of the functions found in this package. Consequently, when you need to specify sample-wise or variable-wise information, you will generally be invited to stock it in one of these tables (same goes for outputs of functions, where to find the generated information).
There are several types of functions in the package. Globally, you will find functions enabling the following:
logger$info("Parsing parameters...")
#> [32m[ info-A Test Tool-16:31:35] - Parsing parameters...[m
args <- optparse_parameters(
input = optparse_character(),
output = optparse_character(),
threshold = optparse_numeric(default = 10),
args = commandArgs()
)
logger$info("Parsing parameters OK.")
#> [32m[ info-A Test Tool-16:31:35] - Parsing parameters OK.[m
show_galaxy_header(
TOOL_NAME,
"1.2.0",
args = args,
logger = logger,
show_sys = FALSE
)
#> [32m[ info-A Test Tool-16:31:35] - Job starting time:[m
#> [32m[ info-A Test Tool-16:31:35] - ven. 08 sept. 2023 16:31:35[m
#> [32m[ info-A Test Tool-16:31:35] - ------------------------------------------------------------------------------[m
#> [32m[ info-A Test Tool-16:31:35] - Parameters used in A Test Tool - version 1.2.0:[m
#> [32m[ info-A Test Tool-16:31:35] - [m
#> [32m[ info-A Test Tool-16:31:35] - List of 4[m
#> [32m[ info-A Test Tool-16:31:35] - $ help : logi FALSE[m
#> [32m[ info-A Test Tool-16:31:35] - $ input : chr "/tmp/RtmpsJ5uhs/./input.csv"[m
#> [32m[ info-A Test Tool-16:31:35] - $ output : chr "/tmp/RtmpsJ5uhs/./output.csv"[m
#> [32m[ info-A Test Tool-16:31:35] - $ threshold: num 2[m
#> [32m[ info-A Test Tool-16:31:35] - ------------------------------------------------------------------------------[m
check_parameters <- function(args, logger) {
logger$info("Checking parameters...")
check_one_character(args$input)
check_one_character(args$output)
check_one_numeric(args$threshold)
if (args$threshold < 1) {
logger$errorf(
"The threshold is too low (%s). Cannot continue", args$threshold
)
stopf("Threshold = %s is tool low.", args$threshold)
}
if (args$threshold < 3) {
logger$warningf(
paste(
"The threshold is very low (%s).",
"This may lead to erroneous results."
),
args$threshold
)
}
logger$info("Parameters OK.")
}
read_input <- function(args, logger) {
logger$infof("Reading file in %s ...", args$input)
return(read.csv(args$input))
}
process_input <- function(args, logger, table) {
filter <- table[1, ] > args$threshold
logger$infof(
"%s values filtered (%s%%)",
length(filter),
round(length(filter) / nrow(table) * 100, 2)
)
if (length(filter) == nrow(table)) {
logger$warning("All values have been filtered.")
}
return(table[which(table[, 1] > args$threshold), ])
}
write_output <- function(args, logger, output) {
logger$info("Writing output file...")
write.table(output, args$output)
}
# check_parameters(args, logger$sublogger("Param Checking"))
# input <- read_input(args, logger$sublogger("Inputs Reader"))
# print(input)
# output <- process_input(args, logger$sublogger("Inputs Processing"), input)
# print(output)
# write_output(args, logger$sublogger("Output Writter"), output)
# show_galaxy_footer(TOOL_NAME, "1.2.0", logger = logger, show_packages = FALSE)