Table of contents:
reticulate
The strength of RaMS
is its simple data format.
Table-like data structures are common in most programming languages, and
they can always be converted to the nigh-universal matrix format. The
goal of this vignette is to illustrate this strength by exporting MS
data to several formats that can be used outside of R.
As with all rectangular data, RaMS
objects can be easily
exported to CSV files with base R functions. This works best with a few
chromatograms at a time, as the millions of data points found in most MS
files can overwhelm common file readers.
library(RaMS)
# Locate an MS file
single_file <- system.file("extdata", "LB12HL_AB.mzML.gz", package = "RaMS")
# Grab the MS data
msdata <- grabMSdata(single_file, grab_what = "everything")
##
## Reading file LB12HL_AB.mzML.gz... 0.21 secs
## Reading MS1 data...0.25 secs
## Reading MS2 data...0.02 secs
## Reading BPC...0.08 secs
## Reading TIC...0.08 secs
## Reading file metadata...0.13 secs
## Binding files together into single data.table
## Total time: 0.77 secs
# Write out MS1 data to .csv file
write.csv(x = msdata$MS1, file = "MS1_data.csv")
# Clean up afterward
file.remove("MS1_data.csv")
## [1] TRUE
Excel workbooks are a common format because of their intuitive GUI
and widespread adoption. They can also encode more information than CSV
files due to their multiple “sheets” within a single workbook - perfect
for encoding both MS1 and MS2 information in one place. This vignette
uses the openxlsx
package, although there are several alternatives with identical
functionality.
library(openxlsx)
# Locate an MS2 file
MS2_file <- system.file("extdata", "S30657.mzML.gz", package = "RaMS")
# Grab the MS1 and MS2 data
msdata <- grabMSdata(MS2_file, grab_what=c("MS1", "MS2"))
##
## Reading file S30657.mzML.gz... 0.35 secs
## Reading MS1 data...0.19 secs
## Reading MS2 data...0.06 secs
## Binding files together into single data.table
## Total time: 0.61 secs
# Write out MS data to Excel file
# openxlsx writes each object in a list to a unique sheet
# Produces one sheet for MS1 and one for MS2
write.xlsx(msdata, file = "MS2_data.xlsx")
# Clean up afterward
file.remove("MS2_data.xlsx")
## [1] TRUE
For more robust data processing and storage, or to work with
larger-than-memory data sets, SQL databases are an excellent choice.
This vignette will demo the RSQLite
package’s engine, although several other database engines have similar
functionality.
library(DBI)
# Get data from multiple files to show off
mzml_files <- system.file(c("extdata/LB12HL_AB.mzML.gz",
"extdata/LB12HL_CD.mzML.gz"),
package = "RaMS")
msdata <- grabMSdata(mzml_files)
## | | | 0% | |=================================== | 50% | |======================================================================| 100%
## Total time: 1.34 secs
# Create the sqlite database and connect to it
MSdb <- dbConnect(RSQLite::SQLite(), "MSdata.sqlite")
# Export MS1 and MS2 data to sqlite tables
dbWriteTable(MSdb, "MS1", msdata$MS1)
dbWriteTable(MSdb, "MS2", msdata$MS2)
dbListTables(MSdb)
## [1] "MS1" "MS2"
# Perform a simple query to ensure data was exported correctly
dbGetQuery(MSdb, 'SELECT * FROM MS1 LIMIT 3')
## rt mz int filename
## 1 4.009 139.0503 1800550.12 LB12HL_AB.mzML.gz
## 2 4.009 148.0967 206310.81 LB12HL_AB.mzML.gz
## 3 4.009 136.0618 71907.15 LB12HL_AB.mzML.gz
# Perform EIC extraction in SQL rather than in R
EIC_query <- 'SELECT * FROM MS1 WHERE mz BETWEEN :lower_bound AND :upper_bound'
query_params <- list(lower_bound=118.086, upper_bound=118.087)
EIC <- dbGetQuery(MSdb, EIC_query, params = query_params)
# Append with additional files
extra_file <- system.file("extdata", "LB12HL_EF.mzML.gz", package = "RaMS")
extra_msdata <- grabMSdata(extra_file, grab_what = "everything")
##
## Reading file LB12HL_EF.mzML.gz... 0.2 secs
## Reading MS1 data...0.13 secs
## Reading MS2 data...0.02 secs
## Reading BPC...0.08 secs
## Reading TIC...0.1 secs
## Reading file metadata...0.14 secs
## Binding files together into single data.table
## Total time: 0.67 secs
## COUNT(*)
## 1 42313
## [1] 22124
## filename
## 1 LB12HL_AB.mzML.gz
## 2 LB12HL_CD.mzML.gz
## 3 LB12HL_EF.mzML.gz
## COUNT(*)
## 1 64437
reticulate
R and Python are commonly used together, and the reticulate
package makes this even easier by enabling a Python interpreter within
R. RStudio, in which this vignette was written, supports both R and
Python code chunks as shown below.
# Locate a couple MS files
data_dir <- system.file("extdata", package = "RaMS")
file_paths <- list.files(data_dir, pattern = "HL.*mzML", full.names = TRUE)
msdata <- grabMSdata(files = file_paths, grab_what = "BPC")$BPC
## | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
## Total time: 0.44 secs