RaMS and friends

Table of contents:

The strength of RaMS is its simple data format. Table-like data structures are common in most programming languages, and they can always be converted to the nigh-universal matrix format. The goal of this vignette is to illustrate this strength by exporting MS data to several formats that can be used outside of R.

Standard export to CSV

As with all rectangular data, RaMS objects can be easily exported to CSV files with base R functions. This works best with a few chromatograms at a time, as the millions of data points found in most MS files can overwhelm common file readers.

library(RaMS)

# Locate an MS file
single_file <- system.file("extdata", "LB12HL_AB.mzML.gz", package = "RaMS")

# Grab the MS data
msdata <- grabMSdata(single_file, grab_what = "everything")
## 
## Reading file LB12HL_AB.mzML.gz... 0.21 secs 
## Reading MS1 data...0.25 secs 
## Reading MS2 data...0.02 secs 
## Reading BPC...0.08 secs 
## Reading TIC...0.08 secs 
## Reading file metadata...0.13 secs 
## Binding files together into single data.table
## Total time: 0.77 secs
# Write out MS1 data to .csv file
write.csv(x = msdata$MS1, file = "MS1_data.csv")

# Clean up afterward
file.remove("MS1_data.csv")
## [1] TRUE

Fancier export to Excel

Excel workbooks are a common format because of their intuitive GUI and widespread adoption. They can also encode more information than CSV files due to their multiple “sheets” within a single workbook - perfect for encoding both MS1 and MS2 information in one place. This vignette uses the openxlsx package, although there are several alternatives with identical functionality.

library(openxlsx)

# Locate an MS2 file
MS2_file <- system.file("extdata", "S30657.mzML.gz", package = "RaMS")

# Grab the MS1 and MS2 data
msdata <- grabMSdata(MS2_file, grab_what=c("MS1", "MS2"))
## 
## Reading file S30657.mzML.gz... 0.35 secs 
## Reading MS1 data...0.19 secs 
## Reading MS2 data...0.06 secs 
## Binding files together into single data.table
## Total time: 0.61 secs
# Write out MS data to Excel file
# openxlsx writes each object in a list to a unique sheet
# Produces one sheet for MS1 and one for MS2
write.xlsx(msdata, file = "MS2_data.xlsx")

# Clean up afterward
file.remove("MS2_data.xlsx")
## [1] TRUE

Exporting to SQL database

For more robust data processing and storage, or to work with larger-than-memory data sets, SQL databases are an excellent choice. This vignette will demo the RSQLite package’s engine, although several other database engines have similar functionality.

library(DBI)
# Get data from multiple files to show off
mzml_files <- system.file(c("extdata/LB12HL_AB.mzML.gz", 
                            "extdata/LB12HL_CD.mzML.gz"), 
                          package = "RaMS")
msdata <- grabMSdata(mzml_files)
##   |                                                                              |                                                                      |   0%  |                                                                              |===================================                                   |  50%  |                                                                              |======================================================================| 100%
## Total time: 1.34 secs
# Create the sqlite database and connect to it
MSdb <- dbConnect(RSQLite::SQLite(), "MSdata.sqlite")

# Export MS1 and MS2 data to sqlite tables
dbWriteTable(MSdb, "MS1", msdata$MS1)
dbWriteTable(MSdb, "MS2", msdata$MS2)
dbListTables(MSdb)
## [1] "MS1" "MS2"
# Perform a simple query to ensure data was exported correctly
dbGetQuery(MSdb, 'SELECT * FROM MS1 LIMIT 3')
##      rt       mz        int          filename
## 1 4.009 139.0503 1800550.12 LB12HL_AB.mzML.gz
## 2 4.009 148.0967  206310.81 LB12HL_AB.mzML.gz
## 3 4.009 136.0618   71907.15 LB12HL_AB.mzML.gz
# Perform EIC extraction in SQL rather than in R
EIC_query <- 'SELECT * FROM MS1 WHERE mz BETWEEN :lower_bound AND :upper_bound'
query_params <- list(lower_bound=118.086, upper_bound=118.087)
EIC <- dbGetQuery(MSdb, EIC_query, params = query_params)

# Append with additional files
extra_file <- system.file("extdata", "LB12HL_EF.mzML.gz", package = "RaMS")
extra_msdata <- grabMSdata(extra_file, grab_what = "everything")
## 
## Reading file LB12HL_EF.mzML.gz... 0.2 secs 
## Reading MS1 data...0.13 secs 
## Reading MS2 data...0.02 secs 
## Reading BPC...0.08 secs 
## Reading TIC...0.1 secs 
## Reading file metadata...0.14 secs 
## Binding files together into single data.table
## Total time: 0.67 secs
dbGetQuery(MSdb, 'SELECT COUNT(*) FROM MS1') # Initial number of rows
##   COUNT(*)
## 1    42313
dbAppendTable(MSdb, "MS1", extra_msdata$MS1)
## [1] 22124
# Confirm three different files exist in DB
dbGetQuery(MSdb, 'SELECT DISTINCT filename FROM MS1')
##            filename
## 1 LB12HL_AB.mzML.gz
## 2 LB12HL_CD.mzML.gz
## 3 LB12HL_EF.mzML.gz
# Confirm new rows have been added
dbGetQuery(MSdb, 'SELECT COUNT(*) FROM MS1')
##   COUNT(*)
## 1    64437
# Disconnect after export
dbDisconnect(MSdb)

# Clean up afterward
unlink("MSdata.sqlite")

Interfacing with Python via reticulate

R and Python are commonly used together, and the reticulate package makes this even easier by enabling a Python interpreter within R. RStudio, in which this vignette was written, supports both R and Python code chunks as shown below.

R code chunk: {r}

# Locate a couple MS files
data_dir <- system.file("extdata", package = "RaMS")
file_paths <- list.files(data_dir, pattern = "HL.*mzML", full.names = TRUE)

msdata <- grabMSdata(files = file_paths, grab_what = "BPC")$BPC
##   |                                                                              |                                                                      |   0%  |                                                                              |=======================                                               |  33%  |                                                                              |===============================================                       |  67%  |                                                                              |======================================================================| 100%
## Total time: 0.44 secs

Python code chunk: {python}

# Not run to pass R CMD check on GitHub
# Make sure python, matplotlib, and seaborn are installed

import seaborn as sns
import matplotlib.pyplot as plt

sns.relplot(data=r.msdata, kind="line", x="rt", y="int", hue="filename")
plt.show()