Galaxy is an open, web-based platform for data-intensive biomedical research. It provides an easy-to-use web interface and can expose bioinformatics workflows written in any programming language.
Normally, in order to expose new functionality (a tool in Galaxy parlance) in a Galaxy instance, you have to manually create an XML file with information about the function, and modify an additional XML file.
The RGalaxy
package automates this process, pulling most of the necessary
information from the function itself and its manual page (you provide the
remaining information as arguments to the galaxy
function).
Let's say you want to create a Galaxy tool that adds two numbers.
First, load RGalaxy
:
library(RGalaxy)
Then write a function like this:
addTwoNumbers <-
function(
number1=GalaxyNumericParam(required=TRUE),
number2=GalaxyNumericParam(required=TRUE),
sum=GalaxyOutput("sum", "txt"))
{
cat(number1 + number2, file=sum)
}
There are a few things to notice about this function:
R
's numeric
type, we are using a special
class called GalaxyNumericParam
. This is because Galaxy
(unlike R
) needs to know the type of each parameter,
as well as other informationrequired=TRUE
tells Galaxy not to allow empty values.numeric
values:t <- tempfile()
addTwoNumbers(2, 2, t)
readLines(t, warn=FALSE)
## [1] "4"
We're almost ready to tell Galaxy about our function, but first we
need to document it with a manual page. RGalaxy
will use information
in this page to create the Galaxy tool, and the man page will also
be useful to anyone who wants to run your function in R
.
The man page might look like this:
\name{addTwoNumbers}
\alias{addTwoNumbers}
\title{Add two numbers}
\description{
An example function that can be made into a Galaxy tool.
Takes two numbers, adds them, and returns a file containing
the result.
}
\usage{
addTwoNumbers(number1=GalaxyNumericParam(required=TRUE),
number2=GalaxyNumericParam(required=TRUE),
sum=GalaxyOutput("sum", "txt"))
}
\arguments{
\item{number1}{
The first number to add.
}
\item{number2}{
The second number to add.
}
\item{sum}{
Where the result of the addition should be written.
}
}
\value{
invisible(NULL)
}
\seealso{
\code{\link{galaxy}},
\code{\link{GalaxyConfig}}, \code{\link{GalaxyOutput}},
\code{\link{addTwoNumbers}}
}
\examples{
t <- tempfile()
addTwoNumbers(2, 2, t)
readLines(t, warn=FALSE)
}
Some things to note about this man page:
name
, alias
, description
, title
, usage
, and
arguments
sections are required. The details
section
is not required but its use is encouraged (RGalaxy
will
notify you if this section is missing).examples
section, this section
is only useful to people running your function via R
.
This section is not used by RGalaxy
.Before we can tell Galaxy about our function, we have to install Galaxy.
The Galaxy Installation page
gives full instructions, but in a nutshell, you can install
Galaxy as follows (you may need to install
Mercurial, which provides the hg
command):
hg clone https://bitbucket.org/galaxy/galaxy-dist/
The directory where you just installed Galaxy (a full path ending in
galaxy-dist
) is your “Galaxy Home” directory, represented by
galaxyHome
in the following code snippet.
Now we point Galaxy to the function we just wrote:
galaxy("addTwoNumbers",
galaxyConfig=
GalaxyConfig(galaxyHome, "mytool", "Test Section",
"testSectionId")
)
## Warning: Not enough information to create a functional test.
## Note: Did not find section 'Details' in man page.
## [1] "/tmp/RtmpW01bNp/Rbuild74406e82df89/RGalaxy/vignettes/tools/mytool/addTwoNumbers.xml"
Notice the warning about functional tests. We'll cover that later in the vignette.
The galaxy
function notifies you that the details
section
of the man page is empty. It also returns the path to the
XML tool wrapper it created.
To start Galaxy, open a command window and change to your Galaxy home directory (defined earlier). Then issue this command:
./run.sh –reload
If Galaxy is already running, you should stop it
(with control-C) and restart it with the command above.
Galaxy should always be restarted after running the
galaxy
function.
You can now access Galaxy at http://localhost:3000.
If you click on “Test Section” and then “Add Two Numbers”, you should see something like Figure 1.
Things to notice about this:
RGalaxy
has generated a tool in which each parameter
has some explanatory text that comes from our man page.name
to galaxy()
).GalaxyNumericParam
in our
function.required=TRUE
.If we enter 10 and 5, then click “Execute”, Galaxy will run and when finished will show 'sum.txt' in the History Pane at the right. Clicking on it should show something like Figure 2. You can download the result or send it to another Galaxy tool.
We just ran Galaxy and made sure our tool worked.
It would be nice to automate this procedure so we
can know that for inputs x
and y
, our tool
will always produce output d
.
With a couple of small additions, we can accomplish this. Our function will have a self-contained test.
Also, when submitting tools to the public Galaxy instance, functional tests like this are required.
Here is our addTwoNumbers
function again, this time with
a functional test:
addTwoNumbersWithTest <-
function(
number1=GalaxyNumericParam(required=TRUE, testValues=5L),
number2=GalaxyNumericParam(required=TRUE, testValues=5L),
sum=GalaxyOutput("sum", "txt"))
{
cat(number1 + number2, file=sum)
}
The only visible difference is that we've added a testValues
argument to each input parameter.
Another, subtler difference is that we have added
a file in our package called inst/functionalTests/addTwoNumbersWithTest/sum
,
which contains the expected output of the function.
By using this convention, we ensure RGalaxy can find the file.
Does the function pass its functional test?
runFunctionalTest(addTwoNumbersWithTest)
## [1] TRUE
Note that this just runs the function in R
, it does not test it
inside a running Galaxy. But because the functional test infrastructure is
present in the XML file generated by RGalaxy
, you can do that from your
Galaxy home directory as follows:
./run_functional_tests.sh -id addTwoNumbersWithTest
The output of the test will be written to run_functional_tests.html
.
Note that R
doesn't always produce the same output each time,
even though the files may look identical. The pdf
function
in particular may produce different files. You can use the png
function as a workaround.
We've glossed over it so far, but the addTwoNumbers()
function
and its man page live a package (the RGalaxy
package in this case).
It is possible to expose in Galaxy a function that does not
live in a package, but you have to provide a lot of extra information.
We recommend that the functions you expose live in a package
(and be exported in your NAMESPACE file).
library()
within your function.gstop()
. Error messages will be seen
by the Galaxy user. Also use gwarning()
and gmessage()
for
warnings and informational messages.Galaxy runs tools by invoking scripts in various languages
at the command line. These scripts are generally self-contained.
Sometimes it can take a long time for the script to load its
dependencies. Sometimes this takes longer than the actual
work that the script is supposed to do. We can stop
waiting for the script to load its dependencies if
the script does its work on a remote instance of R
where
the dependencies have already been loaded. We accomplish this
using the
Rserve
package.
To use Rserve
, create an Rserv.conf
file that contains statements
like this:
eval library(LongLoadingPackage1)
eval library(LongLoadingPackage2)
Replace the package names with the packages your function uses that take a long time to load.
Start Rserve as follows:
R CMD Rserve --vanilla --RS-conf Rserv.conf
Re-run Galaxy on your function, specifying that Rserv should be used:
galaxy("addTwoNumbersWithTest",
galaxyConfig=
GalaxyConfig(galaxyHome, "mytool", "Test Section",
"testSectionId"),
RserveConnection=RserveConnection()
)
Install the RSclient package:
source("http://bioconductor.org/biocLite.R")
biocLite("RSclient", siteRepos="http://www.rforge.net")
Restart Galaxy if it is already running. Your function should be much faster.
You can run Rserve on a different machine (and on a different port)
by passing this information to the RserveConnection()
function:
RserveConnection(host="mymachine", port=2012L)
## An object of class "RserveConnection"
## Slot "host":
## [1] "mymachine"
##
## Slot "port":
## [1] 2012
Note that the other machine should have shared disk space with the machine where you are running Galaxy.
Suppose you have some Affymetrix probe IDs and you want to look up the PFAM and SYMBOL names for them. It's quite easy to write a function to expose this in Galaxy:
probeLookup <-
function(
probe_ids=GalaxyCharacterParam(
required=TRUE,
testValues="1002_f_at 1003_s_at"),
outputfile=GalaxyOutput("probeLookup", "csv"))
{
suppressPackageStartupMessages(library(hgu95av2.db))
ids <- strsplit(probe_ids, " ")[[1]]
results <- select(hgu95av2.db, keys=ids, columns=c("SYMBOL","PFAM"),
keytype="PROBEID")
write.csv(results, file=outputfile)
}
Behind the scenes, we've also written a man page for the
function, and put a test fixture in our package (which can be
found at
inst/functionalTests/probeLookup/outputfile
).
Let's run it and make sure it works:
runFunctionalTest(probeLookup)
## 'select()' returned 1:1 mapping between keys and columns
## [1] TRUE