Important note:
This is a work-in-progress project to
update the sevenbridges2
package. Accordingly, this
vignette will also change as new features are implemented.
sevenbridges2
is an R package that provides an interface
for the Seven Bridges public API. The supported platforms include the Seven Bridges Platform, Cancer Genomics Cloud
(CGC), BioData
Catalyst (BDC) and CAVATICA.
Learn more from our documentation on the Seven Bridges Platform, Cancer Genomics Cloud (CGC), BioData Catalyst (BDC) and CAVATICA.
Unlike the current sevenbridges
package that is built on
top of Reference classes,
the sevenbridges2
package is based on more modern and
lightweight R6 classes.
However, the basic idea and way of constructing API requests is largely
preserved.
In order to use the sevenbridges2
package users must
authenticate themselves first by creating an Auth object and providing
necessary credentials. You can read more about the authentication types
in our next chapters.
The sevenbridges2
package only supports v2+ versions of
the API, since versions prior to v2 are not compatible with the Common
Workflow Language (CWL). This package provides a simple interface for
accessing and trying out various methods.
The sevenbridges2
package is available on CRAN and Seven
Bridges Github repository.
To install it from CRAN
, use simply:
# Install package from CRAN
install.packages("sevenbridges2")
To install the development version from the develop
branch on our Github, use the remotes
package:
# Install package from github
remotes::install_github(
"sbg/sevenbridges2",
build_vignettes = TRUE, dependencies = TRUE
)
If you have trouble with pandoc
and do not want to
install it, set build_vignettes = FALSE
to avoid the
vignettes build.
There are two ways of constructing API calls. For instance, you can
use low-level API calls which use arguments like path
,
query
, and body
. These are documented in the
API reference libraries for the Seven
Bridges Platform and the CGC. An
example of a low-level request to “list all projects” is shown below. In
this request, you can also pass query
and body
as a list.
# Load the package
library("sevenbridges2")
# Authenticate
a <- Auth$new(token = "<your_token>", platform = "aws-us")
# List all projects with raw api() function
a$api(path = "projects", method = "GET")
(Advanced user option) The second way of constructing an API request is to directly use the httr2 package to make your API calls.
The sevenbridges2
package is organized by main resources
from the Seven Bridges API reference. There we have groups of endpoints
to work with projects, files, apps, tasks, invoices, volumes, etc. For
each group of resources, there is a set of operations such as
query()
, get()
and delete()
which
are common, as well as other custom operations.
Before we start, keep in mind the following:
offset
and limit
Almost every API call accepts two arguments named offset
and limit
.
By default, offset
is set to 0
and
limit
is set to 50
. As such, your API request
returns the first 50 items when you list items or
search for items by name. To search and list all items, use
complete = TRUE
if you are using the core
api()
function in your API request, or the
all()
operation within the Collection
object
you’ve received as the result.
Collection
Every API call that returns a list of items (usually the output from
query()
operations), operations), like fetching projects,
files, apps etc, wraps the results into a general
Collection
class object containing the items
field from which users may access the items returned. Additional options
that the Collection
class offers are to navigate between
pages of results, for example, to load next or previous page of results
by calling next_page()
and prev_page()
methods.
Moreover, users can fetch all results using Collection
’s
all()
method which is a shortcut to send multiple API calls
for each next page and collect all results. Keep in mind the
limit
used, as well as the API rate limit.
# Create a collection of files
public_files <- a$files$query(project = "admin/sbg-public-data")
# Load next 50 results
public_files$next_page()
# Load previous 50 results
public_files$prev_page()
# Load all results
public_files$all()
Lastly, printing Collection
objects will print the first
10 items (if there are more than 10 items in the results) by default,
but this can be changed with the n
parameter in its
print()
function:
# Create a collection of files
public_files <- a$files$query(project = "admin/sbg-public-data")
# Default print
public_files
# Print 20 items
public_files$print(n = 20)
Search by ID
When searching by ID (usually it’s the resource’s get()
operation), your request will return your exact resource as it is
unique. Therefore, you do not have to set offset
and
limit
manually. It is good practice to find your resources
by their ID and pass this ID as an input to your task. You can find a
resource’s ID in the final part of the URL in the visual interface or
via API requests to list resources or get a resource’s details.
Search by name
Search by name as criteria in the query()
operations of
Resources, returns all exact or partial matches depending on the
resource.
For example, to list all public files, use the
admin/sbg-public-data
project query parameter, while if you
want to find an exact file by name, set its name
parameter
to the exact value (partial search by name is not possible for
files).
# Search all public files
public_files <- a$files$query(project = "admin/sbg-public-data")
# Search files by name
file_1000G_omni <- a$files$query(
project = "admin/sbg-public-data",
name = "1000G_omni2.5.b37.vcf"
)
On the other hand, partial search by name works for Projects and Apps
resources. You can set the corresponding name
or
query_terms
parameters for this use case. In order to query
public apps, set the visibility
parameter to ‘public’.
# Search all public apps containing the STAR term
public_star_apps <- a$apps$query(
visibility = "public",
query_terms = list("STAR")
)
# Search all projects that contain "demo" in the name
demo_projs <- a$projects$query(name = "demo")
Auth
ObjectBefore you can access your account via the API, you have to provide
your credentials. You can obtain your credentials in the form of an “authentication
token” from the Developer Tab under Account
Settings in the visual interface. Once you’ve obtained this,
create an Auth
object, so it remembers your authentication
token and the path for the API. All subsequent requests will use these
two pieces of information.
Let’s load the package first:
# Load package
library("sevenbridges2")
You have three different ways to provide your token. Choose from one of the methods below:
Direct authentication. Here you should
provide your developer token and a base URL for the platform of interest
(alternatively, you can provide the name of the platform - these are the
available options cgc
, aws-us
,
aws-eu
, ali-cn
, cavatica
,
f4c
- the default platform is aws-us
) as
function call arguments to Auth$new()
. This will create the
platform authentication object and temporarily set up your token and
platform base URL as environment variables SB_AUTH_TOKEN
and SB_API_ENDPOINT
. This way, your token will not be
directly stored in the Auth object, but you will still be able to access
it by calling the get_token()
method. Keep in mind that
these environment variables are session-specific and are deleted when
the session ends.
Authentication via system environment
variables. By default this will read the credential information from
two existing system environment variables: SB_API_ENDPOINT
and SB_AUTH_TOKEN
. Of course, assuming that you have
previously set these environment variables. Alternatively, you can
specify the names of the system environment variables you want to be
loaded using the sysenv_token
and sysenv_url
arguments.
Authentication via the user configuration
file. This file, by default
$HOME/.sevenbridges/credentials
, provides an organized way
to collect and manage all your API authentication information for Seven
Bridges platforms.
If you need to be logged into multiple accounts at the same time (which can also be for different platforms), please use either the second or the third method.
Method 1: Direct authentication
This is the most common method to construct the Auth
object. For example:
# Authenticate with direct method
a <- Auth$new(platform = "aws-us", token = "<your-token>")
Method 2: Environment variables
To set the two environment variables in your system, you could use
the function sbg_set_env()
. For example:
# Set environment variables
sevenbridges2:::sbg_set_env(
url = "https://api.sbgenomics.com/v2/",
token = "<your_token>"
)
Note that these environment variables are session-specific.
Create an Auth
object:
# Authenticate using environment variables
a <- Auth$new(from = "env")
Method 3: User configuration file
Assume we have already created the configuration file named
credentials
under the directory
$HOME/.sevenbridges/
:
[aws-us-<username>]
api_endpoint = https://api.sbgenomics.com/v2
auth_token = token_for_this_user
# another user on the same platform
[aws-us-rosalind-franklin]
api_endpoint = https://api.sbgenomics.com/v2
auth_token = token_for_this_user
[cgc]
api_endpoint = https://cgc-api.sbgenomics.com/v2
auth_token = token_for_this_user
[bdc]
api_endpoint = https://api.sb.biodatacatalyst.nhlbi.nih.gov/v2/
auth_token = token_for_this_user
To load the user profile aws-us-<username>
from
this configuration file, simply use:
# Load aws-us-<username> profile for authentication
a <- Auth$new(
from = "file",
profile_name = "aws-us-<username>"
)
If profile_name
is not specified, we will try to load
the profile named [default]
:
# Load default profile
a <- Auth$new(from = "file")
The option based on the use of a configuration file also enables
simultaneous authentication from multiple accounts. Assuming that we
have a configuration file like the one listed above, and that we want to
create authentication objects for two profiles (default
and
aws-us-<username>
), we can achieve this in the
following way:
# Create Auth object with 'default' account
a <- Auth$new(from = "file", profile_name = "default")
# Create Auth object with 'aws-us-<username>' account
b <- Auth$new(from = "file", profile_name = "aws-us-<username>")
Note: API paths (base URLs) differ for each Seven Bridges environment. Be sure to provide the correct path for the environment you are using. API paths for some of the environments are:
Platform Name | API Base URL | Short Name |
---|---|---|
Seven Bridges Platform (US) | https://api.sbgenomics.com/v2 |
"aws-us" |
Seven Bridges Platform (EU) | https://eu-api.sbgenomics.com/v2 |
"aws-eu" |
Seven Bridges Platform (China) | https://api.sevenbridges.cn/v2 |
"ali-cn" |
Cancer Genomics Cloud (CGC) | https://cgc-api.sbgenomics.com/v2 |
"cgc" |
Cavatica | https://cavatica-api.sbgenomics.com/v2 |
"cavatica" |
BioData Catalyst Powered by Seven Bridges | https://api.sb.biodatacatalyst.nhlbi.nih.gov/v2 |
"f4c" |
Please check
vignette("Authentication_and_Billing", package = "sevenbridges2")
for more technical details about all available authentication
methods.
This call returns information about your account.
# Get currently authenticated user info
a$user()
── User ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
• country: United States
• affiliation: SBG
• last_name: Test
• first_name: User
• email: <user>@sbgenomics.com
• username: <username>
• href: https://api.sbgenomics.com/v2/users/<user>
Get information about a user
This call returns information about the specified user. Note that currently you can view only your own user information, so this call is equivalent to the call to get information about your account.
# Get user info
a$user(username = "<username>")
Please check
vignette("Authentication_and_Billing", package = "sevenbridges2")
for more technical details about getting user information.
This call returns information about your current rate limit. This is the number of API calls you can make in five minutes. This call also returns information about your current instance limit.
# Get rate limit info
a$rate_limit()
── Rate Limit ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
• rate
• limit: 1000
• remaining: 1000
• reset: 2022-12-26 11:31:01 CET
• instance
• limit: 25
• remaining: 25
Please check
vignette("Authentication_and_Billing", package = "sevenbridges2")
for more technical details about rate limit information.
Each project must have a Billing Group associated with it. This Billing Group pays for the storage and computation in the project.
For example, your first project(s) were created with the free funds from the Pilot Funds Billing Group assigned to each user at sign-up.
To get information about your billing groups:
# Check your billing info
a$billing_groups$query()
This call lists all your billing groups, including groups that are pending or have been disabled.
To get information about your invoices:
# Check your invoices
a$invoices$query()
The call returns information about all your available invoices,
unless you use the billing_group_id
query parameter to
specify the ID of a particular billing group, in which case it will
return the invoice incurred by that billing group only.
To get detailed information for a specific billing group, please use the billing_group method with the billing group ID. The information returned includes the billing group owner, the total balance, and the status of the billing group (pending, disabled,…).
# Get a single billing group
a$billing_groups$get(id = "<billing_group_id>")
── Billing group info ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
• disabled: FALSE
• pending: FALSE
• type: regular
• name: My billing group
• owner: <bg_owner's_username>
• id: <billing_group_id>
• href: https://api.sbgenomics.com/v2/billing/groups/<billing_group_id>
• balance
• currency: USD
• amount: 221
Please check
vignette("Authentication_and_Billing", package = "sevenbridges2")
for more technical details about billing informations.
Projects are the core building blocks of the platform. Each project corresponds to a distinct scientific investigation, serving as a container for its data, analysis tools, results, and collaborators.
In order to query and explore all projects, use the
projects
resource path and the query()
method.
One can also filter the projects by several criteria, like project’s
name and tags. The search by name is partial and case-insensitive.
# List first 5 projects
my_projects <- a$projects$query(limit = 5)
my_projects
# Load next page of results
my_projects$next_page()
# Return all projects that contain the term "demo"
demo_projects <- a$projects$query(name = "demo")
# Return all projects tagged with "demo"
tagged_projects <- a$projects$query(tags = list("demo"))
Note that the output is the Collection
object and the
results (list of Project
objects) can be found within the
items
field.
Please check
vignette("Projects_and_Tasks_execution", package = "sevenbridges2")
for more technical details about projects.
Create a new project called “API testing” with the billing group
id
obtained above.
# List all available billing groups for currently logged in user
a$billing_groups$query()
# Set the billing group for the new project
bid <- "<billing_group_id>"
# Create a new project
p <- a$projects$create(
name = "API testing", billing_group = bid,
description = "This project has been created using the sevenbridges2 R API
library."
)
── Project ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
• category: PRIVATE
• root_folder: <root_folder_id>
• type: v2
• description: This project has been created using the sevenbridges2 R API library.
• billing_group: <billing_group_id>
• name: API testing
• id: <your_username_or_division>/api-testing
• href: https://api.sbgenomics.com/v2/projects/<your_username_or_division>/api-testing
• settings
• locked: FALSE
• controlled: FALSE
• location: aws:us-east-1
• use_interruptible_instances: TRUE
• use_memoization: FALSE
• intermediate_files: list(duration = 24, retention = "LIMITED")
• allow_network_access: TRUE
• use_elastic_disk: FALSE
The new project is created on the platform. Notice
also that the variable p
is an R6 object with fields that
contain information about the platform project. The facility also has
several methods that allow you to perform basic platform operations on
the project.
Please check
vignette("Projects_and_Tasks_execution", package = "sevenbridges2")
for more technical details about projects.
Use the get()
method and provide the full ID of the
project you would like to fetch.
# Get a single project by ID
a$projects$get(id = "<your_username_or_division>/api-testing")
── Project ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
• category: PRIVATE
• root_folder: <root_folder_id>
• type: v2
• description: This project has been created using the sevenbridges2 R API library.
• billing_group: <billing_group_id>
• name: API testing
• id: <your_username_or_division>/api-testing
• href: https://api.sbgenomics.com/v2/projects/<your_username_or_division>/api-testing
• settings
• locked: FALSE
• controlled: FALSE
• location: aws:us-east-1
• use_interruptible_instances: TRUE
• use_memoization: FALSE
• intermediate_files: list(duration = 24, retention = "LIMITED")
• allow_network_access: TRUE
• use_elastic_disk: FALSE
Please check
vignette("Projects_and_Tasks_execution", package = "sevenbridges2")
for more technical details about apps.
Seven Bridges maintains workflows and tools available to all of its users in the Public Apps repository.
To find out more about public apps, you can do the following:
sevenbridges2
package to find it, as
shown below.# Search by name matching, with limit 10
public_apps <- a$apps$query(
visibility = "public",
limit = 10,
query_terms = list("STAR")
)
# Search by ID
star_app <- a$apps$get(
id = "admin/sbg-public-data/rna-seq-alignment-star/0"
)
Now, copy the App your project
with a new
name
, following this logic.
# Copy app into the project
a$apps$copy(
app = star_app,
project = "<username_or_division>/api-testing",
name = "New copy of STAR"
)
# Check if it is copied
p <- a$projects$get(id = "<username_or_division>/api-testing")
# List the apps you have in your project
p$list_apps()
The short name is changed to newcopyofstar
.
== App ==
id : <username_or_division>/api-testing/newcopyofstar/0
name : RNA-seq Alignment - STAR
project : <username_or_division>/api-testing
revision : 0
Alternatively, you can copy it from the App
object.
# Get public app RNA Sequencing alignment - STAR
star_app <- a$apps$get(
id = "admin/sbg-public-data/rna-seq-alignment-star/0"
)
# Copy it into a project
star_app$copy(
project = "<username_or_division>/api-testing",
name = "Copy of STAR"
)
Next, we would like to run a task with this app. Let’s see what is required.
Please check
vignette("Projects_and_Tasks_execution", package = "sevenbridges2")
for more technical details about tasks.
Once you have copied the public app
admin/sbg-public-data/rna-seq-alignment-star/0
into your
project, <username>/api-testing
, the app
id
in your current project is
<username>/api-testing/newcopyofstar
. Alternatively,
you can use another app you already have in your project for this
Quickstart.
To draft a new task, you need to specify the following:
id
of the workflow you are
executingYou can always check the App details on the visual interface for task
input requirements. However, there is also a function on the App objects
to get basic information about app’s inputs and outputs. To find the
required inputs with R, you need to get an App
object
first.
Let’s check which inputs this app requires by calling the
input_matrix()
function and bring them into our
project.
# Fetch copied app
copied_star_app <- a$apps$get(
id = "<username_or_division>/api-testing/newcopyofstar/0"
)
# Preview its inputs
copied_star_app$input_matrix()
Locate the IDs of the required inputs. Note that task inputs need to match the expected data type and name. In the above example, we see two required fields:
We also want to provide a gene feature file:
You can find a list of possible input types below:
File
), other inputs take
more than one file (File
arrays, FilesList
, or
‘File...
’ ). This input requires you to pass a single
File
object (for a single file input) or list of
File
objects (for inputs which accept more than one file).
You can search for your file by id
or by name
,
as shown in the example below.# Get reads (fastq) files and and copy them into a project
reads_1 <- a$files$get(id = "641c48c425ed1842bd0bf799") # file id
reads_1$copy_to(project = p)
reads_2 <- a$files$get(id = "641c48c425ed1842bd0bf835") # file id
reads_2$copy_to(project = p)
# Get a single file reference file and copy into a project
fasta_in <- a$files$get(id = "641c48c525ed1842bd0bf86a") # file id
fasta_in$copy_to(project = p)
# Get gtf file and copy into a project
gtf_in <- a$files$get(id = "641c48c425ed1842bd0bf825") # file id
gtf_in$copy_to(project = p)
# Get copied files
input_files <- p$list_files()$items
# Add new tasks
taskName <- paste0("STAR-alignment ", date())
tsk <- p$create_task(
name = taskName,
description = "STAR test",
app = copied_star_app,
inputs = list(
"fastq" = c(input_files[[1]], input_files[[2]]),
"genomeFastaFiles" = input_files[[3]],
"sjdbGTFfile" = list(input_files[[4]])
)
)
# Preview task
tsk$print()
Similarly as with inputs, you can also preview the structure of the
expected outputs of the task or workflow. You can get details about the
output’s name, description and type using output_matrix()
.
This function can be called from the App object.
# Get app's outputs details
copied_star_app$output_matrix()
Please check
vignette("Projects_and_Tasks_execution", package = "sevenbridges2")
for more technical details about tasks.
Now, we are ready to run our task.
# Run your task
tsk$run()
Before you run your task, you can adjust your draft task if you have any final modifications.
# Update task
tsk$update(description = "New RNA SEQ Alignment - STAR task")
After you run a task, you can track its status by refreshing the
object with reload()
function.
# Reload task
tsk$reload()
tsk$status
You can also abort the task execution if needed:
# Abort your task
tsk$abort()
If you want to rerun your task without any modifications, you can use
rerun()
function which will clone the current task for you
and start the execution immediately.
# Rerun your task
tsk$rerun()
On the other side, if you want to update your task first and then re-run it, you should clone the current task, update it and then run it, as demonstrated below:
# First clone existing task
cloned_task <- tsk$clone_task()
# Then, update GTF input file in the cloned task
cloned_task$update(inputs = list(sjdbGTFfile = "<some new file>"))
cloned_task$run()
Alternatively, you can delete the draft task if you no longer wish to run it.
# # not run
# tsk$delete()
Please check
vignette("Projects_and_Tasks_execution", package = "sevenbridges2")
for more technical details about running tasks.
Running tasks with spot instances could potentially reduce a considerable amount of computational cost. This option can be controlled on the project level or the task level on Seven Bridges platforms. Our package follows the same logic as our platform’s web interface (the current default setting for spot instances is on).
For example, when we create a project using Projects resource’s
method create()
, we can set
use_interruptible = FALSE
to use on-demand instances
(non-interruptible but more expensive) instead of the spot instances
(interruptible but cheaper):
# Create project with disabled spot instances
p <- a$projects$create(
name = "spot-disabled-project", bid, description = "spot disabled project",
use_interruptible = FALSE
)
Then all the new tasks created under this project will use on-demand
instances to run by default, unless an argument
use_interruptible_instances
is specifically set to
TRUE
when drafting the new task using Tasks resource method
create()
.
For example, if p
is the above spot disabled project, to
draft a task that will use spot instances to run:
# Create task and set usage of interruptible instances to TRUE
tsk <- p$create_task(
name = paste0("spot enabled task in a spot disabled project"),
description = "spot enabled task",
app = copied_star_app,
inputs = list(
"fastq" = c(input_files[[1]], input_files[[2]]),
"genomeFastaFiles" = input_files[[3]],
"sjdbGTFfile" = list(input_files[[4]])
),
use_interruptible_instances = TRUE
)
Conversely, you can have a spot instance enabled project, but draft
and run specific tasks using on-demand instances, by setting
use_interruptible_instances = FALSE
in
create_task()
explicitly.
Please check
vignette("Projects_and_Tasks_execution", package = "sevenbridges2")
for more technical details about running tasks using spot instances.
During workflow development and benchmarking, sometimes we need to view and make adjustments to the computational resources needed for a task to run more efficiently. Also, if a task fails due to resource deficiency, we often want to define a larger instance for the task re-run without editing the app itself. This is particularly important in cases where there is not enough disk space.
The Seven Bridges API allows setting specific task execution
parameters by using execution_settings
. It includes the
instance type (instance_type
) and the maximum number of
parallel instances (max_parallel_instances
):
# Create task with setting instance type and number of parallel instances
tsk <- p$create_task(
...,
execution_settings = list(
instance_type = "c4.2xlarge;ebs-gp2;2000",
max_parallel_instances = 2
)
)
For details about execution_settings
, please check create
a new draft task.
Please check
vignette("Projects_and_Tasks_execution", package = "sevenbridges2")
for more technical details about execution hints.
Now let’s do a batch with 4 files in 2 groups, which is batched by
metadata sample_id
. We will assume each file has this
metadata field entered. Since these files can be evenly grouped into 2,
we will have a single parent batch task with 2 child tasks.
# Add two more fastq files that will be used in our task inputs
# and copy them into our API testing project
reads_3 <- a$files$get(id = "641c48c425ed1842bd0bf7b6") # file id
reads_3$copy_to(project = p)
reads_4 <- a$files$get(id = "641c48c425ed1842bd0bf7a5") # file id
reads_4$copy_to(project = p)
# Get all project files
input_files <- p$list_files()$items
taskName <- paste0("STAR-alignment ", date())
# Create task with batch criteria
tsk <- p$create_task(
name = taskName,
description = "Batch Star Test",
app = copied_star_app,
batch = TRUE,
batch_input = "fastq",
batch_by = list(
type = "CRITERIA",
criteria = list("metadata.sample_id")
),
inputs = list(
"fastq" = c(
input_files[[1]],
input_files[[2]],
input_files[[3]],
input_files[[4]]
),
"genomeFastaFiles" = input_files[[5]],
"sjdbGTFfile" = list(input_files[[6]])
)
)
# Run batch task
tsk$run()
Now you have a draft batch task. Please check it out in the visual interface. Your response body should inform you of any errors or warnings.
You can also check the parent task’s children status with
list_batch_children()
method and then for each child
execution details:
# List parent task children and their execution details
child_tasks <- tsk$list_batch_children()
child1_details <- child_tasks$items[[1]]$get_execution_details()
child2_details <- child_tasks$items[[2]]$get_execution_details()
Please check
vignette("Projects_and_Tasks_execution", package = "sevenbridges2")
for more technical details about running batch tasks.