Get started with Nextflow on Verily Workbench
Categories:
Prior reading: Workflows in Verily Workbench: Cromwell, dsub, and Nextflow
Purpose: This document provides detailed instructions for configuring and running Nextflow pipelines in Verily Workbench.
Introduction
Be aware
Cloud Life Sciences API is deprecated and will shutdown on July 8, 2025.
Workbench users can instead use Google Batch API. Currently, Batch API has limited support on Verily Workbench GCE VMs. As a result, workflow logs will not be available. Nextflow does not yet support logging directly to a bucket.
Nextflow is a framework for creating data-driven computational pipelines. It allows you to create scalable and reproducible scientific workflows using software containers.
To get set up, you will:
- Create a Workbench workspace
- Create resources in the workspace
- Create a cloud app in the workspace on which to run Nextflow
The following sections walk you through that setup, then show how wb
makes it easy to configure
and run a Nextflow pipeline. This tutorial has two examples; one shows running an example pipeline
from the nextflow-io
GitHub org, and one shows how to run a nf-core pipeline, where the
nf-core project provides a community-generated, curated, collection of analysis
pipelines built using Nextflow.
Both examples include configurations for
Google Cloud Life Sciences API
and Google Batch API as the Nextflow pipeline
process.executor
. This allows the pipelines to be run at scale, with Nextflow processes executed
on separate cloud virtual machines.
Note
On Verily Workbench, the easiest way to get started using Nextflow is via an app, where Nextflow and the Workbench CLI (command-line interface) is already installed for you. This tutorial walks through that process. If you like, you can also use a local installation of the Workbench CLI.1. Create a workspace
If you don't already have a Workbench workspace that you want to use, you can create one via either the Workbench CLI or the web UI.
To create a workspace via the web UI, see the instructions here.
First, check wb status
. If no workspace is set, or
you are not logged in, first log in and set the workspace that you want to work in. Otherwise,
create a new workspace.
wb status
wb auth login # if need be
wb workspace list
To create a new workspace:
wb workspace create –name=<workspace-name>
To set the Workbench CLI to use an existing workspace:
wb workspace set –id=<workspaceid>
2. Create workspace resources: GitHub repos and a Cloud Storage bucket
If you haven't already, you'll need to create a Cloud Storage bucket resource, which will be used by Nextflow for staging and logging.
We'll also create a Git resource that points to the Nextflow example repo. Any notebook instances that you subsequently create in your workspace will automatically clone that repo for you.
To create a Cloud Storage
bucket resource via the web UI, see the instructions
here.
Note the name of this resource, which you'll need below. E.g., name it nf_files
.
Then, create referenced resources for
the example Git repositories, as described
here.
The repository URLs to use are:
- https://github.com/nextflow-io/rnaseq-nf.git. Give it the name:
rnaseq-nf-repo
. - https://github.com/nf-core/configs.git. Give it the name:
nf-core-configs
These repositories are public, so for this example, you don't need to set up the Workbench SSH keys.
If you do not already have a bucket resource
that you want to use, you can create one as follows. The name of this resource will be nf_files
.
wb resource create gcs-bucket --id=nf_files \
--description="Bucket for Nextflow run logs and output."
Then, create referenced resources to the Git repositories we'll use for these examples:
wb resource add-ref git-repo --id=rnaseq-nf-repo --repo-url=https://github.com/nextflow-io/rnaseq-nf.git
wb resource add-ref git-repo --id=nf-core-configs --repo-url=https://github.com/nf-core/configs.git
You can list your newly created resources:
wb resource list
3. Create an app to run Nextflow
Next, create a Workbench notebook app on which to run the Nextflow examples.
Your app will have Nextflow pre-installed, and any workspace
Git repo resources — such as the one we
just defined — will be automatically cloned. However, if you want to run the example on your local
machine, you can install
nextflow
yourself.
To create a cloud app via the web UI, see the instructions here.
Once your app is running, you can click on the link next to it to bring up JupyterLab on the new app
instance. To reduce costs, you can STOP
the instance from its 'three-dot' menu, when you're not
using it, and restart it again later.
Create a new app:
wb app create gcp --app-config=<config_type> \
--id=<notebook_resource_id> \
--description=<description>
After your notebook resource is created, you can see its details via:
wb resource describe --id <notebook_resource_id>
Included in that description is a Proxy URL. Visit that URL in your browser (logged in with your Workbench user account) to bring up JupyterLab on your new app.
Tip: The info in the resource description also indicates whether your app is
RUNNING
orTERMINATED
. You can stop your app when you’re not using it, and then restart it, via:
>wb app stop --id <notebook_resource_id>
and
>wb app start --id <notebook_resource_id>
.
Using Workbench environment variables in Nextflow config files
Workbench supports running Nextflow via a 'passthrough' command, e.g. wb nextflow ...
. When
you use this construct you are able to add Workbench-specific environment variables to
Nextflow configuration files. For example, you can use the $WORKBENCH_<bucket_resource_name>
construct, and it will be expanded to gs://<underlying_GCS_bucket>
. In addition, in a
Workbench app, variables like $GOOGLE_SERVICE_ACCOUNT_EMAIL
and $GOOGLE_CLOUD_PROJECT
will
be set.
In the examples below, we'll leverage this capability when we create the Nextflow config files.
Configure and run the example 'rnaseq-nf' Nextflow pipeline
Because you created a Git repo resource,
the rnaseq-nf example should be automatically cloned into your new
app, and you should see its directory,
rnaseq-nf
, at the top level of your file system. (If it is not there, you can
run:wb git clone --resource=rnaseq-nf-repo
).
In the JupyterLab Terminal window, change to
the rnaseq-nf
directory and check out the tagged v2.1
version.
cd rnaseq-nf
git checkout v2.1
Edit the Nextflow configuration file
You can determine how a workflow is run by specifying the executor. The configuration for two executors are given below.
In the rnaseq-nf
directory, edit the nextflow.config
file. Replace the entry corresponding with
your chosen job executor with the snippet given in the following sections. Edit the
workDir
line to replace <your_bucket_resource_name>
with the name of the Cloud
Storage bucket resource you created.
As you'll see below, you will run the Nextflow
pipeline via wb
, and wb
will substitute the
correct values for the environment variables in the config before it runs.
If you’ve forgotten the name of the bucket resource you created, you can find it via:
wb resource list # this shows the resource names
or in the Resources tab of your workspace.
Google Cloud Life Sciences API (deprecated)
Update the gls
entry in nextflow.config
.
gls {
// Workflow params
params.transcriptome = 'gs://rnaseq-nf/data/ggal/transcript.fa'
params.reads = 'gs://rnaseq-nf/data/ggal/gut_{1,2}.fq'
params.multiqc = 'gs://rnaseq-nf/multiqc'
// Google Life Sciences config
process.executor = 'google-lifesciences'
process.container = 'nextflow/rnaseq-nf:latest'
// Edit the following line for your bucket resource
workDir = "$WORKBENCH_<your_bucket_resource_name>/nf"
google.location = 'us-central1'
google.region = 'us-central1'
google.project = "$GOOGLE_CLOUD_PROJECT"
google.lifeSciences.usePrivateAddress = true
google.lifeSciences.network = 'network'
google.lifeSciences.subnetwork = 'subnetwork'
google.lifeSciences.serviceAccountEmail = "$GOOGLE_SERVICE_ACCOUNT_EMAIL"
}
In a terminal window, change to the parent directory of rnaseq-nf
(~/repos
if you followed the
instructions above), and sanity-check your config changes.
cd ..
wb nextflow config rnaseq-nf/main.nf -profile gls
You should see output that shows instantiated values for your workspace project, Cloud Storage bucket, and service account email.
Google Batch API
Update the google-batch
entry in nextflow.config
.
'google-batch' {
// Workflow params
params.transcriptome = 'gs://rnaseq-nf/data/ggal/transcript.fa'
params.reads = 'gs://rnaseq-nf/data/ggal/gut_{1,2}.fq'
params.multiqc = 'gs://rnaseq-nf/multiqc'
// Google Batch config
process.executor = 'google-batch'
process.container = 'nextflow/rnaseq-nf:latest'
// Edit the following line for your bucket resource
workDir = "$WORKBENCH_mybucket/scratch"
google.region = 'us-east1'
google.project = "$GOOGLE_CLOUD_PROJECT"
google.batch.serviceAccountEmail = "$GOOGLE_SERVICE_ACCOUNT_EMAIL"
google.batch.usePrivateAddress = true
google.batch.network = 'global/networks/network'
google.batch.subnetwork = 'regions/us-east1/subnetworks/subnetwork'
}
In a terminal window, change to the parent directory of rnaseq-nf
(~/repos
if you followed the
instructions above), and sanity-check your config changes.
cd ..
wb nextflow config rnaseq-nf/main.nf -profile google-batch
You should see output that shows instantiated values for your workspace project, Cloud Storage bucket, and service account email.
Run the Nextflow example workflow via wb
After you check your config, you’re ready to run the Nextflow example. Select the appropriate job
executor as the profile. The following command selects Google Batch API. For Cloud Life Sciences
API, specify -profile gls
.
In the parent directory of rnaseq-nf
, run:
wb nextflow run rnaseq-nf/main.nf -profile google-batch
The workflow will take about 10 minutes to complete.
Configure and run a nf-core example
The nf-core project provides a curated set of analysis pipelines built using Nextflow. The nf-core pipelines adhere to strict guidelines— so if one works for you, any of them should. Once your config file is set up, you should be able to test any nf-core pipeline.
Edit the nf-core google.config
configuration file
In the left-hand File navigator, navigate to the ~/repos/nf-core-configs/conf
directory. Find the
config file corresponding to your chosen job executor. (If you gave the repo a different name in
Step #2, find its folder instead under repos
).
In the config file, edit the google_bucket
line to replace <your_bucket_name>
with the name of
the Cloud Storage bucket resource you will be using. E.g., if you used the suggested bucket name
in Step #2, <your_bucket_name>
would be replaced with nf_files
.
As you'll see in the following step, you will run the Nextflow
job via wb
, and wb
will substitute the correct
value for the environment variables in the config before it runs.
Google Cloud Life Sciences API (deprecated)
The configuration file is google.conf
.
// Nextflow config file for running on Google Cloud Life Sciences
// Edit the 'google_bucket' param before using.
params {
config_profile_description = 'Google Cloud Life Sciences Profile'
config_profile_contact = 'Evan Floden, Seqera Labs (@evanfloden)'
config_profile_url = 'https://cloud.google.com/life-sciences'
google_zone = 'us-central1-c'
google_bucket = "$WORKBENCH_<your_bucket_name>/nf-core"
google_debug = true
google_preemptible = true
boot_disk = '100 GB'
workers_service_account = "$GOOGLE_SERVICE_ACCOUNT_EMAIL"
project_id = "$GOOGLE_CLOUD_PROJECT"
}
process.executor = 'google-lifesciences'
google.zone = params.google_zone
google.project = params.project_id
google.lifeSciences.serviceAccountEmail = params.workers_service_account
google.lifeSciences.usePrivateAddress = true
google.lifeSciences.debug = params.google_debug
workDir = params.google_bucket
google.lifeSciences.preemptible = params.google_preemptible
google.lifeSciences.network = 'network'
google.lifeSciences.subnetwork = 'subnetwork'
if (google.lifeSciences.preemptible) {
process.errorStrategy = { task.exitStatus in [8,10,14] ? 'retry' : 'terminate' }
process.maxRetries = 5
}
process.machineType = { task.memory > task.cpus * 6.GB ? ['custom', task.cpus, task.cpus * 6656].join('-') : null }
Google Batch API
The configuration file is googlebatch.conf
.
// Nextflow config file for running on Google Batch API
// Edit the 'google_bucket' param before using.
params {
config_profile_description = 'Google Cloud Batch API Profile'
config_profile_contact = 'Hatem Nawar @hnawar'
config_profile_url = 'https://cloud.google.com/batch'
google_location = 'us-central1'
google_zone = 'us-central1-c'
google_bucket = "$WORKBENCH_<your_bucket_name>/nf-core"
google_debug = true
google_preemptible = true
//networking
use_private_ip = true
// Custom VPC should be in this format 'global/networks/[custom_VPC]'
custom_vpc = 'global/networks/network'
//Custom subnet should be in this format 'regions/[GCP_Region]/subnetworks/[custom_subnet]'
custom_subnet = 'regions/us-central1/subnetworks/subnetwork'
boot_disk = '100 GB'
workers_service_account = "$GOOGLE_SERVICE_ACCOUNT_EMAIL"
project_id = "$GOOGLE_CLOUD_PROJECT"
}
workDir = params.google_bucket
google {
zone = params.google_zone
location = params.google_location
project = params.project_id
batch.network = params.custom_vpc
batch.subnetwork = params.custom_subnet
batch.usePrivateAddress = params.use_private_ip
batch.debug = params.google_debug
batch.serviceAccountEmail = params.workers_service_account
batch.bootDiskSize = params.boot_disk
batch.preemptible = params.google_preemptible
}
process.executor = 'google-batch'
if (google.batch.preemptible) {
process.errorStrategy = { task.exitStatus in [8,10,14] ? 'retry' : 'terminate' }
process.maxRetries = 5
}
process.machineType = { task.memory > task.cpus * 6.GB ? ['custom', task.cpus, task.cpus * 6656].join('-') : null }
Run a nf-core pipeline via wb
When you chose a NF-core pipeline to run, the pipeline definition
will automatically be fetched (and stored under: ~/.nextflow/assets/nf-core
). For every pipeline,
the test
profile can be used in conjunction with the google
profile (or any other config) to run
the pipeline with some test data.
For this example, you'll run the viralrecon pipeline, which does assembly and intrahost/low-frequency variant calling for viral samples.
You can first confirm your config by running the following command in the Terminal. This will check
out the given pipeline from its repo and make it available locally if need be. Run this command
from the nf-core-config
(repository checkout) directory.
Choose the appropriate job executor for the profile. The following commands select Google Batch API.
For Cloud Life Sciences API, specify -profile test,google
.
cd ~/repos/nf-core-config
wb nextflow config nf-core/viralrecon -profile test,google-batch
Then, run the pipeline, still in the nf-core-config
repo checkout directory in the Terminal.
Before you run the following command, edit it to replace <your_bucket_name>
with your bucket
resource name in the --outdir
param. If you used the suggested bucket name in Step #2,
<your_bucket_name>
would be replaced with nf_files
. The outdir
holds run results, so each time
you run the pipeline, use a different 'outdir' path.
# Edit the 'outdir' bucket name first
wb nextflow run nf-core/viralrecon -profile test,google-batch --outdir '$WORKBENCH_<your_bucket_name>'/viralrecon_outdir1
Summary
This tutorial showed two examples of running Nextflow pipelines on Workbench. One was an example from https://github.com/nextflow-io/, and the other showed how to set up and use a nf-core config file for any nf-core pipeline.
Workbench makes it easy to set up config files that need minimal editing to work in any Workbench app and to run pipeline tasks scalably in the cloud.
Last Modified: 20 June 2025