Get started with dsub on Verily Workbench
Categories:
Prior reading: Workflows in Verily Workbench: Cromwell, dsub, and Nextflow
Purpose: This document provides details for configuring and running a dsub job in a Workbench app.
Introduction
dsub is a command-line tool that allows you to write and run batch computing scripts in the cloud.
dsub is a good option if you're interested in running single-stage workflows and feel comfortable running shell commands.
Configure dsub
dsub comes preinstalled in new Workbench cloud apps. The dsub
command is already added to the PATH
, so you can run dsub
in a Python virtual environment.
To start the venv, run dsub_activate
in the JupyterLab terminal. Once activated, you can install additional packages to the virtual environment. To exit the venv, run deactivate
.
You can view the PATH
to the dsub
venv by running echo ${DSUB_VENV_PATH}
.
Note
Cloud apps created before June 6, 2025 will not havedsub
automatically installed. Please create a new app to use dsub
.
Example dsub workflow
The following command creates an index (BAI file) from a BAM file of DNA sequences from the 1,000 Genomes Project. For the BUCKET_WORKING_DIR
variable, replace <your-gcs-bucket>
with the full path to a GCS bucket you have write access to (e.g., gs://full_bucket_name
).
BUCKET_WORKING_DIR=<your-gcs-bucket>/dsub_example
dsub \
--provider google-batch \
--project "${GOOGLE_CLOUD_PROJECT}" \
--logging "${BUCKET_WORKING_DIR}/logs" \
--service-account "${PET_SA_EMAIL}" \
--network "projects/${GOOGLE_CLOUD_PROJECT}/global/networks/network" \
--subnetwork "projects/${GOOGLE_CLOUD_PROJECT}/regions/us-central1/subnetworks/subnetwork" \
--use-private-address \
--input BAM=gs://genomics-public-data/1000-genomes/bam/HG00114.mapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam \
--output BAI="${BUCKET_WORKING_DIR}/HG00114.mapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam.bai" \
--image quay.io/cancercollaboratory/dockstore-tool-samtools-index \
--command 'samtools index ${BAM} ${BAI}' \
--wait
It can take a few minutes for the job to finish. Once it completes, you should see a SUCCESS
message in the terminal. You can confirm the BAI file has been generated by running the following command:
gcloud storage ls ${BUCKET_WORKING_DIR}
You should see the following output:
${BUCKET_WORKING_DIR}/HG00114.mapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam.bai
Further reading
Learn more about dsub in the dsub GitHub repository.
Last Modified: 16 June 2025