Get started with dsub on Verily Workbench
Categories:
Prior reading: Workflows in Verily Workbench: Cromwell, dsub, and Nextflow
Purpose: This document provides details for configuring and running a dsub job in a Workbench app.
Introduction
dsub is a command-line tool that allows you to write and run batch computing scripts in the cloud.
dsub is a good option if you're interested in running single-stage workflows and feel comfortable running shell commands.
Configure dsub
dsub comes preinstalled in new Workbench cloud apps. The dsub command is already added to the PATH, so you can run dsub in a Python virtual environment.
To start the venv, run dsub_activate in the JupyterLab terminal. Once activated, you can install additional packages to the virtual environment. To exit the venv, run deactivate.
You can view the PATH to the dsub venv by running echo ${DSUB_VENV_PATH}.
Note
Cloud apps created before June 6, 2025 will not havedsub automatically installed. Please create a new app to use dsub.
Example dsub workflow
The following command creates an index (BAI file) from a BAM file of DNA sequences from the 1,000 Genomes Project. For the BUCKET_WORKING_DIR variable, replace <your-gcs-bucket> with the full path to a GCS bucket you have write access to (e.g., gs://full_bucket_name).
BUCKET_WORKING_DIR=<your-gcs-bucket>/dsub_example
dsub \
--provider google-batch \
--project "${GOOGLE_CLOUD_PROJECT}" \
--logging "${BUCKET_WORKING_DIR}/logs" \
--service-account "${PET_SA_EMAIL}" \
--network "projects/${GOOGLE_CLOUD_PROJECT}/global/networks/network" \
--subnetwork "projects/${GOOGLE_CLOUD_PROJECT}/regions/us-central1/subnetworks/subnetwork" \
--use-private-address \
--input BAM=gs://genomics-public-data/1000-genomes/bam/HG00114.mapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam \
--output BAI="${BUCKET_WORKING_DIR}/HG00114.mapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam.bai" \
--image quay.io/cancercollaboratory/dockstore-tool-samtools-index \
--command 'samtools index ${BAM} ${BAI}' \
--wait
It can take a few minutes for the job to finish. Once it completes, you should see a SUCCESS message in the terminal. You can confirm the BAI file has been generated by running the following command:
gcloud storage ls ${BUCKET_WORKING_DIR}
You should see the following output:
${BUCKET_WORKING_DIR}/HG00114.mapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam.bai
Further reading
Learn more about dsub in the dsub GitHub repository.
Last Modified: 16 June 2025