Get started with dsub on Verily Workbench

How to run a dsub job in Workbench

Prior reading: Workflows in Verily Workbench: Cromwell, dsub, and Nextflow

Purpose: This document provides details for configuring and running a dsub job in a Workbench app.



Introduction

dsub is a command-line tool that allows you to write and run batch computing scripts in the cloud.

dsub is a good option if you're interested in running single-stage workflows and feel comfortable running shell commands.

Configure dsub

dsub comes preinstalled in new Workbench cloud apps. The dsub command is already added to the PATH, so you can run dsub in a Python virtual environment.

To start the venv, run dsub_activate in the JupyterLab terminal. Once activated, you can install additional packages to the virtual environment. To exit the venv, run deactivate.

You can view the PATH to the dsub venv by running echo ${DSUB_VENV_PATH}.

Example dsub workflow

The following command creates an index (BAI file) from a BAM file of DNA sequences from the 1,000 Genomes Project. For the BUCKET_WORKING_DIR variable, replace <your-gcs-bucket> with the full path to a GCS bucket you have write access to (e.g., gs://full_bucket_name).

BUCKET_WORKING_DIR=<your-gcs-bucket>/dsub_example
dsub \
    --provider google-batch \
    --project "${GOOGLE_CLOUD_PROJECT}" \
    --logging "${BUCKET_WORKING_DIR}/logs" \
    --service-account "${PET_SA_EMAIL}" \
    --network "projects/${GOOGLE_CLOUD_PROJECT}/global/networks/network" \
    --subnetwork "projects/${GOOGLE_CLOUD_PROJECT}/regions/us-central1/subnetworks/subnetwork" \
    --use-private-address \
    --input BAM=gs://genomics-public-data/1000-genomes/bam/HG00114.mapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam \
    --output BAI="${BUCKET_WORKING_DIR}/HG00114.mapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam.bai" \
    --image quay.io/cancercollaboratory/dockstore-tool-samtools-index \
    --command 'samtools index ${BAM} ${BAI}' \
    --wait

It can take a few minutes for the job to finish. Once it completes, you should see a SUCCESS message in the terminal. You can confirm the BAI file has been generated by running the following command:

gcloud storage ls ${BUCKET_WORKING_DIR}

You should see the following output:

${BUCKET_WORKING_DIR}/HG00114.mapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam.bai

Further reading

Learn more about dsub in the dsub GitHub repository.

Last Modified: 16 June 2025