Workflows in Verily Workbench: Cromwell, dsub, and Nextflow
Categories:
Prior reading: Workflows overview
Purpose: This document provides information on creating workflows in Verily Workbench using Cromwell, dsub, and Nextflow.
Introduction
Verily Workbench enables you to run workflows at scale using tools that are popular across the life sciences community. In this document, you’ll discover how to choose the right workflow engine for your work, and where you can find and run existing workflows. This document provides architectural insights into how tools such as Cromwell, dsub, and Nextflow execute within the Workbench environment so that you can run workflows on your data at scale.
Choosing a workflow engine
Though Workbench provides first-class support for running WDL-based workflows on Cromwell, users are not locked into this technology. With multiple workflow engines available, how do you choose which one to use? Perhaps you want to make use of Verily Workbench's built-in support for Cromwell; in other cases the decision may be made for you by:
- Standardized engine(s) selected by your organization or team
- Desired tools with existing workflows written for a particular engine
There are a few important things to know about choosing an engine for your work.
Single-stage workflows (tasks)
- Are you only looking to run single-stage "tasks" at scale?
- Are you comfortable with running shell commands, but less comfortable learning new domain-specific languages, such as WDL or Nextflow scripting?
In either of the above cases, dsub may be the right choice for you. dsub is more of a task runner rather than a workflow engine. It doesn’t have built-in capabilities for sequencing multiple workflow steps; instead, where dsub excels is in its simplicity and ease of use.
Multi-stage or multi-environment workflows
If you’re looking to run multi-stage workflows, where each stage runs on a separate VM, and want to take advantage of more sophisticated built-in capabilities (such as automatic horizontal scaling or stage result caching), then Cromwell/WDL and Nextflow may be better choices.
If you are looking to write workflows that you can run in multiple environments, such as Workbench and your institutional HPC cluster, then Cromwell/WDL and Nextflow may again be better choices; both Cromwell and the Nextflow engine support many different backends and executors.
Note that Verily Workbench currently only supports Cromwell through the UI. However, all workflow engines (including Cromwell) can be executed on VMs within a workspace, as described below.
You have flexibility
It’s worth noting that using any one of these workflow engines on Workbench isn’t a deep commitment. These frameworks use the Life Sciences API on Google Cloud to execute individual tasks, which means that each is built around:
- Packaged code in Docker images
- Input files localized from Google Cloud Storage
- Output files delocalized to Google Cloud Storage
Moving a workflow (or a subset of workflow tasks) to a different workflow engine can be focused on rewriting the orchestration, rather than the need to rewrite each individual task.
Where to find existing workflows
Cromwell supports the orchestration of complex workflows written in the Workflow Definition Language (WDL). WDL is used for workflows across the community and many existing WDL workflows can be found in Dockstore. To learn more about using Cromwell on Workbench, see Using the Cromwell engine to run WDL workflows on Workbench
Nextflow supports the orchestration of complex workflows written in the Nextflow scripting language. Nextflow is used for workflows across the community and many existing Nextflow workflows can be found in Dockstore. For a Nextflow tutorial, see Getting started with Nextflow on Workbench.
Last Modified: 9 December 2024