Use the Cromwell engine to run WDL workflows
Categories:
Prior reading: Workflows in Verily Workbench: Cromwell, dsub, and Nextflow
Purpose: This document provides detailed instructions for running and monitoring Cromwell workflows in Verily Workbench.
Running and monitoring Cromwell workflows
Verily Workbench provides built-in support for running and monitoring WDL-based workflows via the Cromwell workflow engine. Right within the UI, you can add workflows, run them with a set of inputs, and monitor their execution.
Adding workflows
Before running a workflow on Verily Workbench, it first needs to be added to the workspace. The system currently supports WDL files stored in Google Cloud Storage, and in the future will support GitHub as a storage type as well.
To add a workflow, you first need a Google Cloud Storage bucket to add the source WDL file to. Create a bucket if one does not already exist. To do so, navigate to the Resources tab in a workspace, click the + New resource button, and select New Cloud Storage bucket.
Next, upload the source WDL file to the bucket with gsutil. Some alternative methods for workflow upload include:
- Creating and opening a cloud notebook app, which has gsutil already installed.
- Uploading the WDL file via the Google Cloud Console.
- Using the Add File Via URL feature to specify a URL to the given workflow.
Verily Workbench supports the execution of workflows with sub-workflows, so if your workflow is composed of multiple WDLs, add them all to the bucket.
gsutil cp path/to/workflow.wdl gs://some-bucket-name/
The final step is to create the workflow. First navigate to the Workflows tab. If this is your first workflow, click the Add your first workflow link. Otherwise, click the Add workflow button.
On the first page of the dialog, select the bucket that contains your WDL(s), then select the main WDL file. Click Next.
Note
If the Workbench UI does not find your WDLs and gives you a "No .wdls in bucket" error, this indicates that there are too many objects in the bucket for the UI to handle currently. As workarounds, rename your WDL directories to contain a prefix with a number or uppercase letter— so that they are listed first— or copy just your WDL directories to a different dedicated bucket.Add a display name for the workflow. Click Add to workspace.
You'll now see the workflow listed in the Setup sub-tab.
Running workflow jobs
To run a workflow via the Verily Workbench UI, first navigate to the Workflows tab, and select an existing workflow in the Setup sub-tab. If one does not exist, please follow the instructions in the above section.
Click the New job button in the workflow details tab. A Creating new job dialog will open.
On the Enter job details step, the UI will provide a default display name for the job, which you may optionally adjust. Click Next.
On the Prepare inputs step, the UI will show two sections: Run options and Input form.
The Run options section allows the user to configure a few options specific to running workflows in Cromwell:
- Enable call caching - Allows the user to cache job results to speed up future executions of the same job run with the same inputs. On by default.
- Delete intermediate output files - Removes intermediate files created by Cromwell during workflow execution. On by default.
- Retry with more memory - If selected, allows the user to specify a factor which the VM's memory will be multiplied by if the workflow fails and Cromwell retries execution. For example, if a factor of 1.2 is provided, and the initial VM memory is 2GB, then when retried the VM will be upgraded to 2.4GB memory.
The Input form section is where a user will enter inputs for the workflow. Whenever you open this dialog to run a job, Verily Workbench parses the WDL(s) to determine what inputs can be provided. It then dynamically displays these inputs to the user as an input form, complete with validation specific to the WDL type for each field. Optional fields are hidden by default and can be adjusted by unchecking the "Show required inputs only" checkbox.
After adjusting options and inputs, click the Next button to continue.
Finally, the Set up outputs step will allow the user to choose a destination bucket to write the workflow outputs and intermediate files to. This will default to the bucket the WDL is located in, but the user can choose any bucket in the workspace. The top-level folder name in the bucket will default to the display job name; this can also be adjusted. Click the Run button when ready and your workflow will begin execution.
Monitoring workflow jobs
To monitor a workflow that's previously been executed within a workspace, click the Job status sub-tab under the Workflows tab.
The Jobs table shows the name, status, and submission date of every executed workflow job in the workspace. To see more details for a workflow, select its row.
The workflow details pane will show source information, as well as a table displaying status of each individual task executed by the workflow. When a task is done, you can click View in GCP to get a direct link to the workflow's execution folder in Google Cloud Storage.
If a task or workflow fails, you can hover over the status to view the error text.
Last Modified: 21 November 2024