Using the Cromwell engine to run WDL workflows
Running and monitoring Cromwell workflows
Verily Workbench provides built-in support for running and monitoring WDL-based workflows via the Cromwell workflow engine. Right within the UI, you can add workflows, run them with a set of inputs, and monitor their execution.
Before running a workflow on Verily Workbench, it first needs to be added to the Workspace. The system currently supports WDL files stored in Google Cloud Storage, and in the future will support GitHub as a storage type as well.
To add a workflow, you first need a Google Cloud Storage bucket to add the source WDL file to. Create a bucket if one does not already exist. To do so, navigate to the Resources tab in a workspace, click the + Cloud resource button, and select New Cloud storage bucket.
Next, upload the source WDL file to the bucket with gsutil. Some alternative methods for workflow upload include:
- Creating and opening a cloud notebook environment, which has gsutil already installed.
- Uploading the WDL file via the Google Cloud Console.
- Using the Add File Via Url feature to specify a URL to the given workflow.
Verily Workbench supports the execution of workflows with sub-workflows, so if your workflow is composed of multiple WDLs, add them all to the bucket.
gsutil cp path/to/workflow.wdl gs://some-bucket-name/
The final step is to create the workflow. First navigate to the Workflows tab:
Click the “Add your first workflow” link if this is your first workflow, or the “Add workflow” button otherwise:
On the first page of the dialog, select the the bucket that contains your WDL(s), then select the main WDL file.
Add a display name for the workflow, and select “Add to workspace”.
You’ll now see the workflow listed in the Set up workflows sub tab.
Running workflow jobs
To run a workflow via the Verily Workbench UI, first navigate to the Workflows tab, and select an existing workflow in the Set up workflows sub-tab. If one does not exist, please follow the instructions in the above section.
Select the New job button in the workflow details tab.
On the Enter job details step, the UI will provide a default display name for the job which you may optionally adjust.
On the Prepare inputs step, the UI will show two sections: Run options and Input form.
The Run options section allows the user to configure a few options specific to running workflows in Cromwell:
- Enable call caching - Allows the user to cache job results to speed up future executions of the same job run with the same inputs. On by default.
- Delete intermediate output files - Removes intermediate files created by Cromwell during workflow execution. On by default.
- Retry with more memory - If selected, allows the user to specify a factor which the VM’s memory will be multiplied by if the workflow fails and Cromwell retries execution. For example, if a factor of 1.2 is provided, and the initial VM memory is 2GB, then when retried the VM will be upgraded to 2.4GB memory.
The Input form section is where a user will enter inputs for the workflow. Whenever you open this dialog to run a job, Verily Workbench parses the WDL(s) to determine what inputs can be provided. It then dynamically displays these inputs to the user as an input form, complete with validation specific to the WDL type for each field. Optional fields are hidden by default and can be adjusted by unchecking the “Show required inputs only” checkbox. After adjusting options and inputs, click the Next button to continue.
Finally, the Set up outputs step will allow the user to choose a destination bucket to write the workflow outputs and intermediate files to. This will default to the bucket the WDL is located in, but the user can choose any bucket in the workspace. The top level folder name in the bucket will default to the display job name, this can also be adjusted. Click the Run button when ready and your workflow will begin execution.
Monitoring workflow jobs
To monitor a workflow that’s previously been executed within a workspace, click the Monitor jobs sub tab under the Workflows tab.
The Jobs table shows the name, status, and submission date of every executed workflow job in the workspace. To see more details for a workflow, select its row.
The workflow details pane will show source information, as well as a table displaying status of each individual task executed by the workflow. When a task is done, you can click View in GCP to get a direct link to the workflow’s execution folder in Google Cloud Storage.
If a task or workflow fails, you can hover over the status to get the error text.
Last Modified: 9 February 2024