Run WDL workflows on AWS-backed workspaces

How to run WDL workflows in AWS-backed workspaces in Workbench

Prior reading: Workflows overview

Purpose: This document provides information about running WDL workflows in an AWS-backed workspace.



Introduction

Workbench users working on AWS-backed workspaces can run WDL workflows with AWS HealthOmics. At this time, WDL workflows must be run through the Workbench UI.

Technical requirements

WDL files must live in an S3 storage folder or an external S3 bucket attached to your workspace.

Due to a HealthOmics limitation, any workflow images must be ECR (Elastic Container Registry) URIs.

To ensure Workbench has access to S3 and ECR to run workflows, you'll need to update the resource-based IAM policies for both.

S3

For each externally managed S3 bucket that Workbench should access, attach the policy statement generated by the Workbench UI when you create a new external S3 bucket resource.

Below is an example of a generated policy for the test-example bucket with access granted to the a111a1aa-1111-11a1-1111-111a1a1a111a workspace in account 999999999999:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::999999999999:root"
      },
      "Action": ["s3:ListBucket", "s3:GetObjectAttributes", "s3:GetObject"],
      "Resource": ["arn:aws:s3:::test-example/*", "arn:aws:s3:::test-example"],
      "Condition": {
        "StringEquals": {
          "aws:PrincipalTag/vwb-a111a1aa-1111-11a1-1111-111a1a1a111a`": ["reader", "writer"],
          "aws:PrincipalType": "AssumedRole"
        }
      }
    }
  ]
}

ECR

For each externally managed ECR repository that Workbench should access, you'll need to attach a specific IAM policy statement below. See IAM policy configuration for the policy statement and more information.

Once the policy statement is applied, ECR access will be granted to all workspaces and data collections linked to that workspace.

Differences between HealthOmics and EC2/Batch

If you previously ran workflows with EC2 instances and AWS Batch, you'll notice some differences when using HealthOmics.

Purpose

AWS Batch is designed for processing batch computing jobs across a variety of industries. Meanwhile, HealthOmics is specifically for bioinformatics.

Workflow engine management

With AWS Batch, users have to fully manage the workflow engine on their instances including installation, configuration, scaling, and maintenance. AWS HealthOmics manages the orchestration, job retries, and scaling of the workflow engine, removing significant operational and engineering overhead.

Compute instance selection

With EC2 and Batch, users can specify the exact EC2 instance family and size. With HealthOmics, users can select only the vCPU and memory requirements. Since HealthOmics is designed for bioinformatics, appropriate instances will already be selected for you.

Spot instance support

AWS Batch fully supports spot instance pricing. AWS HealthOmics jobs run on on-demand pricing.

Regional resource restrictions

AWS Batch allows for S3 buckets, ECR container repositories, and compute instances to exist in different AWS Regions. However, all HealthOmics resources need to be in the same Region so you'll need to plan ahead for your data and compute needs.

Specialized data storage

AWS Batch users can store data in any general-purpose storage (e.g., S3 buckets or EBS volumes), and they're responsible for mounting file systems and managing data access. With HealthOmics, workflows are designed to read and write directly from or to S3 buckets. There's also the separate "HealthOmics" storage, which is optional, decoupled, and not directly mounted to the workflow compute.

Last Modified: 12 December 2025