Compute resource reservations for cloud apps

Understand GPU reservations in Google Cloud Platform and how to use them with Verily Workbench cloud apps

Prior reading: Cloud apps overview, Compute profile configuration options

Purpose: This document explains GPU reservations in Google Cloud Platform (GCP) and provides guidance on using them effectively with Workbench cloud apps.



Introduction

Reservations in Google Cloud Platform allow you to secure compute capacity in advance, ensuring that the machine types and GPUs you need are available when you create or start your Workbench cloud apps. This is particularly important for high-performance workloads that require specific machine configurations or GPU types.

When working with Workbench cloud apps, understanding reservation types helps you:

  • Guarantee access to scarce compute resources (including GPUs and specific machine types)
  • Plan for future workloads
  • Manage costs effectively
  • Ensure availability for critical applications

For detailed information about Google Cloud reservations, refer to the official GCP documentation.

Types of reservations

Google Cloud Platform offers two main types of reservations for compute resources:

Future reservations

Future reservations allow you to reserve GPU capacity for a specific time period in the future.

Features:

  • Schedule ahead: GPUs can be scheduled to start at a specific date and time in the future.
  • Set duration: Set a defined start and end time.
  • Potential reduced costs: Often provide cost savings for predictable workloads

Future reservations require advance planning and commitment to specific time windows. This reservation type is ideal for planned workloads where you know exactly when you'll need GPU resources.

Use cases for future reservations:

  • Scheduled training jobs that run at specific times
  • Batch processing workloads with known schedules
  • Research projects with defined timelines
  • Cost optimization for predictable GPU usage patterns

H200 and B200 GPU limitations

H200 (A3 Ultra machine type):

  • Feature NVIDIA H200 GPUs optimized for AI/ML workloads
  • Designed for the most demanding training and inference tasks

B200 (A4 machine type):

  • Feature NVIDIA B200 GPUs optimized for inference and graphics workloads
  • Ideal for AI inference, video processing, and graphics-intensive applications

On-demand reservations

On-demand reservations provide immediate access to reserved GPU capacity.

Features:

  • Immediate availability: Start immediately when created
  • Flexibility: Can be created and deleted as needed
  • Duration: Remain active until explicitly deleted
  • Instant access: Provide immediate guaranteed access to GPU resources

Use cases for on-demand reservations:

  • Interactive development and debugging sessions
  • Urgent model training or inference tasks
  • Variable workloads with unpredictable timing

Types of on-demand reservations

On-demand reservations can be configured in two ways:

Automatic reservations (default):

  • VMs matching the exact configuration of the reservation will automatically try to use the reservation.
  • When creating cloud apps, do not specify a reservation name. If there are multiple automatic reservation with the same configuration, GCP will decide which reservation to consume.

Specific reservations:

  • When creating cloud apps, you must explicitly specify the reservation name.

Using GPU reservations with Workbench cloud apps

Creating reservations

GPU reservations are created and managed through the Google Cloud Platform console, not directly through Workbench. To create a reservation:

  1. Navigate to the Google Cloud console and select the project ID for your workspace. Alternatively, you can select the Google Project link on the workspace's Overview tab in the Workbench UI.

  2. Go to Compute Engine → Reservations.

  3. Select the On-demand reservations or Future reservations tab. (You must select Future reservations for H200 and B200 GPUs)

  4. Select Create reservation and specify the following:

    • GPU type and quantity
    • Machine type (only if GPU is not selected)
    • Zone (must match your workspace's region)
    • Duration (for future reservations)
    • Set Local SSD count to 0 if it's pre-populated
    • Automatic or specific

Use your reservation with Workbench cloud apps

Via the Workbench UI

You can select a reservation when selecting your compute options during app creation.

Tick the Use a specific reservation checkbox. A dropdown menu will appear listing all available future and on-demand specific reservations. Select a reservation and proceed with creating the app.

The Creating app dialog showing the
  Reservation dropdown with a list of available reservations.
Select an available reservation when creating an app.

If you don't select a reservation from the dropdown but your app's configuration matches that of an automatic reservation, that automatic reservation will be consumed. For example, if you configure your app with a n2-standard-2 machine type, and you have an automatic reservation using n2-standard-2, that reservation will be consumed by default if it's not already in use.

Via the Workbench CLI

To use a future or on-demand specific reservation with your cloud app, you must specify the reservation name (--reservation-uri) when creating the app:

wb app create gcp \
  --id=nemo-b200 \
  --app-config=nemo \
  --accelerator-type=nvidia-b200 \
  --accelerator-core-count=8 \
  --machine-type=a4-highgpu-8g \
  --zone=us-central1-b \
  --reservation-uri=my-b200-reservation

Replace my-b200-reservation with the actual name of your future reservation. Replace the zone with the reservation zone.

For automatic reservations, simply create your app with the desired GPU configuration:

wb app create gcp \
  --id=nemo-A200 \
  --app-config=nemo \
  --accelerator-type=nvidia-a100-80gb \
  --accelerator-core-count=1 \
  --machine-type=a2-ultragpu-1g \
  --zone=us-central1-a

The app will automatically use matching reservation capacity if available.

Best practices

  1. Zone alignment: Always create reservations in one of the zones in the region where your workspace operates.
  2. Capacity planning: Reservations are shared within the workspace. Reserve multiple VM instances if you expect high demand.
  3. Machine type compatibility: Verify that your chosen machine type supports the reserved GPU type.
  4. Configuration simplicity: For on-demand reservations, avoid Local SSD and specific CPU platform selections.
  5. On-demand monitoring: Check the status of a reservation at any time on the Google Cloud console.
  6. Future reservation planning: Plan workloads to align with reservation windows and set reminders before start times.

Troubleshoot common issues

The reservation I created is not being used

If your cloud app isn't using reserved capacity, check the following in your Google Cloud console:

  1. Verify zone matching: Ensure your app and reservation are in the same zone.
  2. Check reservation configuration: Check in the Cloud console to ensure there's 0 additional Local SSDs attached. CPU platform is set as automatic.
  3. Check machine type compatibility: Confirm the machine type supports your reserved GPU. Check the details of your reservation for compute configuration.
  4. Review reservation status: Ensure the reservation is active and has available capacity.
  5. Validate GPU configuration: Verify GPU type and count match the reservation.
  6. Check reservation name: For on-demand specific reservations and future reservation, ensure the --reservation-uri parameter is correct.

A reservation I created is not available

You can monitor a reservation's in-use status on the Cloud console's Reservations page.

Reservations will be available for use after the cloud app consuming the reservation is stopped or deleted. However, it may take a few minutes for the reservation to become available.

Next steps

Last Modified: 2 October 2025