Using Cloud Environments for Analysis

How to use Cloud Environments for analysis


A cloud environment is a configurable pool of cloud computing resources. Cloud environments consist of a virtual machine and a persistent disk, with some useful libraries and tools preinstalled. They’re ideal for interactive analysis and data visualization, and can be finely tuned to suit analysis needs.

Cost is incurred while the cloud environment is running, based on your configuration. You can pause the environment when it’s not in use, but there’s still a charge for maintaining your disk.

You can create and manage multiple cloud environments per workspace. The environments can have different base images (e.g. one for TensorFlow experiments, another for working with R), and can differ in the machine configuration and number of attached GPUs. You might set up a many-core VM for prototyping on-node ML training or doing a complex analysis, and a cheaper lightweight environment for, e.g. setting up Dataproc clusters— where the heavy lifting is done on the Dataproc cluster, and so the notebook to launch the cluster doesn’t need to do a lot of computation.

Creating a cloud environment

To create a new cloud environment:

  1. Open a workspace in the Verily Workbench web UI

  2. Navigate to the “Environments” tab

  3. Click “New cloud environment” to open the “Creating cloud environment” dialog

On the “Enter details” panel:

  1. Enter a name and optional description

  2. Click the “Next” button

  3. Then, choose the type of Deep Learning VM image that you want to use for your environment. In the image below, the “TensorFlow Enterprise” image is selected.

    (See below for more information on the “Custom container” option).

  4. Then, select the number of cores you want, which in turn determines the memory available. Optionally, you can also configure your environment to attach GPUs to the VM. GPUs may not be available for all environment images.

Once your cloud environment is created, you can stop and restart the environment, edit your configuration, delete your environment, and launch JupyterLab.

From the “Environments” tab for a workspace, you will be able to see all of the cloud environemnts associated with that workspace, and their status.

Workspace Git repositories

You can add references to git repos from your workspace. When you add a reference to a git repository, it will be automatically cloned to the file system of any cloud environment that you create.

With public repos, use the https:// syntax to add the repo URI. This will allow the repo to be cloned without having an SSH key set up.

With private repos, use the git@ (SSH) syntax to add the repo URI. This is required for Workbench to succesfully clone the repo. For a clone of a private repo to be successful, the Workbench key must be set up as described in the next section.

If you visit the repo on GitHub, you can copy either of those URI formats as indicated here.

Creating an SSH key

For private repos, you will need an SSH key to access the repo from your workspace. Workbench makes this straightforward by generating and managing an SSH key for you. Visit your Profile in the Web UI (upper-right corner) to view your public SSH key, which you can copy and regenerate.

You must associate the new SSH public key with your GitHub account using this process. See this page for more detail.

Configuring and using a cloud environment

After a cloud environment reaches the RUNNING state, click to bring up a JupyterLab notebook server in a new window. From this UI, you can create and run Jupyter notebooks, and use the Terminal to work from the command line.

Setting up cloud environment defaults

You will probably want to configure your cloud environment to tailor them to your particular analysis tasks. This notebook sets up some reasonable defaults for your workspace environment, and this one creates some resources expected to exist for many Workbench tutorials. These notebooks perform some common and useful workspace setup tasks, including:

  • Configuring the user name and email address to use for your git commits.
  • Creating Cloud Storage bucket resources used in Workbench tutorials.
  • Creating a BigQuery dataset resource used in Workbench tutorials.
  • Creating a directory on this machine for Python virtual environments used in Workbench tutorials.

(If you take a closer look, you’ll notice that some of the resources set up by this notebook are configured to autodelete older content after a period of time. This alleviates the need for you to remember to delete example and temporary data).

You may want to modify this notebook further for your own purposes.

Accessing the terra command-line tool from your cloud environment

The terra command-line utility is automatically installed and configured in your cloud environments. From the Terminal window, or from a notebook cell, you can use this utility to get information about your account, workspaces, and workspace resources. Below are a few examples.

$ terra auth status
User email:
Proxy group email:
Service account email for current workspace:

terra resource list lists all the resources defined for the current workspace:

$ terra resource list
NAME                            RESOURCE TYPE         STEWARDSHIP TYPE      DESCRIPTION
nb-repo                         GIT_REPO              REFERENCED            (unset)
nextflow_tests                  AI_NOTEBOOK           CONTROLLED            (unset)
nf-core-sample-data-repo        GIT_REPO              REFERENCED            (unset)
rnaseq-nf-repo                  GIT_REPO              REFERENCED            Respository containing a Nextflow RNA...
tabular_data_autodelete_aft...  BQ_DATASET            CONTROLLED            BigQuery dataset for temporary storag...
terra-axon-examples             GIT_REPO              REFERENCED            (unset)
ws_files                        GCS_BUCKET            CONTROLLED            Bucket for reports and provenance rec...
ws_files_autodelete_after_t...  GCS_BUCKET            CONTROLLED            Bucket for temporary storage of file ...

You can see details of a resource given its name:

$ terra resource describe --name ws_files
Name:         ws_files
Description:  Bucket for reports and provenance records.
Type:         GCS_BUCKET
Stewardship:  CONTROLLED
Cloning:      COPY_NOTHING
Access scope: SHARED_ACCESS
Managed by:   USER
Properties:   class Properties {
GCS bucket name: terra-vpp-quick-rhubarb-111-ws-files
Location: US-CENTRAL1
# Objects: 0

You can use the terra resource resolve command to find the underlying resource that that a name points to. You will often see this command used in example notebooks. This makes it straightforward to work with easily-remembered resource names and to access the underlying URI when needed.

$ terra resource resolve --name ws_files

Viewing and managing your cloud environments via the Cloud Console

In addition to viewing the status of your cloud environments in the Workbench web UI, you can also view them in the Google Cloud Console. This provides another interface for launching JupyterLab for a notebook environment, stopping/starting your environments, and making some configuration changes. (However, you must create and delete your environments via Workbench.)

You can follow the project link in a workspace description page to visit the Cloud Console for the workspace project, then visit to see your cloud environments. You can also navigate to Vertex AI » Workbench in the Cloud Console.

Modifying a cloud environment configuration

You can change the machine type and number of GPUs for a cloud environment after you have created it. To do this, the environment VM needs to be STOPPED first.

To update a configuration:

  1. From the right-hand panel of the workspace’s “Overview” panel, click on the link for your workspace’s associated Google Project. This will take you to the Google Cloud Console.

  2. From the “hamburger” menu in the upper left of the Console, navigate to the Vertex AI > Workbench panel in the Console.

  3. You should see your cloud environment listed on this page. You can stop and start your environments from this UI as well. Make sure that the environment you want to reconfigure is stopped. Then click on the link for the environment to view its details, and click on the HARDWARE tab:

  4. Then, update the “machine type” and GPU to the new config, and click “SUBMIT”. This screencast walks through the process:

Get cost estimates for different environment VM configurations

As you can see in the screencast above, the cloud environment cost estimates change as you reconfigure the machine type and GPU settings. You can use this Cloud Console view of your cloud environment to see an estimate of how much your environment would cost you if you left it RUNNING for a month.

Note that the estimated charges are specifically for a running cloud environment; if you stop a cloud environment, you are still charged for your cloud environment’s disk, but you are not incurring compute costs. As discussed above, it’s therefore recommended to stop your cloud environment when it’s not in use.

Specifying a container image as the basis for a notebook environment

The Workbench web UI also allows you to specify a container image as the basis for a cloud environment.

A number of prebuilt containers are listed here. If you wish to create a custom container, you should use one of these containers as your base image, as they include the necessary config for successfully launching a cloud environment.

The container images you build must be Docker container images. Private images may only come from the Google Cloud Artifact Registry or Container Registry. See this page for more detail on setting up an Artifact Registry and using Cloud Build to build and push your custom image to the registry.

Last Modified: 16 November 2023