Using Cloud Environments for Analysis
Categories:
Introduction
A cloud environment is a configurable pool of cloud computing resources. Cloud environments consist of a virtual machine and a persistent disk, with some useful libraries and tools preinstalled. They’re ideal for interactive analysis and data visualization, and can be finely tuned to suit analysis needs.
Cost is incurred while the cloud environment is running, based on your configuration. You can pause the environment when it’s not in use, but there’s still a charge for maintaining your disk.
You can create and manage multiple cloud environments per workspace. The environments can have different base images (e.g. one for TensorFlow experiments, another for working with R), and can differ in the machine configuration and number of attached GPUs. You might set up a many-core VM for prototyping on-node ML training or doing a complex analysis, and a cheaper lightweight environment for, e.g. setting up Dataproc clusters— where the heavy lifting is done on the Dataproc cluster, and so the notebook to launch the cluster doesn’t need to do a lot of computation.
Creating a cloud environment
To create a new cloud environment:
-
Open a workspace in the Verily Workbench web UI
-
Navigate to the “Environments” tab
-
Click “New cloud environment” to open the “Creating cloud environment” dialog
On the “Enter details” panel:
-
Enter a name and optional description
-
Click the “Next” button
-
Then, choose the type of Deep Learning VM image that you want to use for your environment. In the image below, the “TensorFlow Enterprise” image is selected.
(See below for more information on the “Custom container” option).
-
Then, select the number of cores you want, which in turn determines the memory available. Optionally, you can also configure your environment to attach GPUs to the VM. GPUs may not be available for all environment images.
Tip
GPUs will increase the running cost of a VM per hour (making it particularly important to STOP a GPU-enabled environment while you’re not using it.)
You can read about GPUs on Google Compute Engine in more depth here, and see more detail about GPU pricing here.
Once your cloud environment is created, you can stop and restart the environment, edit your configuration, delete your environment, and launch JupyterLab.

Tip
You can stop your environments when you’re not using them. While an environment is stopped, you’re billed for disk.
Tip
It’s possible to create notebooks from other Deep Learning VM environments as well, via the Workbench CLI. See this page for more information on specifying image versions. You can run a command like the following (substituting your notebook name, and specifying machine type, accelerators, etc. as desired):
terra resource create gcp-notebook \
--name <your-notebook-name> \
--machine-type=<MACHINE_TYPE> \
--location=us-central1-a \
--vm-image-family=<IMAGE_FAMILY> \
--vm-image-project=deeplearning-platform-release
From the “Environments” tab for a workspace, you will be able to see all of the cloud environemnts associated with that workspace, and their status.
Workspace Git repositories
You can add references to git repos from your workspace. When you add a reference to a git repository, it will be automatically cloned to the file system of any cloud environment that you create.

With public repos, use the https://
syntax to add the repo URI. This will allow the repo to be
cloned without having an SSH key set up.

With private repos, use the git@
(SSH) syntax to add the repo URI. This is required for Workbench to succesfully clone the repo. For a clone of a private repo to be successful, the Workbench key must be set up as described in the next section.

If you visit the repo on GitHub, you can copy either of those URI formats as indicated here.
Creating an SSH key
For private repos, you will need an SSH key to access the repo from your workspace. Workbench makes this straightforward by generating and managing an SSH key for you. Visit your Profile in the Web UI (upper-right corner) to view your public SSH key, which you can copy and regenerate.

You must associate the new SSH public key with your GitHub account using this process. See this page for more detail.
Configuring and using a cloud environment
After a cloud environment reaches the RUNNING
state, click to bring up a JupyterLab notebook server in a new window.
From this UI, you can create and run Jupyter notebooks, and use the Terminal to work from the command line.
Setting up cloud environment defaults
You will probably want to configure your cloud environment to tailor them to your particular analysis tasks. This notebook sets up some reasonable defaults for your workspace environment, and this one creates some resources expected to exist for many Workbench tutorials. These notebooks perform some common and useful workspace setup tasks, including:
- Configuring the user name and email address to use for your git commits.
- Creating Cloud Storage bucket resources used in Workbench tutorials.
- Creating a BigQuery dataset resource used in Workbench tutorials.
- Creating a directory on this machine for Python virtual environments used in Workbench tutorials.
(If you take a closer look, you’ll notice that some of the resources set up by this notebook are configured to autodelete older content after a period of time. This alleviates the need for you to remember to delete example and temporary data).
You may want to modify this notebook further for your own purposes.
Accessing the terra
command-line tool from your cloud environment
The terra
command-line utility is automatically installed and
configured in your cloud environments. From the Terminal window, or from a notebook cell, you can
use this utility to get information about your account, workspaces, and workspace resources. Below
are a few examples.
$ terra auth status
User email: xxxx@google.com
Proxy group email: PROXY_xxxxxxxxxxxxxxxxxxxxx@verily-bvdp.com
Service account email for current workspace: pet-xxxxxxxxxxxxxxxxxxxxx@terra-vpp-quick-rhubarb-111.iam.gserviceaccount.com
LOGGED IN
terra resource list
lists all the resources defined for the current workspace:
$ terra resource list
NAME RESOURCE TYPE STEWARDSHIP TYPE DESCRIPTION
nb-repo GIT_REPO REFERENCED (unset)
nextflow_tests AI_NOTEBOOK CONTROLLED (unset)
nf-core-sample-data-repo GIT_REPO REFERENCED (unset)
rnaseq-nf-repo GIT_REPO REFERENCED Respository containing a Nextflow RNA...
tabular_data_autodelete_aft... BQ_DATASET CONTROLLED BigQuery dataset for temporary storag...
terra-axon-examples GIT_REPO REFERENCED (unset)
ws_files GCS_BUCKET CONTROLLED Bucket for reports and provenance rec...
ws_files_autodelete_after_t... GCS_BUCKET CONTROLLED Bucket for temporary storage of file ...
You can see details of a resource given its name
:
$ terra resource describe --name ws_files
Name: ws_files
Description: Bucket for reports and provenance records.
Type: GCS_BUCKET
Stewardship: CONTROLLED
Cloning: COPY_NOTHING
Access scope: SHARED_ACCESS
Managed by: USER
Properties: class Properties {
[]
}
GCS bucket name: terra-vpp-quick-rhubarb-111-ws-files
Location: US-CENTRAL1
# Objects: 0
You can use the terra resource resolve
command to find the underlying resource that that a name
points to. You will often see this command used in example notebooks. This makes it straightforward to work
with easily-remembered resource names and to access the underlying URI when needed.
$ terra resource resolve --name ws_files
gs://terra-vpp-quick-rhubarb-111-ws-files
Viewing and managing your cloud environments via the Cloud Console
In addition to viewing the status of your cloud environments in the Workbench web UI, you can also view them in the Google Cloud Console. This provides another interface for launching JupyterLab for a notebook environment, stopping/starting your environments, and making some configuration changes. (However, you must create and delete your environments via Workbench.)
You can follow the project link in a workspace description page to visit the Cloud Console for the workspace project, then visit https://console.cloud.google.com/vertex-ai/workbench/user-managed to see your cloud environments. You can also navigate to Vertex AI » Workbench in the Cloud Console.

Modifying a cloud environment configuration
You can change the machine type and number of GPUs for a cloud environment after you have created
it. To do this, the environment VM needs to be STOPPED
first.
To update a configuration:
-
From the right-hand panel of the workspace’s “Overview” panel, click on the link for your workspace’s associated Google Project. This will take you to the Google Cloud Console.
-
From the “hamburger” menu in the upper left of the Console, navigate to the
Vertex AI
>Workbench panel
in the Console. -
You should see your cloud environment listed on this page. You can stop and start your environments from this UI as well. Make sure that the environment you want to reconfigure is stopped. Then click on the link for the environment to view its details, and click on the
HARDWARE
tab: -
Then, update the “machine type” and GPU to the new config, and click “SUBMIT”. This screencast walks through the process:
Get cost estimates for different environment VM configurations
As you can see in the screencast above, the cloud environment cost estimates change as
you reconfigure the machine type and GPU settings. You can use this Cloud Console view of your
cloud environment to see an estimate of how much your environment would cost you if you left it RUNNING
for a month.

Note that the estimated charges are specifically for a running cloud environment; if you stop a cloud environment, you are still charged for your cloud environment’s disk, but you are not incurring compute costs. As discussed above, it’s therefore recommended to stop your cloud environment when it’s not in use.
Specifying a container image as the basis for a notebook environment
The Workbench web UI also allows you to specify a container image as the basis for a cloud environment.
A number of prebuilt containers are listed here. If you wish to create a custom container, you should use one of these containers as your base image, as they include the necessary config for successfully launching a cloud environment.

The container images you build must be Docker container images. Private images may only come from the Google Cloud Artifact Registry or Container Registry. See this page for more detail on setting up an Artifact Registry and using Cloud Build to build and push your custom image to the registry.
Last Modified: 16 November 2023