Create container images in a workspace

How to use Cloud Build or docker to create container images from within a workspace

Introduction

This article describes how you can build and use docker container images from within a Verily Workbench workspace.

This capability can be useful in many scenarios. For example, you may want to define a workflow (e.g. using a framework like nextflow or dsub) that includes step(s) which require you to create custom containers. As another example, you may want to create a notebook instance that is based on your own custom container.

This article walks through how you can build container images from your workspace using either Google Cloud Build, or the Docker command-line tool, then push the container image to the Google Artifact Registry so that you can use it in your workspace.

Custom containers for Vertex AI Workbench notebook environments

You can create Workbench notebook environments by choosing from a set of pre-built Deep Learning VMs. If these don’t meet your needs, you can create a custom container image to use when creating a notebook environment, as described below.

Create an Artifact Registry repository

The Google Artifact Registry lets you store, manage, and secure your build artifacts. To get started, you’ll create an Artifact Registry repository in the Google Project tied to your workspace.

  1. First, create or set the workspace that you want to use. You can do this from the Workbench UI, or the command line as follows. (On your local machine you will first need to install the Workbench CLI .)

    terra status
    terra auth login  # if need be
    terra workspace list
    

    To create a new workspace:

    terra workspace create –name=<workspace-name>
    

    To set the Workbench CLI to use an existing workspace:

    terra workspace set –id=<workspaceid>
    
  2. View any existing artifact repositories for your workspace via:

    terra gcloud artifacts repositories list
    

    You can run this command either from a local machine where you’ve installed the Workbench CLI, or from a notebook environment in your workspace.

    If you want to use an external repository, see Granting access to a private artifact registry repo in a separate Google project below.

  3. Create a new artifact repository from the command line like this, first replacing <your-repo-name> with the name of your new repo:

    terra gcloud artifacts repositories create <your-repo-name> --repository-format=docker \
    --location=us-central1
    

    As above, you can run this command either from a local machine where you’ve installed the Workbench CLI, or from a notebook environment in your workspace.

Create a Dockerfile

Next, you’ll define (or obtain) a Dockerfile from which to build your container image.

Create a directory for your Dockerfile and any other build artifacts, then create the Dockerfile. As a simple example for testing, you can create a Dockerfile that includes only the following line. That is, it uses a Tensorflow Deep Learning container as its base, and doesn’t make any other modifications.

FROM gcr.io/deeplearning-platform-release/tf2-cpu.2-12.py310

After you’ve defined your Dockerfile (and any other build artifacts), you can build and push your container image via either Cloud Build or by running docker directly. Both are described below.

Use Cloud Build to build and push a Docker image to the Artifact Registry

Cloud Build can import source code from a variety of repositories or cloud storage spaces, execute a build to your specifications, and produce artifacts such as containers or code archives.

We can use Cloud Build to build a Docker image and push it to the workspace project’s Artifact Registry repository in one step, from the command-line or in a notebook environment. When you use Cloud Build, you’re doing the Docker build in the cloud, which can be particularly helpful when you have a compute-intensive build you want to launch from (say) an underpowered cloud environment.

  1. Ensure that you have created a Workbench GCS bucket resource to use. If you haven’t done so yet, you can do so via the Workbench UI or from the command line like this:

    terra resource create gcs-bucket --name=ws_files \
     --description="Bucket for reports and provenance records."
    

    This command creates a bucket with the resource name: ws_files. The following example assumes a bucket with this resource name. If you want to use a different bucket resource, you can see your existing bucket resource names via: terra resource list.

  2. Then, in the same directory as your Dockerfile, run Cloud Build to build the container and push it to the Artifact Registry, first replacing <your-repo-name> in the following command.

    Note that we’re running the command as terra gcloud .... This “pass-through” allows us to use environment variables that will be set by terra. The TERRA_ws_files environment variable tells Workbench to use the GCS bucket with resource name ws_files. If you are using a bucket with a different resource name, or a different region, change that info as well. $GOOGLE_CLOUD_PROJECT will be set to the name of the project underlying your current Workbench workspace.
    (This example command sets the container image name to test1 and uses a date-based tag; you can edit those as well.)

    terra gcloud builds submit \
        --timeout 2h --region=us-central1 \
        --gcs-source-staging-dir \${TERRA_ws_files}/cloudbuild_source \
        --gcs-log-dir \${TERRA_ws_files}/cloudbuild_logs \
        --tag us-central1-docker.pkg.dev/\$GOOGLE_CLOUD_PROJECT/<your-repo-name>/test1:`date +'%Y%m%d'`
    

Once your container image has been built and pushed, you can see it listed in the Google Cloud Artifact Registry here: https://console.cloud.google.com/artifacts. Click in to the repo, and then your image name, to see its details.

Use the docker tool from to build and push an image to the Artifact Registry

As an alternative to using Cloud Build, you can run docker directly. You can use docker from your local machine if it is installed, or from a Workbench notebook environment, e.g. in a Terminal window.

For example, to build a container image, run the following command in the directory where your Dockerfile is, replacing <container_name>:<tag> and <your-project-id> with your details. You can find your workspace’s project ID from the overview page in the UI or via terra workspace describe, or from the GOOGLE_CLOUD_PROJECT environment variable in a workspace notebook environment.

docker build -t us-central1-docker.pkg.dev/<your-project-id>/<container_name>:<tag> .

To push your image to your Artifact Registry repository, you may need to first run the following command (you only need to do this once for each Registry region, in this case us-central1):

gcloud auth configure-docker us-central1-docker.pkg.dev

Then, push your container image to the registry (again, replacing the following with your details):

docker push us-central1-docker.pkg.dev/<your-project-id>/<container_name>:<tag>

Create a container-based cloud environment with your new container image

If your image is based on one of the Deep Learning container images (as in the example Dockerfile above), you can now create a Workbench cloud environment using that container image. You can do so via the Workbench UI:

Alternately, you can create a custom-container-based environment via the Workbench CLI like this (again replacing <your-repo-name> in the following command):

terra resource create gcp-notebook --name ctest --instance-id my-container-nb --machine-type=n1-standard-4 \
  --location=us-central1-c \
  --container-repository=us-central1-docker.pkg.dev/\$GOOGLE_CLOUD_PROJECT/<your-repo-name>/<your-container-name>:<container_tag>

Once the cloud environment comes up, you should be able to see it in the Workbench UI under the workspace “Environments”, and list it from the command line via terra resource list.

You can connect to the cloud environment via the link in the UI, or by running

terra resource describe --name <notebook_name>

and then visiting the listed proxy-url.

Granting access to a private artifact registry repo in a separate Google project

You may at times want to read from or push to an Artifact Registry in a Google project not associated with your workspaces.

In order to use a private artifact registry repo from a separate Google project in your Workbench workspaces, you need to grant read access to the proxy group. The proxy group contains a group of service account emails of all of your Workbench workspaces.

  1. Get the proxy group email for your workspace by running:

    terra auth status
    
  2. Go to the GCP project with the artifact registry:

    gcloud config set project <PROJECT_ID>
    gcloud artifacts repositories add-iam-policy-binding <my-repo> \
    --location=<us-central1> --member=group:<proxy-group-email>@verily-bvdp.com --role=roles/artifactregistry.reader
    

    Substitute <my-repo> with your artifact registry repository name. Substitute location with the location of the repo. Substitute proxy-group-email with the email from step 1.

Grant the artifactregistry.writer role to the Cloud Build service account

If you want to upload your images to an Artifact registry outside of your workspace’s Google project, you need to grant the artifactregistry.writer role to the cloud build service account of the workspace. It’s in the format of <project-number>@cloudbuild.gserviceaccount.com and can be found in the GCP console > Cloud Build > Settings.

  1. Go to the GCP console of the current workspace. Using the Workbench CLI, you can get the link to your project in the Cloud Console by running:

    terra status
    

    From the UI, look for the “Google Project” link under “Workspace Details”, e.g.:

  1. Click on that link to visit the Google Cloud Console for that project. In the Cloud Console, go to the Cloud Build settings.

  2. Get the Cloud Build service account, which should be in the form <project-number>@cloudbuild.gserviceaccount.com:

  3. Go to the Cloud console for the project containing the Artifact Registry that you want to use, and navigate to the Artifact Registry panel.

  4. Select your artifact registry and select ADD PRINCIPAL:

  5. Add the Cloud Build service account from your Workbench workspace with the Artifact Registry Writer role:

Additional Resources

GCP Cloudbuild documentation: Quickstarts | Cloud Build Documentation

Why Artifact registry? Understanding Artifact Registry vs. Container Registry | Google Cloud Blog

Last Modified: 16 November 2023