Basic usage

Examples of typical operations performed via the Verily Workbench CLI

Prior reading: Command-line interface overview

Purpose: This document provides summary examples of commands that may be used alone or in combination to perform typical operations using the Workbench CLI.



Prerequisites

These instructions assume that you have already installed the Workbench CLI or are working in a cloud environment where it has been installed.

Starting a new work session

This is an example of the typical flow of operations when starting a new work session with the Workbench CLI.

Log in

Use this command to authorize the Workbench CLI to access the relevant APIs and data with user credentials.

wb auth login

ℹ️ wb auth login

Optionally, check the authentication status to confirm the login was successful.

wb auth status
User email: xxxx@google.com
Proxy group email: PROXY_xxxxxxxxxxxxxxxxxxxxx@verily-bvdp.com
Service account email for current workspace: pet-xxxxxxxxxxxxxxxxxxxxx@terra-vpp-quick-rhubarb-111.iam.gserviceaccount.com
LOGGED IN

ℹ️ wb auth status

Check server status

Use this command to check the status of the server and details of the current context.

wb server status

ℹ️ wb server status

List accessible workspaces

This command lists all workspaces that you (i.e. the logged-in user) have read or write access to.

wb workspace list

ℹ️ wb workspace list

Create a new workspace

Use this command to create a workspace as well as a Google project to back the workspace.

wb workspace create --id=<my-workspace-id> --name=<my-workspace-name>

ℹ️ wb workspace create

The --name=<my-workspace-name> argument is optional; if you do not include it, the system will assign a randomly generated unique identifier (UUID).

Optionally, you can use the wb status command to confirm that the workspace was created successfully.

wb status

ℹ️ wb status

Use an existing workspace

If you want to use an existing workspace, use the set command instead of create.

wb workspace set --id=<my-workspace-id>

ℹ️ wb workspace set

Note that in the Verily Workbench web UI, the overview page of a workspace includes a predetermined command that can be copied and pasted to set that workspace in the CLI.

Screenshot of Workbench CLI panel showing terminal command to set a specified workspace on your local machine.

Add a controlled resource

You can add a controlled BigQuery dataset, GCS bucket, GCP notebook, or GCP Dataproc cluster.

wb resource create

The following is an example of a GCS bucket controlled resource creation:

wb resource create gcs-bucket --id=scratch-data --description="Scratch space for working data."
Successfully added controlled GCS bucket.
Name:         scratch-data
Description:  Scratch space for working data.
Type:         GCS_BUCKET
Stewardship:  CONTROLLED
Cloning:      COPY_RESOURCE
Access scope: SHARED_ACCESS
Managed by:   USER
GCS bucket name: scratch-data-terra-vdevel-clean-pear-3014
Location: US-CENTRAL1
# Objects: 0

ℹ️ wb resource create

Add a referenced resource

You can add a BigQuery dataset/table, GCS bucket/object, or a Git repo as a referenced resource.

wb resource add-ref

The following is an example of a BigQuery table reference creation:

wb resource add-ref bq-table --dataset-id=samples --project-id=bigquery-public-data --table-id=github_timeline --id=github_timeline
Successfully added referenced BigQuery data table.
Name:         github_timeline
Description:
Type:         BQ_TABLE
Stewardship:  REFERENCED
Cloning:      COPY_REFERENCE
GCP project id: bigquery-public-data
BigQuery dataset id: samples
BigQuery table id: github_timeline
# Rows: 6219749

ℹ️ wb resource add-ref

Locate a data resource

Use this command to list all resources in your workspace.

wb resource list

You’ll see a list of resources with their respective names, resource types, and stewardship types:

NAME                            RESOURCE TYPE         STEWARDSHIP TYPE      DESCRIPTION
nb-repo                         GIT_REPO              REFERENCED            (unset)
nextflow_tests                  AI_NOTEBOOK           CONTROLLED            (unset)
nf-core-sample-data-repo        GIT_REPO              REFERENCED            (unset)
rnaseq-nf-repo                  GIT_REPO              REFERENCED            Respository containing a Nextflow RNA...
tabular_data_autodelete_aft...  BQ_DATASET            CONTROLLED            BigQuery dataset for temporary storag...
workbench-examples              GIT_REPO              REFERENCED            (unset)
ws_files                        GCS_BUCKET            CONTROLLED            Bucket for reports and provenance rec...
ws_files_autodelete_after_t...  GCS_BUCKET            CONTROLLED            Bucket for temporary storage of file ...

You can print details of a resource given its id (i.e., the name of the resource):

$ wb resource describe --id=ws_files
Name:         ws_files
Description:  Bucket for reports and provenance records.
Type:         GCS_BUCKET
Stewardship:  CONTROLLED
Cloning:      COPY_NOTHING
Access scope: SHARED_ACCESS
Managed by:   USER
Properties:   class Properties {
    []
}
GCS bucket name: terra-vpp-quick-rhubarb-111-ws-files
Location: US-CENTRAL1
# Objects: 0

You can then use this command to print the underlying cloud location.

wb resource resolve --id=<resource-name>

ℹ️ wb resource list

ℹ️ wb resource resolve

Create a notebook environment

Use the wb resource create command to create a notebook environment. The example below shows an environment configuration with a specific machine type, VM image, and GPUs.

wb resource create gcp-notebook \
--name <notebook_name> \
--machine-type=n1-highmem-16 \  
--vm-image-family=tf-ent-latest-gpu \
--vm-image-project=deeplearning-platform-release \ 
--data-disk-size 800 \
--accelerator-type NVIDIA_TESLA_V100 \
--accelerator-core-count=8 \
--install-gpu-driver=true \

Configure autostop idle time

Use this command to update the autostop idle time (in seconds) for your cloud environment.

For Google Compute Engine:

wb resource update gce --id=<compute-engine-id> --new-metadata=idle-timeout-seconds=<autostop-time>

For AWS EC2:

wb resource update ec2 --id=<ec2-id> --new-metadata=idle-timeout-seconds=<autostop-time>

Note: Make sure to replace <compute-engine-id> and <ec2-id> with the desired computing service ID. Replace <autostop-time> with the desired idle time. This should be a whole number.

Set gcloud credentials

Use this command to set user and application default credentials that the gcloud utilities should use to access data.

gcloud auth login
gcloud auth application-default login

ℹ️ gcloud auth

Last Modified: 22 August 2024