Data Resource operations

Operations that can be performed on data resources through the Verily Workbench web UI.

Purpose: This document provides detailed instructions for performing operations on data resources through the Verily Workbench web UI.

Note: These instructions all assume that you have already opened a workspace in the Verily Workbench web UI and navigated to the Resources tab.

All of these operations can be performed via the Workbench CLI as well. See the CLI reference for details.



List your data resources

Your data resources are listed in the ‘Resources’ tab of the workspace. If your resources are organized in folders, the folders may be displayed as collapsed by default. Click on the triangle to the left of the folder name to expand or collapse the view.

For more information about folders, see Organize resources into folders below. For other operations such as previewing contents and editing resource details, see Manage your data resources.


Create a new controlled resource

You can create empty storage buckets and BigQuery datasets directly from the Verily Workbench web UI. Any resource created in this way will be treated as a controlled resource, meaning that access to the resource is controlled by Workbench. This is in contrast to a referenced resource (see below).

Note that controlled data resources are tightly associated with the workspace where they are created. They are automatically shared with any collaborators who have been granted access to the workspace, and their data lifecycle matches that of the workspace. If the workspace is deleted, its controlled data resources are also deleted. If the workspace is cloned, its controlled data resources are also cloned.

Create a storage bucket

To create a new storage bucket via the web UI, click on the ‘+ Cloud resource’ button in the top right of the Resources pane and select ‘New Cloud Storage bucket’. This will open a resource creation dialog; fill it out as detailed below.

Creating a controlled storage bucket.
  1. Enter a name for your new resource. This will be the name displayed when you list your resources in Workbench. The name must be unique within the workspace.

  2. Use the ‘Folder path’ dropdown menu to select a folder. You will be able to move the bucket to a different folder after creation if desired.

  3. Provide a brief description of the resource. This is optional but highly recommended.

  4. The system will suggest a bucket name, generated automatically based on the resource name and the Google Project associated with the workspace. The bucket name will be the name of the bucket as listed in Google Cloud (displayed in the Resource details in Workbench). You can modify or replace the suggested bucket name in the creation dialog, but note that the bucket name must be globally unique across all of Google Cloud. You will not be able to edit the bucket name once it has been created (though you may change the resource name if you like).

Creating a controlled storage bucket. Here, the resource will be added under the "experimental data" folder.

Create a BigQuery dataset

To create a new BigQuery dataset via the web UI, click on the ‘+ Cloud resource’ button in the top right of the Resources pane and select ‘New BigQuery dataset’. This will open a resource creation dialog; fill it out as detailed below.

Creating a controlled BigQuery dataset.
  1. Enter a name for your new resource. This will be the name displayed when you list your resources in Workbench. The name must be unique the workspace.

  2. Use the ‘Folder path’ dropdown menu to select a folder. You will be able to move the dataset to a different folder after creation if desired.

  3. Provide a brief description of the resource. This is optional but highly recommended.

  4. The system will suggest a dataset identifier (ID), generated automatically based on the resource name. The dataset ID will be the name of the dataset as listed in Google Cloud (displayed in the Resource details in Workbench). You can modify or replace the suggested dataset ID in the creation dialog, but note that the dataset ID must be unique within your Google Cloud project (but not across all of Google Cloud). You will not be able to edit the dataset ID once it has been created.

Creating a controlled BigQuery dataset. This resource is organized under the "test data" folder.

Add a reference to an existing resource

You can reference existing storage buckets and files as well as existing BigQuery datasets and tables based on their Google Cloud identifiers. Any resource added in this way will be treated as a referenced resource, meaning that access to the resource is not controlled by Workbench. This is in contrast to a controlled resource (see above).

Note that as a result, access to referenced data resources is not automatically granted to collaborators who have been granted access to the workspace. For information about sharing a referenced data resource with collaborators, see TODO: Add link to doc about sharing & auth.

Reference a storage bucket

To reference an existing storage bucket via the web UI, click on the ‘+ Cloud resource’ button in the top right of the Resources pane and select ‘Reference Cloud Storage bucket’. This will open a resource addition dialog; fill it out as detailed below.

  1. Enter a name for your resource. This will be the name displayed when you list your resources in Workbench. The name must be unique across all of Workbench (but not across all of Google Cloud).

  2. Use the ‘Folder path’ dropdown menu to select a folder. You will be able to move the bucket to a different folder after creation if desired.

  3. Provide a brief description of the resource. This is optional but highly recommended.

  4. Enter the name of the bucket you want to reference. You can find this information in the Google Cloud console. Do not include the gs:// prefix.

Creating a GCS bucket reference.

Once created, the resource should look similar to the following:

A GCS bucket reference.

Reference a file or a folder in a bucket

To reference an existing file or folder in a storage bucket via the web UI, click on the ‘+ Cloud resource’ button in the top right of the Resources pane and select ‘Reference Cloud Storage object’. This will open a resource addition dialog; fill it out as detailed above under Reference a storage bucket for steps 1-3, then as detailed below for step 4.

  1. Enter the gs:// URI of the file or folder you want to reference. You can find this information in the Google Cloud console (or in the “details” panel for Workbench workspace resources).
Creating a GCS bucket folder reference.

Once created, the resource should look similar to the following:

A GCS bucket folder reference.

Reference a BigQuery dataset

To reference an existing BigQuery dataset via the web UI, click on the ‘+ Cloud resource’ button in the top right of the Resources pane and select ‘Reference BigQuery dataset’. This will open a resource addition dialog; fill it out as detailed below.

  1. Enter a name for your resource. This will be the name displayed when you list your resources in Workbench. The name must be unique within the workspace.

  2. Use the ‘Folder path’ dropdown menu to select a folder. You will be able to move the bucket to a different folder after creation if desired.

  3. Provide a brief description of the resource. This is optional but highly recommended.

  4. Enter the identifier (ID) of the BigQuery dataset you want to reference. You can find this information in the Google Cloud console.

    You can find project, dataset, and table identifiers in the Google Cloud BigQuery console.
  5. Enter the identifier (ID) of the Google Project that the BigQuery dataset you want to reference is associated with. You can find this information in the Google Cloud console.

Creating a BigQuery dataset reference.

Reference a BigQuery table

To reference an existing BigQuery table via the web UI, click on the ‘+ Cloud resource’ button in the top right of the Resources pane and select ‘Reference BigQuery table’. This will open a resource addition dialog; fill it out as detailed above under Reference a BigQuery dataset for steps 1-5, then as detailed below for step 6.

Creating a BigQuery table reference.
  1. Enter the identifier (ID) of the BigQuery table you want to reference. You can find this information in the Google Cloud BigQuery console.

Once created, the table reference details will look like the following:

A BigQuery table reference.

Add a data collection from the Data Catalog

Import references to resources from a data collection

To add a data collection to your workspace via the web UI, click on the ‘+ Data from catalog’ button in the top right of the Resources pane. This will open a resource addition dialog; use it as detailed below.

  1. Browse the data catalog and select a data collection of interest. You’ll be able to see information about the most recent version of the data collection and when it was published. Click ‘Next’.

    This will lead to a dialog showing the contents of the collection.

  2. After you’ve clicked in to the data collection, select the version you’d like to import.

  3. Select which resources you would like to import from the data collection version. You can expand folders by clicking on the triangle to the left of the folder name. If you do not select all resources in a data collection, you will still have the option of adding them later. Once you have finalized your selection, click ‘Next’.

    This will lead to a dialog showing the data policies associated with the resources you have selected.

  4. Review the policy requirements. Click ‘Next’.

    This will display a list of selected resources and destination options.

  5. Review your selection and choose the workspace folder where you want to add them. You can select an existing folder from the dropdown menu or create a new folder. Click ‘Add to your workspace’.

The selected resources should now appear in your workspace resources view.

You can manage and access these resources as you would any other resource in your workspace, as described below in Manage your data resources.

View the lineage of resources imported from a data collection

You can view the data collection lineage of each resource. This displays provenance information, including a link to the collection of origin as well as the time or date when the resource was added to the workspace.

To view lineage information, click on the resource you want to inspect in the Resources list, then click on the ’lineage’ tab in the information pane on the right.

Data lineage for a referenced resource.

Manage your data resources

Organize resources into folders

You can organize your data resources in hierarchical folders.

To create a new folder, click the ‘+ New folder’ button in the top right corner of the Resources tab. This will bring up a folder creation dialog.

The following screencast shows creation of a new folder, then creation of a controlled resource— a GCS bucket— within that folder.

To move a resource or folder to a different folder, select it and click on the ‘Move to’ button in the information pane on the right. This will bring up a folder organization dialog (which also allows you to create a new folder if needed).

The following screencast shows moving a resource (bucket1) to a new folder, created as part of the ‘Move’ dialog. When creating a new folder, you have the option of where to place it. In this case we didn’t locate the new folder under the current one, but created it at the top level.

View and edit resource details

You can edit the resource name and description of any of your resources at any time. To do so, select the resource and click on the ‘Edit details’ button in the information pane on the right. This will bring up the editing dialog.

Note that you cannot edit external identifiers such as bucket path, project ID, dataset ID or table ID after a resource creation. If you realize you made a mistake in one of these identifiers when you created or added the resource, you will need to delete the erroneous entry and repeat the process of creating or adding that resource to your workspace as described above. For instructions on deleting a resource, see Delete a resource below.

Browse buckets and preview file contents

You can browse the contents of buckets and preview file contents for certain file types directly in Workbench.

To browse the contents of a bucket, select it in the list of resources and click ‘Browse’ in the information pane on the right. This will bring up a browser pane that you can use to explore the contents of the bucket.

Browsing a referenced GCS bucket.

Note that you can select an object within the bucket browser and add a direct reference to it in your resource list by clicking on ‘Add as reference’ in the information pane on the right.

File details while browsing a referenced GCS bucket.

To preview a file, select the file in the list of resources or in the bucket browser and click on the ‘Preview’ button in the information pane on the right. This will display a preview of the file contents.

Here’s an example of previewing a bam file:

Preview of a bam file.

Workbench supports the below file types for preview, as identified by the file extensions:

  • Images: jpeg, jpg, png, tiff, gif, bmp, svg
  • Renderables: md, pdf, html, ipynb, rmb
  • Tabular: csv, tsv
  • Text: txt, wdl, nf, sh, log, stdout, stderr, script, rc, json
  • Bioinformatics-related data formats: bam, bed, bedgraph, bb, bw, birdseye_canary_calls, broadpeak, seg, cbs, sam, vcf, linear, logistic, assoc, qassoc, gwas, gct, cram

Note that you cannot upload files into your buckets through the VWB web UI. To do so, please use the Google Cloud console, or the gsutil command-line utility.

Access a resource in the Google Cloud console

In some cases you may need to access a resource through the Google Cloud console. To do so, select the resource in the Resources list and click on the ‘Open in GCP’ button in the information pane on the right. This will open a new tab or window in your web browser.

Delete a resource

When you delete a controlled resource, managed by your workspace, it will be fully deleted and is not recoverable.

In contrast, when you delete a referenced resource, you’re removing only the reference. The resource to which the reference pointed is not affected.

To delete a resource, select it in the list of resources and click the symbol showing three vertically-stacked dots to display the menu of additional actions, and select ‘Delete’.

Deleting a controlled resource.

This will bring up a deletion dialog that recapitulates what will happen if you confirm the deletion. To confirm that you want to delete the resource, click the confirmation checkbox and click ‘Delete resource’.

Deleting a controlled resource.
Deleting a referenced resource.

Note that deletion of referenced resources and controlled resources has different effects as described above; please make sure that you understand the difference before deleting any resources.


Note on button locations

The resource management operations described above are available through buttons or selector menus located in the information pane that is displayed on the right when a resource is selected.

"Move" and "Delete" in the information pane for a resource.

The exact layout and appearance of the information may vary with the type of resource selected. For example, the information pane displayed for a storage bucket will include a ‘Browse’ button, while the one displayed for a BigQuery dataset will not.

Last Modified: 16 November 2023