Data resource operations

Operations that can be performed on data resources through the Verily Workbench web UI

Purpose: This document provides detailed instructions for performing operations on data resources through the Verily Workbench web UI.

Note: These instructions all assume that you have already opened a workspace in the Verily Workbench web UI and navigated to the Resources tab.

All of these operations can be performed via the Workbench CLI as well. See the CLI reference for details.



List your data resources

Your data resources are listed in the ‘Resources’ tab of the workspace. If your resources are organized in folders, the folders may be displayed as collapsed by default. Click on the triangle to the left of the folder name to expand or collapse the view.

For more information about folders, see Organize resources into folders below. For other operations such as previewing contents and editing resource details, see Manage your data resources.


Create a new controlled resource

You can create empty storage buckets and BigQuery datasets directly from the Verily Workbench web UI. Any resource created in this way will be treated as a controlled resource, meaning that access to the resource is controlled by Workbench. This is in contrast to a referenced resource (see below).

Note that controlled data resources are tightly associated with the workspace where they are created. They are automatically shared with any collaborators who have been granted access to the workspace, and their data lifecycle matches that of the workspace. If the workspace is deleted, its controlled data resources are also deleted. If the workspace is cloned, its controlled data resources are also cloned.

Create a storage bucket

To create a new storage bucket via the web UI, click on the ‘+ Cloud resource’ button in the top right of the Resources pane and select ‘New Cloud Storage bucket.’ This will open a resource creation dialog; fill it out as detailed below.

Screenshot of a workspace's Resources page, with the New Cloud Storage bucket option highlighted.
Creating a controlled storage bucket.
  1. Enter an ID for your new resource. This will be the ID displayed when you list your resources in Workbench. The resource ID must be unique within the workspace.

  2. Use the ‘Folder path’ dropdown menu to select a folder. You’ll be able to move the bucket to a different folder after creation if desired.

  3. Provide a brief description of the resource. This is optional but highly recommended.

  4. The system will suggest a bucket name, generated automatically based on the resource ID and the Google Project associated with the workspace. The bucket name will be the name of the bucket as listed in Google Cloud (displayed in the Resource details in Workbench). You can modify or replace the suggested bucket name in the creation dialog, but note that the bucket name must be globally unique across all of Google Cloud. You will not be able to edit the bucket name once it has been created (though you may change the resource ID if you like).

Screenshot of the Creating Cloud Storage bucket dialog, showing the folder path for the newly created bucket.
Creating a controlled storage bucket. Here, the resource will be added under the "experimental data" folder.

Create a BigQuery dataset

To create a new BigQuery dataset via the web UI, click on the ‘+ Cloud resource’ button in the top right of the Resources pane and select ‘New BigQuery dataset.’ This will open a resource creation dialog; fill it out as detailed below.

Screenshot of a workspace's Resources page, with the New BigQuery dataset option highlighted.
Creating a controlled BigQuery dataset.
  1. Enter an ID for your new resource. This will be the ID displayed when you list your resources in Workbench. The resource ID must be unique within the workspace.

  2. Use the ‘Folder path’ dropdown menu to select a folder. You’ll be able to move the dataset to a different folder after creation if desired.

  3. Provide a brief description of the resource. This is optional but highly recommended.

  4. The system will suggest a dataset ID, generated automatically based on the resource ID. The dataset ID will be the name of the dataset as listed in Google Cloud (displayed in the Resource details in Workbench). You can modify or replace the suggested dataset ID in the creation dialog, but note that the dataset ID must be unique within your Google Cloud project (but not across all of Google Cloud). You will not be able to edit the dataset ID once it has been created.

Screenshot of the Creating BigQuery dataset dialog.
Creating a controlled BigQuery dataset. This resource will be stored in the "test data" folder.

Add a reference to an existing resource

You can reference existing storage buckets and files as well as existing BigQuery datasets and tables based on their Google Cloud identifiers. Any resource added in this way will be treated as a referenced resource, meaning that access to the resource is not controlled by Workbench. This is in contrast to a controlled resource (see above).

Note that as a result, access to referenced data resources is not automatically granted to collaborators who have been granted access to the workspace. For information about sharing a referenced data resource with collaborators, visit Access levels and privileges.

Access a resource in the Google Cloud console

If you need the name or ID of a referenced resource, you can access it through the Google Cloud console. To do so, select the resource in the Resources list and click on the ‘Open in GCP’ button in the information pane on the right. This will open a new tab or window in your web browser.

Reference a storage bucket

To reference an existing storage bucket via the web UI, click on the ‘+ Cloud resource’ button in the top right of the Resources pane and select ‘Reference Cloud Storage bucket.’ This will open a resource addition dialog; fill it out as detailed below.

  1. Enter an ID for your resource. This will be the ID displayed when you list your resources in Workbench. The resource ID must be unique across all of Workbench (but not across all of Google Cloud).

  2. Use the ‘Folder path’ dropdown menu to select a folder. You’ll be able to move the bucket to a different folder after creation if desired.

  3. Provide a brief description of the resource. This is optional but highly recommended.

  4. Enter the name of the bucket you want to reference. You can find this information in the Google Cloud console. Do not include the gs:// prefix.

Screenshot of the Adding Cloud Storage bucket dialog
Creating a GCS bucket reference.

Once created, the resource should look similar to the following:

Screenshot showing details of a Cloud Storage bucket referenced resource.
A GCS bucket reference.

Reference a file or a folder in a bucket

To reference an existing file or folder in a storage bucket via the web UI, click on the ‘+ Cloud resource’ button in the top right of the Resources pane and select ‘Reference Cloud Storage object.’ This will open a resource addition dialog; fill it out as detailed above under Reference a storage bucket for steps 1-3, then as detailed below for step 4.

  1. Enter the gs:// URI of the file or folder you want to reference. You can find this information in the Google Cloud console (or in the “details” panel for Workbench workspace resources).
Screenshot of the Adding Cloud Storage object dialog
Creating a GCS bucket folder reference.

Once created, the resource should look similar to the following:

Screenshot showing details of a Cloud Storage object referenced resource.
A GCS bucket folder reference.

Reference a BigQuery dataset

To reference an existing BigQuery dataset via the web UI, click on the ‘+ Cloud resource’ button in the top right of the Resources pane and select ‘Reference BigQuery dataset.’ This will open a resource addition dialog; fill it out as detailed below.

  1. Enter an ID for your resource. This will be the ID displayed when you list your resources in Workbench. The resource ID must be unique within the workspace.

  2. Use the ‘Folder path’ dropdown menu to select a folder. You’ll be able to move the bucket to a different folder after creation if desired.

  3. Provide a brief description of the resource. This is optional but highly recommended.

  4. Enter the ID of the BigQuery dataset you want to reference. You can find this information in the Google Cloud console.

Screenshot of a list of resources with Table ID information highlighted in Google Cloud console.
You can find project, dataset, and table identifiers in the Google Cloud BigQuery console.
  1. Enter the ID of the Google Project associated with the BigQuery dataset you want to reference. You can find this information in the Google Cloud console.
Screenshot of the Adding a BigQuery dataset dialog.
Creating a BigQuery dataset reference.

Reference a BigQuery table

To reference an existing BigQuery table via the web UI, click on the ‘+ Cloud resource’ button in the top right of the Resources pane and select ‘Reference BigQuery table.’ This will open a resource addition dialog; fill it out as detailed above under Reference a BigQuery dataset for steps 1-5, then as detailed below for step 6.

Screenshot of the Adding BigQuery table dialog.
Creating a BigQuery table reference.
  1. Enter the ID of the BigQuery table you want to reference. You can find this information in the Google Cloud BigQuery console.

Once created, the table reference details will look like the following:

Screenshot showing details of a BigQuery table referenced resource.
A BigQuery table reference.

Add a data collection from the data catalog

Import references to resources from a data collection

To add a data collection to your workspace via the web UI, click on the ‘+ Data from catalog’ button in the top right of the Resources pane. This will open a resource addition dialog; use it as detailed below.

  1. Browse the data catalog and select a data collection of interest. You’ll be able to see information about the most recent version of the data collection and when it was published. Click ‘Next.’

    Screenshot of the Select collection dialog, the first step when adding a data collection from the data catalog.

    This will lead to a dialog showing the contents of the collection.

  2. After you’ve clicked in to the data collection, select the version you’d like to import.

    Screenshot of the Select resource dialog, the second step when adding a data collection from the data catalog. It highlights selecting a specific version.
  3. Select which resources you would like to import from the data collection version. You can expand folders by clicking on the triangle to the left of the folder name. If you do not select all resources in a data collection, you’ll still have the option of adding them later. Once you have finalized your selection, click ‘Next.’

    Screenshot of a nested list of resources within a specific data collection version, with certain resources selected for importing.

    This will lead to a dialog showing the data policies associated with the resources you have selected.

  4. Review the policy requirements. Click ‘Next.’

    Screenshot of the Review policies dialog, the third step when adding a data collection from the data catalog.

    This will display a list of selected resources and destination options.

  5. Review your selection and choose the workspace folder where you want to add them. You can select an existing folder from the dropdown menu or create a new folder. Click ‘Add to your workspace.’

    Screenshot of the Review selection dialog, the last step when adding a data collection from the data catalog.

The selected resources should now appear in your workspace resources view.

Screenshot of nested workspace resources, which includes resources selected for importing from a data collection in the data catalog.

You can manage and access these resources as you would any other resource in your workspace, as described below in Manage your data resources.

View the lineage of resources imported from a data collection

You can view the data collection lineage of each resource. This displays provenance information, including a link to the collection of origin as well as the time or date when the resource was added to the workspace.

To view lineage information, click on the resource you want to inspect in the Resources list, then click on the ‘Lineage’ tab in the information pane on the right.

Screenshot of a list of resources, with the pedigree-table resource selected and showing its lineage information.
Data lineage for a referenced resource.

Manage your data resources

Organize resources into folders

You can organize your data resources in hierarchical folders.

To create a new folder, click the ‘+ New folder’ button in the top right corner of the Resources tab. This will bring up a folder creation dialog.

The following screencast shows creation of a new folder, then creation of a controlled Cloud Storage bucket resource within that folder.

To move a resource or folder to a different folder, select it and click on the ‘Move to’ button in the information pane on the right. This will bring up a folder organization dialog (which also allows you to create a new folder if needed).

The following screencast shows moving a resource (bucket1) to a new folder, created as part of the ‘Move’ dialog. When creating a new folder, you have the option of where to place it. In this case we didn’t locate the new folder under the current one, but created it at the top level.

View and edit resource details

You can edit the resource name and description of any of your resources at any time. To do so, select the resource and click on the ‘Edit details’ button in the information pane on the right. This will bring up the editing dialog.

Note that you cannot edit external identifiers such as bucket path, project ID, dataset ID or table ID after a resource creation. If you realize you made a mistake in one of these identifiers when you created or added the resource, you’ll need to delete the erroneous entry and repeat the process of creating or adding that resource to your workspace as described above. For instructions on deleting a resource, see Delete a resource below.

Browse buckets and preview file contents

You can browse the contents of buckets and preview file contents for certain file types directly in Workbench.

To browse the contents of a bucket, select it in the list of resources and click ‘Browse’ in the information pane on the right. This will bring up a browser pane that you can use to explore the contents of the bucket.

Screenshot showing a list of folders belonging in a genomics-public-data Google Cloud Storage referenced bucket.
Browsing a referenced GCS bucket.

Note that you can select an object within the bucket browser and add a direct reference to it in your resource list by clicking on ‘Add as reference’ in the information pane on the right.

Screenshot showing details of a selected file, with the 'Preview' and 'Add as reference' buttons highlighted.
File details while browsing a referenced GCS bucket.

To preview a file, select the file in the list of resources or in the bucket browser and click on the ‘Preview’ button in the information pane on the right. This will display a preview of the file contents.

Here’s an example of previewing a bam file:

Screenshot of a preview of a .bam file, with the 'Copy preview link' button highlighted.
Preview of a bam file.

Workbench supports the below file types for preview, as identified by the file extensions:

  • Images: jpeg, jpg, png, tiff, gif, bmp, svg
  • Renderables: md, pdf, html, ipynb, rmb
  • Tabular: csv, tsv
  • Text: txt, wdl, nf, sh, log, stdout, stderr, script, rc, json
  • Bioinformatics-related data formats: bam, bed, bedgraph, bb, bw, birdseye_canary_calls, broadpeak, seg, cbs, sam, vcf, linear, logistic, assoc, qassoc, gwas, gct, cram

Note that you cannot upload files into your buckets through the Workbench web UI. To do so, please use the Google Cloud console, or the gsutil command-line utility.

Delete a resource

When you delete a controlled resource, managed by your workspace, it will be fully deleted and is not recoverable.

In contrast, when you delete a referenced resource, you’re removing only the reference. The resource to which the reference pointed is not affected.

To delete a resource, select it in the list of resources and click the symbol showing three vertically-stacked dots to display the menu of additional actions, and select ‘Delete.’

Screenshot of a bucket's details, with the 'Delete' option highlighted.
Deleting a controlled resource.

This will bring up a dialog that summarizes what will happen upon deletion. To confirm that you want to delete the resource, click the confirmation checkbox and click ‘Delete resource.’

Screenshot of the dialog that appears when a user chooses to delete a controlled resource.
Deleting a controlled resource.
Screenshot of the dialog that appears when a user chooses to delete a referenced resource.
Deleting a referenced resource.

Note that deletion of referenced resources and controlled resources has different effects as described above; please make sure that you understand the difference before deleting any resources.


Note on button locations

The resource management operations described above are available through buttons or selector menus located in the information pane that is displayed on the right when a resource is selected.

Screenshot of a workspace's Resources tab, showing a list of resources and a BigQuery dataset's details, with the Move and Delete options highlighted.
"Move" and "Delete" in the information pane for a resource.

The exact layout and appearance of the information may vary with the type of resource selected. For example, the information pane displayed for a storage bucket will include a ‘Browse’ button, while the one displayed for a BigQuery dataset will not.

Last Modified: 12 May 2024