Data collections overview
Categories:
Purpose: This document explains what data collections are in Verily Workbench.
What is a data collection?
Data collections represent multimodal datasets that you can publish to Verily Workbench’s data catalog, so users can reference these data in workspaces. Collections are curated by data owners who ensure data quality, reproducibility, and associated lineage. Many collections will have policies attached that determine how the data may be accessed and used. Collections you have access to may be entirely or partially referenced for use in your Workbench workspace.
A data collection is a grouping of cloud-based resources related to a specific project, study, or purpose (e.g., public datasets like 1000 Genomes or TCGA, or data sources and studies internal to an organization). Researchers can browse and add data collections to their workspaces.
Data collections can
- Be curated and set up by data stewards or researchers who are responsible for the governance/usage of the data
- Be referenced across multiple workspaces
- Consist of more than just “data”; they can contain notebook and text files, images, etc.
- Be associated with policies that govern the usage of data
Users can add resources from a data collection to their own workspaces through the UI, and from the context of a workspace. See this page for information on how to add an existing data collection to your workspace.
Why create a data collection?
If you have a collection of data (tables and files) that you would like to package and share across a large number of users, then creating a data collection could be the right way to address your use case.
Packaging your data as a data collection has a few benefits:
- Allow users to easily discover, browse, and import your data:
- Allow users to work with multiple data collections in one workspace, in a policy-compliant way, facilitating cross-analysis
- Define policies to associate with your data, and be ensured that workspaces and users who reference subsets of your data must comply
- Centrally manage how your reference data appears in Workbench, regardless of all the
workspaces (and clones) that reference it:
- Define a new "version" of your data collection, and inform users in all workspaces that reference your data
- Manage the "discovery" of your data collections, ranging from widely discoverable to highly private; allow users to get a summary of what a data collection offers, without granting full access permissions
Next steps
Follow our quickstart guide to create a data collection via the Workbench UI or via the Workbench CLI.
Last Modified: 8 January 2026