User case study: Global Parkinson's Genetics Program (GP2)

Learn how GP2 used Workbench to control access to their data collections

Purpose: This case study demonstrates how access is controlled for GP2's data collections.



Introduction

The Michael J. Fox Foundation for Parkinson's Research (MJFF) and the Global Parkinson's Genetics Program (GP2) have partnered with Verily to host and secure their latest data release on Verily Workbench.

GP2 is a worldwide effort to redefine our understanding of the underlying biology of Parkinson’s disease in a global context. GP2 will genotype over 200,000 individuals and generate WGS data for many of them, with a specific focus on underrepresented populations. GP2 is a resource program of Aligning Science Across Parkinson’s (ASAP) managed by the Coalition for Aligning Science and implemented by the MJFF.

This case study describes how MJFF and Verily maintain two tiers of data access via Workbench groups and policies. If you're interested in accessing the GP2 data yourself, please apply for access via the GP2 website.

GP2 data access constraints & concerns

As described on the GP2 website, there are two tiers of GP2 data access through Accelerating Medicines Partnership Parkinson’s Disease (AMP-PD) at the time of this writing:

  • Tier 1 grants access to limited individual-level clinical data, omics summary results, and/or related metadata.
  • Tier 2 grants access to the full GP2 dataset, with individual-level clinical and genetics data, as well as metadata.

Tier 2 data contains data from European participants and must remain in the EU in order to comply with the GDPR.

GP2 is committed to preventing exfiltration of participant-level data in order to adhere as best as possible to all local laws, regulations, and policies.

Data governance in Workbench

Workbench provides several complementary approaches for data governance:

  • Group policies that limit workspace and data access to specific groups of users.
  • Region policies that limit which regions may be used to create cloud resources and apps.
  • Perimeter policies that limit data transfer across designated network boundaries.

Verily Solutions engineers used all of these policies to ensure GP2 data is shared only with their intended partners, in accordance with GDPR requirements, and with limited ability to exfiltrate data.

GP2 data collections

GP2 data is stored in two Workbench data collections: one for the more sensitive, patient-level Tier 2 data, and one for Tier 1 data, which consists mostly of aggregated data and summary statistics.

Tier 2 data

The GP2 Tier 2 data collection contains several releases of GP2 data, versioned by release. Each version consists of a single controlled bucket.

Three policies have been applied to this data collection:

  • A group policy which restricts access to members of the GP2_Tier2 Workbench group. The GP2 access control team administers this group, and researchers are added to it after they are approved by the GP2 organization.
  • A region policy which ensures cloud resources and apps that access this data are created within the europe-west4 Google Cloud region.
  • A perimeter policy (called gp2-perimeter) that limits data from being copied out of the Tier 2 data collection cloud project and limits access to only those Workbench Workspaces created within gp2-perimeter.

Tier 1 data

As with the Tier 2 data collection, the GP2 Tier 1 data collection contains several releases of GP2 data, versioned by release. Each version consists of a single controlled bucket.

Two policies have been applied to this data collection:

  • A group policy which restricts access to members of the GP2_Tier1 Workbench group. The GP2 access control team administers this group, and researchers are added to it after they are approved by the GP2 organization.
  • A region policy has been applied to this data collection for europe-west4, allowing researchers with Tier 2 access to work with Tier 1 data at the same time.

No perimeter policy is applied to the Tier 1 data collection, as this collection contains only summary data and no participant-level data.

Last Modified: 25 April 2025