Data Sharing Tiers for Broad Sharing of Clinically Derived Data

February 2024

Broad sharing of data from biomedical research, including individual patient-level data, enables replication and validation of studies and is required in many settings. When privacy is respected, broad sharing of biomedical research data is a best practice. The Office of Science and Technology Policy issued a memorandum in 2013 (Increasing Access to the Results of Federally Funded Scientific Research) that directs federal funding agencies to promote the deposit of data in publicly accessible repositories where appropriate and available. Some funders, such as USAID, the Agency for Healthcare Research and Quality and the National Institutes of Health (NIH), require a data sharing plan for sharing data at the end of a project or a justification when no plan is provided. In addition to these funder mandates, many journals, such as Science and the PLOS journals, stipulate that data sharing is a condition for publication. The NIH also issued supplemental information regarding Protecting Privacy when Sharing Human Research Participant Data (NOT-OD-22-213).

In parallel, researchers have an interest in sharing clinical data to enable machine learning and other types of artificial intelligence, in addition to other types of data-intensive research. In these cases, there may be no opportunity for explicit patient consent to share the data with interested researchers as the research has not yet been conducted. 

The Sharing Tiers described below are intended to guide Data Trust decisions regarding sharing of clinical data. 

1.  Public Sharing - Data is deposited in an open repository. The public is able to access data directly from the repository for any purpose without any Data Use Agreement.

  • Requires consent to deposit of data in an open data repository that is publicly accessible.
  • The level of identifiability of the data (e.g. de-identified data, identifiable data) to be made available must be explicit in the consent document.
  • Such sharing requires Data Trust review. Such review will include a review of the risks of re-identification of person and institution

Examples: Johns Hopkins Research Data Repository, Virus Pathogen Research (ViPR)

2.  Mediated Sharing - Data is deposited in a mediated repository which requires a Data Use Agreement or certification as applicable. Other biomedical researchers may request the data from the repository. Real-time human review of the data request is not required.

  • Preference for consent that does not prohibit future use of the data for research with a strong preference that the consent specifically mention future research and deposit in a repository. 
  • Only fully de-identified data as defined by HIPAA may be shared.
  • Repository must be able to accommodate data use limitations required by the IRB.
  • Repository must be able either to accommodate JHU review of data access requests or must have a review process that JHU considers satisfactory.
  • JHU retains the right to remove its data from the registry if the repository is unable to meet its assurances.
  • Data is determined not to be sensitive as locally defined.
  • Number of records shared should be less than 10,000 per year unless approved by the Data Trust.

Example: American Association of Cancer Research GENIE

3.  Mediated Sharing with human review - Data is deposited in a mediated repository. Other biomedical researchers are able to request the data from the repository. This requires a Data Use Agreement or certification as applicable and requires human review.

  • Consent was not obtained or did not prohibit the sharing.
  • Only:
    • data without direct identifiers that qualifies as a limited dataset (LDS), OR
    • fully de-identified data as defined under HIPAA
  • Data requesters must agree to limits on the use of data for biomedical research and not-for-profit use only. Other use may be considered under IRB and Data Trust review.
  • Repository must be able to accommodate data use limitations required by IRB.
  • Repository must be able to accommodate JHU review of data access requests or have a satisfactory review process.
  • JHU retains the right to remove its data from the registry if the repository is unable to meet its assurances.

Examples: dbGaP, Inter-university Consortium for Political and Social Research (ICPSR)

4. Enclave Data Sharing - Researchers Researchers must work with the data in place on a secure platform. Downloading is prohibited. This requires a Data Use Agreement or certification as applicable and requires human review. 

  • Use of the data may be unconsented.
  • Use of a limited dataset or fully de-identified data is permitted.
  • Data requesters must agree to limits on the use of data for biomedical research. 
  •  Repository must be able to accommodate data use limitations.
  • Repository must be able to accommodate JHU review of data access requests or have a satisfactory review process.

Example: National COVID Cohort Collaborative(N3C)

Per HIPAA Safe Harbor definition, de-identified data must exclude 18 identifiers, and “The covered entity does not have actual knowledge that the information could be used alone or in combination with other information to identify an individual.” Because head scans with facial features intact, and certain kinds of genomic data contain features that could enable re-identification, they are considered in the category of Limited Dataset (LDS), which requires the exclusion of direct identifiers and allows for the inclusion of information that has a risk of reidentification. A LDS is still PHI and requires a Data Use Agreement to address these risks. 

As a best practice, any data submitted to a data repository should be in a standard format suitable for the data (e.g. DICOM, OMOP, etc).

The Data Trust may consider exceptions to the above guidance on a case-by-case basis.