Levels of Identification of Data
February 2025
Acronyms
LDS – Limited Dataset. See footnote 1 for definition.
PHI – Protected Health Information PHI>LDS - Protected Health Information greater than a Limited Dataset
PII – Personally Identifiable Information. See policy.
Text PHI – Text Protected Health Information
Note: See Genomic Data Risk Tiers for information on the risk of re-identification for different types of genomic data, and whether they should be treated as PHI, LDS, or data with no PHI or PII.
Description |
|
Text PHI | Data that includes the text of clinical notes or reports. Text PHI has not been processed to remove any identifiers. Exception: An algorithmically detected named entity recognition of four words or less is not Text PHI. Note: Abstraction of categorical variables does not constitute Text PHI. |
PHI>LDS |
Unlike a LDS, the label PHI>LDS refers to PHI that may include direct (also called “facial”) identifiers:
Note that this category includes retinal and iris scans and other biometric identifiers. |
Limited Dataset (LDS) | PHI that has removed direct identifiers (like name and MRN) but may include dates, zipcodes, and ages older than 89 or more granular than a year.i Note that this includes volumetric head scans with facial features intact, provided DICOM metadata has been reduced to a limited dataset. |
PII but no PHI | PII is identifiable data that contains no PHI. See policy. |
No PHI or PII | Person-level data that contains no PHI or PII, ie de-identified data.ii JHM certified Honest Broker (CCDA - Core for Clinical Research Data Acquisition) must provide oversight when JHM data are to be considered de-identified. |
Aggregate | Counts and summary level statistics (e.g. mean, median, standard deviation, etc.). Information is reflective of the dataset as a whole and is not associated with a single individual. |
Footnotes:
1. Limited Dataset Definition (from IRB Definition of a Limited Dataset)
A “limited data set” is a limited set of identifiable patient information as defined in the Privacy Regulations issued under the Health Insurance Portability and Accountability Act, better known as “HIPAA”. A “limited data set” of information may be disclosed to an outside party without a patient’s authorization if certain conditions are met. First, the purpose of the disclosure may only be for research, public health or health care operations. Second, the person receiving the information must sign a data use agreement with Hopkins. This agreement has specific requirements which are discussed below.
A “limited data set” is information from which “facial” identifiers have been removed. Specifically, as it relates to the individual or his or her relatives, employers or household members, all the following identifiers must be removed in order for health information to be a “limited data set”:
-
names;
-
street addresses (other than town, city, state and zip code);
-
telephone numbers;
-
fax numbers;
-
e-mail addresses;
-
Social Security numbers;
-
medical records numbers;
-
health plan beneficiary numbers;
-
account numbers;
-
certificate license numbers;
-
vehicle identifiers and serial numbers, including license plates;
-
device identifiers and serial numbers;
-
URLs;
-
IP address numbers;
-
biometric identifiers (including finger and voice prints); and
-
full face photos (or comparable images).
The health information that may remain in the information disclosed includes:
- dates such as admission, discharge, service, DOB, DOD;
- city, state, five digit or more zip code; and
- ages in years, months or days or hours.
It is important to note that this information is still protected health information or “PHI” under HIPAA. It is not de-identified information and is still subject to the requirements of the Privacy Regulations.
2. De-identified Data Definition (from IRB Definition of De-identified data)
Identifiers That Must Be Removed to Make Health Information De-Identified
(i) The following identifiers of the individual or of relatives, employers or household members of the individual must be removed:
(A) Names;
(B) All geographic subdivisions smaller than a State, including street address, city, county, precinct, zip code, and their equivalent geocodes, except for the initial three digits of a zip code if, according to the current publicly available data from the Bureau of the Census:
(1) The geographic unit formed by combining all zip codes with the same three initial digits contains more than 20,000 people; and
(2) The initial three digits of a zip code for all such geographic units containing 20,000 or fewer people is changed to 000.
(C) All elements of dates (except year) for dates directly related to an individual, including birth date, admission date, discharge date, date of death; and all ages over 89 and all elements of dates (including year) indicative of such age, except that such ages and elements may be aggregated into a single category of age 90 or older;
(D) Telephone numbers;
(E) Fax numbers;
(F) Electronic mail addresses;
(G) Social security numbers;
(H) Medical record numbers;
(I) Health plan beneficiary numbers;
(J) Account numbers;
(K) Certificate/license numbers;
(L) Vehicle identifiers and serial numbers, including license plate numbers;
(M) Device identifiers and serial numbers;
(N) Web Universal Resource Locators (URLs);
(O) Internet Protocol (IP) address numbers;
(P) Biometric identifiers, including finger and voice prints;
(Q) Full face photographic images and any comparable images; and
(R) Any other unique identifying number, characteristic, or code; and
(ii) The covered entity does not have actual knowledge that the information could be used alone or in combination with other information to identify an individual who is a subject of the information.