A, C, G and T: These four letters strung together over and over in various combinations, up to about 3 billion letters long, represent the genetic code that provides molecular instructions for each cell in the body. Essentially, the A-C-G-T code is a programming language, and various tweaks and alterations in the code — some changes are only to a single letter, and others are more extensive — make up the differences among individuals and between disease and healthy states.
Variations in human genetic code control not only our hair and eye color, but just about every facet of biology — for example, differences in how the immune system responds to infection.
To solve the mysteries of human biology and develop new treatments and diagnostic tools for disease, scientists rely on large databases that catalog the genomic code of many thousands of people who donate blood or saliva samples containing their individual, unique genetic material.
One such biobank, widely used by researchers, contains genetic material from 500,000 participants and is housed in the United Kingdom.
On the other side of the Atlantic Ocean, the All of Us research program, funded by the National Institutes of Health, is building a similar scientific resource. The program aims to enroll at least 1 million volunteers from diverse backgrounds in the United States who consent to share their health data and blood or saliva samples for genomic analysis.
“A goal of the program is to understand the vast genomic diversity among individuals in the U.S.,” says Kimberly Doheny, Ph.D., associate professor of genetic medicine at the Johns Hopkins University School of Medicine, and co-principal investigator for the Baylor-Hopkins Clinical Genome Center for the All of Us program.
An essential part of understanding the full scope of human genetic diversity, says Doheny, is ensuring that scientific research includes an ethnically and socially diverse population. Research has shown that the effectiveness of medicines and treatments vary among ethnicities, as well as from person to person.
The All of Us research program has released an initial, huge trove of genomics data to scientists.
To date, some 329,000 participants have provided blood or saliva samples, from which DNA is extracted. Half of the participants identified as racially or ethnically underrepresented in biomedical research. The participants also submitted health information through surveys or may have shared their electronic health records.
The extracted DNA from each person is processed so that every chemical “letter,” or base pair, is cataloged in what’s called a whole genome sequence — all 3 billion base pairs per person. Several research institutions — the Baylor-Hopkins Clinical Genome Center, the Northwest Genomics Center at the University of Washington, and the Broad Institute of the Massachusetts Institute of Technology and Harvard University — are working together to compile this whole genome sequencing data.
As part of the Baylor-Hopkins Clinical Genome Center, Doheny’s team at Johns Hopkins provided genotyping array data, which focuses on particular locations in the genome.
The array data, one-third of which was analyzed at Johns Hopkins, provides an important measure of quality control, ensuring that sequencing data from each whole genome is correctly tied to each individual. Array data is accessible to researchers alongside whole genome sequencing information.
The array data has already been used to provide participants with the program’s first genetic results. Through a secure web portal, participants may choose to receive information on their genetic ancestry and non-health related traits, such as a person’s likely preference for cilantro and whether they have sticky ear wax.
In the future, the program also will offer health-related genetic results to participants including information about hereditary disease risk and how they may respond to certain medications based on their genes.
Access to the first set of genomic data from All of Us is now available to researchers — including those at Johns Hopkins — with many privacy and security provisions, such as a registration process for each scientist and the scientist’s organization. In addition, direct participant identifiers have been removed, and scientists may not copy data from the All of Us research portal and store it on their computer devices.
“Broadly accessible datasets, such as All of Us’, are very important to the research community,” says Doheny. “It’s a rich resource for scientists not only for ongoing and new studies of health and disease, but for educating new generations of genetics researchers.”