Johns Hopkins Medicine will take part in a consortium of biomedical and behavioral research scientists across the U.S. to generate “artificial intelligence (AI)-ready” data sets that are ethically sourced. Pending the availability of funds each year, the consortium, Artificial Intelligence Ready and Equitable Atlas for Diabetes Insights (AI-READI), will be supported by approximately $30 million over the next four years from the National Institutes of Health (NIH) Bridge to Artificial Intelligence (Bridge2AI) Common Fund.
“We’re not aiming to solve just one question about human health,” says Tin Yan Alvin Liu, M.D., assistant professor of ophthalmology at the Johns Hopkins University School of Medicine and the Wilmer Eye Institute. “We’re designing a system to ethically collect and generate a data set with various types of information that will be useful for many generations of scientists who specialize in using machine learning to solve challenging issues in human health.”
The AI-READI consortium, led by the University of Washington, will collect health information from people with diabetes, but Liu points to the fields of ophthalmology and radiology as some of the first to integrate AI technologies in standard clinical practice. For example, Liu says, the first fully autonomous AI system approved by the Food and Drug Administration in a medical field is used to screen for diabetic retinopathy, a common condition that damages blood vessels in the retina in patients with diabetes.
“When we refer to precision medicine, we often think about advances in genetics research. However, innovations in AI tools that can predict health conditions and outcomes are equally important,” says Liu, the founding director of the Wilmer Precision Ophthalmology Center of Excellence, which is part of the Johns Hopkins inHealth precision medicine initiative.
The key to improving such tools, says Liu, is high quality data. “The reliability of AI-based predictions depends on the quality of data used to train computer systems that analyze the data,” says Liu, who co-leads the AI-READI consortium’s ethics section with the Wilmer Eye Institute’s Megan Collins, M.D., M.P.H., and the Berman Institute of Bioethics’ Kadija Ferryman, Ph.D. “For example, ideally, such a data set will draw from subjects from diverse ethnic and socioeconomic backgrounds.”
To reach a wide array of people who can provide health information for the project, the research teams are building resources to engage with local communities, input from which can help modify project plans and the data collection process.
“Oftentimes, when work like this is undertaken, ethical issues are addressed at the back end, after the data have been collected and are being used in research and development,” says Debra Mathews, Ph.D., M.A., the Berman Institute of Bioethics’ assistant director for science programs. “That’s when, for example, it is discovered that the data set is not representative or the data are biased, such that not everyone will benefit equally from the research. Many ethical issues, including those raised by artificial intelligence enabled technologies, are much easier to address much earlier in the process.”
Mathews and Ruth Faden, Ph.D., M.P.H., the Philip Franklin Wagley Professor of Biomedical Ethics, are co-investigators in the AI-READI consortium.
The Johns Hopkins research team is leading needs assessments to evaluate what scientists know and don’t know about perceptions in the community of AI and data collection. They will be followed by development of ethics consultation and education for consortium collaborators in other groups.
“Incorporating ethics as a key part of developing this data set from the outset will help ensure that the downstream research based upon it has the best possible chance of producing benefits that mitigate rather than exacerbate health inequalities,” says Mathews.
The four-year project will likely generate huge amounts of data, and Christopher Chute, M.D., Dr.P.H., M.P.H., Bloomberg Distinguished Professor of Health Informatics at the Johns Hopkins University School of Medicine, is tackling standardization of the data.
Chute is co-leading a group in the AI-READI consortium that is focused on data standards. The group aims to create a set of standards and guidelines to help ensure that the data’s format is comparable and consistent.
“To build a robust repository for future research, scientists need to be able to understand and analyze the data at scale and integrate it with other data sets seamlessly,” says Chute.
It comes down to semantics and syntax, Chute adds — which, in scientific terms, is called concept representation. To process immense amounts of data, it needs to be easily evaluated by computers, he says. For example, kidney and renal cancers are the same condition, but a computer doesn’t initially know this.
While AI technology may be able to sort through some data inconsistencies, there is error associated with that, says Chute, who is leading another project that has collected some 18 billion rows of data in a national sampling of patients with COVID-19.
To standardize the data, Chute will work with researchers to determine the consortium’s guidelines on common data models that use international standards for clinical terms. For data that is already being collected, his group will help researchers transform it into comparable and consistent data sets.
The researchers say it’s a step in the right direction toward standardizing how health information is collected with the right attention to ethics and data quality.
“We hope this project will also broaden future opportunities for data collection and AI technology,” says Chute.