This year, one in every twenty Americans will walk into a medical clinic and receive the wrong diagnosis.
That’s more than 10 million people, and for half of them, the misdiagnosis could be harmful, a 2014 study in the British Medical Journal concluded. Doctors try to be systematic when identifying illness and disease, but they’re only human. Bias creeps in. Alternatives are overlooked.
With a paper published today in Nature Medicine, a group of researchers from universities in the U.S. and China tested a potential remedy: artificial intelligence.
The researchers built a system that automatically diagnoses common childhood conditions — from influenza to meningitis — after reading a patient’s symptoms, medical history, lab results and other clinical data. The system was highly accurate, and one day could assist doctors in diagnosing patients with complex or rare conditions, the researchers said.
“In some situations, physicians cannot consider all the possibilities,” said Kang Zhang, a professor of ophthalmology, genetics and nanoengineering at the University of California, San Diego, and one of the authors of the paper. “This system can spot check and make sure the physician didn’t miss anything.”
The system relies on a neural network, a breed of artificial intelligence that can learn tasks by analyzing vast amounts of data. In this case, it analyzed electronic health records from more than 1.3 million patient visits to a pediatric hospital in China, learning to associate common medical conditions with specific patient information gathered by doctors, nurses and other technicians.
The same "deep learning" technology underpins everything from Google Translate (a smartphone app that translates between languages) to Amazon Alexa (a coffee-table gadget that recognizes voice commands from across the living room). After reshaping internet services, consumer devices and driverless cars in the early part of the decade, the technique is now moving into health care.
Many organizations, including Google, are developing and testing systems that analyze electronic health records, in an effort to flag potential medical conditions such as osteoporosis, diabetes, hypertension and heart failure. Similar technologies are being built to automatically detect signs of illness and disease in X-rays, MRIs and eye scans.
Able to recognize patterns in data that humans could never identify on their own, neural networks can be enormously powerful in the right situation. But even experts have difficulty understanding why such networks make particular decisions. As a result, extensive testing is needed to reassure both doctors and patients that these systems are reliable.
“These A.I. tools have a real chance to make a difference,” said Eric Topol, a cardiologist and geneticist with Scripps Research in San Diego, and the author of a forthcoming book on the use of machine learning in health care, who was not involved in the research. “But it is going to take a while.”
Using neural-network technology, Dr. Zhang has built systems that can analyze eye scans for hemorrhages, lesions and other signs of diabetic blindness. Ideally, such systems would serve as a first line of defense, screening patients and pinpointing those who need further attention.
Dr. Zhang and his colleagues now have created a system that can diagnose an even wider range of conditions, by recognizing patterns in text, not just in medical images. The new system analyzed the electronic medical records of nearly 600,000 patients at the Guangzhou Women and Children’s Medical Center, a hospital in southern China.
First, a group of trained physicians annotated the Guangzhou records, adding labels that identified information related to certain medical conditions. The system then analyzed the labeled data. When it was done, and was presented with new data — a patient’s symptoms determined during a physical examination — it was able to make connections on its own.
When tested on unlabeled data, the system could rival the performance of experienced physicians. It was more than 90 percent accurate at diagnosing asthma; the accuracy of physicians in the study ranged from 80 to 94 percent. In diagnosing gastrointestinal disease, the system was 87 percent accurate, compared to the physicians’s accuracy of 82 to 90 percent.
Experts said extensive clinical trials are now needed, particularly given the difficulty of interpreting decisions made by neural networks.
“Medicine is a slow-moving field,” said Ben Shickel, a researcher at the University of Florida who specializes in the use of deep learning for health care. “No one is just going to deploy one of these techniques without rigorous testing that shows exactly what is going on.”
It could be years before deep-learning systems are deployed in emergency rooms and clinics. But some are closer to real-world use. Google is now running clinical trials of its eye-scan system at two hospitals in southern India.
Deep-learning diagnostic tools are more likely to flourish in countries outside the U.S., Dr. Zhang said. Automated screening systems may be particularly useful in places where doctors are scarce.
The system built by Dr. Zhang and his colleagues benefited from the enormousness of the data set gathered from the hospital in Guangzhou. Similar data sets from U.S. hospitals are typically smaller, both because the average American hospital is smaller, and because regulations make it difficult to pool data from multiple facilities. Moreover, privacy norms are less stringent in China than in the U.S. and Europe.
“China does not have nearly the same obstacles,” Dr. Topol said.
Dr. Zhang said he and his colleagues were careful to protect patient privacy when gathering data for their new system. But he acknowledged that researchers in China may have an advantage when it comes to collecting and analyzing this kind of data. “The sheer size of the population — the sheer size of the data — is a big difference,” he said.