Medicine and Machines

As artificial intelligence applications broaden, experts advise physicians to approach them as tools – not threats.

Tools have been invented to aid physicians and improve the care of patients throughout the history of medicine. They have ranged from simple items like disposable bandages and stethoscopes, to more complex devices for medical imaging and defibrillation. Today, artificial intelligence (AI) is poised to become a similarly routine part of clinical care. Rather than worry about AI-based tools’ potential for replacing the human aspect of medicine, Aziz Nazha, MD, Director of Cleveland Clinic’s Center for Clinical Artificial Intelligence, encourages physicians to keep inventions of the past in mind as they consider the  expansion of AI in medicine.

“There is a lot of hype surrounding AI these days,” said Dr. Nazha, who also is a hematologist and oncologist in the Department of Hematology and Medical Oncology at Cleveland Clinic’s Taussig Cancer Institute. “It is important to understand that AI is just a tool like any other.”

The era of “big data” in medicine – ushered in by the widespread adoption of electronic health records – has brought AI closer to the clinic, but as Dr. Nazha and other experts who spoke with ASH Clinical News noted, there is still much work to be done to define where AI can be most useful and how it can be best integrated into patient care. We asked data scientists and clinicians about the potential of AI to improve medical research, physician workflow, and even patient outcomes.

What Is AI?

Defining AI may be as complicated as developing it, according to Pamela S. Becker, MD, PhD, Professor in the Division of Hematology at the University of Washington School of Medicine and a member of the Clinical Research Division at Fred Hutchinson Cancer Research Center in Seattle.

Congress took a stab at defining AI in 2017, with the FUTURE of AI Act, by which legislators sought to address how AI’s continuing development would affect U.S. investment, global competitiveness, workforces, and privacy.1 The legislation included several parameters for defining the concept of AI. Among them, AI includes “any artificial system that performs tasks under varying and unpredictable circumstances, without significant human oversight, or that can learn from experience and improve performance.” These systems could include cognitive architectures and neural networks. (For definitions of these and other AI-related terms, see the SIDEBAR).

“Basically, the concept of AI is broad and can apply to any circumstances where we are using computers to make deductions and draw conclusions that formerly would have been performed by humans,” Dr. Becker explained. “There are many ways in which AI can enhance our abilities, from basic research methods of data collection and predicting drug binding sites to translational research to predict toxicity or define genetic variance that could cause predisposition to illness, and all the way to clinical practice where it could aid in clinical diagnoses.”

That wide scope is why some organizations now define the term AI as augmented intelligence, a combination of human and machine intelligence.

One of the most common forms of AI being used in medicine today is machine learning.

“Machine learning works by giving a machine a lot of data and the right algorithms, so that it will be able to learn from the data without being specifically told what to do,” said Kun-Hsing Yu, MD, PhD, Assistant Professor of Biomedical Informatics at Harvard Medical School.

In supervised machine learning, data are used to develop a training set for the machine, which is then used to develop a machine learning model. The model is then fed new data to evaluate its performance.

Clinical Concepts

“There are several approaches to exploring machine-learning algorithms to see if they can be used for real-world health-care issues,” Dr. Yu said.

Researchers are actively exploring applications for machine learning in pathology and diagnostics. “There are many different types of rare lymphomas, and often centers need to send samples for review by experts at a central laboratory across the country,” Dr. Becker offered as an example. “Many algorithms have been developed to train computers to recognize tumor histology.”

In a recent study from researchers at the University of Texas Health Science Center, “deep learning” with a convolutional neural network algorithm was used to build a diagnostic model that would help classify lymphoma samples into four diagnostic categories: benign lymph node, diffuse large B-cell lymphoma, Burkitt lymphoma, and small lymphocytic lymphoma.2 Researchers trained the model using 1,856 slide images from 32 cases representing each diagnostic category; four sets of five representative images were taken for each case. Another 464 images were used for validation and 240 for testing. The test results had a diagnostic accuracy of 95% for image-by-image prediction and 100% for set-by-set prediction. Although this was a preliminary proof-of-concept trial, the researchers wrote that the model could be incorporated into “future pathology workflow to augment the pathologists’ productivity.”

Dr. Yu and his colleagues also have developed a fully automated system for histopathologic assessment in the area of lung cancer.3 Because human evaluation of pathology slides cannot accurately predict patients’ prognoses, the researchers sought to “teach” their machine learning models to select the features most likely to distinguish shorter-term survivors from longer-term survivors, feeding the model thousands of histopathology whole-slide images of lung adenocarcinoma and squamous cell carcinoma. The machine learning models were able to predict the prognosis of individuals with lung cancer, and Dr. Yu and his team are now looking to apply this technology to other areas, including hematologic cancers, he said.

Researchers also have made efforts to apply machine learning principles to flow cytometry, Dr. Becker said. In 2018, researchers in Taiwan published results of a study testing two AI techniques to develop an algorithm for analyzing multicolor flow cytometry results in the detection of measurable residual disease in patients with acute myeloid leukemia (AML) and myelo-dysplastic syndromes.4 The algorithms were able to classify samples within about 7 seconds, with an accuracy of about 90% in predicting outcome in the postinduction setting. Again, the researchers stated that the algorithm could be “clinically useful in supporting physicians to conduct multicolor flow cytometry interpretation with high efficiency and fidelity.”

AI is being tested in precision medicine approaches, as well. According to Dr. Becker, the most convincing example of machine learning that she has seen is a study in which researchers attempted to use the tool to predict complete remission from gene expression profiles in pediatric patients with AML.5 The algorithms were designed to analyze gene expression patterns obtained through RNA sequencing. The researchers determined which genes were differentially expressed in samples from responders and nonresponders, then tuned the algorithms to select features that would yield the best area under the curve (AUC) score, a measure of a model’s performance. In this study, the higher the AUC, the better the model is at distinguishing between patients with response and no response. The algorithm revealed a “significant underlying genetic difference between patients with contrasting outcomes following treatment … [highlighting] specific biological features that carry prognostic value for further exploration,” the authors wrote.

E-MERGE-ing Uses of AI

Dr. Becker also has been working to apply AI to personalized medicine with her MERGE algorithm, which aims to better match patients to drugs.6 Together with her colleagues at University of Washington’s Paul G. Allen School of Computer Science & Engineering, Su-In Lee, PhD, and Safiye Celik, PhD, Dr. Becker developed MERGE to predict how patients’ tumor cells would respond to 160 chemotherapeutic drugs and determine which genes seemed to confer drug sensitivity or resistance.

The MERGE algorithm determines a gene’s potential to drive cancer progression: expression hubs, mutations, copy number variation, methylation, and known regulators. It then ranks these features and assigns a score, called a MERGE score, to indicate the likeliness that a gene will be positively or negatively associated with drug response. Using publicly available data from patients with AML, researchers used MERGE to successfully identify reliable biomarkers for drug response as well as new gene-drug associations – outperforming several state-of-the-art approaches.

Specifically, their work found that the expression level of the SMARCA4 gene is a marker of sensitivity to topoisomerase II inhibitors, mitoxantrone, and etoposide in AML.

AI is also being incorporated into the field of imaging and radiology, Dr. Nazha noted.

“Scientists are developing aids for radiologists to enhance the accuracy of diagnoses,” he said. “For example, the AI might extract features from radiology reports to predict responses to chemotherapy.”

In a 2018 study, researchers reported results from a project using machine learning to develop an algorithm that would evaluate radiomic features extracted from non-contrast-enhanced CT images of tumors in patients with non-small cell lung cancer.7 The algorithm successfully predicted response to chemotherapy and time to disease progression. The researchers wrote that the signature “had a [higher] overall net benefit in prediction of high-risk patients to receive treatment than the clinicopathologic measurements.”

A more visible example of AI and machine learning is a system developed by Google’s DeepMind and Google Health AI to help clinicians with early detection of breast cancer. Results published in Nature earlier this year showed that the system surpassed the capabilities of human experts in breast cancer prediction.8 The system was trained using data from the U.K. and the U.S., and reduced the incidence of false positives by 1.2% and 5.7% in the two populations, respectively. In a test against six radiologists, the system outperformed the human readers with a better AUC. However, the researchers acknowledged that “a higher benchmark for performance could have been obtained with [human] readers who were more specialized.”

Challenges of AI

Applications of AI in health care hold promise, but living up to that potential will mean overcoming several obstacles.

“The devil is in the details,” Dr. Nazha said. “How do you evaluate a model in general? When do you implement it? How do you work with it? There are many different challenges.”

First, accuracy is not always a good evaluation for a machine learning model, he said. If one were to look at a model designed to predict which patients would come back to the hospital within 30 days of discharge, the model might apply a rate of readmission of 20%, meaning 80% of patients are not being readmitted.

“If my model is saying ‘no, no, no’ all the time, I am 80% accurate, but the model is clinically useless because it does not really help tell me which patient is at high risk of coming back to the hospital,” Dr. Nazha explained.

Often, the right way to evaluate a model is by its AUC, Dr. Nazha said. There are other important matrices for models as well, which sometimes get missed even in publications, making it difficult to ever implement the model clinically.

For example, consider a sepsis model that is designed to alert hospital physicians when a patient is at high risk of developing the blood infection. The model’s precision is 30%, which means that, out of 10 patients that the model is saying are at high risk for sepsis, 3 of those predictions would be correct and 7 would be wrong.

“That is clinically problematic because if I am trying to intervene based on the model’s results, I am harming 7 patients to try to save 3 patients,” Dr. Nazha said.

A deeper understanding of how to evaluate these models is necessary, he said, as is a general agreement on how much physicians are willing to work with the model on each matrix.

Another challenge to implementation of AI is whether the systems provide actionable information. “There are many models out there that look good, but implementing them in clinical practice becomes difficult if the physician isn’t provided with recommendations based on what the model says,” Dr. Nazha noted. “Doctors also need to know that if they follow those recommended actions they are actually changing patient outcomes.”

Another topic often left out of discussions surrounding AI in health care is the concept of bias.

“Machine learning and AI is driven by data, and any biases [regarding] who has access to care, how they are treated, and their representation in our datasets will be reflected in the performance of machine learning and AI algorithms,” said Lee A.D. Cooper, PhD, Director of the Institute for Augmented Intelligence in Medicine at Northwestern University Feinberg School of Medicine. “The risk is that biased algorithms will become entrenched and increase health-care disparities.”

One example of this bias is that minority populations are typically underrepresented in the data used to develop and validate machine learning and AI systems.

“We know from natural history that minority patients may have different risks for developing diseases or that their diseases may manifest differently,” Dr. Cooper said. “If these populations are not well represented in our data, then the algorithms cannot learn how to handle these cases correctly. That leads to a differential performance issue. This will go undetected unless someone makes the effort to evaluate and compare the performance of these algorithms in different populations.”

These issues need to remain in the forefront when AI systems are designed and evaluated, he said.

Redefining Regulation 

Most machine learning models remain in the research space and have not come under the eye of regulators…yet.

There are essentially two approaches to deploying AI-based systems in health-care settings, Dr. Yu explained. One method is to provide decision support that would still require the physician to “be in the loop.” In these cases, an algorithm might make a diagnosis, but a human is still responsible for finalizing the report and for making any clinical decisions.

The other approach is the use of models that, once they reach a certain level of accuracy and have no known biases, act as a self-contained system without human input. In these cases, AI would essentially be treated as a form of medical device, Dr. Yu said.

Regulatory agencies are trying to keep up with the developments in this technology. In April 2019, the FDA published a discussion paper tackling the issue of treating AI- or machine learning–based Software as a Medical Device, or SaMD, with a request for public feedback.9 The paper outlines the agency’s proposed approach for premarket review for AI and machine learning-driven software modifications.

According to the proposed framework, the FDA would use a “predetermined change control plan” in premarket submissions, which would include the types of modifications anticipated with a technology that continuously learns. Rather than requiring new review submissions with each algorithm modification, the proposal advocated for a “total product life cycle” approach.

Dr. Yu said this approach would help address an interesting aspect of AI: the fact that machine learning models evolve over time.

“Scientists may develop a first model with 99.9% accuracy, but a year later, they collect more data and make a revised model,” Dr. Yu said. “The retrained model may have an accuracy of 99.99% but the first model is the one approved by regulatory committees. The challenge now becomes whether the second model with higher accuracy should be automatically approved or have to go through the same process to make sure nothing was wrong with the training process.”

According to Dr. Yu, the FDA is working with a few industry pioneers to see how this approach would work in the real world.

The Human Factor

Pioneers and experts in the field of AI and machine learning are instrumental in the development and application of these AI methods.

“This type of work typically involves an engineering expert working in collaboration with a domain expert, such as a hematologist, to make sure we are attempting to solve the right type of problem,” Dr. Cooper explained. “Hematologists play an important role not only in defining the problem but in designing the experiment. What type of patients should be included? How do you address issues of patients receiving different types of treatment?”

Hematologists and pathologists also have the responsibility of making sure that engineers are using high-quality data and high-quality annotations of clinical images.

“After we build a model and generate results, we work with these experts to see if there are any weaknesses in our model and to figure out additional ways to make our model more robust,” Dr.Yu explained. “They are heavily involved.”

All this is to say that AI experts are not developing these systems with the intention of replacing clinicians.

“There are so many issues that may arise in real-world settings and it is difficult to even know what they will be,” Dr. Yu said. “The goal is not to replace medical doctors but to enhance their current practice of medicine by using computers in some of the ways that computers work best – freeing up clinicians’ time for the more complicated tasks.”

The growth of AI does not mean physicians will be replaced by machines and algorithms, Dr. Yu said. However, MDs who use AI may soon be more highly valued than those who do not.  —By Leah Lawrence

What the Tech?

Cognitive architecture: Loosely defined as the theory of the human mind and its structure, including learning, performance, reasoning, and problem-solving skills, and how they work together to form human intelligence.

Machine learning: A method of data analysis in which computers are trained to recognize certain features that are used to develop algorithms that are then used to find patterns in large amounts of data

Neural networks, or deep learning: A type of machine learning that consists of a large number of simple processing nodes that are densely interconnected and trained to learn specific tasks by considering previously labeled examples


  1. “S.2217 – FUTURE of Artificial Intelligence Act of 2017. 115th Congress (2017-2018).” Accessed April 12, 2020, from
  2. Achi HE, Belousova T, Chen L, et al. Automated diagnosis of lymphoma with digital pathology images using deep learning. Ann Clin Lab Sci. 2019;49:153-160.
  3. Yu KH, Zhang C, Berry GJ, et al. Predicting non-small cell lung cancers prognosis by fully automated microscopic pathology image features. Nat Commun. 2016;7:12474.
  4. Ko BS, Wang YF, Li JL, et al. Clinically validated machine learning algorithm for detecting residual diseases with multicolor flow cytometry analysis in acute myeloid leukemia and myelodysplastic syndrome. EBioMedicine. 2018;37:91-100.
  5. Gal O, Auslander, N, Fan Y, Meerzaman D. Predicting complete remission of acute myeloid leukemia: machine learning applied to gene expression. Cancer Inform. 2019;18:1176935119835544.
  6. Lee SI, Celik S, Logsdon BA, et al. A machine learning approach to integrate big data for precision medicine in acute myeloid leukemia. Nature Comm. 2018;9:42.
  7. Khorrami M, Khunger M, Zagouras A, et al. Combination of peri- and intratumoral radiomic features on baseline CT scans predicts response to chemotherapy in lung adenocarcinoma. Radiol Artif Intell. 2019;1:e180012.
  8. McKinney SM, Sieniek M, Godbole V, et al. International evaluation of an AI system for breast cancer screening. Nature. 2020;577:89-94.
  9. U.S. Food and Drug Administration. Artificial Intelligence and Machine Learning in Software as a Medical Device. Accessed March 11, 2020, from