Algorithm Built on Big Data Improves Prognostic Accuracy Over Existing MDS Models

Using machine learning, researchers were able to develop a prognostic model that predicted survival among patients with myelodysplastic syndromes (MDS) and was more accurate than existing, “gold-standard” prognostic tools, according to a study presented at the 2018 ASH Annual Meeting.

“All treatment guidelines are driven by risk, which means that if we get the risk wrong, we get the treatment wrong,” said lead study author Aziz Nazha, MD, of the Cleveland Clinic. “Improving and personalizing our prognostic models is extremely important and can help patients understand their disease and what they can expect during their journey.”

The survival estimates provided by the International Prognostic Scoring System (IPSS) and IPSS-Revised (IPPS-R) systems “are all over the place … and underestimate the heterogeneity of MDS,” Dr. Nazha said. “So, for us, the question became, ‘Can we build a model that can provide a personalized prediction specific for a given patient?’”

To develop this tool, a collaborative team from Cleveland Clinic and Munich Leukemia Laboratory developed a machine-learning algorithm that used genomic and clinical data to estimate a patient’s prognosis. The system was “trained” in a cohort of 1,471 patients, then validated in a separate data set of 831 patients from the Moffitt Cancer Center. The clinical characteristics in the training and validation cohorts were similar and were representative of a typical MDS cohort, the authors noted.

After running the demographic, clinical, and genomic data of the training cohort through the algorithm, the researchers identified the clinical variables with the greatest prognostic importance, including (from most to least important):

  • cytogenetic risk categories by IPSS-R
  • platelets
  • mutation number
  • hemoglobin
  • bone marrow blasts percentage
  • 2008 World Health Organization diagnosis
  • White blood cell count
  • age
  • absolute neutrophil count
  • presence of one of 11 mutations (g., TP53, RUNX1, STAG2, ASXL1, etc.)

They then built a physician-friendly and patient-friendly web application that allows the user to generate survival probabilities.

In head-to-head comparisons between the new model and IPSS and IPSS-R models, the researchers found that the new model correctly predicted a patient’s likelihood of overall survival 74 percent of the time, while the IPSS and IPSS-R models correctly predicted survival 66 percent and 67 percent of the time, respectively. It also outperformed IPSS and IPSS-R in predicting transformation to acute myeloid leukemia (AML; 81% vs. 73% and 73%, respectively).

Given the prognostic significance of the cytogenetic information, Dr. Nazha said, “We also questioned whether we needed the clinical information for prognosis.” After comparing the geno-clinical model with the predictive accuracy of the mutations-only or mutations-cytogenetics model, “we found that the more clinical characteristics we added to cytogenetic information, the better the accuracy.”

Ultimately, Dr. Nazha concluded, “[Our experience shows that] machine learning and artificial intelligence can open opportunities for us to translate the genomic data into useful clinical tools.”

When asked about the exclusion of comorbidities and other patient factors from this model, Dr. Nazha noted that certain data were not available. He added that the investigators are looking to include other clinical variables in future models that could further improve the prognostic accuracy.

The authors report no relevant financial relationships.


Nazha A, Komrokji RS, Meggendorfer M, et al. A personalized prediction model to risk stratify patients with myelodysplastic syndromes. Abstract #793. Presented at the 2018 ASH Annual Meeting, December 3, 2018; San Diego, CA.