Machine Learning Detects Prognostic Associations Between Morphologic and Genetic Changes in MDS

A new artificial intelligence–based technique was able to establish correlations between genetic and morphologic biomarkers, revealing diagnostically and prognostically important relationships, according to results published in Blood.

“The tremendous complexity of morphologic and genetic changes imposes challenges to studies endeavoring to establish correlations among them,” the authors, led by Yasunobu Nagata, MD, PhD, from Cleveland Clinic’s Taussig Cancer Institute in Ohio, wrote. “Indeed, the extent to which diverse genetic and epigenetic alterations share phenotypes is unresolved; their successful integration may offer a new avenue to improve diagnosis and prognosis of myelodysplastic syndromes (MDS).”

To overcome the limitations of available diagnostic strategies – such as user subjectivity and labor intensity – Dr. Nagata and investigators devised a machine learning technique to identify patterns of co-occurrence between morphologic features and genomic events.

Researchers sequenced 1,079 patients with MDS or MDS/myeloproliferative neoplasm overlap, including low- and high-risk subtypes. An independent pathologist who was blinded to mutational status then evaluated bone marrow morphologic features. All cases were separated into two risk groups according to International Prognostic Scoring System-Revised score (low-risk ≤3.5 and high-risk >3.5), each of which was randomly divided into discovery and validation groups at a 3:1 ratio.

Thirty-three genes were examined by next-generation sequencing, focusing on mutations that were present in >10% of cells. In total, 1,929 somatic mutations were identified. The most frequently mutated genes were: TET2 (20%), ASXL1 (17%), SF3B1 (13%), SRSF2 (11%), DNMT3A (11%), and RUNX1 (10%).

The investigators found correlations between 11 morphologic and clinical features and the 33 mutated genes. To reduce the number of possible associations, they next devised strategies that sequentially examined recurrent genotype-phenotype relationships.

Analyses yielded 52 morphology/genotype associations. For example:

  • myeloid dysplasia associated with STAG2, NRAS, SRSF2, TP53, and TET2 mutations, but was less common with SF3B1 mutations
  • neutropenia was more frequent in patients with IDH1 mutations
  • thrombocytopenia was associated with TP53 mutations, but was less common with JAK2, SF3B1, and BCORL1 mutations

Further univariate hypothesis testing identified significant pairwise associations among several morphologic and mutation features, warranting further interrogation of integrative subtypes, the authors reported.

The machine learning–based technique then revealed that these features describe only five distinct morphologic profiles. More than one-third of patients with high-risk subtypes (n=283; 34%) were clustered into profile 1. The remaining profiles comprised patients with lower-risk subtypes:

  • profile 2 was characterized by trilineage dysplasia and pancytopenia (n=138; 17%)
  • profile 3 was characterized by trilineage dysplasia, bilineage cytopenia, and monocytosis (n=218; 17%)
  • profile 4 had bilineage dysplasia, unilineage cytopenia (anemia), and elevated megakaryocytes (n=130; 16%)
  • profile 5 had erythroid dysplasia occasionally arising with anemia (n=66; 8%)

The authors also found that these profiles could distinguish patients with better survival and with different prognoses. Patients with profile 5 had better overall survival than those with profiles 2, 3, and 4.

Patients with lower-risk MDS were further classified into eight genetic signatures (e.g., signature A had TET2 mutations, signature B had both TET2 and SRSF2 mutations, and signature G had SF3B1 mutations), demonstrating association with specific morphologic profiles. The investigators then confirmed 6 of the morphologic profiles/genetic signature associations in the validation group.

“Our study demonstrates that despite the tremendous morphologic diversity of MDS, nonrandom or even pathognomonic relationships between the MDS phenotype and genotype can be identified,” the researchers concluded. Ultimately, such machine learning techniques could supplant molecular testing and, “that will produce classifications that better reflect underlying true biological subgroupings of these MDS disease entities.”

The authors report no relevant conflicts of interest.


Nagata Y, Zhao R, Awada H, et al. Machine learning demonstrates that somatic mutations imprint invariant morphologic features in myelodysplastic syndromes. Blood. 2020;136(20):2249-2262.