When a patient is diagnosed with cancer, one of the most important steps is for pathologists to examine the tumor under a microscope to determine the stage of the cancer and characterize the tumor. This information is central to understanding the clinical prognosis (ie, likely patient outcomes) and determining the most appropriate treatment, such as surgery alone versus surgery plus chemotherapy. In pathology, the development of machine learning (ML) tools to aid microscopic review represents an influential research area with many potential applications.
Previous studies have shown that ML can accurately identify and classify tumors in pathology images and can even predict patient prognosis using famous features of the pathology, such as the degree to which the appearance of the gland deviates from the norm. While these efforts focus on using ML to detect or quantify known features, alternative approaches offer the potential to identify a novel Features: Discovering new features can in turn further improve cancer prognosis and patient treatment decisions by extracting information not yet accounted for in current workflows.
Today, we would like to share the progress we have made over the past few years in identifying new features of colorectal cancer in collaboration with teams from the Medical University of Graz in Austria and the University of Milano-Bicocca in Italy (UNIMIB). Below we will look at several stages of work. (1) training a model to predict prognosis from pathology images without specifying the features to be applied so that it can learn which features are important; (2) probing that predictive model using explainability techniques; and (3) identify a new feature and confirm its association with patient prognosis. We describe this feature and evaluate its use by pathologists in our recently published paper, Pathologist validation of a mechanized feature for colon cancer risk stratification. To our knowledge, this is the first demonstration that medical experts can learn new predictive features from machine learning, a promising start for the future of this deep learning paradigm.
Training a predictive model to learn what features are important
One potential approach to discovering new features is to train ML models to directly predict patient outcomes using only images and paired outcome data. This contrasts with learning models for predicting “intermediate” labels annotated by humans famous pathological features and then use those features to predict outcomes.
Initial work by our team demonstrated the feasibility of training models to directly predict prognosis for a variety of cancer types using the publicly available TCGA database. It was particularly exciting to see that for some types of cancer the model’s predictions were predictive after controlling for existing pathologic and clinical features. Together with colleagues from the Medical University of Graz and the Graz Biobank, we then extended this work using a large cohort of undiagnosed colorectal cancers. Interpreting the predictions of this model became an interesting next step, but general interpretability techniques were difficult to apply in this context and did not provide clear insights.
Interpreting features learned by the model
To examine the features used by the predictive model, we used a second model (to identify the similarity of trained images), a cluster cut patches large pathology images. We then used the prognostic model to calculate the mean ML predicted risk score for each cluster.
One cluster was distinguished by its high mean risk score (associated with poor prognosis) and its distinct visual appearance. Pathologists described the images as high-grade tumor (ie, least similar to normal tissue) adjacent to adipose (adipose) tissue, leading us to name this cluster “tumor adipose feature” (TAF); see the next figure for detailed examples of this feature. Further analysis showed that the relative amount of TAF itself was a high and independent predictor.
|A predictive ML model was developed to predict patient survival directly from unlabeled giga-pixel pathology images. A second image similarity model was used to assemble the cut sections of the pathology images. The prognostic model was used to calculate the mean model-predicted risk score for each cluster. One cluster, termed “tumor adipose feature” (TAF), was distinguished by its high median risk score (associated with poor survival) and distinct visual appearance. Pathologists were trained to identify TAF, and pathologist’s assessment of TAF was predictive.|
|LeftH&E pathology slide overlaid with a heat map showing tumor fat feature (TAF) locations. Regions highlighted in red/orange are considered by the image similarity model to be more likely TAF compared to regions highlighted in green/blue or not highlighted at all. CorrectA representative set of TAF patches in several cases.|
Validating that the feature learned by the model can be used by pathologists
These studies provided a compelling example of the potential of ML models to predict patient outcomes and a methodological approach to gain insights into model predictions. However, intriguing questions remained as to whether pathologists could learn and evaluate a feature identified by the model while retaining demonstrable predictive value.
In our latest article, we collaborated with UNIMIB pathologists to explore these questions. Using examples of TAF images from a previous publication to learn and understand this feature of interest, UNIMIB pathologists developed TAF evaluation guidelines. If TAF was not seen, the case was graded as “absent,” and if TAF was seen, the categories “unilateral,” “multiple,” and “diffuse” were used to indicate relative quantity. Our study demonstrated that pathologists could reproducibly identify ML-derived TAF and that their score for TAF provided statistically significant predictive value on an independent retrospective database. To our knowledge, this is the first demonstration of pathologists learning to identify and evaluate a specific pathology feature originally identified by an ML-based approach.
Putting things into context. learning from deep learning as a paradigm
Our work is an example of people learning from “deep learning”. In traditional ML, models learn from hand-crafted features informed by existing domain knowledge. Recently, in the era of deep learning, the combination of large-scale model architectures, computations, and datasets has made it possible to learn directly from raw data, but often at the expense of human interpretability. Our work combines the use of deep learning to predict patient outcomes with interpretable methods to extract new knowledge that can be applied by pathologists. We see this process as a natural next step in the evolution of applying ML to problems in medicine and science, moving from using ML to distill existing human knowledge to humans using ML as a knowledge discovery tool.
|Traditional ML focuses on engineering features of raw data using existing human knowledge. Deep learning enables models to learn features directly from raw data at the expense of human interpretability. Combining deep learning with interpretability methods makes it possible to expand the boundaries of scientific knowledge by learning through deep learning.|
This work would not have been possible without co-authors Vincenzo L’Imperio, Markus Plass, Haimo Müller, Niccolò Tamini, Luca Gianotti, Nicola Zucchini, Robert Reichs, Greg S. Corrado, Dale R. Webster, Lily H. Peng Efforts, Po-Hsuan Cameron Chen, Marialuiza Lavitrano, David F. Steiner, Kurt Zatlukal, Fabio Pagni. We also appreciate the support of the Verily Life Sciences and Google Health Pathology teams, particularly Timo Kohlberger, Yunnan Cai, Hongwu Wang, Kunal Nagpal, Craig Mermel, Tricia Brown, Isabelle Flament-Ovin, and Angela Lin. We also appreciate manuscript feedback from Akinori Mitani, Rory Sayres, and Michael Howell, as well as illustration assistance from Abi Jones. This work would also not have been possible without the support of Christian Guell, Andreas Holzinger, Robert Reichs, Farah Nader, the Graz Biobank, the efforts of the slide digitization team at the Medical University of Graz, the pathologists who reviewed and noted: cases during model development, and UNIMIB team technicians.