Colorectal cancer, despite being a leading cause of cancer deaths, is also highly preventable through efficient and fast diagnosis and precancerous lesions removal. However, bottlenecks in patient screening schedules prevent proper access to rapid diagnosis and emphasize the urgent need for efficient methods, such as Deep Learning (DL) tools, to support pathologists. Nevertheless, DL models face significant challenges in computational pathology because of the gigapixel image size of whole-slide images and the scarcity of detailed annotated datasets. It is crucial to leverage self-supervised learning (SSL) methods to alleviate the burden and cost of data annotation. However, current research lacks methods to apply SSL frameworks to analyze pathology data effectively.
We introduce a novel Barlow Twins framework, enhanced with an optimized augmentation strategy for pathology data. We leverage the KGH dataset, a private repository of colorectal polyps. We then train a Swin Transformer to exploit its hierarchical structure, effectively capturing the multi-scale nature of pathology images. These innovations improved Accuracy and Area Under the Curve (AUC) on the KGH dataset and PCam dataset, a well-known challenging benchmark for
classifying metastatic cancer in breast cancer patients’ lymph nodes.
Furthermore, we provide meaningful explainability insights into the performance of the different techniques. We propose a practical and impactful approach for integrating deep learning tools into pathologist clinical workflow.
In this thesis, we demonstrate that the proposed model, relatively new to computational pathology, achieves remarkable and explainable results on various cancer types when adapted to the specific pathology task.