When studying for a doctoral degree (PhD), candidates submit a thesis that provides a critical review of the current state of knowledge of the thesis subject as well as the student’s own contributions to the subject. The distinguishing criterion of doctoral graduate research is a significant and original contribution to knowledge.
Once accepted, the candidate presents the thesis orally. This oral exam is open to the public.
Abstract
Automatic heart sound classification is a longstanding and challenging problem in the research field of biomedical engineering due to the fact that heart sound is very complex and highly nonstationary. The most machine learning-based heart sound classification methods achieve limited accuracy and are primarily developed using a segmentation-based approach. Since they depend on single-domain feature information and also tend to focus equally on each part of the signal, rather than employing a selective attention mechanism. Additionally, the existing studies rely on single-stream architectures, overlooking the advantages of multi-resolution features. Apart from this, these methods rely solely on early, late, or intermediate fusion, which often fail to fully integrate the diverse and multiscale features required for robust classification. The proposed framework addresses these problems through three core components of deep learning architecture design: the selection of diverse feature representations combined through an effective feature fusion strategy, a learning mechanism that leverages multi-resolution feature fusion, and the utilization of multi-scale representations. In this thesis, three novel deep learning architectures and methods are developed that do not require a segmentation approach to deal with the existing limitations in the field of heart sound classification.
First, we propose a novel multimodal attention convolutional neural network (MACNN) that incorporates a feature-level fusion strategy. The architecture consists of three parallel branches, each employing low-complexity convolutional neural networks combined with attention mechanisms to process different feature domains. This design enhances feature diversity and enables more effective multi-feature fusion.
Second, we designed a novel attention fusion-based two-stream vision transformer (AFTViT) architecture that captures long-range dependencies and diverse contextual information at multiple scales. A novel attention block is then used to integrate cross-context features at the feature level, enhancing the overall feature representation.
Third, a novel hierarchical multiscale swin (HM-Swin) transformer network with attention fusion is developed. The proposed model benefited from extracting hierarchical multiscale features from swin transformer backbone, which are further fused using a dynamic fusion block. This approach ensures the effective capture of interdependencies.
Comprehensive experiments are conducted on publicly available datasets to demonstrate the efficiency of MACNN, AFTViT and HM-Swin architectures. The results show that these methods consistently outperform existing state-of-the-art approaches in terms of classification accuracy while maintaining a minimal number of training parameters.