Thesis defences

PhD Oral Exam - Mohamed Elsayed Awad Mohamed, Electrical and Computer Engineering

Enhanced Video Tracking Based on Fusion of Visible and Infrared Images

Date & time

Monday, September 8, 2025
1 p.m. – 4 p.m.

Cost

This event is free

Organization

School of Graduate Studies

When studying for a doctoral degree (PhD), candidates submit a thesis that provides a critical review of the current state of knowledge of the thesis subject as well as the student’s own contributions to the subject. The distinguishing criterion of doctoral graduate research is a significant and original contribution to knowledge.

Once accepted, the candidate presents the thesis orally. This oral exam is open to the public.

Abstract

Video tracking is the process of automated identification, localization, and continuous monitoring of objects of interest throughout consecutive video frames. Video tracking is core of many cutting-edge vision applications such as surveillance systems, autonomous vehicles, augmented reality, robotics, and human-computer interaction. However, reliance solely on visible (RGB) imagery introduces significant challenges, including poor visibility, low illumination, occlusion, and appearance variations. To overcome these challenges, fusion of RGB with thermal infrared (TIR) data has been explored to leverage complementary modality information for improved tracking performance under challenging conditions.

Many existing RGB-Thermal (RGB-T) trackers use deep learning (DL) methods for strong object feature representation. However, despite their superior performance, these tracking methods mostly rely on dual-branch architectures, complex fusion modules, or external teacher-student frameworks, leading to increased model size and significant training overhead. To address this problem, this thesis proposes unified RGB-T tracking schemes that enhance conventional RGB trackers without altering their network architectures or significantly increasing the computational complexity.

In the first part of the thesis, a novel pixel-level fusion network, symmetric bidirectional dynamic fusion (SBiDF), is introduced. SBiDF enhances RGB inputs by dynamically integrating TIR data at the pixel level prior to tracking, utilizing modality-specific autoencoders, dynamic convolutional filtering (DCF) blocks, and an output fusion module. The DCF blocks perform adaptive, bidirectional, content-aware enhancement, enabling balanced cross-modal refinement. Importantly, SBiDF generalizes effectively beyond TIR to additional modalities such as depth and event data, providing superior tracking accuracy and broad applicability without modifying the tracker architecture.

The second part of the thesis presents a novel learning-based framework, multi-level self-distillation (MSD), adapting a single-stream RGB tracker to the RGB-T setting through advanced training strategies rather than architectural changes. MSD integrates RGB and TIR data via a shared backbone guided by self-supervised contrastive and modality-gap alignment losses alongside supervised focal and modality-specific losses.

Extensive evaluations are performed to demonstrate that SBiDF and MSD provide performance superior to that of state-of-the-art tracking methods in terms of robust accuracy, simplified implementation, and enhanced computational efficiency, making them highly practical for real-world applications.

Events