When studying for a doctoral degree (PhD), candidates submit a thesis that provides a critical review of the current state of knowledge of the thesis subject as well as the student’s own contributions to the subject. The distinguishing criterion of doctoral graduate research is a significant and original contribution to knowledge.
Once accepted, the candidate presents the thesis orally. This oral exam is open to the public.
Abstract
Visual processing is a core aspect of modern computer vision, supporting applications such as object recognition, tracking, and 3D reconstruction. However, challenges like occlusions, appearance variations, and inconsistencies between RGB and depth data continue to hinder reliable scene interpretation. Traditional feature descriptors often lack adaptability in complex environments, and depth maps acquired from low-cost sensors typically exhibit low resolution, noise, and missing values. These limitations obstruct the accurate extraction of spatial and contextual information required for robust scene analysis. To address these issues, the overall objective of this thesis is to advance visual processing through adaptive, feature-driven techniques that leverage the spatial and contextual richness of color and depth data, thereby contributing to more reliable interpretation of complex visual scenes. Aligned with this objective, the thesis is structured around two main parts aimed at achieving more accurate and context-aware scene understanding.
The first part of the thesis is focused on the design of two advanced feature descriptors and an object tracking scheme. The first descriptor, r-spatiogram, captures spatial, color, and texture information within image regions to provide a detail-preserving, context-aware representation. The second descriptor, adaptive multi-scale (AMS), improves adaptability by employing strategies suited to diverse visual environments. In continuation of this part, a novel object tracking framework is introduced that utilizes color and depth information to address common challenges such as occlusions and target-background similarity. This framework initially operates independently and is then extended by incorporating each proposed descriptor to evaluate their impact on tracking accuracy and robustness in dynamic environments.
The second part of the thesis is concerned with a novel depth upsampling scheme that improves the quality of low-resolution depth maps. It employs a joint local–nonlocal framework guided by an adaptive bandwidth mechanism that dynamically adjusts the influence of neighboring pixels. A distance-based patch similarity map is introduced to support this adaptation. Two similarity strategies are explored, one using a standard metric, and the other incorporating the AMS descriptor for capturing more complex structural relationships.
Extensive experiments are conducted on multiple benchmark datasets to validate the effectiveness of the proposed schemes.