Skip to main content
Thesis defences

PhD Oral Exam - Zerui Wang, Electrical and Computer Engineering

Explainable AI Process, Algorithms and Service Architecture for Cloud and Open-Source AI Vision Models


Date & time
Monday, October 6, 2025
1 p.m. – 4 p.m.
Cost

This event is free

Organization

School of Graduate Studies

Contact

Dolly Grewal

Where

Engineering, Computer Science and Visual Arts Integrated Complex
1515 Ste-Catherine St. W.
Room 05.251

Accessible location

Yes - See details

When studying for a doctoral degree (PhD), candidates submit a thesis that provides a critical review of the current state of knowledge of the thesis subject as well as the student’s own contributions to the subject. The distinguishing criterion of doctoral graduate research is a significant and original contribution to knowledge.

Once accepted, the candidate presents the thesis orally. This oral exam is open to the public.

Abstract

This thesis develops and validates a framework for Explainable Artificial Intelligence (XAI) to produce interpretable, robust, and reproducible AI services. The research addresses key limitations in the field, including the architectural fragmentation of XAI tools, the opacity of proprietary cloud AI services, the vulnerability of explanations to adversarial attacks, and the specific challenges of spatiotemporal data analysis.

The contributions of this research are as follows: First, a methodological foundation is established through an analysis of the XAI landscape, resulting in a taxonomy of explanation techniques and a formalized XAI process. This work introduces quantitative metrics to evaluate the consistency of explanations across different methods and the stability of explanations from a single method, providing a data-driven assessment protocol.

Second, a modular, OpenAPI-compliant microservice architecture is designed to implement these principles. This architecture includes: (1) XAIport, a service for integrating XAI into the Machine Learning Operations (MLOps) lifecycle to support early and reproducible explainability. (2) XAIpipeline, a service that automates and orchestrates multi-step XAI workflows for both cloud and open-source models.

Third, to explain proprietary closed box models, a cloud-agnostic service is developed. It interacts with commercial cloud AI services via a unified API to deliver model-agnostic feature explanations. This enables systematic auditing by evaluating attributes such as explanation consistency, attribution provenance, and service quality.

For open-source models, an assessment framework is created to systematically evaluate five quality attributes: computational cost, predictive performance, adversarial robustness, explanation deviation, and explanation resilience. This analysis demonstrates the vulnerability of XAI explanations themselves to adversarial perturbations when applied to a range of computer vision and tabular models.

For video analysis, this thesis introduces STAA (Spatio-Temporal Attention Attribution), an XAI method for video transformer models. It functions by directly extracting and aggregating attention weights from internal model layers in a single forward pass to generate joint spatio-temporal explanations. It is demonstrated to outperform existing methods in both faithfulness and monotonicity while reducing explanation latency. Additionally, this work develops a novel adversarial attack framework that explicitly targets the spatio-temporal attention mechanisms unique to video transformers. Unlike traditional frame-wise perturbations that treat videos as independent image sequences, our method jointly perturbs both spatial and temporal features by exploiting the self-attention vulnerabilities in transformer architectures. The attack achieves a high Attack Success Rate (ASR) on the Kinetics-400 dataset, significantly outperforming existing approaches, including frame-wise attacks, sparse attacks, and the V-BAD method. We quantify the attack's impact using spatial similarity metrics and temporal coherence degradation measures, revealing systematic disruption of attention patterns across both dimensions.

In summary, the contributions form an integrated framework of methodologies and deployable tools for building, explaining, and validating AI systems, with particular emphasis on understanding and securing video transformer models against adversarial threats.

Back to top

© Concordia University