Thesis defences

PhD Oral Exam - Mossad Helali, Computer Science

Automation of Data Science Workflows via Knowledge Graphs and Table Representation Learning

Date & time

Thursday, November 27, 2025
10 a.m. – 1 p.m.

Cost

This event is free

Organization

School of Graduate Studies

Contact

Dolly Grewal

Where

ER Building
2155 Guy St.
Room 1072

Accessible location

Yes - See details

When studying for a doctoral degree (PhD), candidates submit a thesis that provides a critical review of the current state of knowledge of the thesis subject as well as the student’s own contributions to the subject. The distinguishing criterion of doctoral graduate research is a significant and original contribution to knowledge.

Once accepted, the candidate presents the thesis orally. This oral exam is open to the public.

Abstract

The rapid growth of open and collaborative data science platforms has led to large, disconnected collections of artifacts, namely, datasets and code pipelines. This separation makes it difficult to reuse knowledge and automate complex data science workflows. This thesis shows that by combining the semantic representations of Knowledge Graphs (KGs) and Table Representation Learning (TRL) into a unified foundational layer, these limitations can be overcome.

The presented research demonstrates the versatility of this semantic layer through three distinct paradigms, each addressing a critical stage of the data science lifecycle. First, data discovery is formulated as a scalable, direct query process against the KG, enabling expressive, semantic searches. Second, Automated Machine Learning (AutoML) is approached as a meta-learning task where Graph Neural Networks (GNNs) learn from the collective experience encoded in the graph's structure to recommend optimal pipelines. Third, Automated Exploratory Data Analysis (AutoEDA) is realized via a Retrieval-Augmented Generation (RAG) framework that grounds Large Language Models (LLMs) with factual, verifiable context from the KG.

Through extensive evaluations on standard benchmarks, the systems developed in this research show significant improvements in accuracy, scalability, and cost-effectiveness over state-of-the-art methods. Together, this work presents a new and powerful architectural pattern for data science automation, proving that a unified semantic representation is a practical foundation for building the next generation of intelligent and collaborative data science tools.