Thesis defences

PhD Oral Exam - Hussein Abdallah, Computer Science

AI-Enabled KG Platform: Incorporating LLMs and GNNs into KG Engines

Date & time

Tuesday, November 25, 2025
9 a.m. – 12 p.m.

Cost

This event is free

Organization

School of Graduate Studies

Contact

Dolly Grewal

Where

ER Building
2155 Guy St.
Room 1072

Accessible location

Yes - See details

When studying for a doctoral degree (PhD), candidates submit a thesis that provides a critical review of the current state of knowledge of the thesis subject as well as the student’s own contributions to the subject. The distinguishing criterion of doctoral graduate research is a significant and original contribution to knowledge.

Once accepted, the candidate presents the thesis orally. This oral exam is open to the public.

Abstract

Heterogeneous graphs (HGs) are complex networks with diverse node and edge types, capturing intricate interconnections. Knowledge graphs (KGs) are a specialized type of HGs that store real-world entities and their relationships as triples. Major companies, such as Meta, Google and LinkedIn, utilize KGs for multi-domain knowledge management and information retrieval. Heterogeneous graph neural networks (HGNNs) have emerged as an effective method for learning the structural semantics within graphs and generating graph-level or entity-level embeddings as features for downstream ML tasks such as node classification and link predictions. GNN embeddings can also support knowledge graph question answering (KGQA) by integrating structural embeddings with the reasoning capabilities of large language models through a Retrieval-Augmented Generation (RAG) pipeline. In this setting, the LLM is augmented with relevant knowledge graph information to address unseen questions, thereby enhancing reasoning, reducing hallucinations, and eliminating the need for costly domain-specific fine-tuning.

Training GNN models on top of existing KG engines for large-scale KGs remains a challenging task. Firstly, KG data migration is inefficient. Existing graph ML libraries, like DGL and PYG do not support training HGNN models directly on top of existing KG engines. They require exporting the whole KG data, which is impractical for large-scale KGs and fails to capture frequent KG updates. Also, it requires deep experience in graph ML to write complex data transformation, model training, and model inference pipelines. Secondly, existing HGNN training methods are task-agnostic as they are trained on the entire KG data, including task-irrelevant data. This adds noise, increases training time and memory usage, and results in complex multi-layered GNN models with costly inference time. Moreover, task-agnostic training does not guarantee optimal performance for specific tasks. Thirdly, existing KGQA methods are not designed for question answering over incomplete KGs; the common scenario in production. Existing KGQA methods combined with graph-RAG pipelines struggle when the question answer is missing from the KG and incur high retrieval and prompting costs in large-scale KGs.

My thesis addresses these challenges through KGNET, an AI-enabled KG platform designed to overcome the scalability limitations of HGNN models on large-scale KGs. KGNET tackles three key challenges: (1) The seamless integration of HGNNs, LLMs, and KG engines through our novel system design that extends SPARQL language to support graph ML tasks and thus automates the graph ML training/inference pipelines directly on KG engines with a lightweight task-relevant data extraction and a basic knowledge of graph ML. (2) KGTOSA, a task-oriented sampling method that enables scalable and efficient HGNN training on large-scale knowledge graphs, ensuring high performance in node classification and link prediction tasks without compromising accuracy. (3) Incorporating a graph RAG pipeline with HGNNs that enables effective KGQA over incomplete KGs.