When studying for a doctoral degree (PhD), candidates submit a thesis that provides a critical review of the current state of knowledge of the thesis subject as well as the student’s own contributions to the subject. The distinguishing criterion of doctoral graduate research is a significant and original contribution to knowledge.
Once accepted, the candidate presents the thesis orally. This oral exam is open to the public.
Abstract
Continual Learning (CL) aims to enable models to learn from a sequence of tasks without forgetting previously acquired knowledge, an ability that is much needed in real-world scenarios where data and system requirements evolve over time. Traditional machine learning models are typically trained once on a fixed dataset, but CL offers a more efficient paradigm, updating models incrementally as new tasks arise, avoiding the need to retrain from scratch. However, CL faces significant challenges, such as catastrophic forgetting, where models lose performance on earlier tasks when adapting to new ones. This thesis proposes to advance CL by addressing these core challenges and developing methods for highly constrained scenarios, where access to past data is limited or unavailable.
The first research focus is on understanding catastrophic forgetting, with a particular emphasis on how neural representations evolve as new tasks are introduced. This work investigates the reliability of current metrics, such as Centered Kernel Alignment (CKA), in tracking representation changes and proposes novel methods for more accurate measurement of forgetting.
The second area of study involves developing CL methods for restricted scenarios, particularly in cases where replaying past data is not feasible due to privacy concerns or proprietary restrictions. This research introduces Model Breadcrumbs, a method that merges pre-existing fine-tuned models into a multi-task model without requiring access to their original training data.
Lastly, this thesis introduces prompt migration as a new challenge in large language model (LLM) based products. Prompt migration focuses on adapting prompts that work well for one LLM to a different LLM, without re-optimization or access to internal model parameters. Drawing parallels with CL, prompt migration is crucial for maintaining performance across LLMs as businesses increasingly switch between providers, LLM version, and architectures. The research explores how CL principles, such as incremental adaptation without retraining, can be applied to solve the problem of efficient prompt migration.
By addressing these interconnected challenges, this thesis aims to contribute novel methodologies that extend the applicability of CL to real-world constrained scenarios, improving both computational efficiency and adaptability in dynamic environments.