When studying for a doctoral degree (PhD), candidates submit a thesis that provides a critical review of the current state of knowledge of the thesis subject as well as the student’s own contributions to the subject. The distinguishing criterion of doctoral graduate research is a significant and original contribution to knowledge.
Once accepted, the candidate presents the thesis orally. This oral exam is open to the public.
Abstract
Understanding discourse is essential for advancing computational models from surface-level text processing to deeper language reasoning, as it captures the logical flow of ideas that shapes meaning into a coherent text. However, progress in computational discourse analysis is hindered by divergent theoretical frameworks, ambiguity in implicit discourse relations and a myopic focus on the English language.
This thesis addresses these challenges through three research objectives. First, it proposes an empirical mapping between the two most widely used discourse frameworks, the Rhetorical Structure Theory and the Penn Discourse Treebank, for explicit and implicit discourse relations. The proposed mapping successfully maps 80.0% of the overlapping annotations between the most prominent corpora following each framework, laying groundwork for cross-framework interoperability. Second, the thesis introduces a novel multi-task classification model, MTask, for Implicit Discourse Relation Recognition (IDRR). The model captures ambiguity in implicit relations by jointly learning multi-label representations of their senses. The model establishes the first benchmark on multi-label IDRR and is also evaluated on the traditional single-label IDRR. Third, the thesis extends the multi-label approach to different languages and presents a hierarchical classification model. The model outperforms MTask in the English language and establishes the first benchmark on multilingual and multi-label IDRR. The thesis further explores prompting strategies using recent large language models and shows that fine-tuning strategies still perform better in this task.
Together, these contributions advance the goal of harmonizing divergence in computational discourse analysis, offering more generalizable and inclusive methods for discourse modeling across frameworks, ambiguity and languages.