When studying for a doctoral degree (PhD), candidates submit a thesis that provides a critical review of the current state of knowledge of the thesis subject as well as the student’s own contributions to the subject. The distinguishing criterion of doctoral graduate research is a significant and original contribution to knowledge.
Once accepted, the candidate presents the thesis orally. This oral exam is open to the public.
Abstract
The use of automatic speech recognition (ASR) to score pronunciation placement tests offers language institutions an efficient alternative to human raters, addressing common challenges such as testing reliability and high labour costs. However, the customizable ASR systems used to create scoring models are costly, making them unaffordable for most institutions. As an alternative, this dissertation explored the feasibility of using transcripts from Google Voice Typing (GVT), a free and readily available dictation-based software, to provide automated scores for pronunciation assessments. Via two empirical studies, it addressed the following overarching research question “What are the affordances offered by dictation ASR in an L2 pronunciation assessment context?”
In the first study (Manuscript A), human-rated and GVT-rated scores of 56 pronunciation placement tests were compared, showing strong correlations. However, when the samples were divided by proficiency levels, there were weak correlations for high-proficiency users, raising concerns about the reliability of the test scores. To explain this finding, it was hypothesized that some high-proficiency test takers received low GVT scores due to problematic linguistic elements in some test items (e.g., highly infrequent words, unusual collocations). Overall, this study showed that scoring pronunciation tests with GVT is feasible, with the caveat that reliability issues be explored to ensure test validity and reliability.
As a follow-up, the second study (Manuscript B) explored the effect of word frequency, unusual collocations, and phonologically ambiguous items on GVT transcription accuracy, with the aim of supporting the design of valid and reliable pronunciation tests. Four highly intelligible English speakers recorded 60 sentences targeting these three features. The recordings were transcribed by GVT and scored for accuracy, while eight human raters evaluated intelligibility. For GVT, the results suggest that lower-frequency vocabulary and phonologically ambiguous phrases were particularly challenging, while sentences containing names, proper nouns, or unusual collocations were almost always accurately transcribed. In contrast, transcriptions produced by human raters showed great variability and tended to be less accurate than those generated by GVT. These results indicate that certain features are difficult for both human raters and GVT to transcribe, even when produced by highly intelligible speakers. This highlights the importance of careful task design to avoid features that may compromise transcription accuracy. To ensure valid, reliable scores and fair decision-making, transcription accuracy must be verified through systematic test piloting.
The findings of this dissertation emphasize that GVT can be used to develop cost-effective scoring models for pronunciation placement tests, while also highlighting the need for careful task design to avoid the use of language that may compromise transcription accuracy and unfairly penalize test takers. These results also offer practical implications for classroom-based pronunciation assessments.