Date & time
10 a.m. – 1 p.m.
This event is free
School of Graduate Studies
Faubourg Ste-Catherine Building
1610 Ste-Catherine St. W.
Room 5-345
Yes - See details
When studying for a doctoral degree (PhD), candidates submit a thesis that provides a critical review of the current state of knowledge of the thesis subject as well as the student’s own contributions to the subject. The distinguishing criterion of doctoral graduate research is a significant and original contribution to knowledge.
Once accepted, the candidate presents the thesis orally. This oral exam is open to the public.
Automatic Speech Recognition (ASR) refers to the process by which digital devices convert spoken utterances into text. Once limited to dictation software or call centers, ASR is now integrated into everyday tools such as smartphones, laptops, and smart speakers. For second language learners, this technology can provide immediate transcriptions that permit real time monitoring, repetition, and correction, offering an interactive approach to pronunciation practice. In language education, ASR has been embraced for its potential to support learning and assessment, particularly pronunciation, where intelligibility is closely linked to communicative competence and professional opportunity. Its availability on widely used platforms such as Google, Microsoft, and Apple makes it a promising resource that can align with broader educational shifts toward autonomy, technology integration, and access to feedback beyond the classroom. However, concerns remain regarding its accuracy, validity, and potential biases. Considering these issues, this dissertation investigated the potential of dictation-based ASR for valid pronunciation assessment through two empirical studies.
In Manuscript A, the study examined Apple Siri as a potential tool for pronunciation assessment, extending prior work on Google Voice Typing and Microsoft Transcribe. Fifty-six adult English learners at a Canadian university completed a five-sentence read aloud oral test designed to target increasing pronunciation difficulty. Recordings were scored both by experienced human raters and by Siri, using a rubric covering comprehensibility, segmental accuracy, connected speech, stress, and rhythm. Siri’s output was analyzed for transcription accuracy and compared with human ratings. Results demonstrated strong correlations between Siri and human raters in measures of intelligibility. These findings suggest that Siri, like other dictation-based ASR tools, can produce valid and cost-effective results for formative assessment contexts. The study provides evidence that off-the-shelf ASR systems can help reduce rater bias, lower costs, and expand access to pronunciation feedback.
In Manuscript B, the focus shifted to validity by investigating potential age-related bias in dictation ASR. A corpus of test responses from 1,000 university learners was analyzed, spanning five first language backgrounds and three age groups. Each recording was processed through Google Voice Typing, Microsoft Transcribe, and Siri to generate word accuracy scores, which were then compared across age groups using regression analyses. Results indicated that the three ASR systems systematically favored younger test takers, a bias that reached statistical significance. These findings highlight an underexplored limitation of dictation-based ASR in assessment: while the technology can provide efficiency and large-scale testing, it may also introduce validity concerns that disproportionately affect older learners.
Together, these studies highlight both the promise and the limitations of dictation ASR in L2 pronunciation assessment. On the one hand, tools such as Siri demonstrate potential for generating valid, accessible feedback that aligns with human ratings, making them useful in educational contexts. On the other hand, evidence of age-related bias across systems raises questions about validity in high stakes contexts. The findings contribute to ongoing debates about the reliability and validity of automated assessment tools, emphasizing the need for critical validation before their large-scale adoption. This dissertation ultimately suggests that while dictation ASR can support broader access to pronunciation feedback, responsible integration requires systematic evaluation of accuracy, validity, and bias.
© Concordia University