Date & time
10 a.m. – 1 p.m.
This event is free
School of Graduate Studies
Engineering, Computer Science and Visual Arts Integrated Complex
1515 Ste-Catherine St. W.
Room 2.184
Yes - See details
When studying for a doctoral degree (PhD), candidates submit a thesis that provides a critical review of the current state of knowledge of the thesis subject as well as the student’s own contributions to the subject. The distinguishing criterion of doctoral graduate research is a significant and original contribution to knowledge.
Once accepted, the candidate presents the thesis orally. This oral exam is open to the public.
Cloud computing enables on-demand access to a shared pool of configurable computing resources, including Graphics Processing Units (GPUs), which are essential for accelerating compute-intensive workloads such as artificial intelligence (AI), machine learning (ML), and microservice-based applications. As GPU adoption grows in modern cloud environments, the diversity of workloads, heterogeneous resource requirements, and strict isolation demands make efficient GPU resource allocation a critical challenge. Inefficient scheduling and static allocation policies often result in GPU underutilization, performance interference, prolonged task completion times, and degraded Quality of Service (QoS).
Modern cloud workloads require dynamic and fine-grained GPU resource management to satisfy fairness, performance isolation, and latency constraints. In a multi-tenant cloud environment, GPU allocation mechanisms must enforce fairness and strong isolation to prevent interference across workloads while maintaining high utilization. Moreover, cloud-native applications, such as microservices, consist of loosely coupled, interdependent components that exhibit diverse GPU demands and dynamic execution behaviors. Inefficient resource sharing in such applications can lead to performance bottlenecks and increased communication and data-transfer overhead. Furthermore, real-time cloud services, particularly latency-sensitive AI inference workloads, require priority-based GPU allocation and scheduling to meet deadline requirements. The requirements together outline GPU resource allocation as a multi-dimensional challenge. At the cluster level, the problem concerns ensuring fairness and isolation among tenants. At the application level, the focus is on maximizing efficiency for cloud-native applications. Lastly, at the runtime level, the challenge lies in executing tasks with priority awareness under real-time latency constraints.
This thesis addresses these challenges with three key contributions for multi-tenant cloud environments, where efficient, fair, and latency-aware GPU allocation is critical. First, we propose a fairness-driven GPU allocation mechanism that enforces strong isolation among tenants while maximizing GPU utilization in shared cloud infrastructures. Second, we introduce a dynamic GPU resource allocation framework designed for microservice-based applications. This framework adapts to workload variations and inter-component dependencies to improve throughput and reduce end-to-end latency. Third, we present a priority-based GPU scheduling strategy that supports task preemption and resumption, enabling the timely execution of real-time workloads while preserving fairness.
© Concordia University