Date & time
1 p.m. – 4 p.m.
This event is free
Engineering, Computer Science and Visual Arts Integrated Complex
1515 St. Catherine W.
Room 003.309
Yes - See details
When studying for a doctoral degree (PhD), candidates submit a thesis that provides a critical review of the current state of knowledge of the thesis subject as well as the student’s own contributions to the subject. The distinguishing criterion of doctoral graduate research is a significant and original contribution to knowledge.
Once accepted, the candidate presents the thesis orally. This oral exam is open to the public.
Data centers (DCs) have become indispensable to modern digital infrastructure, powering critical online services and supporting the exponential growth of data-driven operations worldwide. However, the complexity of DC systems, coupled with their high energy demands and the need for continuous uptime, poses significant challenges in maintenance management. This dissertation reports and synthesizes a comprehensive review of existing Operations and Maintenance (O&M) practices in DCs, alongside the development and implementation of two novel maintenance optimization models tailored for DC environments.
At the first step, key research gaps in the field of DC maintenance are highlighted, identifying a lack of focus on models specific to DC infrastructure. Despite numerous studies on general O&M management in industrial systems, limited research addresses the unique challenges faced by DCs, such as optimizing energy efficiency and ensuring high availability. The review also emphasizes the importance of integrating advanced reliability and availability analyses into DC maintenance strategies, offering future research directions in predictive maintenance, energy management, and system optimization.
Building on the findings of the literature review, this thesis introduces two novel dynamic models designed to optimize maintenance in DCs. The first model proposes a Dynamic Programming (DP) approach for prioritizing maintenance actions based on DC availability requirements and budget constraints. This model formulates a 0-1 Knapsack problem to optimize the allocation of maintenance resources, ensuring that the most critical components receive timely maintenance while adhering to strict availability standards. By incorporating Reliability Block Diagrams (RBDs) and Failure Modes Effects Analysis (FMECA), the model offers a robust solution for maintenance scheduling in hyperscale and cloud-based DCs.
The second model presents a dynamic, availability-based maintenance cost optimization framework for k-out-of-n systems, focusing on Uninterruptible Power Supply (UPS) components. Using a combination of system reliability analysis, availability metrics, and DP, the model determines the optimal number of operational components (k) while minimizing maintenance costs and ensuring compliance with required availability thresholds. Through simulations and a case study, the model demonstrates significant reductions in both maintenance expenses and system downtime.
The third model introduces an Availability-Constrained Maintenance Cost Optimization Model for Series-Parallel Systems, tailored specifically for complex DC configurations. Formulated as a Mixed-Integer Linear Programming (MILP) problem, this model minimizes total maintenance costs while meeting or exceeding system availability requirements based on Uptime Institute Tier standards. It accounts for fixed monthly budget constraints and mandates the inclusion of at least one component from each condition state for maintenance in every cycle. By modeling DC architecture as a hybrid series-parallel system, the model realistically captures both redundancy and critical dependencies. Key constraints incorporate availability formulas for series and parallel subsystems, condition-state-based component selection, and budget limitations. The model effectively guides decision-makers in selecting cost-efficient maintenance strategies without compromising service-level expectations. Results from simulations demonstrate its robustness and applicability to real-world DC environments.
This research work contributes to a deeper understanding of DC maintenance management, providing practical and scalable frameworks for optimizing costs, enhancing system reliability, and supporting the evolving needs of operators in data centers.
© Concordia University