Skip to main content
Thesis defences

PhD Oral Exam - Jinli Yao, Information and Systems Engineering

Towards Better Clustering: From Quality Criteria to Advanced Hierarchical Algorithms


Date & time
Monday, August 25, 2025
11 a.m. – 2 p.m.
Cost

This event is free

Organization

School of Graduate Studies

Contact

Dolly Grewal

Accessible location

Yes

When studying for a doctoral degree (PhD), candidates submit a thesis that provides a critical review of the current state of knowledge of the thesis subject as well as the student’s own contributions to the subject. The distinguishing criterion of doctoral graduate research is a significant and original contribution to knowledge.

Once accepted, the candidate presents the thesis orally. This oral exam is open to the public.

Abstract

Clustering is a cornerstone of unsupervised learning, offering powerful tools for uncovering patterns and natural groupings in unlabeled data. Despite its extensive applications across diverse fields, clustering research faces persistent challenges, including inconsistent definitions, varied eval- uation criteria, and difficulties in handling complex data characteristics. This thesis addresses these challenges by integrating theoretical insights with algorithmic innovations to enhance clustering methodologies and their applicability.

The first part of this work explores the fundamental question, “What defines a good cluster?” Through a systematic review of clustering criteria, principles, and evaluation metrics, it highlights the diversity of clustering algorithms and the challenges posed by high-dimensional, overlapping, and varied-density data. This foundational analysis establishes a structured understanding of clus- tering quality and its implications for algorithm design.

Building on these principles, the thesis introduces Gauging-δ, a nonparametric hierarchical clus- tering algorithm capable of handling diverse cluster shapes. Employing an adaptive mergeability function, the algorithm iteratively merges clusters based on local data statistics and environmental factors. Rigorous experiments on synthetic and real-world datasets demonstrate its robustness in identifying well-separated clusters and its sensitivity to feature and distance metric selection.

The thesis further presents Gauging-β, a density-aware hierarchical clustering algorithm ad- dressing challenges in data separation. The proposed algorithm leverages density-based methods to identify and remove border points, effectively separating data sets. Gauging-δ is then applied to the remaining points to generate the main clusters. Finally, the border points are reintegrated into the
formed clusters. Experimental results demonstrate that the algorithm is capable of handling both convex and non-convex, as well as well-separated and poorly-separated, data sets. The impact of parameter settings on clustering outcomes is thoroughly investigated. Further experiments on real- world data sets reveal that the consistency of clustering results with classification labels strongly depends on an appropriate measure of sample similarity.

Together, these three components offer a coherent approach to clustering, from clarifying theo- retical concepts of cluster quality to developing algorithms capable of identifying meaningful clus- ters in various synthetic and complex real-world datasets.

Back to top

© Concordia University