Master Thesis Defense - May 27, 2019: Variational Approaches for Learning Finite Scaled Dirichlet Mixture Models
Dinh Hieu Nguyen
Monday, May 27, 2019 at 9:00 a.m.
You are invited to attend the following M.A.Sc. (Information Systems Security) thesis examination.
Dr. J. Yan, Chair
Dr. N. Bouguila, Supervisor
Dr. W. Lucia, CIISE Examiner
Dr. B. Lee, External Examiner (BCEE)
With a massive amount of data created on a daily basis, the ubiquitous demand for data analysis is undisputed. Recent development of technology has made machine learning techniques applicable to various problems. Particularly, we emphasize on cluster analysis, an important aspect of data analysis. Recent works with excellent results on the aforementioned task using finite mixture models have motivated us to further explore their extents with different applications. In other words, the main idea of mixture model is that the observations are generated from a mixture of components, in each of which the probability distribution should provide strong flexibility in order to fit numerous types of data. Indeed, the Dirichlet family of distributions has been known to achieve better clustering performances than those of Gaussian when the data are clearly non-Gaussian, especially proportional data.
Thus, we introduce several variational approaches for finite Scaled Dirichlet mixture models. The proposed algorithms guarantee reaching convergence while avoiding the computational complexity of conventional Bayesian inference. In summary, our contributions are threefold. First, we propose a variational Bayesian learning framework for finite Scaled Dirichlet mixture models, in which the parameters and complexity of the models are naturally estimated through the process of minimizing the Kullback-Leibler (KL) divergence between the approximated posterior distribution and the true one. Secondly, we integrate component splitting into the first model, a local model selection scheme, which gradually splits the components based on their mixing weights to obtain the optimal number of components. Finally, an online variational inference framework for finite Scaled Dirichlet mixture models is developed by employing a stochastic approximation method in order to improve the scalability of finite mixture models for handling large scale data in real time. The effectiveness of our models is validated with real-life challenging problems including object, texture, and scene categorization, textbased and image-based spam email detection.