PhD Oral Exam - Rim Nasfi, Information and Systems Engineering
Modeling Semi-Bounded Support Data using Non-Gaussian Hidden Markov Models with Applications
This event is free
School of Graduate Studies
When studying for a doctoral degree (PhD), candidates submit a thesis that provides a critical review of the current state of knowledge of the thesis subject as well as the student’s own contributions to the subject. The distinguishing criterion of doctoral graduate research is a significant and original contribution to knowledge.
Once accepted, the candidate presents the thesis orally. This oral exam is open to the public.
With the exponential growth of data in all formats, and data categorization rapidly becoming one of the most essential components of data analysis, it is crucial to research and identify hidden patterns in order to extract valuable information that promotes accurate and solid decision making. Because data modeling is the first stage in accomplishing any of these tasks, its accuracy and consistency are critical for the later development of a complete data processing framework. Furthermore, an appropriate distribution selection that corresponds to the nature of the data is a particularly interesting subject of research. Hidden Markov Models (HMMs) are some of the most impressively powerful probabilistic models, which have recently made a big resurgence in the machine learning industry, despite having been recognized for decades. Their ever-increasing application in a variety of critical practical settings to model varied and heterogeneous data (image, video, audio, time series, etc.) is the subject of countless extensions. Equally prevalent, finite mixture models are a potent tool for modeling heterogeneous data of various natures.
The over-use of Gaussian mixture models for data modeling in the literature is one of the main driving forces for this thesis. This work focuses on modeling positive vectors, which naturally occur in a variety of real-life applications, by proposing novel HMMs extensions using the Inverted Dirichlet, the Generalized Inverted Dirichlet and the Beta-Liouville mixture models as emission probabilities. These extensions are motivated by the proven capacity of these mixtures to deal with positive vectors and overcome mixture models’ impotence to account for any ordering or temporal limitations relative to the information. We utilize the aforementioned distributions to derive several theoretical approaches for learning and deploying Hidden Markov Models in real-world settings. Further, we study online learning of parameters and explore the integration of a feature selection methodology. Extensive experimentation on highly challenging applications ranging from image categorization, video categorization, indoor occupancy estimation and Natural Language Processing, reveals scenarios in which such models are appropriate to apply, and proves their effectiveness compared to the extensively used Gaussian-based models.