Skip to main content
Thesis defences

PhD Oral Exam - Fatma Najar, Information and Systems Engineering

Smoothed Probabilistic-based Algorithms for Sparse Data with applications to Emotion Recognition and Sentiment Analysis

Date & time
Monday, October 3, 2022 (all day)

This event is free


School of Graduate Studies


Daniela Ferrer



When studying for a doctoral degree (PhD), candidates submit a thesis that provides a critical review of the current state of knowledge of the thesis subject as well as the student’s own contributions to the subject. The distinguishing criterion of doctoral graduate research is a significant and original contribution to knowledge.

Once accepted, the candidate presents the thesis orally. This oral exam is open to the public.


Humans are able to express more than 10,000 expressions through 43 facial muscles which makes reading faces a significant human skill and a challenge task for Artificial Intelligence (AI) algorithms. Even though much research work has been proposed for the field of sentiment analysis and emotion recognition, it continues to present considerable challenges. In our research, we focus on providing novel emotion recognition and sentiment analysis solutions where we address data challenges that occur in different modalities: texts, images, and videos. Considering these different multimedia contents, the analysis of data considers the concurrency nature of words in a collection of documents, visual words or proportional features vectors when considering images and videos. This type of data involves several challenges including sparseness, burstiness, correlated features, and high-dimensionality.

In this dissertation, we propose smoothed probabilistic-based approaches to deal with the afore-mentioned data challenges. First, we introduce the calculation of the exact Fisher information matrix of the generalized Dirichlet multinomial. Our proposed approach has been adopted for detecting de- pression in tweets, dialogue-based emotion recognition, and image-based sentiment analysis. Second, we develop different smoothed solutions for handling sparsity, high dimensionality, and burstiness issues such as smoothed Dirichlet multinomial, smoothed Generalized Dirichlet, smoothed Generalized Dirichlet multinomial (SGDM), Taylor approximation to the SGDM, Latent-based smoothed Beta-Liouville, Smoothed Beta-Liouville Emotion Term model, and Smoothed Scaled Dirichlet Relevance Model. These models are based on smoothing count vectors in a smoothed subset of the whole simplex to deal with the problem of sparseness. Moreover, we incorporate a hierarchical generalized Dirichlet prior for sparse multinomial distributions and a Beta-Liouville Naive Bayes with vocabulary knowledge. These two techniques build up on Bayesian vocabulary knowledge over large discrete domains represented by subsets of feasible outcomes: observed and un- observed words. In another research work, we consider a sparse topic model for non-exchangeable correlated data over time and present a new interactive distance dependant IBP compound Dirichlet process. We derive a Markov Chain Monte Carlo sampler combined with Metropolis-Hastings algorithm and study its performance on sentiment analysis data.

Back to top

© Concordia University