When studying for a doctoral degree (PhD), candidates submit a thesis that provides a critical review of the current state of knowledge of the thesis subject as well as the student’s own contributions to the subject. The distinguishing criterion of doctoral graduate research is a significant and original contribution to knowledge.
Once accepted, the candidate presents the thesis orally. This oral exam is open to the public.
Data is ever increasing with today’s many technological advances in terms of both quantity and dimensions. Such inflation has posed various challenges in statistical and data analysis methods and hence requires the development of new powerful models for transforming the data into useful information. Therefore, it was necessary to explore and develop new ideas and techniques to keep pace with challenging learning applications in data analysis, modeling and pattern recognition. Finite mixture models have received considerable attention due to their ability to effectively and efficiently model high dimensional data. In mixtures, choice of distribution is a critical issue and it has been observed that in many real life applications, data exist in a bounded support region, whereas distributions adopted to model the data lie in unbounded support regions. Therefore, it was proposed to define bounded support distributions in mixtures and introduce a modified procedure for parameters estimation by considering the bounded support of underlying distributions. The main goal of this thesis is to introduce bounded support mixtures, their parameters estimation, automatic determination of number of mixture components and application of mixtures in feature extraction techniques to overall improve the learning pipeline. Five different unbounded support distributions are selected for applying the idea of bounded support mixtures and modified parameters estimation using maximum likelihood via Expectation-Maximization (EM). Probability density functions selected for this thesis include Gaussian, Laplace, generalized Gaussian, asymmetric Gaussian and asymmetric generalized Gaussian distributions, which are chosen due to their flexibility and broad applications in speech and image processing. The proposed bounded support mixtures are applied in various speech and images datasets to create leaning applications to demonstrate the effectiveness of proposed approach. Mixtures of bounded Gaussian and bounded Laplace are also applied in feature extraction and data representation techniques, which further improves the learning and modeling capability of underlying models. The proposed feature representation via bounded support mixtures is applied in both speech and images datasets to examine its performance. Automatic selection of number of mixture components is very important in clustering and parameter learning is highly dependent on model selection and it is proposed for mixture of bounded Gaussian and bounded asymmetric generalized Gaussian using minimum message length. Proposed model selection criterion and parameter learning are simultaneously applied in speech and images datasets for both models to examine the model selection performance in clustering.