PhD Oral Exam - Aminata Kane, Computer Science
When studying for a doctoral degree (PhD), candidates submit a thesis that provides a critical review of the current state of knowledge of the thesis subject as well as the student’s own contributions to the subject. The distinguishing criterion of doctoral graduate research is a significant and original contribution to knowledge.
Once accepted, the candidate presents the thesis orally. This oral exam is open to the public.
Innovation and advances in technology have led to the growth of time series data at a phenomenal rate in many applications. Query processing and the analysis of time series data have been studied and, numerous solutions have been proposed. In this research, we focus on multivariate time series (MTS) and devise techniques for high dimensional and voluminous MTS data. The success of such solution techniques relies on effective dimensionality reduction in a preprocessing step. Feature selection has often been used as a dimensionality reduction technique. It helps identify a subset of features that capture most characteristics from the data. We propose a more effective feature subset selection technique, termed Weighted Scores (WS), based on statistics drawn from the Principal Component Analysis (PCA) of the input MTS data matrix. The technique allows reducing the dimensionality of the data, while retaining and ranking its most influential features. We then consider feature grouping and develop a technique termed FRG (Feature Ranking and Grouping) to improve the effectiveness of our technique in sparse vector frameworks. We also developed a PCA based MTS representation technique M2U (Multivariate to Univariate transformation) which allows to transform the MTS with large number of variables to a univariate signal prior to performing downstream pattern recognition tasks such as seeking correlations within the set. In related research, we study the similarity search problem for MTS, and developed a novel correlation based method for standard MTS, ESTMSS (Efficient and Scalable Technique for MTS Similarity Search). For this, we uses randomized dimensionality reduction, and a threshold based correlation computation. The results of our numerous experiments on real benchmark data indicate the effectiveness of our methods. The technique improves computation time by at least an order of magnitude compared to other techniques, and affords a large reduction in memory requirement while providing comparable accuracy and precision results in large scale frameworks.