Drs. J. Bentahar, T. Glatard, J. Rilling
Title: A Three-layered Investigation of Theories and Models for Selection of Big Data Sources
Date: Thursday, April 9, 2020
Place: Online via Zoom teleconference (more details to be accounced)
Selection of data sources that meet the objectives of analysis goals in Big Data research is an open challenging research problem. A detailed investigation of the current research in this field reveals that there exists no rigorous approach to solve this open problem. The methodology proposed in my research has three layers, and each layer is founded on a precise model. Layer 3 (bottom layer) model is built on Service-oriented Paradigm. Conceptualizing each data source as a ``thing’’ that provides several configured services in a specific domain, several directories that encapsulate functional, quality, and contextual aspects of services are modeled. Layer 1 (top layer) model is a collection of hierarchical models built on GQM (Goal-Query-Metric) approach. They capture the precise analysis-oriented requirements. Layer 2 (middle layer) will use a semantic-based similarity function for ``best-match'' comparison of the Layer 1 model with Layer 3 model, by aggregating pairs of vectors extracted from them. Data sources are selected and ranked based on the rank scores, calculated from the similarity function scores. In this approach, the advantages and disadvantages of data sources that are relevant to the analysis goals are factored in during ``best-match'' comparison in Layer 2. The modularity of this approach is significant, because changes to model in each layer is localized while the procedures for the selection and ranking of data sources simply will rerun the existing algorithms on updated models.