Skip to main content
Oral defences & examinations, Thesis defences

Masters Thesis Defense: Dave Bhardwaj


Date & time
Thursday, September 9, 2021
2 p.m. – 4 p.m.
Cost

This event is free

Where

Online

Candidate:

Dave Bhardwaj - # 40000679

   
             

Thesis Title:

Measurement Framework for Assessing Quality of Big Data (MEGA) in Big Data Pipelines

             

Date & Time:

September 9th, 2021 @ 2:00 PM

   
             

Location:

Zoom

   
             

Examining Committee:

         
             
 

Dr. Tristan Glatard

(Chair)

   
             
 

Dr. Olga Ormandjieva

(Supervisor)

   
             
 

Dr. Tse-Hsun (Peter) Chen

(Examiner)

 
             
 

Dr. Tristan Glatard

(Examiner)

 
             
             

 

 

 

Abstract:

           

Big Data is used widely in the decision-making process and businesses have seen just how powerful data can be, especially for areas such as advertising and marketing. As institutions begin relying on their Big Data systems to make more informed and strategic business decisions, the importance of the underlying data quality becomes extremely significant. In our research this is accomplished through studying and automating the quality characteristics of Big Data, more specifically, through the V’s of Big Data.


           In this thesis, our aim is to not only present researchers with useful Big Data quality measurements, but to bridge the gap between theoretical measurement models of Big Data quality characteristics and the application of these metrics to real world Big Data Systems. Therefore, our thesis proposes a framework (The MEGA Framework) that can be applied to Big Data Pipelines in order to facilitate the extraction and interpretation of Big Data V’s measurement indicators. The proposed framework allows the application of Big Data V’s measurements at any phase of the architecture process in order to flag quality anomalies of the underlying data, before they can negatively impact the decision-making process. The theoretical quality measurement models for six of the Big Data V’s, namely Volume, Variety, Velocity, Veracity, Validity, and Vincularity, are currently  
automated.


          The novelty of the MEGA approach includes the ability to: i) process both structured and unstructured data, ii) track a variety of quality indicators defined for the V’s, iii) flag datasets that pass a certain quality threshold, and iv) define a general infrastructure for collecting, analyzing, and reporting the V's measurement indicators for trustworthy and meaningful decision-making.

Back to top

© Concordia University