Skip to main content
notice

Master Thesis Defense: Soudabeh Barghi

November 19, 2018
|


Speaker: Soudabeh Barghi

Supervisor: Dr. T. Glatard

Examining Committee: Drs. G. Butler, A. Krzyzak, J. Yang (Chair)

Title: Predicting Computational Reproducibility of Data Analysis Pipelines in Large Population Studies Using Collaborative Filtering

Date: Monday, November 19, 2018

Time: 12:30PM

Place: EV 11.119

ABSTRACT

Evaluating the computational reproducibility of data analysis pipelines has become a critical issue. It is, however, a cumbersome process for analyses that involve data from large populations of subjects, due to their computational and storage requirements. We present a method to predict the computational reproducibility of data analysis pipelines in large population studies. We formulate the problem as a collaborative filtering process, with constraints on the construction of the training

set. We propose 6 different strategies to build the training set, which we evaluate on 2 datasets, a synthetic one modeling a population with a growing number of subject types, and a real one obtained with neuroinformatics pipelines. Results show that one sampling method, “Random File Numbers (Uniform)” is able to predict computational reproducibility with a good accuracy. We also analyse the relevance of including file and subject biases in the collaborative filtering model. We conclude that the proposed method is able to speed-up reproducibility evaluations substantially, with a reduced accuracy loss.




Back to top

© Concordia University