Mathematics & statistics research seminar
Pizza and soft drinks will be served after the talk.
SPEAKER: Dr. Farhad Shokoohi
ABSTRACT: The advent of modern technology has led to a surge of high-dimensional data in biology and health sciences such as genomics, epigenomics and medicine. The high-grade serous ovarian cancer (HGS-OvCa) data reported by The Cancer Genome Atlas (TCGA) Research Network is one example that includes information on over 9,000 genes. Our study focuses on the relationship between Disease Free Time (DFT) after surgery among ovarian cancer patients and their DNA methylation profiles of genomic features. Such studies pose additional challenges beyond the typical big data problem due to intangible population substructure and censoring. Despite the availability of several methods for analyzing time-to-event data with a large number of covariates but a small sample size, there is no method available to date that accommodates the additional feature of heterogeneity. In this talk, we propose a regularized framework based on the finite mixture of accelerated failure time model to capture intangible heterogeneity due to population substructure and to account for censoring simultaneously. Our data analysis indicates the existence of heterogeneity in the HGS-OvCa data, with one component of the mixture capturing a more aggressive form of the disease, and the second component capturing a less aggressive form. In particular, the second component portrays a significant positive relationship between methylation and DFT for BRCA1. By further unearthing the negative relationship between gene expression and methylation for this gene, a biologically reasonable explanation emerges: hyper-methylation of BRCA1 leads to downregulation of the gene, which reduces the expression of the gene and thus longer DFT, especially if the gene is mutated.