Master Thesis Defense: Varun Gupta
Speaker: Varun Gupta
Supervisor: Dr. A. Krzyzak
Examining Committee: Drs. T. Glatard, C. Y. Suen, A. Hanna (Chair)
Title: An Empirical Evaluation of Attention and Pointer Networks for Paraphrase Generation
Date: Thursday, June 27, 2019
Place: EV 3.309
In computer vision, one of the common practice to augment the image dataset is by creating new images using geometric transformation, which preserves the similarity. This data augmentation was one of the most significant factors to win the Image Net competition in 2012 with vast neural networks. Similarly, in speech recognition, we saw similar results by augmenting the signal by noise, slowing signal or accelerating it, and spectrogram modification.
Unlike in computer vision and speech data, there haven't been many techniques explored to augment data in natural language processing (NLP). The only technique explored in text data is by lexical substitution, which only focuses on replacing words by synonyms.
In this thesis, we investigate the use of different pointer networks with the sequence to sequence models, which have shown excellent results in neural machine translation (NMT) and text simplification tasks, in generating similar sentences using a sequence to sequence model and of the paraphrase dataset (PPDB). The evaluation of these paraphrases is carried out by augmenting the training dataset of IMDb movie review dataset and comparing its performance with the baseline model. We show how these paraphrases can affect downstream tasks. Furthermore, We train different classifiers to create a stable baseline for evaluation on IMDb movie dataset. To our best knowledge, this is the first study on generating paraphrases using these models with help of PPDB dataset and evaluating these paraphrases in the downstream task.