Thesis defences

MCS Thesis Examination: Amirmohammad Sarfi

On Using Simulated Annealing in Training Deep Neural Networks

Date & time

Monday, April 17, 2023
2 p.m. – 4 p.m.

Cost

This event is free

Organization

Department of Computer Science and Software Engineering

Contact

Leila Kosseim

Where

Online

Abstract

In deep learning, overfitting is a major problem that makes it difficult for a neural network to perform well on new data. This issue is especially prevalent in low-data regimes, or when training for too many epochs. Iterative learning methods have been devised to improve the generalization performance of neural networks when trained for a prolonged duration. These techniques periodically reduce the training accuracy of a network which is called forgetting. The primary objective of the forgetting stage is to allow the network to learn more from the same data and surpass its previous performance over the long run.

In this thesis, we propose a new forgetting technique motivated by simulated annealing. Although simulated annealing is a powerful tool in optimization, its application in deep learning has been overlooked. In our study, we highlight the potential of this method in deep learning and illustrate its usefulness through experiments. Essentially, we select a subset of layers to undergo brief periods of gradient ascent, followed by gradient descent. In the first scenario, we utilize Simulated Annealing in Early Layers (SEAL) during the training process. Through extensive experiments on the Tiny-ImageNet dataset, we demonstrate that our method has a much better prediction depth, in-distribution, and transfer learning performance compared to the state-of-the-art works in iterative training. In the second scenario, we expand the application of simulated annealing beyond the realms of classification and computer vision, by employing it in text-to-3D generative methods. In this scenario, we apply simulated annealing to the entire network and illustrate its effectiveness compared to normal training. These two scenarios collectively demonstrate the potential of simulated annealing as a valuable tool for optimizing deep neural networks and emphasize the need for further exploration of this technique in the literature.

Examining Committee

Dr. Adam Krzyzak (Chair)
Dr. Sudhir Mudur & Eugene Belilovsky (Supervisor)
Dr. Tiberiu Popa (Examiner)
Dr. Adam Krzyzak (Examiner)

Events