To fully realize the benefits of deep learning, we need to design highly scalable, robust, and privacy-preserving learning algorithms along with understanding the fundamental limits of the underlying architecture, e.g., a neural network over which the learning algorithm is applied. The key algorithm underlying deep learning revolution is stochastic gradient descent (SGD), which needs to be distributed to handle enormous and possibly sensitive data distributed among multiple owners, such as hospitals and cellphones, without sharing local data. When implementing SGD on large-scale and distributed systems, communication time required to share stochastic gradients is the main performance bottleneck. In addition to communication-efficiency, robustness is highly desirable in real-world settings. We present efficient gradient compression and robust aggregation schemes to reduce communication costs and enhance security while preserving privacy. Our algorithms currently offer the highest communication-compression while still converging under regular (uncompressed) hyperparameter values. Considering the underlying architecture, one fundamental question is "How much should we overparameterize a neural network?" We present the current best scaling on the number of parameters for fully-trained shallow neural networks under standard initialization schemes.
Bio
Ali Ramezani-Kebrya is a senior postdoctoral associate at EPFL. Before joining the Laboratory for Information and Inference Systems, he was a postdoctoral fellow at the Vector Institute. Ali received his Ph.D. from the University of Toronto. He works in machine learning and studies communication, optimization, privacy/security, and generalization aspects of machine learning algorithms. He is a recipient of the Natural Sciences and Engineering Research Council of Canada Postdoctoral Fellowship