Research Projects
On Simultaneous Optimization and Generalization in Minimax Learning Frameworks
Joint work with Asu Ozdaglar (Massachusetts Institute of Technology)

Abstract: The success of minimax learning frameworks has been empirically observed to depend on the optimization algorithm used for their training. In this work, we present theoretical and numerical evidence showing that the generalization properties of learned minimax models are influenced by the underlying optimization algorithm. To this end, we extend the classical algorithmic stability approach to gradientbased minimax optimization algorithms and analyze the generalization properties of standard gradient descent ascent and proximal point method algorithms. Our analysis suggests that in general nonconvexnonconcave settings the models learned by simultaneous update algorithms such as gradient descent ascent are expected to generalize better than the models learned by nonsimultaneous update algorithms such as gradient descent max which fully solves the maximization subproblem at every iteration. We support our theoretical results through several numerical experiments on the discussed minimax optimization algorithms.
Papers and Preprints
F. Farnia, A. Ozdaglar, “Train simultaneously, generalize better: Stability of gradientbased minimax learners”, ICML 2021

Generative Adversarial Training for Gaussian Mixture Models
Joint work with William Wang, Ali Jadbabaie (Massachusetts Institute of Technology), and Subhro Das (IBM Research)

Abstract: While GANs achieve stateoftheart empirical performance in learning the complex distributions of image, sound, and text data, they perform suboptimally in learning multimodal distribution benchmarks including Gaussian mixture models, where they are commonly outperformed by maximum likelihoodbased methods such as the expectationmaximization (EM) algorithm. A natural question is whether this performance gap fundamentally exists in learning mixture distributions or is due to the poor design of the GAN players and loss function. In this research work, we design a new GAN architecture called Generative Adversarial Training for Gaussian Mixture Models (GATGMM) for learning mixtures of Gaussians. We prove that GATGMM is theoretically guaranteed to learn a mixture of two Gaussians and provide numerical results indicating that GATGMM can perform as well as the EM algorithm in learning Gaussian mixture models. Our results indicate that with a proper design the GAN minimax framework can potentially lead to stateoftheart performance in learning mixture distributions.
Papers and Preprints
F. Farnia*, W. Wang*, S. Das, A. Jadbabaie, “GATGMM: Generative Adversarial Training for Gaussian Mixture Models”, arXiv:2006.10293

Robust Federated Learning under Affine Distribution Shifts
Joint work with Amirhossein Reisizadeh, Ramtin Pedarsani (UC Santa Barbara), and Ali Jadbabaie (Massachusetts Institute of Technology)

Abstract: Federated learning is a recent learning paradigm where a machine learning model is trained using the training data and computational power distributed across multiple local nodes connected to a universal server node. The heterogeneity of the underlying data distributions observed by different local nodes has been an important barrier toward training robust federated learning models. In this work, we propose a minimax learning framework called FedRobust as an efficient and robust federated learning scheme against affine distribution shifts manipulating the samples observed by every node with an affine transformation. We theoretically support FedRobust by proving generalization, convergence, and distributional robustness guarantees. We perform several empirical experiments to show the robustness gain offered by FedRobust against both affine distribution shifts and unstructured adversarial perturbations.
Papers and Preprints
A. Reisizadeh*, F. Farnia*, R. Pedarsani, A. Jadbabaie, ”Robust Federated Learning: The Case of Affine Distribution Shifts”, NeurIPS 2020

On the Existence of Nash Equilibria in GAN Minimax Games
Joint work with Asu Ozdaglar (Massachusetts Institute of Technology)

Abstract: Despite the successful applications of GANs to various distribution learning problems, GAN minimax optimization remains theoretically and empirically challenging. In this work, we study the equilibrium properties of GAN problems and prove that the classical Nash equilibrium may not exist in GAN minimax games. To understand equilibrium in GANs, we propose a new notion of equilibrium, called proximal equilibrium, by considering the Nash equilibria of a different zerosum game applying a proximal operator to the original GAN minimax objective. We prove that unlike Nash equilibria, proximal equilibria indeed exist in the minimax game of Wasserstein GANs. We then propose proximal training for GANs by optimizing the proximal objective and numerically demonstrate that proximal training can outperform standard GAN training methods.
Papers and Preprints
F. Farnia, A. Ozdaglar, “Do GANs always have Nash equilibria?”, ICML 2020

A Convex Duality Framework for GANs
Joint work with David Tse (Stanford University)

Abstract: GANs represent a minimax game between the generator and discriminator machines for learning the distribution of observed data. The GAN minimax problem with an unconstrained discriminator is known to reduce to the minimization of the JensenShannon divergence between the generator’s and data distributions. However, in practice the discriminator is constrained to smaller parametrized function sets such as convolutional neural networks. In this research work, we develop a general convex duality framework for GANs and demonstrate that for a discriminator constrained to a convex set of functions, the GAN minimax problem minimizes the JensenShannon divergence between two sets of probability distributions: generative models and discriminator moment matching models. We further use the proposed duality framework for interpreting and improving other minimax GAN formulations including Wasserstein GAN (WGAN) and spectrallynormalized GAN (SNGAN) problems.
Papers and Preprints
F. Farnia, D. Tse, “A convex duality framework for GANs”, NeurIPS 2018

Understanding GANs in the LQG Setting
Joint work with Soheil Feizi (University of Maryland), Tony Ginart, and David Tse (Stanford University)

Abstract: Generative Adversarial Networks (GANs) have become a popular method to learn a probability model from data. In this work, we provide an understanding of basic issues surrounding GANs including their formulation, generalization and stability on a simple LQG benchmark where the generator is Linear, the discriminator is Quadratic and the data has a highdimensional Gaussian distribution. Even in this simple benchmark, the GAN problem has not been wellunderstood as we observe that existing stateoftheart GAN architectures may fail to learn a proper generative distribution owing to (1) stability issues (i.e., convergence to bad local solutions or not converging at all), (2) approximation issues (i.e., having improper global GAN optimizers caused by inappropriate GAN's loss functions), and (3) generalizability issues (i.e., requiring large number of samples for training). In this setup, we propose a GAN architecture which recovers the maximumlikelihood solution and demonstrates fast generalization. Moreover, we analyze global stability of different computational approaches for the proposed GAN and highlight their pros and cons. Finally, through experiments on MNIST and CIFAR10 datasets, we outline extensions of our modelbased approach to design GANs in more complex setups than the considered Gaussian benchmark.
Papers and Preprints
S. Feizi, F. Farnia, T. Ginart, D. Tse, “Understanding GANs in the LQG Setting: Formulation, Generalization and Stability”, IEEE Journal on Selected Areas in Information Theory, 2020

Generalizable Adversarial Training via Spectral Normalization
Joint work with Jesse M. Zhang and David Tse (Stanford University)

Abstract: Deep neural networks (DNNs) have set benchmarks on a wide array of supervised learning tasks. Trained DNNs, however, often lack robustness to minor adversarial perturbations to the input, which undermines their true practicality. Recent works have increased the robustness of DNNs by fitting networks using adversariallyperturbed training samples, but the improved performance can still be far below the performance seen in nonadversarial settings. A significant portion of this gap can be attributed to the decrease in generalization performance due to adversarial training. In this work, we extend the notion of margin loss to adversarial settings and bound the generalization error for DNNs trained under several wellknown gradientbased attack schemes, motivating an effective regularization scheme based on spectral normalization of the DNN's weight matrices. We also provide a computationallyefficient method for normalizing the spectral norm of convolutional layers with arbitrary stride and padding schemes in deep convolutional networks. We evaluate the power of spectral normalization extensively on combinations of datasets, network architectures, and adversarial training schemes.
Papers and Preprints
F. Farnia*, J. Zhang*, D. Tse, “Generalizable Adversarial Training via Spectral Normalization”, ICLR 2019

A FourierBased Approach to Generalization and Optimization in Deep Learning
Joint work with Jesse M. Zhang and David Tse (Stanford University)

Abstract: The success of deep neural networks stems from their ability to generalize well on real data; however, neural networks are observed to easily overfit randomlygenerated labels. This observation highlights the following question: why do gradient methods succeed in finding generalizable solutions for neural networks while there exist solutions with poor generalization behavior? In this project, we use a Fourierbased approach to study the generalization properties of gradientbased methods over 2layer neural networks with bandlimited activation functions. Our results indicate that in such settings if the underlying distribution of data enjoys nice Fourier properties including bandlimitedness and bounded Fourier norm, then the gradient descent method can converge to local minima with nice generalization behavior. We also establish a Fourierbased generalization error bound for bandlimited function spaces, applicable to 2layer neural networks with general activation functions. This generalization bound motivates a grouped version of path norms for measuring the complexity of 2layer neural networks with ReLUtype activation functions. We empirically demonstrate that regularization of the group path norms results in neural network solutions that can fit true labels without losing test accuracy while not overfitting random labels.
Papers and Preprints
F. Farnia, J. Zhang, D. Tse, “A FourierBased Approach to Generalization and Optimization in Deep Learning”, IEEE Journal on Selected Areas in Information Theory, 2020

A Minimax Approach to Supervised Learning
Joint work David Tse (Stanford University)

Abstract: Given a task of predicting Y from X, a loss function L, and a set of probability distributions Gamma on (X,Y), what is the optimal decision rule minimizing the worstcase expected loss over Gamma? In this project, we address this question by introducing a generalization of the principle of maximum entropy. Applying this principle to sets of distributions with marginal on X constrained to be the empirical marginal from the data, we develop a general minimax approach for supervised learning problems. While for some loss functions such as squarederror and log loss, the minimax approach rederives wellknown regression models, for the 01 loss it results in a new linear classifier which we call the maximum entropy machine. The maximum entropy machine minimizes the worstcase 01 loss over the structured set of distribution, and by our numerical experiments can outperform other wellknown linear classifiers such as SVM. We also prove a bound on the generalization worstcase error in the minimax approach.
Papers and Preprints
F. Farnia, D. Tse, “A Minimax Approach to Supervised Learning”, NeurIPS 2016

Inference and Feature Selection Based on Low Order Marginals
Joint work Meisam Razaviyayn (University of Southern California), Sreeram Kannan (University of Washington), and David Tse (Stanford University)

Abstract: We consider the binary classification problem of predicting a target variable Y from a discrete feature vector X. When the joint probability distribution is known, the optimal classifier, leading to the minimum misclassification rate, is given by the Maximum Aposteriori Probability decision rule. However, estimating the complete joint distribution is computationally and statistically impossible for large values of d. An alternative approach is to first estimate some low order marginals and then design the classifier based on the estimated low order marginals. This approach is also helpful when the complete training data instances are not available due to privacy concerns. In this work, we consider the problem of finding the optimum classifier based on some estimated low order marginals of (X,Y). We prove that for a given set of marginals, the minimum HirschfeldGebeleinRenyi (HGR) correlation principle introduced leads to a randomized classification rule which is shown to have a misclassification rate no larger than twice the misclassification rate of the optimal classifier. Then, under a separability condition, we show that the proposed algorithm is equivalent to a randomized linear regression approach. In addition, this method naturally results in a robust feature selection method selecting a subset of features having the maximum worst case HGR correlation with the target variable. We numerically compare our proposed algorithm with the DCC classifier and show that the proposed algorithm results in better misclassification rate over various datasets.
Papers and Preprints
M. Razaviyayn, F. Farnia, D. Tse, “Discrete Rényi Classifiers”, NeurIPS 2015
F. Farnia, M. Razaviyayn, S. Kannan, D. Tse, “Minimum HGR correlation principle: From marginals to joint distribution”, ISIT 2015

