Batch and match: score-based approaches for black-box variational inference

Probabilistic modeling is a cornerstone of modern data analysis, uncertainty quantification, and decision making. A key challenge of probabilistic inference is computing a target distribution of interest, which is often intractable. Black-box variational inference (BBVI) algorithms have gained popularity due to their ease of implementation with automatic differentiation. Traditionally, black-box variational inference (BBVI) methods have been effective when applied to factorized variational families. However, in many complex problems, richer variational families are required to more accurately quantify uncertainty. While methods such as automatic differentiation variational inference (ADVI) have been applied to these more expressive families, stochastic gradient-based optimization often suffers from high-variance estimates and sensitivity to the hyperparameters of the learning algorithm.

In this talk, we introduce “batch and match” (BaM) algorithms, which match the scores of the variational and target distributions over a batch of samples, providing an alternative to classical BBVI methods based on KL divergence minimization. Notably, we demonstrate that score-based divergences enable efficient optimization for several expressive variational families without relying on stochastic gradient descent. We begin with the family of full-covariance Gaussians, presenting a score-based divergence that admits a closed-form proximal update. We analyze BaM’s convergence in the infinite-batch regime and evaluate its performance on Gaussian and non-Gaussian targets arising in posterior inference, observing that it typically converges in significantly fewer gradient evaluations than leading implementations of BBVI. We then introduce EigenVI, a method that fits variational approximations based on orthogonal function expansions. For distributions over R^D, the lowest-order term recovers a Gaussian approximation, while higher-order terms capture non-Gaussian structure. Optimizing a Fisher divergence in this setting reduces to a minimum eigenvalue problem, sidestepping the iterative gradient-based updates required by standard BBVI. Finally, we highlight recent developments in using product-of-experts as a variational family, which enables more expressive approximations that can capture multimodality and skew, while retaining a tractable normalizing constant and efficient sampling procedure. In this setting, we show that the BaM objective becomes convex, and we analyze its theoretical and empirical convergence properties.