Skip to main content

Showing 1–50 of 136 results for author: Bach, F

Searching in archive stat. Search in all archives.
.
  1. arXiv:2408.16543  [pdf, other

    stat.ML cs.LG math.FA math.OC

    Statistical and Geometrical properties of regularized Kernel Kullback-Leibler divergence

    Authors: Clémentine Chazal, Anna Korba, Francis Bach

    Abstract: In this paper, we study the statistical and geometrical properties of the Kullback-Leibler divergence with kernel covariance operators (KKL) introduced by Bach [2022]. Unlike the classical Kullback-Leibler (KL) divergence that involves density ratios, the KKL compares probability distributions through covariance operators (embeddings) in a reproducible kernel Hilbert space (RKHS), and compute the… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  2. arXiv:2407.17280  [pdf, other

    stat.ML cs.LG

    Enhanced Feature Learning via Regularisation: Integrating Neural Networks and Kernel Methods

    Authors: Bertille Follain, Francis Bach

    Abstract: We propose a new method for feature learning and function estimation in supervised learning via regularised empirical risk minimisation. Our approach considers functions as expectations of Sobolev functions over all possible one-dimensional projections of the data. This framework is similar to kernel ridge regression, where the kernel is $\mathbb{E}_w ( k^{(B)}(w^\top x,w^\top x^\prime))$, with… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  3. arXiv:2311.03794  [pdf, other

    math.OC cond-mat.dis-nn stat.ML

    On the Impact of Overparameterization on the Training of a Shallow Neural Network in High Dimensions

    Authors: Simon Martin, Francis Bach, Giulio Biroli

    Abstract: We study the training dynamics of a shallow neural network with quadratic activation functions and quadratic cost in a teacher-student setup. In line with previous works on the same neural architecture, the optimization is performed following the gradient flow on the population risk, where the average over data points is replaced by the expectation over their distribution, assumed to be Gaussian.W… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

  4. arXiv:2310.10807  [pdf, other

    stat.ML cs.CR cs.LG math.OC

    Regularization properties of adversarially-trained linear regression

    Authors: Antônio H. Ribeiro, Dave Zachariah, Francis Bach, Thomas B. Schön

    Abstract: State-of-the-art machine learning models can be vulnerable to very small input perturbations that are adversarially constructed. Adversarial training is an effective approach to defend against it. Formulated as a min-max problem, it searches for the best solution when the training data were corrupted by the worst-case attacks. Linear models are among the simple models where vulnerabilities can be… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: Accepted (spotlight) NeurIPS 2023; A preliminary version of this work titled: "Surprises in adversarially-trained linear regression" was made available under a different identifier: arXiv:2205.12695

  5. Variational Gaussian approximation of the Kushner optimal filter

    Authors: Marc Lambert, Silvère Bonnabel, Francis Bach

    Abstract: In estimation theory, the Kushner equation provides the evolution of the probability density of the state of a dynamical system given continuous-time observations. Building upon our recent work, we propose a new way to approximate the solution of the Kushner equation through tractable variational Gaussian approximations of two proximal losses associated with the propagation and Bayesian update of… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

    Comments: Lecture Notes in Computer Science, 2023

  6. arXiv:2307.12754  [pdf, other

    stat.ME cs.AI cs.LG math.ST

    Nonparametric Linear Feature Learning in Regression Through Regularisation

    Authors: Bertille Follain, Francis Bach

    Abstract: Representation learning plays a crucial role in automated feature selection, particularly in the context of high-dimensional data, where non-parametric methods often struggle. In this study, we focus on supervised learning scenarios where the pertinent information resides within a lower-dimensional linear subspace of the data, namely the multi-index model. If this subspace were known, it would gre… ▽ More

    Submitted 7 August, 2024; v1 submitted 24 July, 2023; originally announced July 2023.

    Comments: 45 pages, 5 figures

    MSC Class: 62G08; 62F10 (Primary); 65K10 (Secondary) ACM Class: I.2.6

  7. arXiv:2306.00742  [pdf, other

    cs.LG cs.AI stat.ML

    The Galerkin method beats Graph-Based Approaches for Spectral Algorithms

    Authors: Vivien Cabannes, Francis Bach

    Abstract: Historically, the machine learning community has derived spectral decompositions from graph-based approaches. We break with this approach and prove the statistical and computational superiority of the Galerkin method, which consists in restricting the study to a small set of test functions. In particular, we introduce implementation tricks to deal with differential operators in large dimensions wi… ▽ More

    Submitted 26 February, 2024; v1 submitted 1 June, 2023; originally announced June 2023.

    Journal ref: AISTATS 2024

  8. arXiv:2305.19473  [pdf, other

    stat.ML cs.LG stat.CO

    Chain of Log-Concave Markov Chains

    Authors: Saeed Saremi, Ji Won Park, Francis Bach

    Abstract: We introduce a theoretical framework for sampling from unnormalized densities based on a smoothing scheme that uses an isotropic Gaussian kernel with a single fixed noise scale. We prove one can decompose sampling from a density (minimal assumptions made on the density) into a sequence of sampling from log-concave conditional densities via accumulation of noisy measurements with equal noise levels… ▽ More

    Submitted 28 September, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

  9. arXiv:2305.18399  [pdf, other

    cs.LG cs.AI stat.ML

    On the impact of activation and normalization in obtaining isometric embeddings at initialization

    Authors: Amir Joudaki, Hadi Daneshmand, Francis Bach

    Abstract: In this paper, we explore the structure of the penultimate Gram matrix in deep neural networks, which contains the pairwise inner products of outputs corresponding to a batch of inputs. In several architectures it has been observed that this Gram matrix becomes degenerate with depth at initialization, which dramatically slows training. Normalization layers, such as batch or layer normalization, pl… ▽ More

    Submitted 17 November, 2023; v1 submitted 28 May, 2023; originally announced May 2023.

  10. arXiv:2305.16358  [pdf, other

    cs.LG cs.AI stat.ML

    Differentiable Clustering with Perturbed Spanning Forests

    Authors: Lawrence Stewart, Francis S Bach, Felipe Llinares López, Quentin Berthet

    Abstract: We introduce a differentiable clustering method based on stochastic perturbations of minimum-weight spanning forests. This allows us to include clustering in end-to-end trainable pipelines, with efficient gradients. We show that our method performs well even in difficult settings, such as data sets with high noise and challenging geometries. We also formulate an ad hoc loss to efficiently learn fr… ▽ More

    Submitted 6 November, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

    Journal ref: 37th Conference on Neural Information Processing Systems, Dec 2023, New Orleans, United States

  11. arXiv:2303.11669  [pdf, other

    stat.ML cs.LG

    Universal Smoothed Score Functions for Generative Modeling

    Authors: Saeed Saremi, Rupesh Kumar Srivastava, Francis Bach

    Abstract: We consider the problem of generative modeling based on smoothing an unknown density of interest in $\mathbb{R}^d$ using factorial kernels with $M$ independent Gaussian channels with equal noise levels introduced by Saremi and Srivastava (2022). First, we fully characterize the time complexity of learning the resulting smoothed density in $\mathbb{R}^{Md}$, called M-density, by deriving a universa… ▽ More

    Submitted 21 March, 2023; originally announced March 2023.

    Comments: Technical Report

  12. arXiv:2303.03237  [pdf, other

    stat.ML cs.LG math.ST stat.CO

    Convergence Rates for Non-Log-Concave Sampling and Log-Partition Estimation

    Authors: David Holzmüller, Francis Bach

    Abstract: Sampling from Gibbs distributions $p(x) \propto \exp(-V(x)/\varepsilon)$ and computing their log-partition function are fundamental tasks in statistics, machine learning, and statistical physics. However, while efficient algorithms are known for convex potentials $V$, the situation is much more difficult in the non-convex case, where algorithms necessarily suffer from the curse of dimensionality i… ▽ More

    Submitted 1 August, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

    Comments: Changes in v3: Minor corrections and improvements. Plots can be reproduced using the code at https://github.com/dholzmueller/sampling_experiments

  13. arXiv:2303.01372  [pdf, other

    cs.LG stat.ML

    High-dimensional analysis of double descent for linear regression with random projections

    Authors: Francis Bach

    Abstract: We consider linear regression problems with a varying number of random projections, where we provably exhibit a double descent curve for a fixed prediction problem, with a high-dimensional analysis based on random matrix theory. We first consider the ridge regression estimator and review earlier results using classical notions from non-parametric statistics, namely degrees of freedom, also known a… ▽ More

    Submitted 14 March, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

  14. arXiv:2302.06757  [pdf, other

    stat.ML cs.LG math.NA math.ST

    Kernelized Diffusion maps

    Authors: Loucas Pillaud-Vivien, Francis Bach

    Abstract: Spectral clustering and diffusion maps are celebrated dimensionality reduction algorithms built on eigen-elements related to the diffusive structure of the data. The core of these procedures is the approximation of a Laplacian through a graph kernel approach, however this local average construction is known to be cursed by the high-dimension d. In this article, we build a different estimator of th… ▽ More

    Submitted 13 February, 2023; originally announced February 2023.

    Comments: 19 pages, 1 Figure

  15. arXiv:2302.03459  [pdf, ps, other

    cs.LG math.ST stat.ML

    On the relationship between multivariate splines and infinitely-wide neural networks

    Authors: Francis Bach

    Abstract: We consider multivariate splines and show that they have a random feature expansion as infinitely wide neural networks with one-hidden layer and a homogeneous activation function which is the power of the rectified linear unit. We show that the associated function space is a Sobolev space on a Euclidean ball, with an explicit bound on the norms of derivatives. This link provides a new random featu… ▽ More

    Submitted 1 March, 2023; v1 submitted 7 February, 2023; originally announced February 2023.

  16. arXiv:2211.05641  [pdf, other

    cs.LG cs.AI stat.ML

    Regression as Classification: Influence of Task Formulation on Neural Network Features

    Authors: Lawrence Stewart, Francis Bach, Quentin Berthet, Jean-Philippe Vert

    Abstract: Neural networks can be trained to solve regression problems by using gradient-based methods to minimize the square loss. However, practitioners often prefer to reformulate regression as a classification problem, observing that training on the cross entropy loss results in better performance. By focusing on two-layer ReLU networks, which can be fully characterized by measures over their feature spa… ▽ More

    Submitted 1 March, 2023; v1 submitted 10 November, 2022; originally announced November 2022.

  17. arXiv:2206.13285  [pdf, ps, other

    cs.IT cs.LG math.OC stat.ML

    Sum-of-Squares Relaxations for Information Theory and Variational Inference

    Authors: Francis Bach

    Abstract: We consider extensions of the Shannon relative entropy, referred to as $f$-divergences.Three classical related computational problems are typically associated with these divergences: (a) estimation from moments, (b) computing normalizing integrals, and (c) variational inference in probabilistic models. These problems are related to one another through convex duality, and for all them, there are… ▽ More

    Submitted 18 September, 2023; v1 submitted 27 June, 2022; originally announced June 2022.

  18. arXiv:2206.04613  [pdf, other

    cs.LG stat.ML

    Explicit Regularization in Overparametrized Models via Noise Injection

    Authors: Antonio Orvieto, Anant Raj, Hans Kersting, Francis Bach

    Abstract: Injecting noise within gradient descent has several desirable features, such as smoothing and regularizing properties. In this paper, we investigate the effects of injecting noise before computing a gradient step. We demonstrate that small perturbations can induce explicit regularization for simple models based on the L1-norm, group L1-norms, or nuclear norms. However, when applied to overparametr… ▽ More

    Submitted 22 January, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: Accepted at AISTATS 2023 23 pages

  19. arXiv:2205.15902  [pdf, other

    stat.ML cs.LG math.ST

    Variational inference via Wasserstein gradient flows

    Authors: Marc Lambert, Sinho Chewi, Francis Bach, Silvère Bonnabel, Philippe Rigollet

    Abstract: Along with Markov chain Monte Carlo (MCMC) methods, variational inference (VI) has emerged as a central computational approach to large-scale Bayesian inference. Rather than sampling from the true posterior $π$, VI aims at producing a simple but effective approximation $\hat π$ to $π$ for which summary statistics are easy to compute. However, unlike the well-studied MCMC methodology, algorithmic g… ▽ More

    Submitted 21 April, 2023; v1 submitted 31 May, 2022; originally announced May 2022.

    Comments: 52 pages, 15 figures

  20. arXiv:2205.13255  [pdf, other

    cs.LG cs.AI cs.IR stat.ML

    Active Labeling: Streaming Stochastic Gradients

    Authors: Vivien Cabannes, Francis Bach, Vianney Perchet, Alessandro Rudi

    Abstract: The workhorse of machine learning is stochastic gradient descent. To access stochastic gradients, it is common to consider iteratively input/output pairs of a training dataset. Interestingly, it appears that one does not need full supervision to access stochastic gradients, which is the main motivation of this paper. After formalizing the "active labeling" problem, which focuses on active learning… ▽ More

    Submitted 7 December, 2022; v1 submitted 26 May, 2022; originally announced May 2022.

    Comments: 38 pages (9 main pages), 9 figures

    MSC Class: 68T37 ACM Class: G.3

  21. arXiv:2204.07879  [pdf, other

    cs.LG stat.ML

    Polynomial-time Sparse Measure Recovery: From Mean Field Theory to Algorithm Design

    Authors: Hadi Daneshmand, Francis Bach

    Abstract: Mean field theory has provided theoretical insights into various algorithms by letting the problem size tend to infinity. We argue that the applications of mean-field theory go beyond theoretical insights as it can inspire the design of practical algorithms. Leveraging mean-field analyses in physics, we propose a novel algorithm for sparse measure recovery. For sparse measures over $\mathbb{R}$, w… ▽ More

    Submitted 12 February, 2023; v1 submitted 16 April, 2022; originally announced April 2022.

  22. arXiv:2202.08545  [pdf, ps, other

    cs.IT cs.LG math.OC stat.ML

    Information Theory with Kernel Methods

    Authors: Francis Bach

    Abstract: We consider the analysis of probability distributions through their associated covariance operators from reproducing kernel Hilbert spaces. We show that the von Neumann entropy and relative entropy of these operators are intimately related to the usual notions of Shannon entropy and relative entropy, and share many of their properties. They come together with efficient estimation algorithms from v… ▽ More

    Submitted 26 August, 2022; v1 submitted 17 February, 2022; originally announced February 2022.

  23. arXiv:2202.02831  [pdf, other

    stat.ML cs.LG math.OC

    Anticorrelated Noise Injection for Improved Generalization

    Authors: Antonio Orvieto, Hans Kersting, Frank Proske, Francis Bach, Aurelien Lucchi

    Abstract: Injecting artificial noise into gradient descent (GD) is commonly employed to improve the performance of machine learning models. Usually, uncorrelated noise is used in such perturbed gradient descent (PGD) methods. It is, however, not known if this is optimal or whether other types of noise could provide better generalization performance. In this paper, we zoom in on the problem of correlating th… ▽ More

    Submitted 19 May, 2023; v1 submitted 6 February, 2022; originally announced February 2022.

    Comments: 22 pages, 16 figures

  24. arXiv:2201.11980  [pdf, ps, other

    stat.ML cs.LG

    Differential Privacy Guarantees for Stochastic Gradient Langevin Dynamics

    Authors: Théo Ryffel, Francis Bach, David Pointcheval

    Abstract: We analyse the privacy leakage of noisy stochastic gradient descent by modeling Rényi divergence dynamics with Langevin diffusions. Inspired by recent work on non-stochastic algorithms, we derive similar desirable properties in the stochastic setting. In particular, we prove that the privacy loss converges exponentially fast for smooth and strongly convex objectives under constant step size, which… ▽ More

    Submitted 5 February, 2022; v1 submitted 28 January, 2022; originally announced January 2022.

  25. arXiv:2112.01907  [pdf, other

    stat.ML cs.LG math.ST

    Near-optimal estimation of smooth transport maps with kernel sums-of-squares

    Authors: Boris Muzellec, Adrien Vacher, Francis Bach, François-Xavier Vialard, Alessandro Rudi

    Abstract: It was recently shown that under smoothness conditions, the squared Wasserstein distance between two distributions could be efficiently computed with appealing statistical error upper bounds. However, rather than the distance itself, the object of interest for applications such as generative modeling is the underlying optimal transport map. Hence, computational and statistical guarantees need to b… ▽ More

    Submitted 29 December, 2021; v1 submitted 3 December, 2021; originally announced December 2021.

  26. arXiv:2111.11306  [pdf, other

    stat.ML cs.LG

    Learning PSD-valued functions using kernel sums-of-squares

    Authors: Boris Muzellec, Francis Bach, Alessandro Rudi

    Abstract: Shape constraints such as positive semi-definiteness (PSD) for matrices or convexity for functions play a central role in many applications in machine learning and sciences, including metric learning, optimal transport, and economics. Yet, very few function models exist that enforce PSD-ness or convexity with good empirical performance and theoretical guarantees. In this paper, we introduce a kern… ▽ More

    Submitted 24 January, 2022; v1 submitted 22 November, 2021; originally announced November 2021.

  27. arXiv:2106.09994  [pdf, other

    cs.LG stat.ML

    A Note on Optimizing Distributions using Kernel Mean Embeddings

    Authors: Boris Muzellec, Francis Bach, Alessandro Rudi

    Abstract: Kernel mean embeddings are a popular tool that consists in representing probability measures by their infinite-dimensional mean embeddings in a reproducing kernel Hilbert space. When the kernel is characteristic, mean embeddings can be used to define a distance between probability measures, known as the maximum mean discrepancy (MMD). A well-known advantage of mean embeddings and MMD is their low… ▽ More

    Submitted 27 June, 2021; v1 submitted 18 June, 2021; originally announced June 2021.

  28. arXiv:2106.07644  [pdf, other

    math.OC cs.LG cs.MA math.PR stat.ML

    A Continuized View on Nesterov Acceleration for Stochastic Gradient Descent and Randomized Gossip

    Authors: Mathieu Even, Raphaël Berthier, Francis Bach, Nicolas Flammarion, Pierre Gaillard, Hadrien Hendrikx, Laurent Massoulié, Adrien Taylor

    Abstract: We introduce the continuized Nesterov acceleration, a close variant of Nesterov acceleration whose variables are indexed by a continuous time parameter. The two variables continuously mix following a linear ordinary differential equation and take gradient steps at random times. This continuized variant benefits from the best of the continuous and the discrete frameworks: as a continuous process, o… ▽ More

    Submitted 27 October, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

    Comments: arXiv admin note: substantial text overlap with arXiv:2102.06035

  29. arXiv:2106.03970  [pdf, other

    stat.ML cs.AI cs.LG

    Batch Normalization Orthogonalizes Representations in Deep Random Networks

    Authors: Hadi Daneshmand, Amir Joudaki, Francis Bach

    Abstract: This paper underlines a subtle property of batch-normalization (BN): Successive batch normalizations with random linear transformations make hidden representations increasingly orthogonal across layers of a deep neural network. We establish a non-asymptotic characterization of the interplay between depth, width, and the orthogonality of deep representations. More precisely, under a mild assumption… ▽ More

    Submitted 7 June, 2021; originally announced June 2021.

  30. arXiv:2105.15069  [pdf, other

    cs.LG stat.ML

    On the Consistency of Max-Margin Losses

    Authors: Alex Nowak-Vila, Alessandro Rudi, Francis Bach

    Abstract: The foundational concept of Max-Margin in machine learning is ill-posed for output spaces with more than two labels such as in structured prediction. In this paper, we show that the Max-Margin loss can only be consistent to the classification task under highly restrictive assumptions on the discrete loss measuring the error between outputs. These conditions are satisfied by distances defined in tr… ▽ More

    Submitted 21 March, 2022; v1 submitted 31 May, 2021; originally announced May 2021.

  31. arXiv:2102.02789  [pdf, other

    cs.LG cs.AI stat.ML

    Disambiguation of weak supervision with exponential convergence rates

    Authors: Vivien Cabannes, Francis Bach, Alessandro Rudi

    Abstract: Machine learning approached through supervised learning requires expensive annotation of data. This motivates weakly supervised learning, where data are annotated with incomplete yet discriminative information. In this paper, we focus on partial labelling, an instance of weak supervision where, from a given input, we are given a set of potential targets. We review a disambiguation principle to rec… ▽ More

    Submitted 15 July, 2021; v1 submitted 4 February, 2021; originally announced February 2021.

    Comments: 22 pages; 6 figures

    MSC Class: 68Q32 ACM Class: I.2.6; G.3; F.2.2

    Journal ref: Proceedings of the 38th International Conference on Machine Learning, PMLR 139, 2021

  32. arXiv:2102.00760  [pdf, ps, other

    stat.ML cs.AI cs.LG math.ST

    Fast rates in structured prediction

    Authors: Vivien Cabannes, Alessandro Rudi, Francis Bach

    Abstract: Discrete supervised learning problems such as classification are often tackled by introducing a continuous surrogate problem akin to regression. Bounding the original error, between estimate and solution, by the surrogate error endows discrete problems with convergence rates already shown for continuous instances. Yet, current approaches do not leverage the fact that discrete problems are essentia… ▽ More

    Submitted 15 July, 2021; v1 submitted 1 February, 2021; originally announced February 2021.

    Comments: 14 main pages, 3 main figures, 43 pages, 4 figures (with appendix)

    MSC Class: 68T05 ACM Class: I.2.6; F.2.2; G.3

    Journal ref: Conference on Learning Theory, PMLR 134, 2021

  33. arXiv:2012.11978  [pdf, ps, other

    math.OC cs.LG stat.ML

    Finding Global Minima via Kernel Approximations

    Authors: Alessandro Rudi, Ulysse Marteau-Ferey, Francis Bach

    Abstract: We consider the global minimization of smooth functions based solely on function evaluations. Algorithms that achieve the optimal number of function evaluations for a given precision level typically rely on explicitly constructing an approximation of the function which is then minimized with algorithms that have exponential running-time complexity. In this paper, we consider an approach that joint… ▽ More

    Submitted 22 December, 2020; originally announced December 2020.

  34. arXiv:2012.09775  [pdf, other

    stat.ME cs.CR

    Differential privacy and noisy confidentiality concepts for European population statistics

    Authors: Fabian Bach

    Abstract: The paper aims to give an overview of various approaches to statistical disclosure control based on random noise that are currently being discussed for official population statistics and censuses. A particular focus is on a stringent delineation between different concepts influencing the discussion: we separate clearly between risk measures, noise distributions and output mechanisms - putting thes… ▽ More

    Submitted 17 December, 2020; originally announced December 2020.

    Comments: 37 pages, 7 figures, extended abstract accepted for NTTS 2021

  35. arXiv:2010.00892  [pdf, other

    cs.LG math.OC stat.ML

    Variance-Reduced Methods for Machine Learning

    Authors: Robert M. Gower, Mark Schmidt, Francis Bach, Peter Richtarik

    Abstract: Stochastic optimization lies at the heart of machine learning, and its cornerstone is stochastic gradient descent (SGD), a method introduced over 60 years ago. The last 8 years have seen an exciting new development: variance reduction (VR) for stochastic optimization methods. These VR methods excel in settings where more than one pass through the training data is allowed, achieving a faster conver… ▽ More

    Submitted 2 October, 2020; originally announced October 2020.

    Comments: 16 pages, 7 figures, 1 table

    MSC Class: 65K05; 68T99 ACM Class: G.1.6

  36. arXiv:2009.14397  [pdf, other

    stat.ML cs.LG

    Deep Equals Shallow for ReLU Networks in Kernel Regimes

    Authors: Alberto Bietti, Francis Bach

    Abstract: Deep networks are often considered to be more expressive than shallow ones in terms of approximation. Indeed, certain functions can be approximated by deep networks provably more efficiently than by shallow ones, however, no tractable algorithms are known for learning such deep models. Separately, a recent line of work has shown that deep networks trained with gradient descent may behave like (tra… ▽ More

    Submitted 26 August, 2021; v1 submitted 29 September, 2020; originally announced September 2020.

  37. arXiv:2009.04324  [pdf, other

    stat.ML cs.LG

    Overcoming the curse of dimensionality with Laplacian regularization in semi-supervised learning

    Authors: Vivien Cabannes, Loucas Pillaud-Vivien, Francis Bach, Alessandro Rudi

    Abstract: As annotations of data can be scarce in large-scale practical problems, leveraging unlabelled examples is one of the most important aspects of machine learning. This is the aim of semi-supervised learning. To benefit from the access to unlabelled data, it is natural to diffuse smoothly knowledge of labelled data to unlabelled one. This induces to the use of Laplacian regularization. Yet, current i… ▽ More

    Submitted 29 November, 2021; v1 submitted 9 September, 2020; originally announced September 2020.

    Comments: 38 pages, 6 figures

    Journal ref: NeurIPS 2021

  38. arXiv:2007.01012  [pdf, other

    cs.LG stat.ML

    Consistent Structured Prediction with Max-Min Margin Markov Networks

    Authors: Alex Nowak-Vila, Francis Bach, Alessandro Rudi

    Abstract: Max-margin methods for binary classification such as the support vector machine (SVM) have been extended to the structured prediction setting under the name of max-margin Markov networks ($M^3N$), or more generally structural SVMs. Unfortunately, these methods are statistically inconsistent when the relationship between inputs and labels is far from deterministic. We overcome such limitations by d… ▽ More

    Submitted 27 July, 2020; v1 submitted 2 July, 2020; originally announced July 2020.

  39. arXiv:2006.09261  [pdf, other

    cs.LG cs.CV stat.ML

    Structured and Localized Image Restoration

    Authors: Thomas Eboli, Alex Nowak-Vila, Jian Sun, Francis Bach, Jean Ponce, Alessandro Rudi

    Abstract: We present a novel approach to image restoration that leverages ideas from localized structured prediction and non-linear multi-task learning. We optimize a penalized energy function regularized by a sum of terms measuring the distance between patches to be restored and clean patches from an external database gathered beforehand. The resulting estimator comes with strong statistical guarantees lev… ▽ More

    Submitted 16 June, 2020; originally announced June 2020.

  40. arXiv:2006.08212  [pdf, other

    cs.LG cs.MA math.OC stat.ML

    Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model

    Authors: Raphaël Berthier, Francis Bach, Pierre Gaillard

    Abstract: In the context of statistical supervised learning, the noiseless linear model assumes that there exists a deterministic linear relation $Y = \langle θ_*, X \rangle$ between the random output $Y$ and the random feature vector $Φ(U)$, a potentially non-linear transformation of the inputs $U$. We analyze the convergence of single-pass, fixed step-size stochastic gradient descent on the least-square r… ▽ More

    Submitted 27 October, 2020; v1 submitted 15 June, 2020; originally announced June 2020.

  41. arXiv:2006.04593  [pdf, other

    cs.LG cs.CR stat.ML

    ARIANN: Low-Interaction Privacy-Preserving Deep Learning via Function Secret Sharing

    Authors: Théo Ryffel, Pierre Tholoniat, David Pointcheval, Francis Bach

    Abstract: We propose AriaNN, a low-interaction privacy-preserving framework for private neural network training and inference on sensitive data. Our semi-honest 2-party computation protocol (with a trusted dealer) leverages function secret sharing, a recent lightweight cryptographic protocol that allows us to achieve an efficient online phase. We design optimized primitives for the building blocks of neural… ▽ More

    Submitted 28 October, 2021; v1 submitted 8 June, 2020; originally announced June 2020.

    Comments: 26 pages

  42. arXiv:2003.02395  [pdf, other

    stat.ML cs.LG

    A Simple Convergence Proof of Adam and Adagrad

    Authors: Alexandre Défossez, Léon Bottou, Francis Bach, Nicolas Usunier

    Abstract: We provide a simple proof of convergence covering both the Adam and Adagrad adaptive optimization algorithms when applied to smooth (possibly non-convex) objective functions with bounded gradients. We show that in expectation, the squared norm of the objective gradient averaged over the trajectory has an upper-bound which is explicit in the constants of the problem, parameters of the optimizer, th… ▽ More

    Submitted 17 October, 2022; v1 submitted 4 March, 2020; originally announced March 2020.

    Comments: final TMLR version

  43. arXiv:2003.01652  [pdf, other

    stat.ML cs.LG

    Batch Normalization Provably Avoids Rank Collapse for Randomly Initialised Deep Networks

    Authors: Hadi Daneshmand, Jonas Kohler, Francis Bach, Thomas Hofmann, Aurelien Lucchi

    Abstract: Randomly initialized neural networks are known to become harder to train with increasing depth, unless architectural enhancements like residual connections and batch normalization are used. We here investigate this phenomenon by revisiting the connection between random initialization in deep networks and spectral instabilities in products of random matrices. Given the rich literature on random mat… ▽ More

    Submitted 11 June, 2020; v1 submitted 3 March, 2020; originally announced March 2020.

  44. arXiv:2003.00920  [pdf, other

    cs.LG cs.AI stat.ML

    Structured Prediction with Partial Labelling through the Infimum Loss

    Authors: Vivien Cabannes, Alessandro Rudi, Francis Bach

    Abstract: Annotating datasets is one of the main costs in nowadays supervised learning. The goal of weak supervision is to enable models to learn using only forms of labelling which are cheaper to collect, as partial labelling. This is a type of incomplete annotation where, for each datapoint, supervision is cast as a set of labels containing the real one. The problem of supervised learning with partial lab… ▽ More

    Submitted 9 September, 2020; v1 submitted 2 March, 2020; originally announced March 2020.

    Comments: 8 pages for main paper, 27 with main paper, 13 figures, 3 tables

    MSC Class: 68Q32 ACM Class: I.2.6; G.3

    Journal ref: Proceedings of the 37th International Conference on Machine Learning, PMLR 119:1230-1239, 2020

  45. arXiv:2002.09718  [pdf, other

    cs.LG math.OC stat.ML

    Safe Screening for the Generalized Conditional Gradient Method

    Authors: Yifan Sun, Francis Bach

    Abstract: The conditional gradient method (CGM) has been widely used for fast sparse approximation, having a low per iteration computational cost for structured sparse regularizers. We explore the sparsity acquiring properties of a generalized CGM (gCGM), where the constraint is replaced by a penalty function based on a gauge penalty; this can be done without significantly increasing the per-iteration compu… ▽ More

    Submitted 22 February, 2020; originally announced February 2020.

  46. arXiv:2002.08695  [pdf, other

    cs.LG math.OC stat.ML

    Stochastic Optimization for Regularized Wasserstein Estimators

    Authors: Marin Ballu, Quentin Berthet, Francis Bach

    Abstract: Optimal transport is a foundational problem in optimization, that allows to compare probability distributions while taking into account geometric aspects. Its optimal objective value, the Wasserstein distance, provides an important loss between distributions that has been used in many applications throughout machine learning and statistics. Recent algorithmic progress on this problem and its regul… ▽ More

    Submitted 20 February, 2020; originally announced February 2020.

  47. arXiv:2002.08676  [pdf, other

    cs.LG math.OC stat.ML

    Learning with Differentiable Perturbed Optimizers

    Authors: Quentin Berthet, Mathieu Blondel, Olivier Teboul, Marco Cuturi, Jean-Philippe Vert, Francis Bach

    Abstract: Machine learning pipelines often rely on optimization procedures to make discrete decisions (e.g., sorting, picking closest neighbors, or shortest paths). Although these discrete decisions are easily computed, they break the back-propagation of computational graphs. In order to expand the scope of learning problems that can be solved in an end-to-end fashion, we propose a systematic method to tran… ▽ More

    Submitted 9 June, 2020; v1 submitted 20 February, 2020; originally announced February 2020.

  48. arXiv:2002.04486  [pdf, other

    math.OC cs.LG stat.ML

    Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss

    Authors: Lenaic Chizat, Francis Bach

    Abstract: Neural networks trained to minimize the logistic (a.k.a. cross-entropy) loss with gradient-based methods are observed to perform well in many supervised classification tasks. Towards understanding this phenomenon, we analyze the training and generalization behavior of infinitely wide two-layer neural networks with homogeneous activations. We show that the limits of the gradient flow on exponential… ▽ More

    Submitted 22 June, 2020; v1 submitted 11 February, 2020; originally announced February 2020.

    Journal ref: Conference on Learning Theory, Jul 2020, Graz, Austria

  49. arXiv:1911.13254  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Music Source Separation in the Waveform Domain

    Authors: Alexandre Défossez, Nicolas Usunier, Léon Bottou, Francis Bach

    Abstract: Source separation for music is the task of isolating contributions, or stems, from different instruments recorded individually and arranged together to form a song. Such components include voice, bass, drums and any other accompaniments.Contrarily to many audio synthesis tasks where the best performances are achieved by models that directly generate the waveform, the state-of-the-art in source… ▽ More

    Submitted 28 April, 2021; v1 submitted 27 November, 2019; originally announced November 2019.

  50. arXiv:1909.01174  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Demucs: Deep Extractor for Music Sources with extra unlabeled data remixed

    Authors: Alexandre Défossez, Nicolas Usunier, Léon Bottou, Francis Bach

    Abstract: We study the problem of source separation for music using deep learning with four known sources: drums, bass, vocals and other accompaniments. State-of-the-art approaches predict soft masks over mixture spectrograms while methods working on the waveform are lagging behind as measured on the standard MusDB benchmark. Our contribution is two fold. (i) We introduce a simple convolutional and recurren… ▽ More

    Submitted 3 September, 2019; originally announced September 2019.