skip to main content
10.5555/3618408guideproceedingsBook PagePublication PagesConference Proceedingsacm-pubtype
ICML'23: Proceedings of the 40th International Conference on Machine Learning
2023 Proceeding
Publisher:
  • JMLR.org
Conference:
ICML'23: International Conference on Machine Learning Honolulu Hawaii USA July 23 - 29, 2023
Published:
23 July 2023

Reflects downloads up to 16 Oct 2024Bibliometrics
Abstract

No abstract available.

research-article
Data structures for density estimation
Article No.: 1, Pages 1–18

We study statistical/computational tradeoffs for the following density estimation problem: given k distributions v1, ..., vk over a discrete domain of size n, and sampling access to a distribution p, identify vi that is "close" to p. Our main result is ...

research-article
ClusterFuG: clustering fully connected graphs by multicut
Article No.: 2, Pages 19–30

We propose a graph clustering formulation based on multicut (a.k.a. weighted correlation clustering) on the complete graph. Our formulation does not need specification of the graph topology as in the original sparse formulation of multicut, making our ...

research-article
Generalization on the unseen, logic reasoning and degree curriculum
Article No.: 3, Pages 31–60

This paper considers the learning of logical (Boolean) functions with focus on the generalization on the unseen (GOTU) setting, a strong case of out-of-distribution generalization. This is motivated by the fact that the rich combinatorial nature of data ...

research-article
Toward large kernel models
Article No.: 4, Pages 61–78

Recent studies indicate that kernel machines can often perform similarly or better than deep neural networks (DNNs) on small datasets. The interest in kernel machines has been additionally bolstered by the discovery of their equivalence to wide neural ...

research-article
Expertise trees resolve knowledge limitations in collective decision-making
Article No.: 5, Pages 79–90

Experts advising decision-makers are likely to display expertise which varies as a function of the problem instance. In practice, this may lead to sub-optimal or discriminatory decisions against minority cases. In this work, we model such changes in depth ...

research-article
Comparison of meta-learners for estimating multi-valued treatment heterogeneous effects
Article No.: 6, Pages 91–132

Conditional Average Treatment Effects (CATE) estimation is one of the main challenges in causal inference with observational data. In addition to Machine Learning based-models, nonparametric estimators called meta-learners have been developed to estimate ...

research-article
BNN-DP: robustness certification of Bayesian neural networks via dynamic programming
Article No.: 7, Pages 133–151

In this paper, we introduce BNN-DP, an efficient algorithmic framework for analysis of adversarial robustness of Bayesian Neural Networks (BNNs). Given a compact set of input points T ⊂ ℝn, BNN-DP computes lower and upper bounds on the BNN's predictions ...

research-article
SAM operates far from home: eigenvalue regularization as a dynamical phenomenon
Article No.: 8, Pages 152–168

The Sharpness Aware Minimization (SAM) optimization algorithm has been shown to control large eigenvalues of the loss Hessian and provide generalization benefits in a variety of settings. The original motivation for SAM was a modified loss function which ...

research-article
Second-order regression models exhibit progressive sharpening to the edge of stability
Article No.: 9, Pages 169–195

Recent studies of gradient descent with large step sizes have shown that there is often a regime with an initial increase in the largest eigenvalue of the loss Hessian (progressive sharpening), followed by a stabilization of the eigenvalue near the ...

research-article
Global optimality of Elman-type RNNs in the mean-field regime
Article No.: 10, Pages 196–227

We analyze Elman-type Recurrent Reural Networks (RNNs) and their training in the mean-field regime. Specifically, we show convergence of gradient descent training dynamics of the RNN to the corresponding mean-field formulation in the large width limit. We ...

research-article
SemSup-XC: semantic supervision for zero and few-shot extreme classification
Article No.: 11, Pages 228–247

Extreme classification (XC) involves predicting over large numbers of classes (thousands to millions), with real-world applications like news article classification and e-commerce product tagging. The zero-shot version of this task requires generalization ...

research-article
Adaptive IMLE for few-shot pretraining-free generative modelling
Article No.: 12, Pages 248–264

Despite their success on large datasets, GANs have been difficult to apply in the few-shot setting, where only a limited number of training examples are provided. Due to mode collapse, GANs tend to ignore some training examples, causing overfitting to a ...

research-article
Scaling laws for generative mixed-modal language models
Article No.: 13, Pages 265–279

Generative language models define distributions over sequences of tokens that can represent essentially any combination of data modalities (e.g., any permutation of image tokens from VQ-VAEs, speech tokens from HuBERT, BPE tokens for language or code, and ...

research-article
Hypothesis transfer learning with surrogate classification losses: generalization bounds through algorithmic stability
Article No.: 14, Pages 280–303

Hypothesis transfer learning (HTL) contrasts domain adaptation by allowing for a previous task leverage, named the source, into a new one, the target, without requiring access to the source data. Indeed, HTL relies only on a hypothesis learnt from such ...

research-article
Constrained causal Bayesian optimization
Article No.: 15, Pages 304–321

We propose constrained causal Bayesian optimization (cCBO), an approach for finding interventions in a known causal graph that optimize a target variable under some constraints. cCBO first reduces the search space by exploiting the graph structure and, if ...

research-article
Explaining the effects of non-convergent MCMC in the training of energy-based models
Article No.: 16, Pages 322–336

In this paper, we quantify the impact of using nonconvergent Markov chains to train Energy-Based models (EBMs). In particular, we show analytically that EBMs trained with non-persistent short runs to estimate the gradient can perfectly reproduce a set of ...

research-article
Using large language models to simulate multiple humans and replicate human subject studies
Article No.: 17, Pages 337–371

We introduce a new type of test, called a Turing Experiment (TE), for evaluating to what extent a given language model, such as GPT models, can simulate different aspects of human behavior. A TE can also reveal consistent distortions in a language model's ...

research-article
Interventional causal representation learning
Article No.: 18, Pages 372–407

Causal representation learning seeks to extract high-level latent factors from low-level sensory data. Most existing methods rely on observational data and structural assumptions (e.g., conditional independence) to identify the latent factors. However, ...

research-article
Sequential underspecified instrument selection for cause-effect estimation
Article No.: 19, Pages 408–420

Instrumental variable (IV) methods are used to estimate causal effects in settings with unobserved confounding, where we cannot directly experiment on the treatment variable. Instruments are variables which only affect the outcome indirectly via the ...

research-article
Atari-5: distilling the arcade learning environment down to five games
Article No.: 20, Pages 421–438

The Arcade Learning Environment (ALE) has become an essential benchmark for assessing the performance of reinforcement learning algorithms. However, the computational cost of generating results on the entire 57-game dataset limits ALE's use and makes the ...

research-article
Towards credible visual model interpretation with path attribution
Article No.: 21, Pages 439–457

With its inspirational roots in game-theory, path attribution framework stands out among the posthoc model interpretation techniques due to its axiomatic nature. However, recent developments show that despite being axiomatic, path attribution methods can ...

research-article
Convergence of first-order methods for constrained nonconvex optimization with dependent data
Article No.: 22, Pages 458–489

We focus on analyzing the classical stochastic projected gradient methods under a general dependent data sampling scheme for constrained smooth nonconvex optimization. We show the worst-case rate of convergence Õ(t-1/4) and complexity Õ(ε-4) for achieving ...

research-article
Recasting self-attention with holographic reduced representations
Article No.: 23, Pages 490–507

In recent years, self-attention has become the dominant paradigm for sequence modeling in a variety of domains. However, in domains with very long sequence lengths the O(T2) memory and O(T2H) compute costs can make using transformers infeasible. Motivated ...

research-article
The saddle-point method in differential privacy
Article No.: 24, Pages 508–528

We characterize the differential privacy guarantees of privacy mechanisms in the largecomposition regime, i.e., when a privacy mechanism is sequentially applied a large number of times to sensitive data. Via exponentially tilting the privacy loss random ...

research-article
Nonlinear advantage: trained networks might not be as complex as you think
Article No.: 25, Pages 529–546

We perform an empirical study of the behaviour of deep networks when fully linearizing some of its feature channels through a sparsity prior on the overall number of nonlinear units in the network. In experiments on image classification and machine ...

research-article
A simple zero-shot prompt weighting technique to improve prompt ensembling in text-image models
Article No.: 26, Pages 547–568

Contrastively trained text-image models have the remarkable ability to perform zero-shot classification, that is, classifying previously unseen images into categories that the model has never been explicitly trained to identify. However, these zero-shot ...

research-article
On the privacy-robustness-utility trilemma in distributed learning
Article No.: 27, Pages 569–626

The ubiquity of distributed machine learning (ML) in sensitive public domain applications calls for algorithms that protect data privacy, while being robust to faults and adversarial behaviors. Although privacy and robustness have been extensively studied ...

research-article
Differentially private distributed Bayesian linear regression with MCMC
Article No.: 28, Pages 627–641

We propose a novel Bayesian inference framework for distributed differentially private linear regression. We consider a distributed setting where multiple parties hold parts of the data and share certain summary statistics of their portions in privacy-...

research-article
Robust and scalable Bayesian online changepoint detection
Article No.: 29, Pages 642–663

This paper proposes an online, provably robust, and scalable Bayesian approach for change-point detection. The resulting algorithm has key advantages over previous work: it provides provable robustness by leveraging the generalised Bayesian perspective, ...

research-article
Neural Wasserstein gradient flows for discrepancies with Riesz kernels
Article No.: 30, Pages 664–690

Wasserstein gradient flows of maximum mean discrepancy (MMD) functionals with nonsmooth Riesz kernels show a rich structure as singular measures can become absolutely continuous ones and conversely. In this paper we contribute to the understanding of such ...

Contributors
  • Stanford University
  • Stanford University
  • Ben-Gurion University of the Negev
  • National University of Singapore
Index terms have been assigned to the content through auto-classification.

Recommendations