Skip to main content

Showing 1–2 of 2 results for author: He, X O

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.04153  [pdf, other

    cs.LG cs.AI

    Mixture of A Million Experts

    Authors: Xu Owen He

    Abstract: The feedforward (FFW) layers in standard transformer architectures incur a linear increase in computational costs and activation memory as the hidden layer width grows. Sparse mixture-of-experts (MoE) architectures have emerged as a viable approach to address this issue by decoupling model size from computational cost. The recent discovery of the fine-grained MoE scaling law shows that higher gran… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  2. arXiv:2211.11747  [pdf, other

    cs.LG cs.CV

    NEVIS'22: A Stream of 100 Tasks Sampled from 30 Years of Computer Vision Research

    Authors: Jorg Bornschein, Alexandre Galashov, Ross Hemsley, Amal Rannen-Triki, Yutian Chen, Arslan Chaudhry, Xu Owen He, Arthur Douillard, Massimo Caccia, Qixuang Feng, Jiajun Shen, Sylvestre-Alvise Rebuffi, Kitty Stacpoole, Diego de las Casas, Will Hawkins, Angeliki Lazaridou, Yee Whye Teh, Andrei A. Rusu, Razvan Pascanu, Marc'Aurelio Ranzato

    Abstract: A shared goal of several machine learning communities like continual learning, meta-learning and transfer learning, is to design algorithms and models that efficiently and robustly adapt to unseen tasks. An even more ambitious goal is to build models that never stop adapting, and that become increasingly more efficient through time by suitably transferring the accrued knowledge. Beyond the study o… ▽ More

    Submitted 16 May, 2023; v1 submitted 15 November, 2022; originally announced November 2022.