[Call for Papers] The 2nd BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus

Choshen, Leshem; Cotterell, Ryan; Hu, Michael Y.; Linzen, Tal; Mueller, Aaron; Ross, Candace; Warstadt, Alex; Wilcox, Ethan; Williams, Adina; Zhuang, Chengxu

Computer Science > Computation and Language

arXiv:2404.06214 (cs)

[Submitted on 9 Apr 2024 (v1), last revised 27 Jul 2024 (this version, v2)]

Title:[Call for Papers] The 2nd BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus

Authors:Leshem Choshen, Ryan Cotterell, Michael Y. Hu, Tal Linzen, Aaron Mueller, Candace Ross, Alex Warstadt, Ethan Wilcox, Adina Williams, Chengxu Zhuang

View PDF HTML (experimental)

Abstract:After last year's successful BabyLM Challenge, the competition will be hosted again in 2024/2025. The overarching goals of the challenge remain the same; however, some of the competition rules will be different. The big changes for this year's competition are as follows: First, we replace the loose track with a paper track, which allows (for example) non-model-based submissions, novel cognitively-inspired benchmarks, or analysis techniques. Second, we are relaxing the rules around pretraining data, and will now allow participants to construct their own datasets provided they stay within the 100M-word or 10M-word budget. Third, we introduce a multimodal vision-and-language track, and will release a corpus of 50% text-only and 50% image-text multimodal data as a starting point for LM model training. The purpose of this CfP is to provide rules for this year's challenge, explain these rule changes and their rationale in greater detail, give a timeline of this year's competition, and provide answers to frequently asked questions from last year's challenge.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2404.06214 [cs.CL]
	(or arXiv:2404.06214v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2404.06214

Submission history

From: Alex Warstadt [view email]
[v1] Tue, 9 Apr 2024 11:04:50 UTC (157 KB)
[v2] Sat, 27 Jul 2024 18:50:28 UTC (157 KB)

Computer Science > Computation and Language

Title:[Call for Papers] The 2nd BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:[Call for Papers] The 2nd BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators