Hugging Face reposted this
Transformers v4.45 was just released, and it introduces a change I would not have expected: Modularity in Modeling Files. Transformers has always been strict about its single-file policy: a model must be defined in a single file rather than through layers of abstraction. So, what changed, and why are we seemingly moving away from the concept that made transformers what it is today, with 250+ model architectures across many modalities? We respond to an issue that affects both contributors and maintainers: contributing a model to transformers is long and tedious. It oftens results in PRs spanning across 20+ files, with thousands of lines of code. We wanted a solution to remove that constraint from contributors, therefore significantly enabling model additions from model authors and community members. Still, the single-file policy is at the core of Transformers: controversial to some due to the constraints it brings with it, we know for a fact that it enabled: - Researchers to experiment and tweak the modeling files - Students to go through the code without jumping from abstraction to abstraction, - Community members to contribute models without first needing to understand the rest of the overwhelmingly large package. Therefore, we've worked on "Modular Transformers," an approach to designing modeling files in a modular way while maintaining the single-file policy. Contributing a model to Transformers can now be done by subclassing other models, inheriting all their attributes, methods, and forward definitions. The tool we contribute enables unraveling that inheritance into a single file. The RoBERTa "Modular" modeling file above defines the base and masked LM models. This is then unraveled in a 1700+ single-file model definition, which can be inspected, debugged, tweaked, and adapted. The model definition spans ~30 lines of code: only the differences are now explicit. This is particularly important in the wake of LLMs, with each released model being only slightly different in terms of architecture; most of the difference lying in the data for the pretrained checkpoints. While the "Modular" and "Single-file" model definitions serve different purposes, they should both result in the exact same code execution. We aim for no magic, no hidden behavior: define a code path, a property, a method in the modular file, and you'll see it reflected in the single file. With this now merged, we can start seeing model contributions coming in at 215 LoC for the modular file; being unraveled to several files, the single-file definition standing at 1300+ LoC. Now, please come and help us break it! It's experimental and brittle, but it should drastically lower the barrier of entry for model contribution. Come and contribute your model to make it accessible to the community at large