PLaMo-100B: A Ground-Up Language Model Designed for Japanese Proficiency

Abe, Kenshin; Chubachi, Kaizaburo; Fujita, Yasuhiro; Hirokawa, Yuta; Imajo, Kentaro; Kataoka, Toshiki; Komatsu, Hiroyoshi; Mikami, Hiroaki; Mogami, Tsuguo; Murai, Shogo; Nakago, Kosuke; Nishino, Daisuke; Ogawa, Toru; Okanohara, Daisuke; Ozaki, Yoshihiko; Sano, Shotaro; Suzuki, Shuji; Xu, Tianqi; Yanase, Toshihiko

Computer Science > Computation and Language

arXiv:2410.07563 (cs)

[Submitted on 10 Oct 2024]

Title:PLaMo-100B: A Ground-Up Language Model Designed for Japanese Proficiency

Authors:Kenshin Abe, Kaizaburo Chubachi, Yasuhiro Fujita, Yuta Hirokawa, Kentaro Imajo, Toshiki Kataoka, Hiroyoshi Komatsu, Hiroaki Mikami, Tsuguo Mogami, Shogo Murai, Kosuke Nakago, Daisuke Nishino, Toru Ogawa, Daisuke Okanohara, Yoshihiko Ozaki, Shotaro Sano, Shuji Suzuki, Tianqi Xu, Toshihiko Yanase (Preferred Elements, Inc.)

View PDF HTML (experimental)

Abstract:We introduce PLaMo-100B, a large-scale language model designed for Japanese proficiency. The model was trained from scratch using 2 trillion tokens, with architecture such as QK Normalization and Z-Loss to ensure training stability during the training process. Post-training techniques, including Supervised Fine-Tuning and Direct Preference Optimization, were applied to refine the model's performance. Benchmark evaluations suggest that PLaMo-100B performs well, particularly in Japanese-specific tasks, achieving results that are competitive with frontier models like GPT-4.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2410.07563 [cs.CL]
	(or arXiv:2410.07563v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2410.07563

Submission history

From: Yasuhiro Fujita [view email]
[v1] Thu, 10 Oct 2024 02:59:36 UTC (90 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2024-10

Change to browse by:

cs
cs.AI
cs.LG

References & Citations

export BibTeX citation

Computer Science > Computation and Language

Title:PLaMo-100B: A Ground-Up Language Model Designed for Japanese Proficiency

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:PLaMo-100B: A Ground-Up Language Model Designed for Japanese Proficiency

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators