Jamba (language model): Difference between revisions

Jamba
Developer(s)	AI21 Labs
Initial release	28 March 2024
Type	Large language model; Generative pre-trained transformer; Mamba (deep learning architecture); Mixture of experts; Foundation model;
License	Apache 2.0 License

Browse history interactively

← Previous edit Next edit →

Content deleted Content added

VisualWikitext

Inline

Revision as of 18:57, 1 July 2024

Jamba is an open-weights large language model (LLM) developed by AI21 Labs.^[1]^[2] It utilizes a Mamba-based model built on a novel state space model (SSM) and transformer hybrid architecture.^[3]^[1]^[4] It is a 52 billion parameter model trained using a mixture-of-experts (MoE) technique with 12B active parameters (number of parameters active per token).^[2]^[1] Jamba can fit up to 256K tokens in its context window and is the largest Mamba-variant LLM created, or 140k tokens in a single 80GB GPU.^[2]^[3]

Jamba performs well across a number of key measurements including throughput and efficiency while outperforming or matching other state-of-the-art models in its class on a wide range of performance benchmarks while having significantly greater context limits enabling use-cases that require increased context.^[1]^[2] The model is released with open weights under an Apache 2.0 license.^[5]^[4]

The company plans to release a beta-version instruct-tuned version on the AI21 Platform in the near future.^[6]

Characteristics

Context window size: 256k tokens^[6]
Parameters: 52 billion^[6]
Architecture: Hybrid Mamba (SSM) Transformer using Mixture of Experts (MoE)^[6]

References

^ ^a ^b ^c ^d "Introducing Jamba: AI21's Groundbreaking SSM-Transformer Model". www.ai21.com. Retrieved 2024-03-29.
^ ^a ^b ^c ^d Kerner, Sean Michael (2024-03-28). "AI21 Labs juices up gen AI transformers with Jamba". VentureBeat. Retrieved 2024-03-29.
^ ^a ^b "AI21 Labs' Jamba infuses Mamba to bring more context to transformer-based LLMs". SiliconANGLE. 2024-03-28. Retrieved 2024-03-29.
^ ^a ^b "MLTimes - Time To Learn AI". mltimes.se. Retrieved 2024-03-29.
^ AI21. "Unveiling Jamba: AI21's Groundbreaking Hybrid SSM-Transformer Open-Source Model". www.prnewswire.com. Retrieved 2024-03-29.{{cite web}}: CS1 maint: numeric names: authors list (link)
^ ^a ^b ^c ^d "AI21 Labs enhances the capabilities of gen AI transformers through Jamba integration". Global Village Space | Technology. 2024-03-28. Retrieved 2024-03-29.

[:0-1] "Introducing Jamba: AI21's Groundbreaking SSM-Transformer Model". www.ai21.com. Retrieved 2024-03-29.

[:1-2] Kerner, Sean Michael (2024-03-28). "AI21 Labs juices up gen AI transformers with Jamba". VentureBeat. Retrieved 2024-03-29.

[:3-3] "AI21 Labs' Jamba infuses Mamba to bring more context to transformer-based LLMs". SiliconANGLE. 2024-03-28. Retrieved 2024-03-29.

[:2-4] "MLTimes - Time To Learn AI". mltimes.se. Retrieved 2024-03-29.

[5] AI21. "Unveiling Jamba: AI21's Groundbreaking Hybrid SSM-Transformer Open-Source Model". www.prnewswire.com. Retrieved 2024-03-29.{{cite web}}: CS1 maint: numeric names: authors list (link)

[:4-6] "AI21 Labs enhances the capabilities of gen AI transformers through Jamba integration". Global Village Space | Technology. 2024-03-28. Retrieved 2024-03-29.

[1]

[2]

[3]

[4]

[5]

[6]

@@ Line 14: / Line 14: @@
 | license = [[Apache License|Apache 2.0 License]]
 }}
-'''Jamba''' is an open-weights [[large language model]] (LLM) developed by [[AI21 Labs]].<ref name=":0">{{Cite web |title=Introducing Jamba: AI21's Groundbreaking SSM-Transformer Model |url=https://www.ai21.com/blog/announcing-jamba |access-date=2024-03-29 |website=www.ai21.com |language=en}}</ref><ref name=":1">{{Cite web |last=Kerner |first=Sean Michael |date=2024-03-28 |title=AI21 Labs juices up gen AI transformers with Jamba |url=https://venturebeat.com/ai/ai21-labs-juices-up-gen-ai-transformers-with-jamba/ |access-date=2024-03-29 |website=VentureBeat |language=en-US}}</ref> It utilizes a Mamba-based model built on a novel [[State-space representation|state space model]] (SSM) and transformer hybrid architecture.<ref name=":3">{{Cite web |date=2024-03-28 |title=AI21 Labs’ Jamba infuses Mamba to bring more context to transformer-based LLMs |url=https://siliconangle.com/2024/03/28/ai21-labs-jamba-infuses-mamba-bring-context-transformer-based-llms/ |access-date=2024-03-29 |website=SiliconANGLE |language=en-US}}</ref><ref name=":0" /><ref name=":2">{{Cite web |title=MLTimes - Time To Learn AI |url=https://mltimes.se/news/ai21-jamba-hybrid-llm/ |access-date=2024-03-29 |website=mltimes.se}}</ref> It is a 52 billion parameter model trained using a [[Mixture of experts|mixture-of-experts]] (MoE) technique with 12B active parameters (number of parameters active per token).<ref name=":1" /><ref name=":0" /> Jamba can fit up to 256K tokens in its [[context window]] and is the largest Mamba-variant LLM created, or 140k tokens in a single 80GB GPU<ref name=":1" /><ref name=":3" />
+'''Jamba''' is an open-weights [[large language model]] (LLM) developed by [[AI21 Labs]].<ref name=":0">{{Cite web |title=Introducing Jamba: AI21's Groundbreaking SSM-Transformer Model |url=https://www.ai21.com/blog/announcing-jamba |access-date=2024-03-29 |website=www.ai21.com |language=en}}</ref><ref name=":1">{{Cite web |last=Kerner |first=Sean Michael |date=2024-03-28 |title=AI21 Labs juices up gen AI transformers with Jamba |url=https://venturebeat.com/ai/ai21-labs-juices-up-gen-ai-transformers-with-jamba/ |access-date=2024-03-29 |website=VentureBeat |language=en-US}}</ref> It utilizes a Mamba-based model built on a novel [[State-space representation|state space model]] (SSM) and transformer hybrid architecture.<ref name=":3">{{Cite web |date=2024-03-28 |title=AI21 Labs’ Jamba infuses Mamba to bring more context to transformer-based LLMs |url=https://siliconangle.com/2024/03/28/ai21-labs-jamba-infuses-mamba-bring-context-transformer-based-llms/ |access-date=2024-03-29 |website=SiliconANGLE |language=en-US}}</ref><ref name=":0" /><ref name=":2">{{Cite web |title=MLTimes - Time To Learn AI |url=https://mltimes.se/news/ai21-jamba-hybrid-llm/ |access-date=2024-03-29 |website=mltimes.se}}</ref> It is a 52 billion parameter model trained using a [[Mixture of experts|mixture-of-experts]] (MoE) technique with 12B active parameters (number of parameters active per token).<ref name=":1" /><ref name=":0" /> Jamba can fit up to 256K tokens in its [[context window]] and is the largest Mamba-variant LLM created, or 140k tokens in a single 80GB GPU.<ref name=":1" /><ref name=":3" />
-Jamba performs well across a number of key measurements including throughput and efficiency while outperforming or matching other state-of-the-art models in its class on a wide range of performance benchmarks while having significantly greater context limits enabling use-cases that require increased context.<ref name=":0" /><ref name=":1" /> The model is released with open weights under an [[Apache License|Apache 2.0 license]]<ref>{{Cite web |last=AI21 |title=Unveiling Jamba: AI21's Groundbreaking Hybrid SSM-Transformer Open-Source Model |url=https://www.prnewswire.com/news-releases/unveiling-jamba-ai21s-groundbreaking-hybrid-ssm-transformer-open-source-model-302102779.html |access-date=2024-03-29 |website=www.prnewswire.com |language=en}}</ref><ref name=":2" />
+Jamba performs well across a number of key measurements including throughput and efficiency while outperforming or matching other state-of-the-art models in its class on a wide range of performance benchmarks while having significantly greater context limits enabling use-cases that require increased context.<ref name=":0" /><ref name=":1" /> The model is released with open weights under an [[Apache License|Apache 2.0 license]].<ref>{{Cite web |last=AI21 |title=Unveiling Jamba: AI21's Groundbreaking Hybrid SSM-Transformer Open-Source Model |url=https://www.prnewswire.com/news-releases/unveiling-jamba-ai21s-groundbreaking-hybrid-ssm-transformer-open-source-model-302102779.html |access-date=2024-03-29 |website=www.prnewswire.com |language=en}}</ref><ref name=":2" />
-The company plans to release a beta-version instruct-tuned version on the AI21 Platform in the near future<ref name=":4" />
+The company plans to release a beta-version instruct-tuned version on the AI21 Platform in the near future.<ref name=":4" />
 == Characteristics ==
 * '''Context window size''': 256k tokens<ref name=":4">{{Cite web |date=2024-03-28 |title=AI21 Labs enhances the capabilities of gen AI transformers through Jamba integration |url=https://www.globalvillagespace.com/tech/ai21-labs-enhances-the-capabilities-of-gen-ai-transformers-through-jamba-integration/ |access-date=2024-03-29 |website=Global Village Space {{!}} Technology |language=en-US}}</ref>
 * '''Parameters''': 52 billion<ref name=":4" />
@@ Line 27: / Line 26: @@
 == See also ==
+* [[Mamba (deep learning architecture)|Mamba]] – deep learning architecture
-* [[Mamba (deep learning architecture)|Mamba]] - deep learning architecture
+* [[Mixture of experts]] – deep learning technique
+* [[AI21 Labs]] – [[Tel Aviv]] (Israel) based AI company
-* [[Mixture of experts]] - deep learning technique
-* [[AI21 Labs]] - [[Tel Aviv]] (Israel) based AI company
 == References ==