Kyutai’s Post

View organization page for Kyutai, graphic

16,953 followers

Last Wednesday, we introduced Moshi, the lowest latency conversational AI ever released. Moshi can perform small talk, explain various concepts, engage in roleplay in many emotions and speaking styles. Talk to Moshi at https://moshi.chat/ and learn more about the method below: Moshi is an audio language model that can listen and speak continuously, with no need for explicitly modelling speaker turns or interruptions. When talking to Moshi, you will notice that the UI displays a transcript of its speech. This does *not* come from an ASR nor is an input to a TTS, but is rather part of the integrated multimodal modelling of Moshi. Moshi is not an assistant, but rather a prototype for advancing real-time interaction with machines. It can chit-chat, discuss facts and make recommendations, but a more groundbreaking ability is its expressivity and spontaneity that allow for engaging into fun roleplay. Developing Moshi required significant contributions to audio codecs, multimodal LLMs, multimodal instruction-tuning and much more. We believe the main impact of the project will be sharing all Moshi’s secrets with the upcoming paper and open-source of the model. For now, you can experiment with Moshi with our online demo. The development of Moshi is more active than ever, and we will rollout frequent updates to address your feedback. This is just the beginning, let's improve it together.

Claudia Faraci

LLM Editor @ Centific • AI Data Services Into Gen AI Product and Agility

1mo

Is it fluent in other languages than English? I tried to speak French to it and its answer (in English) wasn't really related to my question in French. Q: Est-ce-que tu parle français. A: I'm fluent in Thai Q: Not Thai. French. A: I love French cuisine. (It forgot I was talking about the language) I guess there is some more training to do in two areas: language fluency and context retention. Bon courage! 💪

Like
Reply

Very entertaining but needs work. At one point, Moshi was incessantly describing what a cantivore was. When I told Moshi to stop, Moshi told me flat out "No, I will not stop" and I immediately asked why, and it said "Because I find this very interesting." I laughed so hard after that. But I was asking Moshi what a Cantilever was. It insisted on Cantivore. When I started over, I successfully taught Moshi the concept of a Cantilever. It expanded on it and was able to give examples. Then I started over, and again asked it to tell me the concept of a Cantilever. It insisted on talking about Cantiloupes. It was somewhat argumentative when I interrupted and told it that it was misunderstanding me, and then went on about cantaloupe again without my input. I noticed in a demo video that Moshi tends to ramble on and you need to interrupt it a few times to get it to listen again and follow commands/queries.

Sergej Nikitin

Data Scientist & Data Analyst - Python, SQL, Tableau

1mo

Very interesting product! I tried it out and unfortunately ended up in a sort-of loop very quickly. I asked Moshi to tell me its favorite music, followed by a question if she could hum or perform a piece for me. Moshi then proclaimed "I am humming" and stuck to that sentence regardless of what i said. I eventually figured out that i could escape the loop by asking if she really was humming and if shed like to stop; but until then any attempt at conversation or questions only ended with her saying "I am humming" I am looking forward to see how far development will come in a year or two!

Like
Reply
Johan Hansson

Using data to make the world a better place!

2mo

Great work! Do you have an estimated time for the release of a paper on your approach/research?

Jaber Said

Software Engineering 🚀 | Full Stack Web Developer 🌐 | Transforming Ideas into Seamless Digital Experiences | Crafting Clean Code and Creative Solutions | Lifelong Learner and Technology

1mo

I tried it, but I received a violent response from her. She was getting agitated very quickly and was not tactful in speaking I was surprised by her reaction. She was saying, "This is none of my business. This is not my job," and speaking in an annoyed tone. This was amazing.😂

Like
Reply
Parimal Devulapalli

Building Autonomous Semiconductor Eco-Systems

2mo

After trying moshi chat, I am impressed with it's capability. Looking forward to see moshi tranform the world by opening new opportunities to humans (probably other living beings too that have voice of their own). The best part is "it's open source".

Like
Reply
Alex Rada

AI powered CX platform, since 2011

2mo

it's promising, but it needs a lot of work to be on par with existing competition

Like
Reply
Gowtham S

Senior analyst | Tech enthusiast | Interested in FinCrime AML, Data Science & innovation |

2mo

It's cool 😎 Responds faster. But focuses on one topic and if asked about different topics(domain) in a single conversation, it sticks to the initial conversation. Like English grammar and math collinear points in the area of a triangle. I am curious to see, if it is rolled out as an app.

Like
Reply
See more comments

To view or add a comment, sign in

Explore topics