Look how THICC the competition is! Meanwhile, Chonkie be looking slim and trim πͺ
Ever wondered how much CHONKier other text splitting libraries are? Well, wonder no more! We've put Chonkie up against some of the most popular RAG libraries out there, and the results are... well, let's just say Moto Moto might need to revise his famous quote!
Library | Size | Chonk Factor |
---|---|---|
π¦ Chonkie | 9.7 MiB | 1x (base CHONK) |
π LangChain | 80 MiB | ~8.3x CHONKier |
π LlamaIndex | 171 MiB | ~17.6x CHONKier |
Library | Size | Chonk Factor |
---|---|---|
π¦ Chonkie | 585 MiB | 1x (semantic CHONK) |
π LangChain | 625 MiB | ~1.07x CHONKier |
π LlamaIndex | 678 MiB | ~1.16x CHONKier |
ZOOOOOM! Watch Chonkie run! πββοΈπ¨
All benchmarks were run on the Paul Graham Essays Dataset using the GPT-2 tokenizer. Because Chonkie believes in transparency, we note that timings marked with ** were taken after a warm-up phase.
Library | Time (ms) | Speed Factor |
---|---|---|
π¦ Chonkie | 8.18** | 1x (fastest CHONK) |
π LangChain | 8.68 | 1.06x slower |
π LlamaIndex | 272 | 33.25x slower |
Library | Time (ms) | Speed Factor |
---|---|---|
π¦ Chonkie | 52.6 | 1x (solo CHONK) |
π LlamaIndex | 91.2 | 1.73x slower |
π LangChain | N/A | Doesn't exist |
Library | Time | Speed Factor |
---|---|---|
π¦ Chonkie | 482ms | 1x (smart CHONK) |
π LangChain | 899ms | 1.86x slower |
π LlamaIndex | 1.2s | 2.49x slower |
- Faster Installation: Less to download = faster to get started
- Lower Memory Footprint: Lighter package = less RAM usage
- Cleaner Dependencies: Only install what you actually need
- CI/CD Friendly: Faster builds and deployments
- Faster Processing: Chonkie leads in all chunking methods!
- Production Ready: Optimized for real-world usage
- Consistent Performance: Fast across all chunking types
- Scale Friendly: Process more text in less time
Remember what Chonkie always says:
"I may be a hippo, but I don't have to be heavy... and I can still run fast!" π¦β¨
Note: All measurements were taken using Python 3.8+ on a clean virtual environment. Your actual mileage may vary slightly depending on your specific setup and dependencies. Speed benchmarks were performed on Paul Graham Essays Dataset using the GPT-2 tokenizer.