You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @handoku it's a known issue for deepseek-v3 int4/int8 quantization. Since the Deepseek-v3 didn't publish the int4/int8 metrics yet, we don't recommend quantize the deepseek-v3 with non-fp8 recipe at this moment.
Describe the bug
使用TensorRT-llm 的Deepseek分支 部署4bit weight only的deepseekV3回答乱码
To Reproduce
我参考DeepseekV3readme文件的描述使用了如下的引导获得了4bit weight only版本的引擎文件(先转bf16再量化):
https://github.com/NVIDIA/TensorRT-LLM/tree/deepseek/examples/deepseek_v3
但是转换出的模型是乱码的, 在TensorRT llm的issue中也看到了类似的问题, 请问大家有尝试过这个路线来部署么?
Expected behavior
正常输出结果, 小幅降低精度.
Screenshots
The text was updated successfully, but these errors were encountered: