Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]使用TensorRT-llm 的Deepseek分支 部署4bit weight only的deepseekV3回答乱码 #272

Open
Songyanfei opened this issue Jan 14, 2025 · 3 comments

Comments

@Songyanfei
Copy link

Describe the bug
使用TensorRT-llm 的Deepseek分支 部署4bit weight only的deepseekV3回答乱码

To Reproduce
我参考DeepseekV3readme文件的描述使用了如下的引导获得了4bit weight only版本的引擎文件(先转bf16再量化):
https://github.com/NVIDIA/TensorRT-LLM/tree/deepseek/examples/deepseek_v3
但是转换出的模型是乱码的, 在TensorRT llm的issue中也看到了类似的问题, 请问大家有尝试过这个路线来部署么?
image

Expected behavior
正常输出结果, 小幅降低精度.

Screenshots
image
image

@Songyanfei
Copy link
Author

补充一个使用测试脚本的结果:
mpirun --allow-run-as-root -np 8 python3 ../run.py --input_text "Today is a nice day." \ --max_output_len 30 \ --tokenizer_dir /data-123/syf/DeepSeekV3-trtllm_engine_8gpu_W4A16 \ --engine_dir /data-123/syf/DeepSeekV3-trtllm_engine_8gpu_W4A16 \ --top_p 0.95 \ --temperature 0.3
image

@mowentian
Copy link
Contributor

谢谢,不过这个问题恐怕得 trtllm 来解决

@Songyanfei
Copy link
Author

@mowentian 我昨天看到了NV的回复, 表示这是已知的DeepSeekV3在INT4/INT8量化的问题. 可能trtllm上的示例不太合适, 会浪费很多时间.

Hi @handoku it's a known issue for deepseek-v3 int4/int8 quantization. Since the Deepseek-v3 didn't publish the int4/int8 metrics yet, we don't recommend quantize the deepseek-v3 with non-fp8 recipe at this moment.

Originally posted by @nv-guomingz in #2683

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants