tg-me.com/david_random/564
Create:
Last Update:
Last Update:
8-9 t/s并没有完全发挥8060S的潜力,llama.cpp的llama-server有一个小问题导致server配speculative decoding时性能欠佳(与具体硬件无关):https://github.com/ggml-org/llama.cpp/issues/12968
动手简单修一下这个问题之后Qwen 2.5 72B iq4_xs + 1.5B draft在acceptance rate理想时可达到10-12 t/s左右
https://github.com/hjc4869/llama.cpp/commit/0b32f64ffbe973e99e0dc7097be31d4d966d476e
BY David's random thoughts


Share with your friend now:
tg-me.com/david_random/564