Avieshek@lemmy.world to Technology@lemmy.worldEnglish · 7 小时前Edward Snowden slams Nvidia's RTX 50-series 'F-tier value,' whistleblows on lackluster VRAM capacitywww.tomshardware.comexternal-linkmessage-square29fedilinkarrow-up1124arrow-down130
arrow-up194arrow-down1external-linkEdward Snowden slams Nvidia's RTX 50-series 'F-tier value,' whistleblows on lackluster VRAM capacitywww.tomshardware.comAvieshek@lemmy.world to Technology@lemmy.worldEnglish · 7 小时前message-square29fedilink
minus-squareThe HobbyistlinkfedilinkEnglisharrow-up5·4 小时前You can. I’m running a 14B deepseek model on mine. It achieves 28 t/s.
minus-squarelevzzz@lemmy.worldlinkfedilinkEnglisharrow-up2·28 分钟前You need a pretty large context window to fit all the reasoning, ollama forces 2048 by default and more uses more memory
minus-squareJeena@piefed.jeena.netlinkfedilinkEnglisharrow-up5·4 小时前Oh nice, that’s faster than I imagined.
minus-squareViri4thus@feddit.orglinkfedilinkEnglisharrow-up1·2 小时前I also have a 3060, can you detail which framework (sglang, ollama, etc) you are using and how you got that speed? i’m having trouble reaching that level of performance. Thx
minus-squareThe HobbyistlinkfedilinkEnglisharrow-up1·1 小时前Ollama, latest version. I have it setup with Open-WebUI (though that shouldn’t matter). The 14B is around 9GB, which easily fits in the 12GB. I’m repeating the 28 t/s from memory, but even if I’m wrong it’s easily above 20. Specifically, I’m running this model: https://ollama.com/library/deepseek-r1:14b-qwen-distill-q4_K_M
You can. I’m running a 14B deepseek model on mine. It achieves 28 t/s.
You need a pretty large context window to fit all the reasoning, ollama forces 2048 by default and more uses more memory
Oh nice, that’s faster than I imagined.
I also have a 3060, can you detail which framework (sglang, ollama, etc) you are using and how you got that speed? i’m having trouble reaching that level of performance. Thx
Ollama, latest version. I have it setup with Open-WebUI (though that shouldn’t matter). The 14B is around 9GB, which easily fits in the 12GB.
I’m repeating the 28 t/s from memory, but even if I’m wrong it’s easily above 20.
Specifically, I’m running this model: https://ollama.com/library/deepseek-r1:14b-qwen-distill-q4_K_M