What is a good model that runs on 6GB Vram?

OmegaLemmy@discuss.online · 1 month ago

What is a good model that runs on 6GB Vram?

a2part2 · 1 month ago

Can’t you just increase context length at the cost of paging and slowdown?

SmokeyDope@lemmy.world · 1 month ago

At some point you’ll run out of vram memory on the GPU. You make it slower by offloading some memory layers to make room for more context.

a2part2 · 1 month ago

Yes, but if he’s world building, a larger, slower model might just be an acceptable compromise.

I was getting oom errors doing speech to text on my 4070ti. I know (now) that I should have for for the 3090ti. Such is life.

Pyro@programming.dev · 1 month ago

At a certain point, layers will be pushed to RAM leading to incredibly slow inference. You don’t want to wait hours for the model to generate a single response.