Do i need industry grade gpu’s or can i scrape by getring decent tps with a consumer level gpu.

  • hendrik@palaver.p3x.de
    link
    fedilink
    English
    arrow-up
    6
    ·
    edit-2
    1 day ago

    I’d say you’re looking for something like a 80GB VRAM GPU. That’d be industry grade (an Nvidia A100 for example).

    And to squeeze it into 80GB the model would need to be quantized to 4 or 5 bits. There are some LLM VRAM calculators available where you can put in your numbers, like this one.

    Another option would be to rent these things by the hour in some datacenter (at about $2 to $3 per hour). Or do inference on a CPU with a wide memory interface. Like an Apple M3 processor or an AMD Epyc. But these are pricey, too. And you’d need to buy them alongside an equal amount of (fast) RAM.