Do i need industry grade gpu’s or can i scrape by getring decent tps with a consumer level gpu.

  • GenderNeutralBro@lemmy.sdf.org
    link
    fedilink
    English
    arrow-up
    4
    ·
    edit-2
    18 hours ago

    If you’re running a consumer level GPU, you’ll be operating with 24GB of VRAM max (RTX 4090, RTX 3090, or Radeon 7900XTX).

    90b model = 90GB at 8-bit quantization (plus some extra based on your context size and general overhead, but as a ballpark estimate, just going by the model size is good enough). You would need to drop down to 2-bit quantization to have any hope to fit it in a single consumer GPU. At that point you’d probably be better off using a smaller model will less aggressive quantization, like a 32b model at 4-bit quantization.

    So forget about consumer GPUs for that size of model. Instead, you can look at systems with integrated memory, like a Mac with 96-128GB of memory, or something similar. HP has announced a mini PC that might be good, and Nvidia has announced a dedicated AI box as well. Neither of those are available for purchase yet, though.

    You could also consider using multiple consumer GPUs. You might be able to get multiple RTX 3090s for cheaper than a Mac with the same amount of memory. But then you’ll be using several times more power to run it, so keep that in mind.