Abstract

We present 1.58-bit FLUX, the first successful approach to quantizing the state-of-the-art text-to-image generation model, FLUX.1-dev, using 1.58-bit weights (i.e., values in {-1, 0, +1}) while maintaining comparable performance for generating 1024 x 1024 images. Notably, our quantization method operates without access to image data, relying solely on self-supervision from the FLUX.1-dev model. Additionally, we develop a custom kernel optimized for 1.58-bit operations, achieving a 7.7x reduction in model storage, a 5.1x reduction in inference memory, and improved inference latency. Extensive evaluations on the GenEval and T2I Compbench benchmarks demonstrate the effectiveness of 1.58-bit FLUX in maintaining generation quality while significantly enhancing computational efficiency.

Paper: https://arxiv.org/abs/2412.18653

Code: https://github.com/Chenglin-Yang/1.58bit.flux (coming soon)

  • kwilson@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    20 days ago

    yeah I get that, I’m just surprised that both times the image is so similar. both times the dragon looks right, both time the sky looks mostly the same, stuff that isn’t part of the prompt you know.

    The same prompt can sometimes give you completely different pictures, that still comply with the prompt, on the same model