Abstract
We present 1.58-bit FLUX, the first successful approach to quantizing the state-of-the-art text-to-image generation model, FLUX.1-dev, using 1.58-bit weights (i.e., values in {-1, 0, +1}) while maintaining comparable performance for generating 1024 x 1024 images. Notably, our quantization method operates without access to image data, relying solely on self-supervision from the FLUX.1-dev model. Additionally, we develop a custom kernel optimized for 1.58-bit operations, achieving a 7.7x reduction in model storage, a 5.1x reduction in inference memory, and improved inference latency. Extensive evaluations on the GenEval and T2I Compbench benchmarks demonstrate the effectiveness of 1.58-bit FLUX in maintaining generation quality while significantly enhancing computational efficiency.
Paper: https://arxiv.org/abs/2412.18653
Code: https://github.com/Chenglin-Yang/1.58bit.flux (coming soon)
The text is only part of the initial info the model uses to create the image - the settings and the random number seed are other parts that are relevant here because they’d be the same for both images. The seed in particular is why these images look so similar - normally when you give the same prompt twice, the seed is randomized so the model starts from two different points. Here, each model starts from the same point and works the same way, just with different amounts of data, so a lot of the details are shared.