Flux-Kontext
We used Pruna Pro and combined caching, factorization, quantization, and compilation algorithms to compress the model. More specifically, we used these compression algorithms: Auto Caching, QKV factorization, FP8/INT8 Quantization, and Torch Compilation?

With these compression configurations, we reach up to 5x speedups over 30 steps! In more detail, for each image, we reach
Base: ~14.4s for 1 megapixel image.
Lightly-Juiced: 4.5s for 1 megapixel image. 3.2x speedups!
Juiced: 3.7s for a 1 megapixel image. 3.9x speedups!!
Ultra-Juiced: 2.9s for a 1 megapixel image. 4.9x speedups!!!
This translates to:
Better user experience by reducing generation waiting time,
Money savings when scaling deployment to many users,
Less energy is consumed by reducing GPU utilization time.
Try it in your setup.
from pruna import SmashConfig smash_config = SmashConfig() smash_config["quantizer"] = "fp8" smash_config["compiler"] = "torch_compile" smash_config["torch_compile_target"] = "module_list" smash_config["cacher"] = "auto" smash_config["objective"] = "quality" smash_config["auto_cache_mode"] = "bdf" smash_config["auto_speed_factor"] = 0.4
Read the full benchmark: https://www.pruna.ai/blog/flux-kontext-juiced-state-of-the-art-image-editing-x5-faster