Skip to content
Pruna AI Customer Support Portal home
Pruna AI Customer Support Portal home

Flux-Kontext

We used Pruna Pro and combined caching, factorization, quantization, and compilation algorithms to compress the model. More specifically, we used these compression algorithms: Auto Caching, QKV factorization, FP8/INT8 Quantization, and Torch Compilation?

With these compression configurations, we reach up to 5x speedups over 30 steps! In more detail, for each image, we reach

  • Base: ~14.4s for 1 megapixel image.

  • Lightly-Juiced: 4.5s for 1 megapixel image. 3.2x speedups!

  • Juiced: 3.7s for a 1 megapixel image. 3.9x speedups!!

  • Ultra-Juiced: 2.9s for a 1 megapixel image. 4.9x speedups!!!

This translates to:

  • Better user experience by reducing generation waiting time,

  • Money savings when scaling deployment to many users,

  • Less energy is consumed by reducing GPU utilization time.

Try it in your setup.

from pruna import SmashConfig smash_config = SmashConfig() smash_config["quantizer"] = "fp8" smash_config["compiler"] = "torch_compile" smash_config["torch_compile_target"] = "module_list" smash_config["cacher"] = "auto" smash_config["objective"] = "quality" smash_config["auto_cache_mode"] = "bdf" smash_config["auto_speed_factor"] = 0.4

Read the full benchmark: https://www.pruna.ai/blog/flux-kontext-juiced-state-of-the-art-image-editing-x5-faster