Summarized Benchmark Results

Disclaimer: The benchmark results below reflect performance at the time of testing and may not represent our current capabilities. For the most accurate, up-to-date inference speeds, check the public endpoints listed or contact us for a dedicated benchmark. Note that if a public model appears as coming from the original provider, that’s expected. Pruna powers optimization behind the scenes as a white-label solution for inference platforms.

BRIA3.2: Pruna made BRIA3.2 run up to 3.6× faster than the base model on L40S GPU.
Source: Official Benchmark
Last Updated: June 2025
Endpoint: https://replicate.com/bria/image-3.2
Flux.1-Dev: Pruna made Flux-Dev run up to 2.8× faster than Together AI, Fireworks AI, and fal's APIs on H100 GPUs.
Source: Official Benchmark
Last Updated: April 2025
Endpoint: https://www.pruna.ai/blog/flux-juiced-the-fastest-image-generation-endpoint

Flux-Fill: Pruna made Flux Fill run up to 1.57× faster than the base model pre-optimized with Nunchaku on A100 (SXM) GPUs.
Source: internal – official benchmark in progress
Last Updated: July 2025
Endpoint: not available
Flux-Kontext: Pruna made Flux-Kontext run up to 4.9× faster than the base model on H100 GPU.
Source: Official Benchmark
Last Updated: June 2025
Endpoint: https://replicate.com/prunaai/flux-kontext-dev
Flux-Schnell: Pruna made Flux-Schnell run up to 3× faster than the compiled base model on GPU.
Source: internal – official benchmark in progress
Last Updated: July 2025
Endpoint: https://replicate.com/prunaai/flux-schnell
Llama 3.1-8B-Instruct: Pruna made Llama 3.1-8B-Instruct run up to 1.9× faster than vLLM alone on L40S GPU.
Source: Official Benchmark
Last Updated: June 2025
Endpoint: not available
SmolLM2-135M-Instruct: Pruna made SmolLM2-135M-Instruct run up to 2× faster/7x smaller than the base model on CPU.
Source: Official Benchmark
Last Updated: January 2025
Endpoint: not available
Wan 2.1: Pruna AI made Wan 2.1 run up to 1.66x faster than Waivespeed on a single H100 GPU.
Source: internal – official benchmark in progress
Last Updated: July 2025
Endpoint: https://replicate.com/wan-video/wan-2.1-1.3b
Wan 2.2: Pruna made Wan 2.2 up 10× faster than the base model on a single H100 GPU
Source: https://www.pruna.ai/blog/wan-2-2-video-juiced-x10-faster-video-generation
Last Updated: July 2025
i2v endpoint: https://replicate.com/wan-video/wan-2.2-i2v-480p-fast
t2v endpoint: https://replicate.com/wan-video/wan-2.2-t2v-480p-fast

Wan Image: Pruna made Wan Image up to 3.6× faster than Seedream and 1.41× faster than Flux-1.1 Pro on a single H100 GPU.
Source: https://www.pruna.ai/blog/wan-image-juiced-image-generation
Last Updated: July 2025
Endpoint: https://replicate.com/prunaai/wan-image