Last Release Note: v0.2.6

This release has performance boosts, smoother workflows, and the groundwork for exciting upcoming features.

Pruna OSS Highlights

Accelerate is a library that enables large models to be run across multiple device setups (GPU/CPU)

Pruna is now built using uv for faster installs and cleaner builds.

A total of 35 pull requests were merged in OSS, with contributions from many in the Pruna community, including five from external contributors.

️ Pruna Pro Upgrades

Enables inference time optimization through distribution on multiple devices. Requires add-on (+$0.20/h).

Speed and quality improvements on our most popular caching systems, including “Prompt Padding Pruning” and “Midpoint Caching”;

We’re getting closer to distilling our own TTI models, starting with a new Distillation Dataset.

Dozens of quality and stability improvements.