Last Release Note: v0.2.6
The juiciest bits 🚀
This release has performance boosts, smoother workflows, and the groundwork for exciting upcoming features.
Pruna OSS Highlights
Support for
accelerate
Accelerate is a library that enables large models to be run across multiple device setups (GPU/CPU)
Built with
uv
Pruna is now built using uv
for faster installs and cleaner builds.
Community Power
A total of 35 pull requests were merged in OSS, with contributions from many in the Pruna community, including five from external contributors.
️ Pruna Pro Upgrades
First distributed inference module: ring_attn
Enables inference time optimization through distribution on multiple devices. Requires add-on (+$0.20/h).
New Caching Modes
Speed and quality improvements on our most popular caching systems, including “Prompt Padding Pruning” and “Midpoint Caching”;
Distillation Tools
We’re getting closer to distilling our own TTI models, starting with a new Distillation Dataset.
Fixes
Dozens of quality and stability improvements.