Skip to content
Pruna AI Customer Support Portal home
Pruna AI Customer Support Portal home

Last Release Note: v0.2.6

The juiciest bits 🚀

This release has performance boosts, smoother workflows, and the groundwork for exciting upcoming features.

Pruna OSS Highlights

  • Support for accelerate

Accelerate is a library that enables large models to be run across multiple device setups (GPU/CPU)

  • Built with uv

Pruna is now built using uv for faster installs and cleaner builds.

  • Community Power

A total of 35 pull requests were merged in OSS, with contributions from many in the Pruna community, including five from external contributors.

️ Pruna Pro Upgrades

  • First distributed inference module: ring_attn

Enables inference time optimization through distribution on multiple devices. Requires add-on (+$0.20/h).

  • New Caching Modes

Speed and quality improvements on our most popular caching systems, including “Prompt Padding Pruning” and “Midpoint Caching”;

  • Distillation Tools

We’re getting closer to distilling our own TTI models, starting with a new Distillation Dataset.

  • Fixes

Dozens of quality and stability improvements.

Read the full release note on GitHub