Skip to content
Pruna AI Customer Support Portal home
Pruna AI Customer Support Portal home

50 optimization algorithms, 10 methods

Pruna’s core feature is its library of 50+ optimization algorithms, organized into 10 methods, such as quantization and pruning, that can be combined to maximize efficiency gains.

Capture d’écran 2025-07-07 à 17.16.17.png

Here is a general description of each technique:

  • Pruning: removes less important or redundant connections and neurons from a model, resulting in a sparser, more efficient network.

  • Quantization: reduces the precision of the model’s weights and activations, making them much smaller in terms of memory required.

  • Batching: groups multiple inputs to be processed simultaneously, improving computational efficiency and reducing overall processing time.

  • Enhancing improves the quality of the model’s output. Its uses range from post-processing to test-time compute algorithms.

  • Caching stores intermediate computation results to speed up subsequent operations. It is particularly useful in reducing inference time for machine learning models.

  • Recovery: restores the performance of a model after compression.

  • Factorization: batches several small matrix multiplications into one large fused operation, which, while neutral on memory and raw latency, unlocks notable speed-ups when used alongside quantization.

  • Distillation: trains a smaller, simpler model to mimic a larger, more complex model.

  • Compilation: optimizes the model for specific hardware.

For a deeper understanding of each method's, you can read this blogpost "An Introduction to AI Model Optimization Techniques"