Skip to content
Pruna AI Customer Support Portal home
Pruna AI Customer Support Portal home

Evaluation Orchestration

Evaluation Orchestration offers configurable modes and workflows for evaluating model performance after optimization.

Evaluation Modes

  • Single-Model mode: It evaluates one model at a time. Each run produces standalone quality scores (e.g., accuracy, latency, memory).

  • Pairwise mode: This mode compares an optimized model against a baseline. The agent uses the first evaluated model to reference and outputs a relative comparison score.

Evaluation Workflows

You can trigger an evaluation using one of two methods:

  • Direct Parameters Workflow
    Pass model path, dataset path, metric, and task directly to the Evaluation Agent. This is the fastest way to get started.

    Capture d’écran 2025-07-03 à 18.02.00.png

  • Task-Based Workflow
    Define a named evaluation task with a consistent setup. This is ideal for reusability, team collaboration, or recurring benchmarks.

    Capture d’écran 2025-07-03 à 18.01.44.png

For more information on the Evaluation Orchestration, please read the documentation.