Pruna and ComfyUI

Getting Started

Setting up Pruna within ComfyUI is straightforward. With just a few steps, you'll be ready to optimize your Stable Diffusion or Flux models for faster inference right inside the ComfyUI interface. Here's a quick guide to get started.

Step 1 - Prerequisites

You will need a Linux system with a GPU to run our nodes. First, set up a conda environment, then install both ComfyUI and Pruna:

Create a conda environment, e.g., with
conda create -n comfyui python=3.11 && conda activate comfyui
Install ComfyUI
Install Pruna or Pruna Pro

To use Pruna Pro, you also need to export your Pruna token as an environment variable:

export PRUNA_TOKEN=<your_token_here>

[Optional] If you want to use the x-fast or stable-fast compiler, you need to install additional dependencies:

pip install pruna[stable-fast]==0.2.3

Note: To use our caching nodes or the x_fast compiler, you need access to Pruna Pro.

Step 2 - Pruna node integration

With your environment prepared, you're ready to integrate Pruna nodes into your ComfyUI setup. Follow these steps to clone the repository and launch ComfyUI:

Navigate to your ComfyUI installation’s custom_nodes folder:
cd <path_to_comfyui>/custom_nodes
Clone the ComfyUI_pruna repository:
git clone <https://github.com/PrunaAI/ComfyUI_pruna.git>
Launch ComfyUI
cd <path_to_comfyui> && python main.py --disable-cuda-malloc --gpu-only

After completing these steps, you should see all the Pruna nodes in the nodes menu, under the Pruna category.

Pruna nodes - A short explanation

Pruna adds four powerful nodes to ComfyUI:

A compilation node that optimizes inference speed through model compilation. While this technique preserves output quality, performance gains can vary depending on the model.
Three distinct caching nodes, each implementing a unique strategy to accelerate inference by reusing intermediate computations:
- Adaptive Caching: Dynamically adjusts caching for each prompt by identifying the optimal inference steps to reuse cached outputs.
- Periodic Caching: Caches model outputs at fixed intervals, reusing them in subsequent steps to reduce computation.
- Auto Caching: Automatically determines the optimal caching schedule to achieve a target latency reduction with minimal quality trade-off.

By tuning the hyperparameters of each node, you can achieve the best trade-off between speed and output quality for your specific use case. Please check out the detailed guide in our repo or the documentation for more details.

Read the full article: https://www.pruna.ai/blog/faster-comfyui-nodes