Qwen-Image

This guide walks you through deploying the Pruna-optimized Qwen Image model.

What are the prerequisites?

To run the model, you’ll need:

HuggingFace token (HF_TOKEN): Enables you to download the optimized model.
Pruna token (PRUNA_TOKEN): Enables you to load and run the model.
An environment with pruna_pro installed: pip install pruna_pro==0.2.9

What inputs does the model support?

prompt: Input text to generate video from.
num_frames: Number of video frames. 81 frames give the best results
resolution: 480p (on a single H100) or 720p (on a single H200)
frames_per_second: Number of video frames. 81 frames give the best results
seed: Random seed. Leave blank for random
go_fast: We offer a very fast and a conservative option.

How do I load the model?

You can initialize the Pruna-optimized WAN 2.2 i2v model directly with PrunaProModel.from_pretrained:


from pruna_pro import PrunaProModel

self.pipe = PrunaProModel.from_pretrained(
            "PrunaAI/Qwen-Image-v2",
            token="HF_TOKEN",
            hf_token="PRUNA_TOKEN"
        )

What does a minimal working example look like?

Below is a complete script that sets up the pipeline, loads an image, and generates a short video. This is the fastest way to test the model end-to-end.


import torch

from pruna_pro import PrunaProModel

ASPECT_RATIOS = {
    "1:1": (1328, 1328),
    "16:9": (1664, 928),
    "9:16": (928, 1664),
    "4:3": (1472, 1140),
    "3:4": (1140, 1472),
    "3:2": (1536, 1024),
    "2:3": (1024, 1536),
}


class Predictor:
    def setup(self):
        import logging

        logging.basicConfig(level=logging.INFO)
        self.pipe = PrunaProModel.from_pretrained(
            "PrunaAI/Qwen-Image-v2",
            token="PRUNA_TOKEN",
            hf_token="HF_TOKEN",
            verbose=True,  # If supported by PrunaProModel
            log_level="info",  # If supported
        )

    def predict(
        self,
        prompt,
        aspect_ratio="1:1",
        go_fast=True,
        seed=None,
    ):
        generator = (
            torch.Generator("cuda").manual_seed(seed) if seed is not None else None
        )
        width, height = ASPECT_RATIOS[aspect_ratio]
        if go_fast:
            num_inference_steps = 8
        else:
            num_inference_steps = 16
        with torch.inference_mode(), torch.no_grad():
            image = self.pipe(
                prompt=prompt,
                height=height,
                width=width,
                true_cfg_scale=1.0,
                num_inference_steps=num_inference_steps,
                generator=generator,
            ).images[0]

        image.save("output.png")
        return "output.png"


if __name__ == "__main__":
    predictor = Predictor()
    predictor.setup()
    output = predictor.predict(
        prompt="A cat jumping in the air to catch a bird",
        aspect_ratio="16:9",
    )
    print(output)