Wan 2.2 T2V Fast (14B)

This guide walks you through deploying the Pruna-optimized WAN 2.2 Text-To-Video (T2V) model.

What are the prerequisites?

To run the model, you’ll need:

HuggingFace token (HF_TOKEN): Enables you to download the optimized model.
Pruna token (PRUNA_TOKEN): Enables you to load and run the model.
An environment with pruna_pro installed: pip install pruna_pro==0.2.9

What inputs does the model support?

prompt: Input text to generate video from.
num_frames: Number of video frames. 81 frames give the best results
resolution: 480p (on a single H100) or 720p (on a single H200)
frames_per_second: Number of video frames. 81 frames give the best results
seed: Random seed. Leave blank for random
go_fast: We offer a very fast and a conservative option.

Does the model support LoRAs?

Yes. LoRAs can be loaded into the pipeline. Here’s a minimal example:


self.pipe.load_lora_weights(
       lora_path_transformer, adapter_name="custom_lora"
)

kwargs_lora = {"load_into_transformer_2": True}
   self.pipe.load_lora_weights(
   lora_path_transformer_2, adapter_name="custom_lora_2", **kwargs_lora
)

Where lora_path_transformer and lora_path_transformer_2 point to the local .safetensors files.

How do I load the model?

You can initialize the Pruna-optimized WAN 2.2 T2V model directly with PrunaProModel.from_pretrained:


from pruna_pro import PrunaProModel

self.pipe = PrunaProModel.from_pretrained(
            "PrunaAI/Wan2.2-T2V-A14B-Diffusers-v3",
            token="HF_TOKEN",
            hf_token="PRUNA_TOKEN"
        )

What does a minimal working example look like?

Below is a complete script that sets up the pipeline, loads an image, and generates a short video. This is the fastest way to test the model end-to-end.


import tempfile

import numpy as np
import torch
from diffusers.utils import export_to_video, load_image

from pruna_pro import PrunaProModel


class Predictor:
    def setup(self):
        import logging

        logging.basicConfig(level=logging.INFO)
        self.pipe = PrunaProModel.from_pretrained(
            "PrunaAI/Wan2.2-T2V-A14B-Diffusers-v3",
            token="HF_TOKEN",
            hf_token="PRUNA_TOKEN"
        )

    def predict(
        self,
        prompt,
        num_frames=81,
        resolution="480p",
        frames_per_second=16,
        go_fast=True,
        seed=None,
    ):
        generator = (
            torch.Generator("cuda").manual_seed(seed) if seed is not None else None
        )
        if resolution == "480p":
            self.pipe.scheduler.set_shift(4.0)
            if aspect_ratio == "16:9":
                width, height = 480, 832
            else:
                width, height = 832, 480
        else:
            self.pipe.scheduler.set_shift(7.0)
            if aspect_ratio == "16:9":
                width, height = 720, 1280
            else:
                width, height = 1280, 720

        negative_prompt = "色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走"
        generator = torch.Generator(device="cuda").manual_seed(0)
        if go_fast:
            num_inference_steps = 6
        else:
            num_inference_steps = 10
        with torch.inference_mode(), torch.no_grad():
            output_video = self.pipe(
                prompt=prompt,
                negative_prompt=negative_prompt,
                height=height,
                width=width,
                num_frames=num_frames,
                guidance_scale=1.0,
                guidance_scale_2=1.0,
                num_inference_steps=num_inference_steps,
                generator=generator,
            ).frames[0]
        output_dir = tempfile.mkdtemp()
        export_to_video(output_video, output_dir + "/output.mp4", fps=frames_per_second)
        return output_dir + "/output.mp4"


if __name__ == "__main__":
    predictor = Predictor()
    predictor.setup()
    output = predictor.predict(
        prompt="a cat walking on grass",
    )