Wan 2.2 T2V Fast (14B)
This guide walks you through deploying the Pruna-optimized WAN 2.2 Text-To-Video (T2V) model.
What are the prerequisites?
To run the model, you’ll need:
HuggingFace token (HF_TOKEN): Enables you to download the optimized model.
Pruna token (PRUNA_TOKEN): Enables you to load and run the model.
An environment with pruna_pro installed:
pip install pruna_pro==0.2.9
What inputs does the model support?
prompt: Input text to generate video from.
num_frames: Number of video frames. 81 frames give the best results
resolution: 480p (on a single H100) or 720p (on a single H200)
frames_per_second: Number of video frames. 81 frames give the best results
seed: Random seed. Leave blank for random
go_fast: We offer a very fast and a conservative option.
Does the model support LoRAs?
Yes. LoRAs can be loaded into the pipeline. Here’s a minimal example:
self.pipe.load_lora_weights( lora_path_transformer, adapter_name="custom_lora" ) kwargs_lora = {"load_into_transformer_2": True} self.pipe.load_lora_weights( lora_path_transformer_2, adapter_name="custom_lora_2", **kwargs_lora )
Where lora_path_transformer
and lora_path_transformer_2
point to the local .safetensors
files.
How do I load the model?
You can initialize the Pruna-optimized WAN 2.2 T2V model directly with PrunaProModel.from_pretrained
:
from pruna_pro import PrunaProModel self.pipe = PrunaProModel.from_pretrained( "PrunaAI/Wan2.2-T2V-A14B-Diffusers-v3", token="HF_TOKEN", hf_token="PRUNA_TOKEN" )
What does a minimal working example look like?
Below is a complete script that sets up the pipeline, loads an image, and generates a short video. This is the fastest way to test the model end-to-end.
import tempfile import numpy as np import torch from diffusers.utils import export_to_video, load_image from pruna_pro import PrunaProModel class Predictor: def setup(self): import logging logging.basicConfig(level=logging.INFO) self.pipe = PrunaProModel.from_pretrained( "PrunaAI/Wan2.2-T2V-A14B-Diffusers-v3", token="HF_TOKEN", hf_token="PRUNA_TOKEN" ) def predict( self, prompt, num_frames=81, resolution="480p", frames_per_second=16, go_fast=True, seed=None, ): generator = ( torch.Generator("cuda").manual_seed(seed) if seed is not None else None ) if resolution == "480p": self.pipe.scheduler.set_shift(4.0) if aspect_ratio == "16:9": width, height = 480, 832 else: width, height = 832, 480 else: self.pipe.scheduler.set_shift(7.0) if aspect_ratio == "16:9": width, height = 720, 1280 else: width, height = 1280, 720 negative_prompt = "色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走" generator = torch.Generator(device="cuda").manual_seed(0) if go_fast: num_inference_steps = 6 else: num_inference_steps = 10 with torch.inference_mode(), torch.no_grad(): output_video = self.pipe( prompt=prompt, negative_prompt=negative_prompt, height=height, width=width, num_frames=num_frames, guidance_scale=1.0, guidance_scale_2=1.0, num_inference_steps=num_inference_steps, generator=generator, ).frames[0] output_dir = tempfile.mkdtemp() export_to_video(output_video, output_dir + "/output.mp4", fps=frames_per_second) return output_dir + "/output.mp4" if __name__ == "__main__": predictor = Predictor() predictor.setup() output = predictor.predict( prompt="a cat walking on grass", )