Alibaba open-sources Wan2.2: 27B-param MoE video models, 720 p 24 FPS on a 4090

laiwuchiyuan a day ago

Alibaba’s Tongyi Wanxiang team just dropped Wan2.2—three new video-generation checkpoints under Apache-2.0: • Wan2.2-T2V-A14B – 27 B total / 14 B active, first open-source text-to-video model built on a mixture-of-experts (MoE) diffusion backbone. • Wan2.2-I2V-A14B – same MoE architecture for image-to-video. • Wan2.2-TI2V-5B – unified 5 B-param model that does both T2V and I2V on a single RTX 4090 (22–24 GB VRAM) in ~9 min for 5 s 720 p 24 fps clips thanks to a 4×16×16 3-D VAE and 64× info compression . Key points for HN: MoE first in open video models. Two experts (high-noise layout, low-noise refinement) cut compute ~50 % vs dense 14 B models . Cinema-grade controls baked in. Prompt with lighting, color palette, lens angle, micro-expressions, etc. to get Hollywood-style shots without post-work . Weights + diffusers + ComfyUI ready. Clone, pip install, and run; no API lock-in. Benchmarks. Internal Wan-Bench 2.0 claims SOTA over closed models (Sora et al.) on motion fidelity & aesthetic quality . License. Apache-2.0, commercial-friendly, already mirrored on HF & ModelScope. Hardware: • A14B variants need ~80 GB VRAM (A100/H100) for full precision; • TI2V-5B is the “consumer” option—22 GB is enough for 720 p generation . Training data grew +65 % images / +83 % videos vs Wan2.1, all re-labeled for lighting, composition, color . If you’ve been waiting for an open, reproducible alternative to Runway/Gen-3 or Sora, this looks like the real deal. Discussion points: MoE routing overhead, VRAM tricks, and how far we can push 5 B on desktop GPUs.

Experience the latest model wan2.2: https://www.wan-ai.co/