![]() |
![]() |
![]() |
![]() |
![]() |
A person riding a horse | A big beautiful mountain with a waterfall, a long view | A hot air balloon takes to the sky | A boy playing guitar | A minion waved his hand and the UFO flew over |
![]() |
![]() |
![]() |
![]() |
![]() |
In the universe, the earth revolves around the sun | Red car running, a close-up video | A sailboat sailing on the sea | PeppaPig, Cartoon style, two pig running on the grassland | The plane went through the white clouds |
Denoising-based diffusion models have attained impressive image synthesis; however, their applications on videos can lead to unaffordable computational costs due to the per-frame denoising operations. In pursuit of efffcient video generation, we present a Diffusion Reuse MOtion (Dr. Mo) network to accelerate the video-based denoising process. Our crucial observation is that the latent representations in early denoising steps between adjacent video frames exhibit high consistencies with motion clues. Inspired by the discovery, we propose to accelerate the video denoising process by incorporating lightweight, learnable motion features. Speciffcally, Dr. Mo will only compute all denoising steps for base frames. For a non-based frame, Dr. Mo will propagate the pre-computed based latents of a particular step with interframe motions to obtain a fast estimation of its coarse-grained latent representation, from which the denoising will continue to obtain more sensitive and ffne-grained representations. On top of this, Dr. Mo employs a meta-network named Denoising Step Selector (DSS) to dynamically determine the step to perform motion-based propagations for each frame, which can be viewed as a tradeoff between quality and efffciency. Extensive evaluations on video generation and editing tasks indicate that Dr. Mo delivers widely applicable acceleration for diffusion-based video generations while effectively retaining the visual quality and style.
With Dr. Mo | ![]() |
![]() |
![]() |
Without Dr. Mo | ![]() |
![]() |
![]() |
A white swan swimming in the water. | Dramatic ocean sunset. | An apple is falling from a tree. |
With Dr. Mo | ![]() |
![]() |
![]() |
Without Dr. Mo | ![]() |
![]() |
![]() |
A white swan swimming in the water. | Dramatic ocean sunset. | An apple is falling from a tree. |
CogVideo[1] | ![]() |
![]() |
![]() |
![]() |
Latent-Shift[2] | ![]() |
![]() |
![]() |
![]() |
Dr. Mo (Ours) | ![]() |
![]() |
![]() |
![]() |
A person playing piano | A person doing handstand pushups | A person performing a bench press | A person knitting |
VDM[3] | ![]() |
![]() |
![]() |
![]() |
SimDA[4] | ![]() |
![]() |
![]() |
![]() |
Dr. Mo (Ours) | ![]() |
![]() |
![]() |
![]() |
Mountain river | Path in a tropical forest | Forest in Autumn | Dramatic ocean sunset |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |