High-quality portrait image editing has been made easier by recent advances in GANs (e.g., StyleGAN) and GAN inversion methods that project images onto a pre-trained GAN's latent space. However, extending the existing image editing methods, it is hard to edit videos to produce temporally coherent and natural-looking videos. We find challenges in reproducing diverse video frames and preserving the natural motion after editing.
In this work, we propose solutions for these challenges. First, we propose a video adaptation method that enables the generator to reconstruct the original input identity, unusual poses, and expressions in the video. Second, we propose an expression dynamics optimization that tweaks the latent codes to maintain the meaningful motion in the original video. Based on these methods, we build a StyleGAN-based high-quality portrait video editing system that can edit videos in the wild in a temporally coherent way at up to 4K resolution.
Our video adaptation method uses a pre-trained image encoder, e4e, and finetunes the pre-trained StyleGAN model to invert the video. We then employ latent-based editing techniques to edit the video.
We additionally optimize the latent codes of StyleGAN to suprress unwanted changes that may occur during the expression editing in the video.
@article{seo2022spv,
author = {Seo, Kwanggyoon and Oh,Seoung Wug and Lu, Jingwan and Lee, Joon-Young and Kim, Seonghyeon and Noh, Junyong},
title = {StylePortraitVideo: Editing Portrait Videos with Expression Optimization},
journal = {Computer Graphics Forum},
volume = {41},
number = {7},
year = {2022}
}