StylePortraitVideo: Editing Portrait Videos with Expression Optimization

Supplementary Material

paper1038

The webpage includes video results of the figures included in the main paper.

Go to the project page, Figure 1 (part 1), Figure 1 (part 2), and Figure 4~12 (part 3).

Figure 4. Reconstruction of input videos and their edited reults.
StyleGAN is a pre-trained StyleGAN model with FFHQ, and StyleGAN_va is a video adpated StyleGAN with the given input video. pSp and e4e are image encoders. Show video1, video2, video3, video4, video5, video6, video7, and video8.

Input Video

pSp + StyleGAN

e4e + StyleGAN

e4e + StyleGAN_va
(ours)

Edited
e4e + StyleGAN_va
(- Beard)

Input Video

pSp + StyleGAN

e4e + StyleGAN

e4e + StyleGAN_va
(ours)

Edited
e4e + StyleGAN_va
(+ Makeup)

Input Video

pSp + StyleGAN

e4e + StyleGAN

e4e + StyleGAN_va
(ours)

Edited
e4e + StyleGAN_va
(- Age)

Input Video

pSp + StyleGAN

e4e + StyleGAN

e4e + StyleGAN_va
(ours)

Edited
e4e + StyleGAN_va
(+ Age)

Input Video

pSp + StyleGAN

e4e + StyleGAN

e4e + StyleGAN_va
(ours)

Edited
e4e + StyleGAN_va
(+ Yaw)

Input Video

pSp + StyleGAN

e4e + StyleGAN

e4e + StyleGAN_va
(ours)

Edited
e4e + StyleGAN_va
(+ Chubby)

Input Video

pSp + StyleGAN

e4e + StyleGAN

e4e + StyleGAN_va
(ours)

Edited
e4e + StyleGAN_va
(- Nose Size)

Input Video

pSp + StyleGAN

e4e + StyleGAN

e4e + StyleGAN_va
(ours)

Edited
e4e + StyleGAN_va
(+ Chubby)

Figure 5. Motion blurred videos. Show video1, video2, and video3

Input Frame

e4e + StyleGAN

e4e + StyleGAN_va
(ours)

Edited
e4e + StyleGAN_va
(+ Age)

Input Frame

e4e + StyleGAN

e4e + StyleGAN_va
(ours)

Edited
e4e + StyleGAN_va
(+ Makeup)

Input Frame

e4e + StyleGAN

e4e + StyleGAN_va
(ours)

Edited
e4e + StyleGAN_va
(+ Beard)

Figure 6. Comparison to optimization variations.
Opt. runs optimization per frame to find the corresponding latent code. R + Opt. recurrently use the previous frame's latent code for initilization and optimize to find the latent code for the current frame. Show video1 and video2.

Input Video

Opt. + StyleGAN

R + Opt. + StyleGAN

Opt. + StyleGAN_va

R + Opt. + StyleGAN_va

e4e + StyleGAN_va
(ours)

Input Video

Opt. + StyleGAN

R + Opt. + StyleGAN

Opt. + StyleGAN_va

R + Opt. + StyleGAN_va

e4e + StyleGAN_va
(ours)

Figure 8. Qualitative results on expression editing.
Videos with Exp. Dyn. Optim. better follow the original input video's lip contact than those without Exp. Dyn. Optim. The eyes in the other hand follow the intented edited results. Note that for all expression directions, the mouth is opening, and our method tries to preserve the original expression dynamics such as the lip contact to follow the original motion while maintaining the edited expression results. Show video1, video2, video3, video4, video5, video6, video7, video8, and video9.

Input Video

w/o Exp. Dyn. Optim.

w/ Exp. Dyn. Optim. (+ Happiness)

Input Video

w/o Exp. Dyn. Optim.

w/ Exp. Dyn. Optim. (+ Happiness)

Input Video

w/o Exp. Dyn. Optim.

w/ Exp. Dyn. Optim. (+ Happiness)

Input Video

w/o Exp. Dyn. Optim.

w/ Exp. Dyn. Optim. (+ Anger)

Input Video

w/o Exp. Dyn. Optim.

w/ Exp. Dyn. Optim. (+ Anger)

Input Video

w/o Exp. Dyn. Optim.

w/ Exp. Dyn. Optim. (+ Anger)

Input Video

w/o Exp. Dyn. Optim.

w/ Exp. Dyn. Optim. (+ Anger)

Input Video

w/o Exp. Dyn. Optim.

w/ Exp. Dyn. Optim. (+ Surprise)

Input Video

w/o Exp. Dyn. Optim.

w/ Exp. Dyn. Optim. (+ Surprise)

Figure 9. Sequential editing results.
Expression dynamics optimziation can be perfomred after sequential editing for both eye and eyebrow shapes. Show video1 and video2.

Input Video

w/o Exp. Dyn. Optim.
(+ Anger)

w/o Exp. Dyn. Optim.
(+ Anger
- Arched eyebrow)

w Exp. Dyn. Optim.
(+ Anger)

w Exp. Dyn. Optim.
(+ Anger
- Arched eyebrow)

Input Video

w/o Exp. Dyn. Optim.
(+ Happiness)

w/o Exp. Dyn. Optim.
(+ Happiness
- Eye openess)

w Exp. Dyn. Optim.
(+ Happiness)

w Exp. Dyn. Optim.
(+ Happiness
- Eye openess)

Figure 11. Frame interpolation. Show video1 and video2.

Input Video

Reconstructed Video.

Frame Interpolated Video

Input Video

Reconstructed Video.

Frame Interpolated Video

Figure 12 (a). Limitations and capacity of StyleGAN3.
Texture sticking in the skin and eyebrows are effectively solved by using StyleGAN3. Show video1, video2, and video2.

Input Video

e4e+StyleGAN2_va

e4e+StyleGAN3_va

Input Video

e4e+StyleGAN2_va

e4e+StyleGAN3_va

Input Video

e4e+StyleGAN2_va

e4e+StyleGAN3_va

Figure 12 (b). Limitations and capacity of StyleGAN3.
Misalignment of the eyes is effectively solved by using StyleGAN3. Show video1.

Input Video

e4e+StyleGAN2_va

e4e+StyleGAN3_va