Casual GAN Papers: CIPS-3D

65: CIPS-3D

CIPS-3D: A 3D-Aware Generator of GANs Based on Conditionally-Independent Pixel Synthesis by Peng Zhou et al. explained in 5 minutes

โญ๏ธPaper difficulty: ๐ŸŒ•๐ŸŒ•๐ŸŒ•๐ŸŒ‘๐ŸŒ‘

CIPS-3D teaser

๐ŸŽฏ At a glance:

If you have been following generative ML for a while you might have noticed more and more GAN papers focusing on the underlying 3D representation of the generated images. CIPS-3D is a 3D-aware GAN model proposed by Peng Zhou and the team at Shanghai Jiao Tong University & Huawei that combines a low-res NeRF (surprise) with a CIPS generator (genuine surprise) to achieve high quality 256x256 3D-aware image synthesis as well as transfer learning and 3D-aware face stylization.

โŒ›๏ธ Prerequisites:

(Highly recommended reading to understand the core contributions of this paper):
2) NeRF

๐Ÿš€ Motivation:

Existing methods either use a full NeRF that is limited to low resolution due to memory constraints or a small NeRF that is combined with a 2D CNN upsampler with an unfortunate aliasing side-effect due to an unideal filter. Whereas CIPS synthesizes each pixel independently without any upsampling, hence bypassing this issue altogether. Furthermore, many 3D-aware GANs suffer from a mirror symmetry problem that is solved in CIPS-3D with an auxiliary discriminator that regularizes the NeRF output. CIPS training is typically very time-intensive, and CIPS-3D provides a faster modulated fully connected layer for faster training.

๐Ÿ” Main Ideas:

1) Modified NeRF:
A shallow NeRF with 3 SIREN blocks is used as the first few layers of the generator network. Each block consists of a fully connected layer followed by FiLM conditioning (a fancy way to say AdaIN, where instead of mean and std you have frequency and phase) and a Sine activation function. The NeRF module does not use viewing direction and predicts a generalized feature vector instead of a color. The output of this shallow NeRF is computed via classical volume rendering techniques.

2) INR Network for Appearance:
The shallow NeRF is followed by a pixel-wise CIPS generator that produces the final RGB image by summing up the outputs of the intermediate toRGB layers after each block. Partial gradient propagation is used to train on high-res images. This technique leverages the pixel-wise nature of the generator and backpropagates the gradient through the NeRF module for a random subset of all sampled pixels to fit the model into memory. Additionally, the modulated fully connected layer is reimplemented in a more efficient way using batch matrix multiplication.

3) Overcoming Mirror Symmetry:
Apparently, 3D-aware models suffer from a mirror symmetry problem, where synthesized images suddenly flip horizontally when the camera passes the center line of symmetry. CIPS-3D solves this problem using learnable positional encoding along with an auxiliary discriminator that looks directly at the shallow NeRF output

๐Ÿ“ˆ Experiment insights / Key takeaways:
  • Datasets: FFHQ
  • Baselines: CIPS-3D outperforms GIRAFFE, pi-GAN, StyleNeRF in terms of FID and KID and almost matches StyleGAN-2 on 2D images
  • 27.78 batches per second with batch size of 4096 on 8 V100
  • Ablations: adding viewing direction - identity inconsistencies, learned positional encoding without the auxiliary discriminator hurts FID, 96x96 pixels is enough for the partial backpropagation to work almost as good as using all pixels
  • By freezing the NeRF layer and fine-tuning the INR, it is possible to do transfer learning to a different domain
  • Swapping or interpolating INR layers between different models has the same effect as cartoonization in StyleGAN2

๐Ÿ–ผ๏ธ Paper Poster:

CIPS-3D poster

๐Ÿ›  Possible Improvements:
  • Current stylization only works for generated images, since there isnโ€™t an encoder for real images
โœ๏ธMy Notes:
  • (2/5) CIPS-3D is a so-so name - not very clever nor funny
  • Real happy to see a CIPS follow-up!
  • Partial gradients are a great idea, I had a lot of headaches fitting things into memory when inverting CIPS
  • Unfortunately, there is still the heavy texture sticking that plagued CIPS, and the results are in 256x256
  • I wonder if the authors tried to train progressively to upscale CIPS-3D to 1024x1024
  • The transfer learning is awesome, just one question - why no experiments with CLIP?!
  • Share your thoughts on CIPS-3D in the comments!

CIPS-3D arxiv / CIPS-3D github

๐Ÿ‘‹ Thanks for reading!

Join Patreon for Exclusive Perks!

If you found this paper digest useful, subscribe and share the post with your friends and colleagues to support Casual GAN Papers!

Join the Casual GAN Papers telegram channel to stay up to date with new AI Papers!

Discuss the paper

By: @casual_gan

P.S. Send me paper suggestions for future posts @KirillDemochkin!