This is a collection of the papaers I’ve read.


Authors: Tero Karras, Timo Aila, Samuli Laine, Jaakko Lehtinen, 2018

alternate text
  • Github, Youtube, Trained network
  • Image generation from CELEBA with GAN(Generative Adversarial Networks) using generator and discriminator progressively.
  • 8.80 in unsupervised CIFAR10
  • GAN is differentiable and this allows us to guide generators and discriminators to the right direction.
  • High-resolution image generation is difficult as discriminator can more easily distinguish fakes from real images, thus amplifies gradient problem.
  • The paper used low-resolution training sets in the beginning, and add new layers that introduce higher-resolution details as the training progresses.
  • They used minibatch discrimination in order to compute feature statistics across the minibatch which is added towards the end of the discriminator.
  • Used \(\mathcal{N}(0,1)\) for weight initialization and then scale at runtime (He et al., 2015).
  • In order to prevent the escalation of signal magnitudes, used a variant of “local response normalization”(Krizhevsky et al., 2012), \(b_{x,y} = a_{x,y} \Big/ \sqrt{\frac{1}{N} \sum_{j=0}^{N-1} (a_{x,y}^j)^2 + \epsilon }\) where
    • \(\epsilon = 10^{-s}\)
    • \(N\), the number of feature maps
    • \(a_{x,y}, b_{x,y}\) original and normalized feature vector in pixel (x, y)
  • Used sliced Wasserstein distance(SWD) and multi-scale structural similarity(MS- SSIM) (Odena_et_al_2017) to evaluate the importance our individual contributions, and also percep- tually validate the metrics themselves
  • Progressive variant offers two main benefits: it converges to a considerably better optimum and also reduces the total training time by about a factor of two.


Authors: Alec Radford & Luke Metz, Soumith Chintala, 2016

Generating natural images

  • parametric
    • samples often suffer from being blurry
    • iterative forward diffusion process (Sohl-Dickstein et al., 2015)
    • GAN (Goodfellow et al., 2014) suffers from being noisy and incomprehensible.
      • A laplacian pyramid extension to this approach (Denton et al., 2015) showed higher quality images, but they still suffered from the objects looking wobbly because of noise introduced in chaining multiple models.
      • A recurrent network approach (Gregor et al., 2015) and a deconvolution network approach (Dosovitskiy et al., 2014) have also recently had some success with generating natural images. However, they have not leveraged the generators for supervised tasks.
  • non-parametric
    • do matching from a database of existing images

Approach & Architecture

Until LAPGAN (Denton et al., 2015) appeared GANs using CNNs to model images was not scalable. LAPGAN is an alternative approach to iteratively upscale low resolution generated images which can be modeled more reliably.

  • Used convolutional net (Springenberg et al., 2014) which replaces deterministic spatial pooling functions (such as maxpooling) with strided convolutions, allowing the network to learn its own spatial downsampling/upsampling. Used in generators and discriminators.
  • Eliminated fully connected layers on top of convolutional features. (Mordvintsev et al.) used this approach in their art image classifiers with global average pooling.
  • Batch Normalization (Ioffe & Szegedy, 2015) which stabilizes learning by normalizing the input to each unit to have zero mean and unit variance. This helps deal with training problems that arise due to poor initialization and helps gradient flow in deeper models. Applying batchnorm to all layers however, resulted in sample oscillation and model instability. This was avoided by not applying batchnorm to the generator output layer and the discriminator input layer.
  • Architecture guidelines for stable Deep Convolutional GANs
    • Replace any pooling layers with strided convolutions (discriminator) and fractional-strided convolutions (generator).
    • Use batchnorm in both the generator and the discriminator.
    • Remove fully connected hidden layers for deeper architectures.
    • Use ReLU activation in generator for all layers except for the output, which uses Tanh.
    • Use LeakyReLU activation in the discriminator for all layers.

Visualizing and Understanding Convolutional Networks

alternate text

< Top: A deconvnet layer (left) attached to a con- vnet layer (right). The deconvnet will reconstruct an approximate version of the convnet features from the layer beneath. Bottom: An illustration of the unpooling operation in the deconvnet, using switches which record the location of the local max in each pooling region (colored zones) during pooling in the convnet. >

Why do they run rectifier in deconvnet??? -> The author says it’s because the rectifier is used in forward passing so should be used in the backward passing as well. I don’t think they have a really good reason. Read this Quora answer.

In computer vision it is important to have image patterns(filters) that cause high activations. The purpose of deconvolution is to visualize those….

alternate text

< Visualization of features in a fully trained model. For layers 2-5 we show the top 9 activations in a random subset of feature maps across the validation data, projected down to pixel space using our deconvolutional network approach. Our reconstructions are not samples from the model: they are reconstructed patterns from the validation set that cause high activations in a given feature map. For each feature map we also show the corresponding image patches. Note: (i) the the strong grouping within each feature map, (ii) greater invariance at higher layers and (iii) exaggeration of discriminative parts of the image, e.g. eyes and noses of dogs (layer 4, row 1, cols 1). >