What I learned about GANs after trying to build the world's first Rectangular image GAN

8 min readJan 29, 2019

When I first learned about GANs (Generative Adversarial Networks), I was intrigued by how they can be used to generate new images from nothing. But once I started playing with the code associated with the many GAN implementations out there, a fundamental flaw was exposed: these GANs can only be trained on and can only generate square images. And the vast majority of pictures I had access to were rectangular. This surely made it impractical to use GANs to create my own art.

Disclaimer: I’m no expert in GANs as of this writing, and if you’re even somewhat well-versed in this area, it’s very likely you’ll be screaming at me for my naive implementation of a rectangular GAN. I only set out to find a way to hack an easy and intuitive GAN implementation to start generating rectangular images, even if this implementation is less than optimal. Thus, RectGAN was born.

What was the original GAN implementation for RectGAN?

I used this implementation of an Anime face GAN with the neural networks implemented in Keras. Like 99.9999% (if not 100%) of all GAN image generators written as of now, it only worked with square images, and in this case, 64x64 images of faces. To adjust the Discriminator and Generator to start recognizing and generating rectangular images, I had to alter its architecture. I started looking at the Generator, original code here:

def build_gen( shape ) :
    def deconv2d( x, filters, shape=(4, 4) ) :
        '''
        Conv2DTransposed gives me checkerboard artifact...
        Select one of the 3.
        '''
        # Simpe Conv2DTranspose
        # Not good, compared to upsample + conv2d below.
        x= Conv2DTranspose( filters, shape, padding='same',
            strides=(2, 2), kernel_initializer=Args.kernel_initializer )(x)# simple and works
        #x = UpSampling2D( (2, 2) )( x )
        #x = Conv2D( filters, shape, padding='same' )( x )# Bilinear2x... Not sure if it is without bug, not tested yet.
        # Tend to make output blurry though
        #x = bilinear2x( x, filters )
        #x = Conv2D( filters, shape, padding='same' )( x )x = BatchNormalization(momentum=Args.bn_momentum)( x )
        x = LeakyReLU(alpha=Args.alpha_G)( x )
        return x# https://github.com/tdrussell/IllustrationGAN  z predictor...?
    # might help. Not sure.noise = Input( shape=Args.noise_shape )
    x = noise
    # 1x1x256
    # noise is not useful for generating images.x= Conv2DTranspose( 512, (4, 4),
        kernel_initializer=Args.kernel_initializer )(x)
    x = BatchNormalization(momentum=Args.bn_momentum)( x )
    x = LeakyReLU(alpha=Args.alpha_G)( x )
    # 4x4
    x = deconv2d( x, 256 )
    # 8x8
    x = deconv2d( x, 128 )
    # 16x16
    x = deconv2d( x, 64 )
    # 32x32# Extra layer
    x = Conv2D( 64, (3, 3), padding='same',
        kernel_initializer=Args.kernel_initializer )( x )
    x = BatchNormalization(momentum=Args.bn_momentum)( x )
    x = LeakyReLU(alpha=Args.alpha_G)( x )
    # 32x32x= Conv2DTranspose( 3, (4, 4), padding='same', activation='tanh',
        strides=(2, 2), kernel_initializer=Args.kernel_initializer )(x)
    # 64x64return models.Model( inputs=noise, outputs=x )

I saw 6 Convolutional layers here and made the connection: 2 to the 6th power is 64 and each convolutional layer used a stride of (2,2) by default. Long story short, I tried changing the strides of these layers to accommodate 256x192 images:

Old Code:

    x = deconv2d( x, 256 )
    # 8x8
    x = deconv2d( x, 128 )
    # 16x16
    x = deconv2d( x, 64 )

New Code:

    x = deconv2d( x, 256 , mystrides = (4,4))
    # 8x8
    x = deconv2d( x, 128 , mystrides = (4,3))
    # 16x16
    x = deconv2d( x, 64 , mystrides = (2,2))

Note that I’ve also redefined the function deconv2d() to take the parameter “mystrides” to specify a custom stride.

Check: 4*4*2*2*2*2 = 256 (the product of all strides along the X-dimension)
Check: 4*3*2*2*2*2 = 192 (the product of all strides along the Y-dimension)

Additional Tweaks I’ve made

Aspect Ratios:

Of course, all of the other modules in the repository need to understand that we’re dealing with 256x192 images instead of 64x64 and I updated various lines to reflect that. Eventually, I’ve added support for various other aspect ratios and respective resolutions as follows:

1:4–64x256
1:2–64x128, 128x256
9:16–144x256
3:4–192x256, 96x128
2:3–128x192, 64x96
1:1–64x64, 128x128, 256x256
3:2–192x128, 96x64
4:3–256x192, 128x96
16:9–256x144
2:1–128x64, 256x128
4:1–256x64

I’ve added a routine to detect a supported resolution and automatically construct the discriminator and generator for that particular resolution.

Batch Size and Initial Random Vector Dimensions

I experimented with various batch sizes that my measly 4 GB GPU could handle and found that smaller batch sizes trained faster without much loss in quality. The lowest batch size that seemed to be accepted was 16 so I stuck with that.

Similarly, it seems nearly everyone used an initial random vector of dimensions (1,1,100) but I got away with dimensions as low as (1,1,8) with no noticeable difference in quality and a substantial increase in training speed. I don’t know why but I figure that even if these parameters end up proving to be poor choices, I’ll at least be able to iterate over and tweak the other parameters in rapid fire succession before working on these parameters which could significantly slow down the training.

Additional Layers:

My GAN’s performance was starting to deteriorate after training about 10,000 to 20,000 epochs and the discriminator loss was starting to increase and stay high, which probably indicated that the discriminator was reaching its limits. So I decided to increase the complexity of the discriminator by adding 2 additional layers and 2 additional layers to the generator for symmetry; these additional layers had a stride of (1,1) so that they don’t end up changing the output dimensions. After making this change, even after about 1 million epochs, my discriminator and generator were still going strong and the image quality was still improving, albeit slower than in the beginning.

Discriminator and Generator Learning Rates

The original Anime GAN used a discriminator learning rate of 0.002 and a generator learning rate half of that at 0.001. I found that these settings were fine for certain data sets but others started showing signs of instability between the generator and discriminator after training for a long time, usually with the discriminator not keeping up with the generator. In those cases, I set the discriminator learning rate at 4x that of the generator’s instead of 2x which seemed to have cured the instability.

Tweaks that didn’t seem to make a difference

Although I didn’t have a chance to extensively tweak all of the parameters, I’ve looked at more than what was mentioned in the previous section. Some of those that I ended up leaving at the default settings after experimenting include:

Dropout

I first tried adding dropout to every layer except the final one in both the generator and discriminator and the the resulting GAN simply failed to generate any meaningful images. So I tried only putting dropout after the final hidden layer and my GAN ended up generating new images, albeit slowly, if I set the dropout relatively low (like 0 to 0.3.) It didn’t seem to effectively prevent overfitting or mode collapse while slowing down progress which completely defeated the purpose of dropout. So I left dropout at 0 although I tweaked the code to have it as an additional feature and parameter.

Adam Beta

The author of the Anime GAN says he used 0.5 since everyone seems to use that and he tried many other settings including 0.1, 0.9, etc. with no luck. Although I did manage to get my GAN to generate reasonable and meaningful images with extreme values like 0.1 and 0.1 for the Adam Beta, some settings resulted in instability while others resulted in slow training. I ended up sticking with 0.5 which compromised between speed and stability.

Batch Normalization Momentum

Like Adam Beta, it seems the default setting here also ended up being a sweet spot. I couldn’t get any meaningful images generated when I used a setting of 0.1 or 0.5 and didn’t both tweaking the this slightly away from its default of 0.3. (I didn’t want to overfit my hyperparameters in case 0.25 or 0.35 worked slightly better.)

How to Get Images

Here are some tools that I’ve found to be useful in creating your own dataset for training your GANs, as well as your other image classification models.

Pixabay Scraper

Here’s a simple Python interface to the Pixabay API. Although you can use any images you’ve found to train your own models, if you want to publish your code with the actual images you’ve used to train your models for public consumption, it’s best to use royalty-free images like those found on Pixabay. One big drawback here is that you can only download up to 500 images from every query which may make it difficult to grab large image training sets. If you choose to go this route, I highly suggest you limit your search to only vertical or horizontal images so that you won’t need to eliminate as many images before training your GAN. You may still want to eliminate some images with dimensions that are “outliers” compared to the average dimensions of your sample.

Scrape Google Images

Here’s a notebook from the Fast.ai machine learning course which is designed to help you scrape images from a Google Images search result. You’re limited to up to 700 search results, but unlike Pixabay, some of them may be copyrighted so you probably don’t want to be make your image dataset publicly available.

Scrape Pinterest

I built a scraper to automatically download all images from a Pinterest board or multiple boards. It uses the Python bindings for Selenium to scrape images which plays nicely with the Javascript code on Pinterest’s site, but it can be a bit buggy when it comes to Pinterest boards with lots of images. And like Google Images, many of these images may also be copyrighted, but unlike Google Images and Pixabay, you can potentially scrape a much larger # of images at once here.

Other Datasets

I’ll briefly mention other sources of images that exist although I haven’t had a chance to these explore in detail:

Future Directions and Concluding Remarks

To be honest, the version of RectGAN currently out as of this writing is a MVP. I thought of multiple directions that I can take this project, but it would be nice to garner some feedback before deciding. Here are some areas I’ve thought of improving:

Standalone program with a Better UI: Instead of using a text file to change the settings, how about using a tkInter UI? One of the reasons deepfakes became so popular was that the program was available as a standalone download and boasted a basic UI. You didn’t need to be a Machine Learning Engineer or even a Python programmer to use it.
Pytorch or Fast.ai backend: Although Keras and Tensorflow are still king in 2019 when it comes to Deep Learning, Pytorch is rapidly catching on due to its greater flexibility. And just like how Keras simplified many of the most common best practices for Tensorflow, Fast.ai is also doing the same for Pytorch. Since I like to do my Deep Learning within the Fast.ai ecosystem, it’s best to write my GANs to better integrate and play nicely with that ecosystem.
Image Size: This is a very handy module for getting the dimensions of an image without opening it completely using, say, PIL or Opencv. This made it possible to filter out images by size very quickly so you can discard those whose dimensions are way off.
Different Network Architectures: I’ve only touched the tip of the iceberg when it comes to implementing a GAN that generates Rectangular images by forking it off of a relatively successful DCGAN implementation. There are countless other implementations that I still haven’t explored as a basis for generating non-square images, especially by taking advantage of Pytorch’s dynamic graph capabilities.

Have you also tried creating Rectangular images using GANs? Is there anything that I’m doing here that’s a blatant violation of best practices for GANs? If so, your feedback will be greatly appreciated!