DEV Community

Aditi Baheti
Aditi Baheti

Posted on

From Day to Night: Building a CycleGAN for Image Translation

Introduction

Welcome to the exciting world of image translation! Have you ever wondered how a scene would look at night if you only have its day image? Using CycleGANs, we can transform images from one domain to another, like day to night and vice versa, without the need for paired examples. Let's dive into this fascinating journey and see how we can achieve this using CycleGANs.

Background

Generative Adversarial Networks (GANs)

Generative Adversarial Networks (GANs) are a type of neural network where two models compete against each other. The Generator creates images, trying to fool the Discriminator, which attempts to distinguish between real and fake images. This adversarial process helps the Generator produce highly realistic images.

CycleGANs

CycleGANs take GANs a step further by introducing cycle consistency. Instead of just one generator-discriminator pair, CycleGANs have two pairs, each learning to translate images from one domain to another. The cycle consistency ensures that if you translate an image from domain A to domain B and back to domain A, you should end up with the original image. This makes CycleGANs powerful for unpaired image-to-image translation tasks.

Image description

Dataset Preparation

Our dataset consists of day and night images. We split these images into training and testing sets to evaluate our model's performance. Specifically, we used 80% of the images for training and 20% for testing. The dataset comprises:

  • Training Day Images: 417
  • Testing Day Images: 105
  • Training Night Images: 181
  • Testing Night Images: 46

Splitting the dataset ensures that our model can generalize well to new, unseen data. The preparation of the dataset is crucial as it directly impacts the model's performance.

Hyperparameters

Setting the right hyperparameters is key to training a successful model. For our CycleGAN, we carefully chose parameters such as the number of epochs, learning rate, batch size, and image size. These parameters control the training process and significantly influence the model's performance. Here are some of the essential hyperparameters we used:

  • Epoch: Starting epoch for training.
  • n_epochs: Total number of epochs for training, set to 200.
  • Batch Size: Number of images fed into the model at once, set to 4.
  • Learning Rate: Set to 0.0002, controls how much to change the model in response to the estimated error each time the model weights are updated.
  • Decay Start Epoch: Epoch at which learning rate decay starts, set to 100.
  • Image Size: Dimensions to which images are resized before feeding into the model, set to 128x128 pixels.
  • Channels: Number of color channels in the images, set to 3 (RGB).
  • Lambda_cyc: Weight for the cycle consistency loss, set to 10.0.
  • Lambda_id: Weight for the identity loss, set to 5.0.
  • Beta1 and Beta2: Coefficients used for computing running averages of gradient and its square, set to 0.5 and 0.999, respectively.

Data Augmentation

To make our model robust, we applied data augmentation techniques such as resizing, normalizing, and random flipping. These augmentations help the model learn to generalize better by seeing various transformations of the input images. In our implementation, we used:

  • Resizing: Images are resized to 128x128 pixels using bicubic interpolation.
  • Normalization: Pixel values are normalized to the range [-1, 1].
  • Random Flipping: Although our current implementation does not include random flipping, this is a common technique used in data augmentation to make the model more robust.

Custom Dataset Class

We created a custom dataset class to handle loading and transforming images from the day and night domains. This class reads images, applies the necessary transformations, and prepares the data for the model. It also supports unaligned image pairs, making it versatile for different datasets.

Unaligned Image Pairs

In traditional supervised learning tasks, datasets consist of paired examples, where each input corresponds to a specific output. However, in many real-world scenarios, such paired datasets are not available. This is where unaligned image pairs come into play. Our dataset class supports unaligned image pairs, which means it can handle cases where the day and night images are not perfectly matched pairs. This flexibility is crucial for training on unpaired datasets, as it allows the model to learn from a broader range of examples, making it more generalizable.

Replay Buffer

A Replay Buffer is used to store previously generated images, which are then reused during training. This technique helps stabilize the training process by providing the Discriminator with a mix of recent and older generated images, preventing it from overfitting to the most recent ones. Our buffer stores up to 50 previously generated images.

Importance and Advantages of Replay Buffer

  • Stabilizes Training: By providing a mix of recent and older generated images, it prevents the Discriminator from becoming too adapted to the most recent outputs of the Generator.
  • Improves Generalization: By reusing images, it helps the Generator learn to produce more varied and realistic images over time.
  • Efficient Use of Data: Ensures that generated images are not wasted and are used effectively to improve the model.

Implementation

In our implementation, the Replay Buffer stores up to 50 previously generated images. When new images are generated, there is a 50% chance that an image from the buffer will be used instead. This randomness helps in keeping the training process dynamic and effective.

LambdaLR

LambdaLR is a learning rate scheduler that helps in decaying the learning rate after a certain number of epochs. This is crucial for ensuring that the model converges smoothly without abrupt changes in learning rates, leading to better and more stable training. The scheduler adjusts the learning rate linearly starting from the decay start epoch.

Initialization of Convolutional Weights

Initializing the weights of the convolutional layers correctly is vital for stable training. We used normal initialization, setting the mean to 0 and the standard deviation to 0.02, which is a standard practice for GANs. This helps in speeding up the convergence and achieving better results.

Model Architecture

Generator

Our Generator uses a ResNet architecture consisting of several convolutional layers, normalization layers, and residual blocks. Residual blocks are essential as they help in retaining the image features across layers, crucial for generating high-quality images. Here's a detailed breakdown:

  • Initial Convolution Block: Pads and convolves the input image to start the feature extraction.
  • Downsampling Layers: Reduce the spatial dimensions, increasing the feature depth.
  • Residual Blocks: We used 19 residual blocks that maintain the image's features while allowing deeper layers to learn more abstract representations.
  • Upsampling Layers: Increase the spatial dimensions back to the original size.
  • Output Layer: Produces the final translated image using a Tanh activation function.

Image description

PatchGAN Discriminator

Our Discriminator uses a PatchGAN architecture, focusing on classifying patches of the image as real or fake. This approach allows the model to capture fine details, making the generated images more realistic.

What is PatchGAN?

PatchGAN is a type of GAN architecture that classifies each patch of the image as real or fake, rather than the entire image. This technique helps in capturing high-frequency details and textures, leading to more realistic outputs.

Advantages of PatchGAN

  • Detail Preservation: By focusing on small patches, it helps in preserving fine details and textures.
  • Computational Efficiency: It is more computationally efficient than processing the entire image, making it faster and less resource-intensive.
  • Improved Realism: Helps in generating images that are more visually appealing and realistic by focusing on local features.

Discriminator Architecture

  • Convolutional Blocks: Layers with convolution, normalization, and activation functions to extract features.
  • PatchGAN Output: Outputs a matrix representing the probability of each patch being real.

Loss Functions

We employed three types of loss functions to train our CycleGAN:

  1. Adversarial Loss: Ensures that the generated images look realistic by fooling the Discriminator, implemented using Mean Squared Error (MSE) loss.
  2. Cycle Consistency Loss: Ensures that translating an image to the other domain and back results in the original image, implemented using L1 loss.
  3. Identity Loss: Ensures that images already in the target domain are preserved during translation, also implemented using L1 loss.

Optimizers and Gradient Clipping

We used Adam optimizers to update the weights of our models, with separate optimizers for the Generators and Discriminators. Gradient clipping was applied to prevent the gradients from exploding, which helps in stabilizing the training process.

Training Procedure

The training process involves the following steps:

  1. Forward Pass: Generate fake images using the Generators.
  2. Compute Losses: Calculate adversarial, cycle consistency, and identity losses.
  3. Backward Pass: Compute gradients and update model weights using the optimizers.
  4. Gradient Clipping: Clip gradients to a maximum value to prevent exploding gradients.
  5. Learning Rate Scheduling: Adjust the learning rate during training to ensure smooth convergence.

We also used a Replay Buffer to store previously generated images and a LambdaLR scheduler to adjust the learning rate during training.

Evaluation

During evaluation, we generated images from the validation set and compared them with the real images. This helps us understand how well the model has learned the mappings between the domains. We saved model checkpoints periodically to monitor progress.

Visualization and Results

After training our CycleGAN model, it is crucial to visualize the results to assess the quality of the image translations. Below are the visualizations of the real and generated images along with the training loss curves.

Image Translations

The first image grid showcases the real images from the day and night domains and their corresponding generated counterparts. Each row contains:

  • First Column: Real day images.
  • Second Column: Generated night images from the corresponding real day images.
  • Third Column: Real night images.
  • Fourth Column: Generated day images from the corresponding real night images.

Image description

Analysis of Image Translations

  • Visual Quality: The generated night images capture the dark tones and lighting typical of nighttime scenes. Similarly, the generated day images retain the brightness and color characteristic of daytime.
  • Detail Preservation: The model manages to preserve significant details from the original images, such as buildings, streets, and landscapes, while translating the overall ambiance from day to night and vice versa.
  • Consistency: There is a consistent style in the generated images, indicating that the model has learned the translation mapping effectively.

Training Loss Curves

The second figure illustrates the training loss curves for both the Generator (G) and the Discriminator (D) over the training epochs.

Image description

Analysis of Training Loss Curves

  • Generator Loss (G): The generator loss shows a decreasing trend, which suggests that the Generator is improving its ability to produce realistic images that can fool the Discriminator over time. There are fluctuations, which are typical in GAN training due to the adversarial nature.
  • Discriminator Loss (D): The discriminator loss remains relatively low and stable throughout the training process, indicating that the Discriminator effectively distinguishes between real and fake images. The stability of the discriminator loss is a good sign, suggesting that the training process is balanced.

Key Observations

  • Training Stability: The loss curves indicate that the training process was stable, with the Generator and Discriminator learning effectively from each other.
  • Improvement Over Time: The gradual decrease in the Generator loss highlights that the model becomes better at generating realistic images as training progresses.
  • Balanced Adversarial Training: The consistent discriminator loss shows that the Discriminator is performing its role effectively without overwhelming the Generator, ensuring a balanced adversarial process.

These visualizations and analysis of the training loss curves demonstrate the effectiveness of our CycleGAN model in translating day images to night images and vice versa. The results indicate that the model has successfully learned the mappings between the two domains, producing realistic and visually appealing image translations.

Conclusion

CycleGANs are a powerful tool for image translation tasks without requiring paired datasets. By using adversarial, cycle consistency, and identity losses, CycleGANs can generate realistic translations between two domains. This implementation demonstrates the potential of CycleGANs in tasks such as day-to-night image translation, offering valuable insights into their workings and applications.

Generalization

The model we built for day-to-night image translation is generalizable to other cyclic GAN datasets as well. For instance, it can be used for tasks like translating horses to zebras, summer to winter landscapes, or even artistic style transfer. The same principles and architectures apply, making CycleGANs a versatile solution for many image-to-image translation problems.

Detailed Steps

  1. Dataset Preparation: Collected images from day and night domains, split them into training and testing sets, and applied data augmentations.
  2. Hyperparameters: Defined key parameters such as learning rate, batch size, and the number of epochs.
  3. Custom Dataset Class: Created a class to load and transform images, handling both aligned and unaligned image pairs.
  4. Replay Buffer: Implemented a buffer to store and reuse previously generated images to stabilize training.
  5. LambdaLR: Used a learning rate scheduler to adjust the learning rate during training.
  6. Initialization of Convolutional Weights: Applied normal initialization to the convolutional layers for stable training.
  7. Model Architecture: Implemented Generators and Discriminators using ResNet and PatchGAN architectures, respectively.
  8. Loss Functions: Used adversarial, cycle consistency, and identity losses to train the models.
  9. Optimizers and Gradient Clipping: Used Adam optimizers and applied gradient clipping to prevent exploding gradients.
  10. Training Loop: Performed forward and backward passes, computed losses, updated model weights, and applied gradient clipping.
  11. Evaluation: Generated images from the validation set and saved model checkpoints periodically.
  12. Visualization: Displayed real and generated images side by side, labeled for clarity.

By following these detailed steps, we implemented a CycleGAN model capable of translating images between day and night domains, demonstrating the versatility and power of GAN-based image translation.

Feel free to reach out if you have any questions or need further clarification on any part of the implementation. Happy coding!

Top comments (0)