Project 5

Part 5A 0: Setup

We will use the DeepFloydIF diffusion model. We use a random seed of seed = 10, here and throughout the project. In this part, we simply test out the model, generating images for 3 text prompts with captions. We do this across different values of num_inference_steps .

To see the images generated, click on the following link:
5A Part 0 Results

Part 5A 1.1: Implementing the Forward Porcess

Overview

The forward process is defined by \[ x_t = \sqrt{\bar{\alpha}_t}x_0 + \sqrt{1 - \bar{\alpha}_t}\varepsilon \] where \( \varepsilon \sim N(0,1) \). Here \(x_0\) is the clean image, the noisy image generated is \( x_t \), so we really are making the noisy image by sampling from a Gaussian of mean \( \sqrt{\bar{\alpha}_t} x_0\) and variance \( 1- \bar{\alpha}_t \). We perform the process on a test image of the campanile shown below, which we will resize to \( 64 \times 64 \).

Results

To see results for \( t\in [250,500,750] \), click the link below:

5A Part 1.1 Results

Part 5A 1.2: Gaussian Blur Denoising

Overview

We naively denoise images by blurring them with a fixed Gaussian. We use kernel_size = 5 and sigma = 1.5 for the Gaussian. We do this for the noise levels seen before.

Results

To see results for \( t\in [250,500,750] \), click the link below:

5A Part 1.2 Results

Part 5A 1.3: One-Step Denoising

Overview

Now, we'll use a pretrained diffusion model to denoise. The actual denoiser can be found at stage_1.unet. This is a UNet that has already been trained on a very, very large dataset of pairs of images \( x_0, x_t \). We can use it to recover Gaussian noise from the image. Then, we can remove this noise to recover (something close to) the original image. To compute \(x_0\) from \(x_t\), we use \[ \hat{x}_0 = \frac{1}{\sqrt{\bar{\alpha}_t}} [x_t - \sqrt{1 - \bar{\alpha}_t} \varepsilon_\theta(x_t,t)],\] where \(\varepsilon_\theta(\cdot,\cdot)\) is the noise predicted by our model for a given noisy image \(x_t\) and corresponding noise level \(t\).

Results

To see results for \( t\in [250,500,750] \), click the link below:

5A Part 1.3 Results

Part 5A 1.4: Iterative Denoising

Overview

We can denoise an image and recover more of the original features by denoising several times in small steps instead of denoising in one-step. Specifically, over \(T=1000\) iterations (step = -30), we denoise the image iteratively as follows: \[ x_{t'} = \frac{\sqrt{\bar{\alpha}_{t'}} \beta_t} {1 - \bar{\alpha}_t} x_0 + \frac{\sqrt{\alpha_t}(1-\bar{\alpha}_{t'})}{1-\bar{\alpha}_t} x_t + v_\sigma,\] where \(x_0\) is our current clean estimate via a one-step estimate from \(x_t\), \(v_\sigma\) is random noise (predicted for DeepFloyd), alphas and betas are known parameters. Here \(x_t\) is the more noisy image at some timestep \(t\), while \(x_{t'}\) is the less noisy image at the next timestep \(t'\) in our iteration sequence.