Project 5: Diffusion Models

1.1 Implementing the Forward Process

In this section, we implement the forward process of diffusion models using the formula:

$$ x_t = \sqrt{\bar\alpha_t} x_0 + \sqrt{1 - \bar\alpha_t} \epsilon \quad \text{where}~ \epsilon \sim N(0, 1) $$

Key variables:

Steps:

  1. Add varying levels of noise to the Berkeley Campanile image.
  2. Generate noisy images for \(t=250, 500, 750\).
Berkeley Campanile
Berkeley Campanile
Noisy Campanile t=250
t=250
Noisy Campanile t=500
t=500
Noisy Campanile t=750
t=750

1.2 Classical Denoising

In this section, Gaussian blur is applied to noisy images generated from the forward process to evaluate the denoising quality. Steps:

Noisy vs. Gaussian Blur Denoising Campanile at t=250
Noisy vs. Gaussian Blur Denoising Campanile at \(t=250\)
Noisy vs. Gaussian Blur Denoising Campanile at t=500
Noisy vs. Gaussian Blur Denoising Campanile at \(t=500\)
Noisy vs. Gaussian Blur Denoising Campanile at t=750
Noisy vs. Gaussian Blur Denoising Campanile at \(t=750\)

1.3 One-Step Denoising

In this section, the goal is to denoise images in one step by predicting noise using a UNet model and reconstructing the original image based on the given formula:

$$ x_t = \sqrt{\bar\alpha_t} x_0 + \sqrt{1 - \bar\alpha_t} \epsilon $$

Steps:

  1. Generate noisy images using the forward() function.
  2. Use a UNet model to predict the noise \( \epsilon \).
  3. Reconstruct the original image \( x_0 \) using the formula:

$$ x_0 = \frac{x_t - \sqrt{1 - \bar\alpha_t} \epsilon}{\sqrt{\bar\alpha_t}} $$

Iteratively Denoised
Noisy Campanile vs. One-Step Denoised Campanile at t=250
Gaussian Blur
Noisy Campanile vs. One-Step Denoised Campanile at t=500
One-Step Denoising
Noisy Campanile vs. One-Step Denoised Campanile at t=750

1.4 Iterative Denoising

In this section, the iterative denoising process is performed by gradually refining the noisy image using the formula:

$$ x_{t'} = \frac{\sqrt{\bar\alpha_{t'}}\beta_t}{1 - \bar\alpha_t} x_0 + \frac{\sqrt{\alpha_t}(1 - \bar\alpha_{t'})}{1 - \bar\alpha_t} x_t + v_\sigma $$

Steps:

  1. Create a sequence of timesteps from \(t=990\) to \(t=0\) with a step size of 30.
  2. Iteratively denoise the image using the formula above.
  3. Compare the results of iterative denoising, one-step denoising, and Gaussian blur denoising.
Noisy Campanile at t=90

1.5 Diffusion Model Sampling

In this section, we generate images from random noise by applying the iterative denoising process guided by a text prompt.

Steps:

Sample 1
Sample 1
Sample 2
Sample 2
Sample 3
Sample 3
Sample 4
Sample 4
Sample 5
Sample 5

1.6 Classifier-Free Guidance (CFG)

Classifier-Free Guidance (CFG) improves image quality by enhancing the conditional noise estimation based on the formula:

$$ \epsilon = \epsilon_u + \gamma (\epsilon_c - \epsilon_u) $$

Steps:

  1. Run the UNet twice to obtain conditional and unconditional noise estimations.
  2. Combine the estimations using the CFG formula.
  3. Use the enhanced denoising process to generate higher-quality images.
CFG Sample 1
CFG Sample 1
CFG Sample 2
CFG Sample 2
CFG Sample 2
CFG Sample 3
CFG Sample 2
CFG Sample 4
CFG Sample 2
CFG Sample 5

1.7 Image-to-Image Translation

1.7.1 Editing Hand-Drawn and Web Images

Steps:

  1. Add noise to the original image using the forward function.
  2. Denoise the image while preserving key features.
  3. Test different noise levels (\(i_{start}=1, 3, 5, 7, 10, 20\)).
Original Image
image at i_start=1
Noise Level 1
image at i_start=3
Noise Level 1
image at i_start=5
Noise Level 1
image at i_start=7
Noise Level 1
image at i_start=10
Noise Level 1
image at i_start=20
Original Image
Original Image
Original Image
image at i_start=1
Noise Level 1
image at i_start=3
Noise Level 1
image at i_start=5
Noise Level 1
image at i_start=7
Noise Level 1
image at i_start=10
Noise Level 1
image at i_start=20
Original Image
Original Image
Original Image
image at i_start=1
Noise Level 1
image at i_start=3
Noise Level 1
image at i_start=5
Noise Level 1
image at i_start=7
Noise Level 1
image at i_start=10
Noise Level 1
image at i_start=20
Original Image
Original Image

1.7.2 Inpainting

Steps:

  1. Initialize the noisy image and apply a mask.
  2. Replace the masked region with noise and preserve the unmasked region.
  3. Iteratively refine the masked region to generate a complete image.
Original Image
Original Image
Inpainting Result
Mask
Inpainting Result
To Replace
Inpainting Result
To Replace
Original Image
Original Image
Inpainting Result
Mask
Inpainting Result
To Replace
Inpainting Result
To Replace

1.7.3 Text-Conditional Image-to-Image Translation

In this section, specific text prompts are used to guide the image generation process. The noise level controls how much of the original image's features are retained. Steps:

  1. Add noise to the original image using the forward function.
  2. Denoise the image using a text prompt to guide the generation.
  3. Test the effect of different noise levels (\(1, 3, 5, 7, 10, 20\)).
Image at Noise Level 1
Image at Noise Level 1
Image at Noise Level 3
Image at Noise Level 3
Original Image
Image at Noise Level 5
Image at Noise Level 1
Image at Noise Level 7
Image at Noise Level 3
Image at Noise Level 10
Image at Noise Level 3
Image at Noise Level 20
Original Image
Campanile
Image at Noise Level 1
Image at Noise Level 1
Image at Noise Level 3
Image at Noise Level 3
Original Image
Image at Noise Level 5
Image at Noise Level 1
Image at Noise Level 7
Image at Noise Level 3
Image at Noise Level 10
Image at Noise Level 3
Image at Noise Level 20
Original Image
Campanile
Image at Noise Level 1
Image at Noise Level 1
Image at Noise Level 3
Image at Noise Level 3
Original Image
Image at Noise Level 5
Image at Noise Level 1
Image at Noise Level 7
Image at Noise Level 3
Image at Noise Level 10
Image at Noise Level 3
Image at Noise Level 20
Original Image
Campanile

1.8 Visual Anagrams

In this section, we create visual anagrams by averaging noise estimations from two different prompts, one for the image and one for its flipped version.

Key formulas:

  1. Start with a random noisy image.
  2. Apply prompts like "An Oil Painting of an Old Man" and "An Oil Painting of People around a Campfire."
  3. Combine noise estimations to iteratively refine the image.
Hybrid Image: A Skull and a Waterfall
an oil painting of people around a campfire
Hybrid Image
a lithograph of waterfalls
Hybrid Image
a photo of a man
Hybrid Image: A Skull and a Waterfall
an oil painting of an old man
Hybrid Image
a lithograph of a skull
Hybrid Image
a photo of a dog

1.9 Hybrid Images

In this section, we generate hybrid images by combining the low-frequency and high-frequency components of two images based on different prompts.

Key formulas:

Steps:

  1. Generate low-frequency components using Gaussian blur on \( \epsilon_1 \).
  2. Generate high-frequency components by subtracting Gaussian blur from \( \epsilon_2 \).
  3. Combine both components and use the diffusion model to update the image.
  4. Repeat multiple iterations to refine the hybrid image.
Hybrid Image: A Skull and a Waterfall
A Skull and a Waterfall
Hybrid Image
amalfi cost and hipster barista
Hybrid Image
snowy mountain village and people around a campfire