Part 5A 0

Generating Images based on Text Prompts

We observe that as the number of inference steps increases, the images get more and more detailed. For very low inference steps (i.e. 4), we see that the image is still just noise. Only the rocket is discernable. As num_inference_steps increases, not only does the prompt become recognizable in the image, but the image also becomes sharper/more detailed. Hallucinations in the images (in the form of mistaken shades/colors) also disappear.

num_inference_steps=4

an oil painting of a snowy mountain village

a man wearing a hat

a rocket ship

num_inference_steps=6

an oil painting of a snowy mountain village

a man wearing a hat

a rocket ship

num_inference_steps=10

an oil painting of a snowy mountain village

a man wearing a hat

a rocket ship

num_inference_steps=15

an oil painting of a snowy mountain village

a man wearing a hat

a rocket ship

num_inference_steps=20

an oil painting of a snowy mountain village

a man wearing a hat

a rocket ship

num_inference_steps=40

an oil painting of a snowy mountain village

a man wearing a hat

a rocket ship

num_inference_steps=100

an oil painting of a snowy mountain village

a man wearing a hat

a rocket ship