| Property | Linear Ho 2020 |
Cosine Nichol 2021 |
Sigmoid 2022 |
EDM σ Karras 2022 |
Laplace Hang 2024 |
|---|---|---|---|---|---|
| SDE family | VP | VP | VP | VE | VP |
| Param axis | β_t linear |
ᾱ_t cosine² |
ᾱ_t sigmoid |
σ lognormal |
log-SNR Laplace |
| Train distribution | Uniform over T=1000 steps | Uniform over T steps | Uniform over T steps | LogNormal (μ=−1.2, σ=1.2) |
Laplace concentrated at log-SNR≈0 |
| Terminal SNR=0 | ✗ no | ✗ no | ✗ no | ≈ yes | ✗ no |
| ≤64px | too fast | good | fine | good | good |
| 256px | suboptimal | optimal | good | optimal | optimal |
| 1024px+ | under-noises | under-noises | stable | scale σ_max | robust |
| FID CIFAR-10 | ~3.2 (DDPM baseline)[1] | ~2.9 (Improved DDPM)[2] | — | 1.79 (35 NFE)[6] | see ImageNet-256 |
| FID ImageNet-256 | — | 10.85 (baseline)[7] | — | — | 7.96 (−26.6%)[7] |
| Adoption | legacy | widespread | niche | SOTA baseline | emerging |
| Verdict | avoid | use ≤256px | high-res only | new training | ★ best FID |
No standard schedule enforces SNR(T) = 0. At the final timestep, the noised sample still
carries residual signal. Inference starts from pure Gaussian noise — a condition the model was never
trained on.[5][15]
SNR(T) = 0 exactlylog-SNR ≈ 0 — concentrate training mass there.
q(x_t | x_0) = N(x_t ; √ᾱ_t · x_0, (1-ᾱ_t) I)
x_t = √ᾱ_t · x_0 + √(1-ᾱ_t) · ε
ε ~ N(0, I)
f(t) = cos²( π/2 · (t/T + s) / (1+s) )
ᾱ_t = f(t) / f(0)
β_t = 1 - ᾱ_t / ᾱ_{t-1}, clip ≤ 0.999
s = 0.008 ← keeps √β_t < 1/127.5
p(λ) = exp(-|λ - μ| / b) / (2b)
λ = log-SNR
Cauchy variant:
p(λ) = γ / (π · ((λ-μ)² + γ²))
| Symbol | Definition | Role |
|---|---|---|
| β_t | noise variance at step t | Schedule parameter — the thing you actually design |
| ᾱ_t = ∏ α_s | cumulative signal weight | Governs q(x_t|x_0); the forward marginal depends only on this[1] |
| SNR(t) | ᾱ_t / (1−ᾱ_t) | Natural timescale; decreases monotonically T→0. Optimal schedule = SNR curve shape[4] |
| log-SNR(t) | log(ᾱ_t / (1−ᾱ_t)) | Training distribution axis; concentrate mass near 0 for best efficiency[7] |
| σ (EDM) | noise standard deviation | EDM works in σ-space directly; avoids ᾱ discretisation entirely[6] |
dx = -½β(t) x dt + √β(t) dw
The drift term prevents variance from exploding. All ᾱ_t-based schedules belong to this family. Bounded variance throughout the forward process.[3]
Both families admit a reverse-time SDE solvable with the score function
∇_x log p(x_t) — a unified generative mechanism.[3]
Flow matching (SD3, FLUX) sidesteps schedules entirely by learning a velocity field — defaults to a
uniform log-SNR distribution equivalent to a specific schedule choice.[16]
0.002
80.0
7
−1.2
1.2