当前位置:AIGC资讯 > AIGC > 正文

李沐学AI--DALL·E 2 + Diffusion Model

DALL·E 2

CLIP的一半的反过程
clip是 text–> text feature
image --> image fueture
对比找相似的,就能进行分类任务,将给定的图像与给定的text对应起来。

dall e2是
text – > text feature --> image feature -->(扩散模型)–> image
就能进行图像生成了,就是从text到图像。其中image feature是使用CLIP监督的。


DALL·E2原文讨论了五六个它自己的局限性/可能的发展方向,但这不影响它很强 有意思的局限: ![无法很好的理解上下左右的位置关系](https://img-blog.csdnimg.cn/direct/cbc0611955b44b6aaa92867f04409ea2.png) 无法理解 逻辑关系,可能是因为CLIP就是找图片-文本对,找文字描述的具有某些物体所对应的图像,而不能理解上下左右等关系?


图片中的文字没有逻辑。生成的图中的文字是逻辑混乱的。

扩散模型讲解

GAN:

#mermaid-svg-VGmQFOX3mHcQKbI4 {font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}#mermaid-svg-VGmQFOX3mHcQKbI4 .error-icon{fill:#552222;}#mermaid-svg-VGmQFOX3mHcQKbI4 .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-VGmQFOX3mHcQKbI4 .edge-thickness-normal{stroke-width:2px;}#mermaid-svg-VGmQFOX3mHcQKbI4 .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-VGmQFOX3mHcQKbI4 .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-VGmQFOX3mHcQKbI4 .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-VGmQFOX3mHcQKbI4 .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-VGmQFOX3mHcQKbI4 .marker{fill:#333333;stroke:#333333;}#mermaid-svg-VGmQFOX3mHcQKbI4 .marker.cross{stroke:#333333;}#mermaid-svg-VGmQFOX3mHcQKbI4 svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-VGmQFOX3mHcQKbI4 .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-VGmQFOX3mHcQKbI4 .cluster-label text{fill:#333;}#mermaid-svg-VGmQFOX3mHcQKbI4 .cluster-label span{color:#333;}#mermaid-svg-VGmQFOX3mHcQKbI4 .label text,#mermaid-svg-VGmQFOX3mHcQKbI4 span{fill:#333;color:#333;}#mermaid-svg-VGmQFOX3mHcQKbI4 .node rect,#mermaid-svg-VGmQFOX3mHcQKbI4 .node circle,#mermaid-svg-VGmQFOX3mHcQKbI4 .node ellipse,#mermaid-svg-VGmQFOX3mHcQKbI4 .node polygon,#mermaid-svg-VGmQFOX3mHcQKbI4 .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-VGmQFOX3mHcQKbI4 .node .label{text-align:center;}#mermaid-svg-VGmQFOX3mHcQKbI4 .node.clickable{cursor:pointer;}#mermaid-svg-VGmQFOX3mHcQKbI4 .arrowheadPath{fill:#333333;}#mermaid-svg-VGmQFOX3mHcQKbI4 .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-VGmQFOX3mHcQKbI4 .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-VGmQFOX3mHcQKbI4 .edgeLabel{background-color:#e8e8e8;text-align:center;}#mermaid-svg-VGmQFOX3mHcQKbI4 .edgeLabel rect{opacity:0.5;background-color:#e8e8e8;fill:#e8e8e8;}#mermaid-svg-VGmQFOX3mHcQKbI4 .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-VGmQFOX3mHcQKbI4 .cluster text{fill:#333;}#mermaid-svg-VGmQFOX3mHcQKbI4 .cluster span{color:#333;}#mermaid-svg-VGmQFOX3mHcQKbI4 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-VGmQFOX3mHcQKbI4 :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} noise Z 生成器 x' 判别器 x 0/1

训练不够稳定
尽可能的真实,但是多样性不高,主要是来自于噪声
不是一个概率模型(?),生成都是隐式的,不知道数据的内在分布

GAN不是概率模型的解释:
无明确的概率解释:GAN的生成器并不直接建模数据的概率分布,而是通过一个随机噪声向量生成数据,这个过程没有明确的概率解释。
无法进行精确的推断:在概率模型中,你可以根据已知的数据推断出未知的参数。但在GAN中,由于没有明确的概率模型,你无法进行这样的推断。
训练过程与概率无关:GAN的训练过程是一个最小化生成器和判别器之间的“对抗”损失函数的过程,而这个过程与概率无关。

AE: Auto-Encoder

#mermaid-svg-iaPIpoGKqxdpWgc1 {font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}#mermaid-svg-iaPIpoGKqxdpWgc1 .error-icon{fill:#552222;}#mermaid-svg-iaPIpoGKqxdpWgc1 .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-iaPIpoGKqxdpWgc1 .edge-thickness-normal{stroke-width:2px;}#mermaid-svg-iaPIpoGKqxdpWgc1 .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-iaPIpoGKqxdpWgc1 .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-iaPIpoGKqxdpWgc1 .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-iaPIpoGKqxdpWgc1 .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-iaPIpoGKqxdpWgc1 .marker{fill:#333333;stroke:#333333;}#mermaid-svg-iaPIpoGKqxdpWgc1 .marker.cross{stroke:#333333;}#mermaid-svg-iaPIpoGKqxdpWgc1 svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-iaPIpoGKqxdpWgc1 .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-iaPIpoGKqxdpWgc1 .cluster-label text{fill:#333;}#mermaid-svg-iaPIpoGKqxdpWgc1 .cluster-label span{color:#333;}#mermaid-svg-iaPIpoGKqxdpWgc1 .label text,#mermaid-svg-iaPIpoGKqxdpWgc1 span{fill:#333;color:#333;}#mermaid-svg-iaPIpoGKqxdpWgc1 .node rect,#mermaid-svg-iaPIpoGKqxdpWgc1 .node circle,#mermaid-svg-iaPIpoGKqxdpWgc1 .node ellipse,#mermaid-svg-iaPIpoGKqxdpWgc1 .node polygon,#mermaid-svg-iaPIpoGKqxdpWgc1 .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-iaPIpoGKqxdpWgc1 .node .label{text-align:center;}#mermaid-svg-iaPIpoGKqxdpWgc1 .node.clickable{cursor:pointer;}#mermaid-svg-iaPIpoGKqxdpWgc1 .arrowheadPath{fill:#333333;}#mermaid-svg-iaPIpoGKqxdpWgc1 .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-iaPIpoGKqxdpWgc1 .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-iaPIpoGKqxdpWgc1 .edgeLabel{background-color:#e8e8e8;text-align:center;}#mermaid-svg-iaPIpoGKqxdpWgc1 .edgeLabel rect{opacity:0.5;background-color:#e8e8e8;fill:#e8e8e8;}#mermaid-svg-iaPIpoGKqxdpWgc1 .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-iaPIpoGKqxdpWgc1 .cluster text{fill:#333;}#mermaid-svg-iaPIpoGKqxdpWgc1 .cluster span{color:#333;}#mermaid-svg-iaPIpoGKqxdpWgc1 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-iaPIpoGKqxdpWgc1 :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;}

x

encoder

bottle neck

decoder

x'

DAE: Denoising Auto-encoder

#mermaid-svg-huQd2bT3QCMCABlt {font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}#mermaid-svg-huQd2bT3QCMCABlt .error-icon{fill:#552222;}#mermaid-svg-huQd2bT3QCMCABlt .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-huQd2bT3QCMCABlt .edge-thickness-normal{stroke-width:2px;}#mermaid-svg-huQd2bT3QCMCABlt .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-huQd2bT3QCMCABlt .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-huQd2bT3QCMCABlt .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-huQd2bT3QCMCABlt .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-huQd2bT3QCMCABlt .marker{fill:#333333;stroke:#333333;}#mermaid-svg-huQd2bT3QCMCABlt .marker.cross{stroke:#333333;}#mermaid-svg-huQd2bT3QCMCABlt svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-huQd2bT3QCMCABlt .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-huQd2bT3QCMCABlt .cluster-label text{fill:#333;}#mermaid-svg-huQd2bT3QCMCABlt .cluster-label span{color:#333;}#mermaid-svg-huQd2bT3QCMCABlt .label text,#mermaid-svg-huQd2bT3QCMCABlt span{fill:#333;color:#333;}#mermaid-svg-huQd2bT3QCMCABlt .node rect,#mermaid-svg-huQd2bT3QCMCABlt .node circle,#mermaid-svg-huQd2bT3QCMCABlt .node ellipse,#mermaid-svg-huQd2bT3QCMCABlt .node polygon,#mermaid-svg-huQd2bT3QCMCABlt .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-huQd2bT3QCMCABlt .node .label{text-align:center;}#mermaid-svg-huQd2bT3QCMCABlt .node.clickable{cursor:pointer;}#mermaid-svg-huQd2bT3QCMCABlt .arrowheadPath{fill:#333333;}#mermaid-svg-huQd2bT3QCMCABlt .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-huQd2bT3QCMCABlt .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-huQd2bT3QCMCABlt .edgeLabel{background-color:#e8e8e8;text-align:center;}#mermaid-svg-huQd2bT3QCMCABlt .edgeLabel rect{opacity:0.5;background-color:#e8e8e8;fill:#e8e8e8;}#mermaid-svg-huQd2bT3QCMCABlt .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-huQd2bT3QCMCABlt .cluster text{fill:#333;}#mermaid-svg-huQd2bT3QCMCABlt .cluster span{color:#333;}#mermaid-svg-huQd2bT3QCMCABlt div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-huQd2bT3QCMCABlt :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;}

x

xc corrupted x

encoder

bottle neck

decoder

x'

类似的还有MAE:masked auto-encoder
主要就是去学习bottleneck那个特征,拿特征图/向量去做检测分割等任务
但是这个不是随机噪声,是用来重建的一个特征,没办法做生成任务,为什么?

VAE: Variational Auto-encoder

#mermaid-svg-Le5SHFJKDsHwi9FL {font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}#mermaid-svg-Le5SHFJKDsHwi9FL .error-icon{fill:#552222;}#mermaid-svg-Le5SHFJKDsHwi9FL .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-Le5SHFJKDsHwi9FL .edge-thickness-normal{stroke-width:2px;}#mermaid-svg-Le5SHFJKDsHwi9FL .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-Le5SHFJKDsHwi9FL .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-Le5SHFJKDsHwi9FL .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-Le5SHFJKDsHwi9FL .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-Le5SHFJKDsHwi9FL .marker{fill:#333333;stroke:#333333;}#mermaid-svg-Le5SHFJKDsHwi9FL .marker.cross{stroke:#333333;}#mermaid-svg-Le5SHFJKDsHwi9FL svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-Le5SHFJKDsHwi9FL .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-Le5SHFJKDsHwi9FL .cluster-label text{fill:#333;}#mermaid-svg-Le5SHFJKDsHwi9FL .cluster-label span{color:#333;}#mermaid-svg-Le5SHFJKDsHwi9FL .label text,#mermaid-svg-Le5SHFJKDsHwi9FL span{fill:#333;color:#333;}#mermaid-svg-Le5SHFJKDsHwi9FL .node rect,#mermaid-svg-Le5SHFJKDsHwi9FL .node circle,#mermaid-svg-Le5SHFJKDsHwi9FL .node ellipse,#mermaid-svg-Le5SHFJKDsHwi9FL .node polygon,#mermaid-svg-Le5SHFJKDsHwi9FL .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-Le5SHFJKDsHwi9FL .node .label{text-align:center;}#mermaid-svg-Le5SHFJKDsHwi9FL .node.clickable{cursor:pointer;}#mermaid-svg-Le5SHFJKDsHwi9FL .arrowheadPath{fill:#333333;}#mermaid-svg-Le5SHFJKDsHwi9FL .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-Le5SHFJKDsHwi9FL .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-Le5SHFJKDsHwi9FL .edgeLabel{background-color:#e8e8e8;text-align:center;}#mermaid-svg-Le5SHFJKDsHwi9FL .edgeLabel rect{opacity:0.5;background-color:#e8e8e8;fill:#e8e8e8;}#mermaid-svg-Le5SHFJKDsHwi9FL .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-Le5SHFJKDsHwi9FL .cluster text{fill:#333;}#mermaid-svg-Le5SHFJKDsHwi9FL .cluster span{color:#333;}#mermaid-svg-Le5SHFJKDsHwi9FL div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-Le5SHFJKDsHwi9FL :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;}

Decoder

x

Encoder

distribution u, sigma
z = u + det * sigma

z

x'

待仔细研读,怎么将其转化为概率模型的?

VQVAE: Vector Quantised Variational Auto-encoder

#mermaid-svg-29RRBjBMO6EFeL5w {font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}#mermaid-svg-29RRBjBMO6EFeL5w .error-icon{fill:#552222;}#mermaid-svg-29RRBjBMO6EFeL5w .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-29RRBjBMO6EFeL5w .edge-thickness-normal{stroke-width:2px;}#mermaid-svg-29RRBjBMO6EFeL5w .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-29RRBjBMO6EFeL5w .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-29RRBjBMO6EFeL5w .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-29RRBjBMO6EFeL5w .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-29RRBjBMO6EFeL5w .marker{fill:#333333;stroke:#333333;}#mermaid-svg-29RRBjBMO6EFeL5w .marker.cross{stroke:#333333;}#mermaid-svg-29RRBjBMO6EFeL5w svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-29RRBjBMO6EFeL5w .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-29RRBjBMO6EFeL5w .cluster-label text{fill:#333;}#mermaid-svg-29RRBjBMO6EFeL5w .cluster-label span{color:#333;}#mermaid-svg-29RRBjBMO6EFeL5w .label text,#mermaid-svg-29RRBjBMO6EFeL5w span{fill:#333;color:#333;}#mermaid-svg-29RRBjBMO6EFeL5w .node rect,#mermaid-svg-29RRBjBMO6EFeL5w .node circle,#mermaid-svg-29RRBjBMO6EFeL5w .node ellipse,#mermaid-svg-29RRBjBMO6EFeL5w .node polygon,#mermaid-svg-29RRBjBMO6EFeL5w .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-29RRBjBMO6EFeL5w .node .label{text-align:center;}#mermaid-svg-29RRBjBMO6EFeL5w .node.clickable{cursor:pointer;}#mermaid-svg-29RRBjBMO6EFeL5w .arrowheadPath{fill:#333333;}#mermaid-svg-29RRBjBMO6EFeL5w .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-29RRBjBMO6EFeL5w .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-29RRBjBMO6EFeL5w .edgeLabel{background-color:#e8e8e8;text-align:center;}#mermaid-svg-29RRBjBMO6EFeL5w .edgeLabel rect{opacity:0.5;background-color:#e8e8e8;fill:#e8e8e8;}#mermaid-svg-29RRBjBMO6EFeL5w .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-29RRBjBMO6EFeL5w .cluster text{fill:#333;}#mermaid-svg-29RRBjBMO6EFeL5w .cluster span{color:#333;}#mermaid-svg-29RRBjBMO6EFeL5w div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-29RRBjBMO6EFeL5w :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;}

x

e

f

z

fa

Decoder

x'

Diffusion Model

#mermaid-svg-dqIxaKHsqRdQIZ6A {font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}#mermaid-svg-dqIxaKHsqRdQIZ6A .error-icon{fill:#552222;}#mermaid-svg-dqIxaKHsqRdQIZ6A .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-dqIxaKHsqRdQIZ6A .edge-thickness-normal{stroke-width:2px;}#mermaid-svg-dqIxaKHsqRdQIZ6A .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-dqIxaKHsqRdQIZ6A .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-dqIxaKHsqRdQIZ6A .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-dqIxaKHsqRdQIZ6A .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-dqIxaKHsqRdQIZ6A .marker{fill:#333333;stroke:#333333;}#mermaid-svg-dqIxaKHsqRdQIZ6A .marker.cross{stroke:#333333;}#mermaid-svg-dqIxaKHsqRdQIZ6A svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-dqIxaKHsqRdQIZ6A .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-dqIxaKHsqRdQIZ6A .cluster-label text{fill:#333;}#mermaid-svg-dqIxaKHsqRdQIZ6A .cluster-label span{color:#333;}#mermaid-svg-dqIxaKHsqRdQIZ6A .label text,#mermaid-svg-dqIxaKHsqRdQIZ6A span{fill:#333;color:#333;}#mermaid-svg-dqIxaKHsqRdQIZ6A .node rect,#mermaid-svg-dqIxaKHsqRdQIZ6A .node circle,#mermaid-svg-dqIxaKHsqRdQIZ6A .node ellipse,#mermaid-svg-dqIxaKHsqRdQIZ6A .node polygon,#mermaid-svg-dqIxaKHsqRdQIZ6A .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-dqIxaKHsqRdQIZ6A .node .label{text-align:center;}#mermaid-svg-dqIxaKHsqRdQIZ6A .node.clickable{cursor:pointer;}#mermaid-svg-dqIxaKHsqRdQIZ6A .arrowheadPath{fill:#333333;}#mermaid-svg-dqIxaKHsqRdQIZ6A .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-dqIxaKHsqRdQIZ6A .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-dqIxaKHsqRdQIZ6A .edgeLabel{background-color:#e8e8e8;text-align:center;}#mermaid-svg-dqIxaKHsqRdQIZ6A .edgeLabel rect{opacity:0.5;background-color:#e8e8e8;fill:#e8e8e8;}#mermaid-svg-dqIxaKHsqRdQIZ6A .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-dqIxaKHsqRdQIZ6A .cluster text{fill:#333;}#mermaid-svg-dqIxaKHsqRdQIZ6A .cluster span{color:#333;}#mermaid-svg-dqIxaKHsqRdQIZ6A div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-dqIxaKHsqRdQIZ6A :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;}

x

x1

..

xi

...

x' 纯噪声

再从噪声恢复回去,就是图像生成。

DDPM 思想类似resnet,预测噪声而不是每一步中的图像

扩散模型的分数是
inception score:
IS score:
FID score:

improved DDPM

Diffusion Model Beats GAN

DALL·E2

更新时间 2024-02-28