2.793

2018影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

深度生成模型综述

胡铭菲 左信 刘建伟

胡铭菲, 左信, 刘建伟. 深度生成模型综述. 自动化学报, 2020, 41(x): 1−34 doi: 10.16383/j.aas.c190866
引用本文: 胡铭菲, 左信, 刘建伟. 深度生成模型综述. 自动化学报, 2020, 41(x): 1−34 doi: 10.16383/j.aas.c190866
Hu Ming-Fei, Zuo Xin, Liu Jian-Wei. Survey on deep generative model. Acta Automatica Sinica, 2020, 41(x): 1−34 doi: 10.16383/j.aas.c190866
Citation: Hu Ming-Fei, Zuo Xin, Liu Jian-Wei. Survey on deep generative model. Acta Automatica Sinica, 2020, 41(x): 1−34 doi: 10.16383/j.aas.c190866

深度生成模型综述

doi: 10.16383/j.aas.c190866
基金项目: 国家重点研发计划项目(2016YFC0303703)资助
详细信息
    作者简介:

    胡铭菲:中国石油大学 (北京) 自动化系博士研究生. 主要研究方向为模式识别. E-mail: hmfzsy@gmail.com

    左信:中国石油大学 (北京) 自动化系教授. 主要研究领域为智能控制. E-mail: zuox@cup.edu.cn

    刘建伟:中国石油大学 (北京) 自动化系副研究员. 主要研究方向为模式识别与智能系统, 先进控制. E-mail: liujw@cup.edu.cn

Survey on Deep Generative Model

Funds: Supported by National Key Research and Development Program of China (2016YFC0303703)
  • 摘要: 通过学习可观测数据的概率密度而随机生成样本的生成模型在近年来受到人们的广泛关注, 网络结构中包含多个隐藏层的深度生成式模型以更出色的生成能力成为研究热点, 深度生成模型在计算机视觉、密度估计、自然语言和语音识别、半监督学习等领域得到成功应用, 并给无监督学习提供了良好的范式. 本文根据深度生成模型处理似然函数的不同方法将模型分为三类: 第一类方法是近似方法, 包括采用抽样方法近似计算似然函数的受限玻尔兹曼机和以受限玻尔兹曼机为基础模块的深度置信网络、深度玻尔兹曼机和亥姆霍兹机, 与之对应的另一种模型是直接优化似然函数变分下界的变分自编码器以及其重要的改进模型, 包括重要性加权自编码和可用于半监督学习的深度辅助深度模型; 第二类方法是避开求极大似然过程的隐式方法, 其代表模型是通过生成器和判别器之间的对抗行为来优化模型参数从而巧妙避开求解似然函数的生成对抗网络以及重要的改进模型, 包括WGAN、深度卷积生成对抗网络和当前最顶级的深度生成模型BigGAN; 第三类方法是对似然函数进行适当变形的流模型和自回归模型, 流模型利用可逆函数构造似然函数后直接优化模型参数, 包括以NICE为基础的常规流模型、变分流模型和可逆残差网络(i-ResNet), 自回归模型(NADE)将目标函数分解为条件概率乘积的形式, 包括神经自回归密度估计(NADE)、像素循环神经网络(PixelRNN)、掩码自编码器(MADE)以及WaveNet等. 详细描述上述模型的原理和结构以及模型变形后, 阐述各个模型的研究进展和应用, 最后对深度生成式模型进行展望和总结.
  • 图  1  深度生成模型分类

    Fig.  1  Deep generative models classification

    图  2  受限玻尔兹曼机

    Fig.  2  Restricted Boltzmann machines

    图  3  深度置信网络结构

    Fig.  3  The structure of deep belief networks

    图  4  两种贪恋逐层学习算法

    Fig.  4  Two kinds of greedy layer-wise pre-training

    图  5  亥姆霍兹机

    Fig.  5  Helmholtz Machine

    图  6  深度玻尔兹曼机

    Fig.  6  Deep Boltzmann machines

    图  7  VAE结构图

    Fig.  7  The structure of VAE

    图  8  VAE训练流程

    Fig.  8  The training process of VAE

    图  9  深度辅助生成模型

    Fig.  9  Auxiliary deep generative models

    图  10  对抗自编码器

    Fig.  10  Adversarial Autoencoders

    图  11  GAN模型结构

    Fig.  11  The structure of GANs

    图  12  DCGAN结构

    Fig.  12  The structure of DCGANs

    图  13  ResNet-GAN结构

    Fig.  13  The structure of ResNet-GANs

    图  14  CGAN和ACGAN结构

    Fig.  14  The structure of CGANs and ACGANs

    图  15  加性耦合层结构

    Fig.  15  The structure of aditive couping

    图  16  维数混合结构

    Fig.  16  The structure of hybrid dimensions

    图  17  仿射耦合层结构

    Fig.  17  The structure of affine coupling layer

    图  18  随机混合结构

    Fig.  18  The structure of random mixing

    图  19  仿射耦合层的组合策略

    Fig.  19  Composition schemes for affine coupling layers

    图  20  GLOW的层结构

    Fig.  20  The structure of layers in GLOW

    图  21  IAF第一层结构

    Fig.  21  The structure of the first layer in IAF

    图  22  IAF其余层结构

    Fig.  22  The structure of other layers in IAF

    表  1  基于RBM的模型

    Table  1  RBM based models

    方法名称改进方式改进目的核心方法
    rtRBM训练算法提高模型性能改进回火RBM, 加入循环机制
    ReLU-RBM激活函数改善训练效果将线性修正单元引入到RBM中
    3-Order RBM模型结构提高模型性能将可见单元和隐单元分解成三元交互隐单元控制可见单元协方差和阈值
    PGBM模型结构结构扩展在RBM中使用门控单元用于特征选择
    RBM-SVM模型结构提高模型性能上层RBM用于特征提取下层SVM进行回归
    RNN-RBM模型结构结构扩展RBM与循环网络结合
    apRBM模型结构结构扩展构造层权重之间的确定性函数
    cRBM模型结构实现监督学习将自回归结构和标签信息应用到RBM
    Factored- cRBM模型结构提高模型性能将三元交互方法用在条件RBM中
    Gaussian-Bernoulli RBM数据类型将RBM推广到实值可见单元为参数化高斯分布, 隐藏单元为参数化伯努利分布
    mcRBM模型结构捕获同层神经元之间的关系在隐藏层中添加协方差单元对条件协方差结构建模
    ssRBM模型结构捕获同层神经元之间的关系使用辅助实值变量编码条件协方差
    mPoT模型结构捕获同层神经元之间的关系添加非零高斯均值的隐变量条件分布为条件独立的Gamma分布
    fBMMI-DBN训练算法改进预训练算法用梅尔频率倒谱系数训练DBN产生特征以预测HMM状态上的后验分布
    CDBN模型结构结构扩展DBN与卷积结构结合
    3-Order DBN模型结构提高模型性能将三元交互方法用在DBN中
    fsDBN训练算法提高模型性能用连续判别训练准则优化权值、状态变换参数和语言模型分数
    DBN-HMM模型结构提高模型性能DBN与隐马尔科夫模型结合
    CAST训练算法改进训练算法将自适应算法和MCMC结合训练DBN
    Trans-SAP训练算法改进训练算法将回火算法和MCMC结合训练DBN
    aiDBM训练算法改进训练算法提出一种近似推断算法, 用单独的识别模型加速DBN训练速度
    centered DBM训练算法改进训练算法通过重参数化模型使开始学习时代价函数的Hessian具有更好的条件数
    MP-DBM训练算法改进训练算法允许反向传播算法, 避免MCMC估计梯度带来的训练问题
    CDBM模型结构结构扩展DBM与卷积结构结合
    下载: 导出CSV

    表  2  重要的VAE模型

    Table  2  Important VAE models

    方法名称主要贡献核心方法
    CVAE使VAE实现监督学习在输入数据中加入one-hot向量用于表示标签信息
    ADGM提高CVAE处理标签信息的能力在VAE中同时引入标签信息和辅助变量用五个神经
    网络构造各变量之间的关系
    kg-CVAE提高生成样本的多样性在ADGM上引入额外损失(bag-of-words loss)使隐变量
    包含单词出现概率的信息
    hybrid-CVAE用CVAE建立鲁棒的结构化预测算法输入中加入噪声、使用随机前馈推断构造带有随机高斯
    网络的混合变分下界: $L(x) = \alpha {L_{{\rm{CVAE}}}} + (1 - \alpha ){L_{{\rm{GSNN}}}}$
    SSVAE使VAE实现半监督学习构造两个模型: M2为半监督模型M1模型为VAE用于提升M2的能力
    IMVAE提高SSVAE处理混合信息的能力用非参数贝叶斯方法构造无限混合模型混合系数由Dirichlet过程获得
    AAE使模型可以学习出后验分布构造聚合的伪先验分布匹配真实分布在隐变量处
    附加一个对抗网络学习伪先验分布
    ARAE使AAE能够处理离散结构编码器和解码器采用循环神经网络里变分下界中添加额外的正则项
    IWAE使后验分布的假设更符合真实后验分布构造比VAE更紧的变分下界形式, 通过弱化变分下界
    中编码器的作用提升变分推断的能力
    DC-IGN保留图片样本中的局部相关性用卷积层和池化层替代原来的全连接网络
    infoVAE提高隐变量和可观测变量之间的互信息,
    使近似后验更逼近真实后验分布
    在变分下界中引入互信息: $\alpha {I_q}(x)$
    β-VAE从原始数据中获取解开纠缠的可解释隐表示在变分下界中添加正则系数: $L(x) = {{\rm{E}}_{Q(z\left| x \right.)}}(\log P(x|z)) - \beta {D_{{\rm{KL}}}}(Q(z\left| x \right.)||P(z))$
    β-TCVAE解释β-VAE能够解开纠缠的原因并提升模型性能在β-VAE变分下界中引入互信息和额外正则项: $ - {I_q}(z)$$ - {D_{{\rm{KL}}}}(Q(x)||P(x))$
    HFVAE使VAE对离散变量解开纠缠总结主流VAE的变分下界对变分下界分解成四项并逐一解释作用: $\begin{aligned} L(x) =& { {\rm{E} }_{Q(z\left| x \right.)} }[\log { {(P(x|z)} / {P(x)} }) - \log { {(Q(z|x)} / {Q(z)} })] -\\& {D_{ {\rm{KL} } } }(Q(z)||P(z)) - {D_{ {\rm{KL} } } }(Q(x)||P(z)) \end{aligned}$
    DRAM处理时间序列样本在VAE框架中引入注意力机制和长短时记忆网络结构
    MMD-VAE用最大平均差异替换KL散度将变分下界中的KL散度项替换成: ${D_{{\rm{MMD}}}}(Q(x)||P(x))$
    HVI使用精度更高的抽样法替代重参数方法用Hamiltonian Monte Carlo抽样替换重参数化方法直接
    对后验分布抽样以获得更精确的后验近似
    VFAE学习敏感或异常数据时使隐变量保留更多的信息在变分下界中附加基于最大平均差异的惩罚项:
    $\sqrt {2/D} \cos (\sqrt {2/r} xW + b)$
    LVAE逐层、递归的修正隐变量的分布, 使变分下界更紧利用多层的隐变量逐层构造更复杂的分布在变分下界中使用预热法
    wd-VAE解决输入缺失词情况下的语言生成将输入文本转换成UNK格式并进行dropout操作
    使解码器的RNN更依赖隐变量表示
    VLAE用流模型学习出更准确的后验分布用流模型学习的后验分布替代高斯分布, 根据循环
    网络学到的全局表示抛弃无关信息
    PixelVAE捕获样本元素间的关系以生成更清晰锐利的图片样本将隐变量转成卷积结构, 解码器使用PixelCNNCNN
    只需要很少几层, 压缩了计算量
    DCVAE通过调整卷积核的宽度改善解码器理解编码器信息的能力在解码器中使用扩张卷积加大感受野对上下
    文容量与有效的编码信息进行权衡
    MSVAE用双层解码器提高模型生成高清图像的能力第一层解码器生成粗略的样本第二层解码器使用
    残差方法和跳跃连接的超分模型将模糊样本作为输入生成高清样本
    下载: 导出CSV

    表  3  重要的GAN模型

    Table  3  Important GANs

    模型名称核心方法生成图片类型生成最高分辨率
    CGAN将标签信息作为附加信息输入到生成器中再与生成样本一起输入到判别器中MNIST$28 \times 28$
    DCGAN在多种结构中筛选出最优的一组生成器和判别器生成器和判别器均使用深度卷积网络LSUN
    FACES
    ImageNet-1k
    $32 \times 32$
    VAE-GAN在VAE结构外嵌套GAN的框架, 用GAN中的判别器学习VAE的两个分布间的相似程度CelebA
    LFW
    $64 \times 64$
    BiGAN生成器是输入输出不相关的编码器和解码器判别器同时输入样本和隐变量判断两者来自编码器还是解码器MNIST
    ImageNet
    $64 \times 64$
    CoGAN在实现风格转换学习时, 为了让两个编码器的输出尽量接近, 共享两者的最后几层参数MNIST
    CelebA
    $64 \times 64$
    Info-GAN将噪声$z$拆分成子向量$c$$z'$子向量$c$用于调节输出的类别和形状等条件信息用额外的判别器判定生成样本的子向量$c$MNIST
    SVHN
    $64 \times 64$
    LSGAN使用最小二乘损失函数最小二乘可以将图像的分布尽可能接近决策边界LSUN
    HWDB
    $64 \times 64$
    WGAN从理论上分析GAN训练不稳定的原因通过使用Wasserstein距离等方法提高了训练稳定性LSUN$64 \times 64$
    f-GAN证明了任意散度都适用于GAN框架MNIST
    LSUN
    $96 \times 96$
    LAPGAN基于拉普拉斯金字塔结构逐层增加样本分辨率上层高分图像的生成以下层低分图像为条件CIFAR10
    LSUN
    STL
    $96 \times 96$
    WGAN-GP将判别器的梯度作为正则项加入到判别器的损失函数中ImageNet
    CIFAR10
    LSUN
    $128 \times 128$
    SNGAN使用谱归一化代替梯度惩罚CIFAR10
    STL10
    ImageNet
    $128 \times 128$
    Improved-DCGAN使用多种方法对DCGAN的稳定性和生成效果进一步加强MNIST
    CIFAR10
    SVHN
    ImageNet
    $128 \times 128$
    EBGAN将判别器的功能改为鉴别输入图像重构性的高低, 生成器可以在刚开始训练时获得较大的能力驱动(energy based)并在短期内获得效果不错的生成器MNIST
    LSUN
    CelebA
    ImageNet
    $128 \times 128$
    BEGAN判别器为自编码结构, 用于估计分布之间的误差分布提出使用权衡样本多样性和质量的超参数CelebA$128 \times 128$
    ACGAN每个样本都有类标签类标签同时输入到生成器和判别器中ImageNet
    CIFAR10
    $128 \times 128$
    SAGAN用自注意力机制代替卷积层进行特征提取ImageNet$128 \times 128$
    SRGAN生成器用低分图像生成高分图像判别器判断图像是生成器生成的还是真实图像
    StackGAN第一阶段使用CGAN生成$64 \times 64$的低分图像第二阶段以低分图像和文本为输入, 用另一个GAN生成高分图像CUB
    Oxford-102
    COCO
    $256 \times 256$
    StackGAN++在StackGAN的基础上用多个生成器生成不同尺度的图像, 每个尺度有相应的判别器引入非条件损失和色彩正则化项CUB
    Oxford-102
    COCO
    $256 \times 256$
    Cycle-GAN由两个对称的GAN构成的环形网络两个GAN共享两个生成器, 各自使用单独的判别器Cityscapes label$256 \times 256$
    Star-GAN为了实现多个领域的转换引入域的控制信息判别器需要额外判断真实样本来自哪个域CelebA
    RaFD
    $256 \times 256$
    BigGAN训练时增加批次数量和通道数让权重矩阵为正交矩阵, 降低权重系数的相互干扰ImageNet
    JFT-300M
    $512 \times 512$
    PGGAN网络结构可以随着训练进行逐渐加深使用浅层网络训练好低分图像后加深网络深度训练分辨率更高的图像CelebA
    LSUN
    $1024 \times 1024$
    Style-GAN在PGGAN的基础上增加映射网络、样式模块增加随机变换、样式混合等功能块使用新的权重截断技巧FHHQ$1024 \times 1024$
    下载: 导出CSV
  • [1] Smolensky P. Information processing in dynamical systems: foundations of harmony theory. Colorado Univ at Boulder Dept of Computer Science, 1986.194−281.
    [2] Kingma D, Welling M. Auto-encoding variational bayes. arXiv: 1312.6114, 2013.
    [3] Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets. In: Proceedings of Neural Information Processing Systems. 2014: 2672−2680.
    [4] Bengio Y, Thibodeau-Laufer, Éric, Alain G, et al. Deep generative stochastic networks trainable by backprop. In: Proceedings of International Conference on Machine Learning. 2014.226−234.
    [5] Dinh L, Krueger D, Bengio Y. NICE: Non-linear Independent Components Estimation. Computer Science, 2014.
    [6] Larochelle H, Murray I. The neural autoregressive distribution estimator. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics. 2011.29−37.
    [7] Salakhutdinov R. Learning Deep generative models. Annual Review of Statistics & Its Application, 2015, 2(1): 361−385
    [8] 刘建伟, 刘媛, 罗雄麟. 玻尔兹曼机研究进展. 计算机研究与发展, 2014, 51(1): 1−16 doi: 10.7544/issn1000-1239.2014.20121044

    Liu Jian-Wei, Liu Yuan, Luo XiongLin. Research process of Boltzmann machine. Journal of Computer Research and Development, 2014, 51(1): 1−16 doi: 10.7544/issn1000-1239.2014.20121044
    [9] Hinton G E. Training Products of Experts by Minimizing Contrastive divergence. Neural computation: MIT Press, 2002.1771−1800.
    [10] Carreira-Perpinan M A, Hinton G E. On contrastive divergence learning. In: Proceedings of International Conference on Artificial Intelligence and Statistics. 2005.33−40.
    [11] Bengio Y, Delalleau O. Justifying and Generalizing Contrastive Divergence. Neural computation: MIT Press, 2009.1601−1621.
    [12] Cho K H, Raiko T, Ilin A. Parallel tempering is efficient for learning restricted Boltzmann machines. In: Proceedings of International Joint Conference on Neural Networks. IEEE, 2012.1−8.
    [13] Tieleman T, Hinton G E. Using fast weights to improve persistent contrastive divergence. In: Proceedings of International Conference on Machine Learning. Montreal, Quebec, Canada: DBLP, 2009.14−18.
    [14] Hyvärinen A. Some extensions of score matching. Computational statistics & data analysis, 2007, 51(5): 2499−2512
    [15] Hyvärinen A, Dayan P. Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research, 2005, 6(4): 695−709
    [16] Gutmann M, Hyvarinen A. Noise-contrastive estimation: a new estimation principle for unnormalized statistical models. In: Proceedings of International Conference on Artificial Intelligence and Statistics. 2010.297−304.
    [17] Jarzynski C. Nonequilibrium equality for free energy differences. Physical Review Letters, 1997, 78(14): 2690−2693 doi: 10.1103/PhysRevLett.78.2690
    [18] Hinton G E, Salakhutdinov R. Replicated softmax: an undirected topic model. In: Proceedings of Neural Information Processing Systems. 2009.1607−1614.
    [19] Montufar G, Rauh J, Ay N. Expressive power and approximation errors of restricted boltzmann machines. In: Proceedings of Neural Information Processing Systems. 1998.415−423.
    [20] Sutskever I, Hinton G, Taylor G. The recurrent temporal restricted boltzmann machine. In: Proceedings of Neural Information Processing Systems. 2008.1601−1608.
    [21] Nair V, Hinton G E. Rectified linear units improve restricted boltzmann machines. In: Proceedings of International Conference on Machine Learning. 2010.807−814.
    [22] Krizhevsky A, Hinton G. Factored 3-way restricted boltzmann machines for modeling natural images. In: Proceedings of International Conference on Artificial Intelligence and Statistics. 2010.621−628.
    [23] Hinton G E, Osindero S, Teh Y W. A fast learning algorithm for deep belief nets. Neural Compuper, 2006, 18(7): 1527−1554 doi: 10.1162/neco.2006.18.7.1527
    [24] Taylor G W, Hinton G E, Roweis S. Modeling human motion using binary latent variables. In: Proceedings of Neural Information Processing Systems. 2006.1345−1352.
    [25] Dayan P, Hinton G E, Neal R M, Zemel R. The Helmholtz machine. Neural computation, 1995, 7(5): 889−904 doi: 10.1162/neco.1995.7.5.889
    [26] Hinton G E, Dayan P, Frey B J, Neal R M. The "wake-sleep" algorithm for unsupervised neural networks. Science, 1995, 268(5214): 1158−1161 doi: 10.1126/science.7761831
    [27] Mohamed A R, Yu D, Deng L. Investigation of full-sequence training of deep belief networks for speech recognition. In: Proceedings of International Speech Communication Association. 2010.2846−2849.
    [28] Dahl G E, Yu D, Deng L, Acero A. Large vocabulary continuous speech recognition with context-dependent DBN-HMMS. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing. 2011.4688−4691.
    [29] Hinton G E, Salakhutdinov R. Using deep belief nets to learn covariance kernels for Gaussian processes. In: Proceedings of Neural Information Processing Systems. 2008: 1249−1256.
    [30] Hinton G E, Salakhutdinov R. Reducing the dimensionality of data with neural networks. Science, 2006, 313(5786): 504−507 doi: 10.1126/science.1127647
    [31] Lee H, Grosse R, Ranganath R, Ng A Y. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: Proceedings of Annual International Conference on Machine Learning. 2009.609−616.
    [32] Salakhutdinov R, Hinton G E. Deep Boltzmann machines. In: Proceedings of International Conference on Artificial Intelligence and Statistics. 2009.448−455.
    [33] Montavon G, Müller K R. Deep Boltzmann Machines and the Centering Trick. Neural Networks: Tricks of the Trade, 2012.621−637.
    [34] Melchior J, Fischer A, Wiskott L. How to center deep Boltzmann machines. The Journal of Machine Learning Research, 2016, 17(1): 3387−3447
    [35] Goodfellow I, Mirza M, Courville A, Bengio Y. Multi-prediction deep Boltzmann machines. In: Proceedings of Neural Information Processing Systems. 2013.548−556.
    [36] Salakhutdinov R. Learning in Markov random fields using tempered transitions. In: Proceedings of Neural Information Processing Systems. 2009.1598−1606.
    [37] Paisley J, Blei D, Jordan M. Variational Bayesian inference with stochastic search. arXiv: 1206.6430, 2012.
    [38] Theis L, Oord A, Bethge M. A note on the evaluation of generative models. arXiv: 1511.01844, 2015.
    [39] Burda Y, Grosse R, Salakhutdinov R. Importance weighted autoencoders. arXiv: 1509.00519, 2015.
    [40] Maaløe L, Sønderby C K, Sønderby S K, Winther O. Auxiliary deep generative models. arXiv: 1602.05473, 2016.
    [41] Kingma D P, Mohamed S, Rezende D J, Welling M. Semi-supervised learning with deep generative models. In: Proceedings of Neural Information Processing Systems. 2014.3581−3589.
    [42] Abbasnejad M E, Dick A, Hengel A. Infinite variational autoencoder for semi-supervised learning. In: Proceedings of Computer Vision and Pattern Recpgnition. 2017.781−790.
    [43] Kulkarni T D, Whitney W, Kohli P, Tenenbaum J. Deep convolutional inverse graphics network. In: Proceedings of Neural Information Processing Systems. 2015.2539−2547.
    [44] Makhzani A, Shlens J, Jaitly N, Goodfellow I, Frey B. Adversarial autoencoders. arXiv: 1511.05644, 2015.
    [45] Zhao S, Song J, Ermon S. Infovae: Information maximizing variational autoencoders. arXiv: 1706.02262, 2017.
    [46] Higgins I, Matthey L, Pal A, et al. beta-vae: Learning basic visual concepts with a constrained variational framework. In: Proceedings of International Conference on Learning Representations. 2016.
    [47] Sønderby C K, Raiko T, Maaløe L, Sønderby S K, Winther O. Ladder variational autoencoders. In: Proceedings of Neural Information Processing Systems. 2016.3738−3746.
    [48] Cai L, Gao H, Ji S. Multi-stage variational auto-encoders for coarse-to-fine image generation. arXiv: 1705.07202, 2017.
    [49] Arjovsky M, Bottou L. Towards principled methods for training generative adversarial networks. arXiv: 1701.04862, 2017.
    [50] Huszár F. How (not) to train your generative model: scheduled sampling, likelihood, adversary? arXiv: 1511.05101, 2015.
    [51] Arjovsky M, Chintala S, Bottou L. Wasserstein GAN. arXiv: 1701.07875, 2017.
    [52] Nowozin S, Cseke B, Tomioka R. f-GAN: Training generative neural samplers using variational divergence minimization. In: Proceedings of Neural Information Processing Systems. 2016.271−279.
    [53] Gulrajani I, Ahmed F, Arjovsky M, Dumonlin V, Courville A C. Improved training of wasserstein GANs. In: Proceedings of Neural Information Processing Systems. 2017.5767−5777.
    [54] Miyato T, Kataoka T, Koyama M, Yoshida Y. Spectral normalization for generative adversarial networks. In: Proceedings of International Conference on Learning Representations. 2018.
    [55] Radford A, Metz L, Chintala S. Unsupervised representation learning with deep convolutional generative adversarial networks. In: Proceedings of International Conference on Learning Representations. 2015.
    [56] Karras T, Aila T, Laine S, Lehtinen J. Progressive growing of GANs for improved quality, Stability, and Variation. In: Proceedings of International Conference on Learning Representations. 2017.
    [57] Brock A, Donahue J, Simonyan K. Large scale GAN training for high fidelity natural image synthesis. In: Proceedings of International Conference on Learning Representations. 2019.
    [58] Mirza M, Osindero S. Conditional generative adversarial nets. arXiv: 1411.1784, 2014.
    [59] Odena A, Olah C, Shlens J. Conditional image synthesis with auxiliary classifier GANs. In: Proceedings of International Conference on Machine Learning. 2017.2642−2651.
    [60] Sricharan K, Bala R, Shreve M, Ding H, Saketh K, Sun J. Semi-supervised conditional GANs. arXiv: 1708.05789, 2017.
    [61] Zhang H, Xu T, Li H. StackGAN: Text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of International Conference on Computer Vision. 2016.5908−5916.
    [62] Zhang H, Xu T, Li H, et al. StackGAN++: Realistic image synthesis with stacked generative adversarial networks. Pattern Analysis and Machine Intelligence, 2018, 41(8): 1947−1962
    [63] Dinh L, Sohl-Dickstein J, Bengio S. Density estimation using Real NVP. In: Proceedings of International Conference on Learning Representations. 2016.
    [64] Kingma D P, Dhariwal P. Glow: Generative flow with invertible 1x1 convolutions. In: Proceedings of Neural Information Processing Systems. 2018.10235−10244.
    [65] Behrmann J, Grathwohl W, Chen R, Duvenaud D, Jacobsen J H. Invertible residual networks. In: Proceedings of International Conference on Machine Learning. 2019.573−582.
    [66] Rezende D J, Mohamed S. Variational inference with normalizing flows. In: Proceedings of International Conference on Machine Learning. 2015.1530−1538.
    [67] Kingma D P, Salimans T, Jozefowicz R, Chen X, Sutskever I, Welling M. Improving variational inference with inverse autoregressive flow. In: Proceedings of Neural Information Processing Systems. 2016.4743−4751.
    [68] Papamakarios G, Murray I, Pavlakou T. Masked autoregressive flow for density estimation. In: Proceedings of Neural Information Processing Systems. 2017.2338−2347.
    [69] Frey B J, Brendan J F, Frey B J. Graphical Models for Machine Learning and Digital Communication. MIT press, 1998.
    [70] Bengio S, Bengio Y. Taking on the curse of dimensionality in joint distributions using neural networks. IEEE Transactions on Neural Networks, 2000, 11(3): 550−557 doi: 10.1109/72.846725
    [71] Neal R M. Connectionist learning of belief networks. Artificial intelligence. 1992, 56(1): 71−113.
    [72] Bengio Y. Discussion of "the neural autoregressive distribution estimator". In: Proceedings of International Conference on Artificial Intelligence and Statistics. 2011.431−439.
    [73] Raiko T, Li Y, Cho K, Bengio Y. Iterative neural autoregressive distribution estimator nade-k. In: Proceedings of Neural Information Processing Systems. 2014.325−333.
    [74] Reed S, Oord A, Kalchbrenner N, et al. Parallel multiscale autoregressive density estimation. arXiv: 1703.03664, 2017.
    [75] Uria B, Murray I, Larochelle H. A deep and tractable density estimator. In: Proceedings of International Conference on Machine Learning. 2014.467−475.
    [76] Uria B, Côté M A, Gregor K, Murray I, Larochelle H. Neural autoregressive distribution estimation. The Journal of Machine Learning Research, 2016, 17(1): 7184−7220
    [77] Oord A, Kalchbrenner N, Kavukcuoglu K. Pixel recurrent neural networks. arXiv: 1601.06759, 2016.
    [78] Germain M, Gregor K, Murray I, et al. Made: Masked autoencoder for distribution estimation. In: Proceedings of International Conference on Machine Learning. 2015.881−889.
    [79] Papamakarios G, Murray I, Pavlakou T. Masked autoregressive flow for density estimation. In: Proceedings of Neural Information Processing Systems. 2017.2338−2347.
    [80] Socher R, Huang E H, Pennin J, Andrew Ng, Manning C. Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In: Proceedings of Neural Information Processing Systems. 2011.801−809.
    [81] Socher R, Pennington J, Huang E H, Andrew Ng, Manning C. Semi-supervised recursive autoencoders for predicting sentiment distributions. In: Proceedings of the 2011 conference on empirical methods in natural language processing. 2011.151−161.
    [82] Gretton A, Borgwardt K M, Rasch M J, Schölkopf B, Smola A. A kernel two-sample test. Journal of Machine Learning Research, 2012, 13(5): 723−773
    [83] Dziugaite G K, Roy D M, Ghahramani Z. Training generative neural networks via maximum mean discrepancy optimization. arXiv: 1505.03906, 2015.
    [84] Li C L, Chang W C, Cheng Y, Yang Y, Poczos B. MMD-GAN: Towards deeper understanding of moment matching network. In: Proceedings of Neural Information Processing Systems. 2017.2203−2213.
    [85] Ren Y, Zhu J, Li J, Zhu J. Conditional generative moment-matching networks. In: Proceedings of Neural Information Processing Systems. 2016.2928−2936.
    [86] Bengio Y, Yao L, Alain G, Vincent P. Generalized denoising auto-encoders as generative models. In: Proceedings of Neural Information Processing Systems. 2013.899−907.
    [87] Rezende D J, Mohamed S, Wierstra D. Stochastic backpropagation and approximate inference in deep generative models. In: Proceedings of International Conference on Machine Learning. 2014.
    [88] Zöhrer M, Pernkopf F. General stochastic networks for classification. In: Proceedings of Neural Information Processing Systems. 2014.2015−2023.
    [89] Chen X, Duan Y, Houthooft R, Schulman J, Sutskever I, Abbeel P. InfoGAN: Interpretable representation learning by information maximizing generative adversarial nets. In: Proceedings of Neural Information Processing Systems. 2016.2172−2180.
    [90] Oord A, Vinyals O, Kavukcuoglu K. Neural discrete representation learning. In: Proceedings of Neural Information Processing Systems. 2017.6306−6315.
    [91] Razavi A, Oord A, Vinyals O. Generating diverse high-fidelity images with VQ-VAE-2. In: Proceedings of Neural Information Processing Systems. 2019.14866−14876.
    [92] Shaham T R, Dekel T, Michaeli T. SinGAN: Learning a generative model from a single natural image. In: Proceedings of International Conference on Computer Vision. 2019.4570−4580.
    [93] Hinton G E. To recognize shapes, first learn to generate images. Progress in brain research, 2007, 165(6): 535−547
    [94] Taylor G W, Hinton G E, Roweis S T. Modeling human motion using binary latent variables. In: Proceedings of International Conference on Computer Vision. 2007.1345−1352.
    [95] Mohamed A, Sainath T N, Dahl G, Ramabhadran B, Hinton G H, Picheny M. Deep belief networks using discriminative features for phone recognition. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing. 2011.22−27.
    [96] Ghahabi E O. Deep belief networks for i-vector based speaker recognition. In:Proceedings of International Conference on Acoustics, Speech and Signal Processing. 2014.1700−1704.
    [97] Thomas D, Oliver B, Hermann N. A deep learning approach to machine transliteration. In: Proceedings of the Fourth Workshop on Statistical Machine Translation. 2009.233−241.
    [98] Abdollahi B, Nasraoui O. Explainable restricted boltzmann Machines for collaborative filtering. In: Proceedings of International Conference on Machine Learning. 2016.
    [99] Xing L, Demertzis K, Yang J. Identifying data streams anomalies by evolving spiking restricted Boltzmann machines. Neural Computing and Applications, 2019: 1−15
    [100] Zheng J, Fu X, Zhang G. Research on exchange rate forecasting based on deep belief network. Neural Computing and Applications, 2019, 31(1): 573−582
    [101] Lee H, Grosse R, Ranganath R, YNg A. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: Proceedings of the 26th Annual International Conference on Machine Learning. 2009.609−616.
    [102] Salimans T, Kingma D, Welling M. Markov chain Monte Marlo and variational inference: bridging the gap. In: Proceedings of International Conference on Machine Learning. 2015.1218−1226.
    [103] Gregor K, Danihelka I, Graves A, Rezende D J, Wierstra D. Draw: A recurrent neural network for image generation. arXiv: 1502.04623, 2015.
    [104] Chen R, Li X, Grosse R B, Duvenaud D K. Isolating sources of disentanglement in variational autoencoders. In: Proceedings of Neural Information Processing Systems. 2018.2610−2620.
    [105] Walker J, Doersch C, Gupta A, Hebert M. An uncertain future: Forecasting from static images using variational autoencoders. In: Proceedings of European Conference on Computer Vision. 2016.835−851.
    [106] Gregor K, Besse F, Rezende D J, Danihelka L, Wierstra D. Towards conceptual compression. In: Proceedings of Neural Information Processing Systems. 2016.3549−3557.
    [107] Bowman S R, Vilnis L, Vinyals O, Dai A, Jozefowicz R, Bengio S. Generating sentences from a continuous space. arXiv: 1511.06349, 2015.
    [108] Kusner M J, Paige B, Hernández-Lobato J M. Grammar variational autoencoder. In: Proceedings of International Conference on Machine Learning. 2017.1945−1954.
    [109] Jang M, Seo S, Kang P. Recurrent neural network-based semantic variational autoencoder for sequence-to-sequence learning. Information Sciences, 2019, 490: 59−73 doi: 10.1016/j.ins.2019.03.066
    [110] Ravanbakhsh S, Lanusse F, Mandelbaum R, Schneider J, Poczos B. Enabling dark energy science with deep generative models of galaxy images. In: Proceedings of Thirty-First AAAI Conference on Artificial Intelligence. 2017.
    [111] Li X, She J. Collaborative variational autoencoder for recommender systems. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2017.305−314.
    [112] White T. Sampling generative networks. arXiv: 1609.04468, 2016.
    [113] Gómez-Bombarelli R, Wei J N, Duvenaud D, et al. Automatic chemical design using a data-driven continuous representation of molecules. ACS Central Science, 2018, 4(2): 268−276 doi: 10.1021/acscentsci.7b00572
    [114] Tran L, Yin X, Liu X. Disentangled representation learning gan for pose-invariant face recognition. In: Proceedings of Computer Vision and Pattern Recognition. 2017.1415−1424.
    [115] Huang R, Zhang S, Li T, He R. Beyond face rotation: Global and local perception gan for photorealistic and identity preserving frontal view synthesis. In: Proceedings of International Conference on Computer Vision. 2017.2439−2448.
    [116] Ma L, Jia X, Sun Q, Schiele B, Tuytelaars T, Gool L V. Pose guided person image generation. In: Proceedings of Neural Information Processing Systems. 2017.406−416.
    [117] Siarohin A, Sangineto E, Lathuilière S, Sebe N. Deformable GANs for pose-based human image generation. In: Proceedings of Computer Vision and Pattern Recognition. 2018.3408−3416.
    [118] Chang H, Lu J, Yu F, Finkelstein A. PairedCycleGAN: Asymmetric style transfer for applying and removing makeup. In: Proceedings of Computer Vision and Pattern Recognition. 2018.40−48.
    [119] Pumarola A, Agudo A, Martinez A M, Sanfeliu A, Moreno-Noguer F. Ganimation: Anatomically-aware facial animation from a single image. In: Proceedings of European Conference on Computer Vision. 2018.818−833.
    [120] Donahue C, Lipton Z C, Balsubramani A, McAuley J. Semantically decomposing the latent spaces of generative adversarial networks. arXiv: 1705.07904, 2017.
    [121] Shu Z, Sahasrabudhe M, Guler A R, Samaras D, Paragios N, Kokkinos I. Deforming autoencoders: Unsupervised disentangling of shape and appearance. In: Proceedings of European Conference on Computer Vision. 2018.650−665.
    [122] Lu Y, Tai Y W, Tang C K. Attribute-guided face generation using conditional cycleGAN. In: Proceedings of European Conference on Computer Vision. 2018.282−297.
    [123] Ledig C, Theis L, Huszár F, et al. Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of Computer Vision and Pattern Recognition. 2017.4681−4690.
    [124] Wang X, Yu K, Wu S, et al. EsrGAN: Enhanced super-resolution generative adversarial networks. In: Proceedings of European Conference on Computer Vision. 2018.
    [125] Zhu J Y, Park T, Isola P, AEfros A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of International Conference on Computer Vision. 2017.2223−2232.
    [126] Bansal A, Ma S, Ramanan D, Sheikh Y. Recycle-GAN: Unsupervised video retargeting. In: Proceedings of European Conference on Computer Vision. 2018.119−135.
    [127] Yuan Y, Liu S, Zhang J, Zhang Y, Dong C, Lin L. Unsupervised image super-resolution using cycle-in-cycle generative adversarial networks. In: Proceedings of Computer Vision and Pattern Recognition. 2018.701−710.
    [128] Li J, Liang X, Wei Y, Xu T, Feng J, Yan S. Perceptual generative adversarial networks for small object detection. In: Proceedings of Computer Vision and Pattern Recognition. 2017.1222−1230.
    [129] Bai Y, Zhang Y, Ding M, Ghanem B. Sod-mtGAN: Small object detection via multi-task generative adversarial network. In: Proceedings of European Conference on Computer Vision. 2018.206−221.
    [130] Ehsani K, Mottaghi R, Farhadi A. SeGAN: Segmenting and generating the invisible. In: Proceedings of Computer Vision and Pattern Recognition. 2018.6144−6153.
    [131] Vondrick C, Pirsiavash H, Torralba A. Generating videos with scene dynamics. In: Proceedings of Neural Information Processing Systems. 2016.613−621.
    [132] Villegas R, Yang J, Hong S, Lin X, Lee H. Decomposing motion and content for natural video sequence prediction. arXiv: 1706.08033, 2017.
    [133] Chan C, Ginosar S, Zhou T, A Efros A. Everybody dance now. In: Proceedings of European Conference on Computer Vision. 2019.5933−5942.
    [134] Mathieu M, Couprie C, LeCun Y. Deep multi-scale video prediction beyond mean square error. arXiv: 1511.05440, 2015.
    [135] Yu L, Zhang W, Wang J, Yu Y. SeqGAN: Sequence generative adversarial nets with policy gradient. In: Proceedings of Thirty-First AAAI Conference on Artificial Intelligence. 2017.
    [136] Saito Y, Takamichi S, Saruwatari H. Statistical parametric speech synthesis incorporating generative adversarial networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2017, 26(1): 84−96
    [137] Pascual S, Bonafonte A, Serra J. SEGAN: Speech enhancement generative adversarial network. arXiv: 1703.09452, 2017.
    [138] Wang J, Yu L, Zhang W, et al. IrGAN: A minimax game for unifying generative and discriminative information retrieval models. In: Proceedings of the 40th International ACM SIGIR conference on Research and Development in Information Retrieval. 2017.515−524.
    [139] Lin K, Li D, He X, Sun MT. Adversarial ranking for language generation. In: Proceedings of Neural Information Processing Systems. 2017.3155−3165.
    [140] Qiao T, Zhang J, Xu D, Tao D. MirrorGAN: Learning text-to-image generation by redescription. In: Proceedings of Computer Vision and Pattern Recognition. 2019.1505−1514.
    [141] Schlegl T, Seeböck P, Waldstein S M, Schmidt-Erfurth U, Langs G. Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In: Proceedings of International conference on information processing in medical imaging. Springer, Cham, 2017.146−157.
    [142] Xue Y, Xu T, Zhang H, Long LR, Huang X. SegAN: Adversarial network with multi-scale L1 loss for medical image segmentation. Neuroinformatics, 2018, 16(3-4): 383−392 doi: 10.1007/s12021-018-9377-x
    [143] Yang Q, Yan P, Zhang Y, et al. Low-dose CT image denoising using a generative adversarial network with Wasserstein distance and perceptual loss. IEEE transactions on medical imaging, 2018, 37(6): 1348−1357 doi: 10.1109/TMI.2018.2827462
    [144] Zheng Z, Zheng L, Yang Y. Unlabeled samples generated by GAN improve the person re-identification baseline in vitro. In: Proceedings of International Conference on Computer Vision. 2017.3754−3762.
    [145] Gupta A, Johnson J, Li F, Savarese S, Alahi A. Social GAN: Socially acceptable trajectories with generative adversarial networks. In: Proceedings of Computer Vision and Pattern Recognition. 2018.2255−2264.
    [146] Jang E, Gu S, Poole B. Categorical reparameterization with gumbel-softmax. arXiv: 1611.01144, 2016.
    [147] J Song, T He, L Gao, X Xu, Hanjalic A, Shen H T. Binary generative adversarial networks for image retrieval. In: Proceedings of AAAI Conference on Artificial Intelligence. 2018.394−401.
    [148] Sohn K, Lee H, Yan X. Learning structured output representation using deep conditional generative models. In: Proceedings of Neural Information Processing Systems. 2015.3483−3491.
    [149] Walker J, Doersch C, Gupta A, Hebert M. An uncertain future: Forecasting from static images using variational autoencoders. In: Proceedings of European Conference on Computer Vision. 2016.835−851.
    [150] Xu W, Tan Y. Semisupervised text classification by variational autoencoder. IEEE Transactions on Neural Networks and Learning Systems, 2019, 31(1): 295−308
  • 加载中
计量
  • 文章访问数:  426
  • HTML全文浏览量:  147
  • 被引次数: 0
出版历程
  • 收稿日期:  2019-12-19
  • 录用日期:  2020-07-27

目录

    /

    返回文章
    返回