董兴磊 胡英 黄浩 吾守尔·斯拉木

董兴磊, 胡英, 黄浩, 吾守尔·斯拉木. 基于卷积非负矩阵部分联合分解的强噪声单声道语音分离. 自动化学报, 2020, 46(6): 1200-1209. doi: 10.16383/j.aas.c180065
引用本文: 董兴磊, 胡英, 黄浩, 吾守尔·斯拉木. 基于卷积非负矩阵部分联合分解的强噪声单声道语音分离. 自动化学报, 2020, 46(6): 1200-1209. doi: 10.16383/j.aas.c180065
DONG Xing-Lei, HU Ying, HUANG Hao, SILAMU Wushour. Monaural Speech Separation by Means of Convolutive Nonnegative Matrix Partial Co-factorization in Low SNR Condition. ACTA AUTOMATICA SINICA, 2020, 46(6): 1200-1209. doi: 10.16383/j.aas.c180065
Citation: DONG Xing-Lei, HU Ying, HUANG Hao, SILAMU Wushour. Monaural Speech Separation by Means of Convolutive Nonnegative Matrix Partial Co-factorization in Low SNR Condition. ACTA AUTOMATICA SINICA, 2020, 46(6): 1200-1209. doi: 10.16383/j.aas.c180065


doi: 10.16383/j.aas.c180065

国家自然科学基金 61761041

国家自然科学基金 61663044

国家自然科学基金青年基金 61603323

新疆维吾尔自治区自然科学基金 2016D01C061

新疆大学自然科学基金 BS160239

新疆自治区高校科研计划项目 XJ EDU2017T002


    董兴磊  新疆大学信息科学与工程学院硕士研究生.主要研究方向为语音信号处理, 语音分离. E-mail: 15739578112@163.com

    黄浩  新疆大学信息科学与工程学院教授. 2008年在上海交通大学电子工程系获博士学位.主要研究方向语音识别, 多媒体人机交互技术. E-mail: huanghao@xju.edu.cn

    吾守尔·斯拉木  新疆大学信息科学与工程学院教授.主要研究方向为语音识别, 语音合成, 多语种信息处理. E-mail: wushour@xju.edu.cn


    胡英  新疆大学信息科学与工程学院副教授.研究方向为音频信息检索, 语音处理.本文通信作者. E-mail: huying 75@sina.com

Monaural Speech Separation by Means of Convolutive Nonnegative Matrix Partial Co-factorization in Low SNR Condition


National Natural Science Foundation of China 61761041

National Natural Science Foundation of China 61663044

National Natural Science Foundation of Youth Foundation of China 61603323

Natural Science Grant of Xinjiang Uygur Autonomous Region 2016D01C061

Natural Science Grant of Xinjiang University BS160239

University Scientiflc Research Project of Xinjiang Uygur Autonomous Region XJ EDU2017T002

    DONG Xing-Lei  Master student in the Department of Information Science and Engineering, Xinjiang University. His research interest covers speech signal processing and speech separation

    HUANG Hao  Professor in the Department of Information Science and Engineering, Xinjiang University. He received his Ph. D. degree from Shanghai Jiao Tong University in 2008. His research interest covers speech recognition and multi-media human-machine interaction

    SILAMU Wushour  Professor in the Department of Information Science and Engineering, Xinjiang University. His research interest covers speech recognition, speech synthesis, and multi-lingual information processing

    Corresponding author: HU Ying   Associate professor in the Department of Information Science and Engineering, Xinjiang University. Her research interest covers audio information retrieval and speech processing. Corresponding author of this paper
  • 摘要: 非负矩阵部分联合分解(Nonnegative matrix partial co-factorization, NMPCF)将指定源频谱作为边信息参与混合信号频谱的联合分解, 以帮助确定指定源的基向量进而提高信号分离性能.卷积非负矩阵分解(Convolutive nonnegative matrix factorization, CNMF)采用卷积基分解的方法进行矩阵分解, 在单声道语音分离方面取得较好的效果.为了实现强噪声条件下的语音分离, 本文结合以上两种算法的优势, 提出一种基于卷积非负矩阵部分联合分解(Convolutive nonnegative partial matrix co-factorization, CNMPCF)的单声道语音分离算法.本算法首先通过基音检测算法得到混合信号的语音起始点, 再据此确定混合信号中的纯噪声段, 最后将混合信号频谱和噪声频谱进行卷积非负矩阵部分联合分解, 得到语音基矩阵, 进而得到分离的语音频谱和时域信号.实验中, 混合语音信噪比(Signal noise ratio, SNR)选择以-3 dB为间隔从0 dB至-12 dB共5种SNR.实验结果表明, 在不同噪声类型和噪声强度条件下, 本文提出的CNMPCF方法相比于以上两种方法均有不同程度的提高.
    Recommended by Associate Editor DANG Jian-Wu
  • 图  1  干净语音频谱经过CNMF分解后提取出的基向量

    Fig.  1  The basis extracted from the clean speech spectrum after CNMF decomposition

    图  2  CNMPCF算法的频谱分解示意图

    Fig.  2  The illustration of magnitude spectrogram by CNMPCF

    图  3  语音起点、终点(边界)检测示意图

    Fig.  3  The illustration of start end points (boundary) detection of a speech

    图  4  -12 dB混合信号的语音上界、下界检测偏差概率分布

    Fig.  4  The probability distribution of detection deviation of upper and lower bounds in -12 dB mixture speech

    图  5  不同噪声下的PESQ性能对比

    Fig.  5  Comparison of PESQ under difierent noises

    图  6  不同噪声下的SDR性能对比

    Fig.  6  Comparison of SDR under difierent noises

    图  7  不同噪声下的$\Delta $SNR性能对比

    Fig.  7  Comparison of $\Delta $SNR under different noises

    表  1  5种信噪比下, 不同方法的主观听音得分平均值

    Table  1  The subjective listening score of different methods at five different input SNR levels

