Learning Deep Representation of The Emotion Speech Signal

Junyi Duan; Zheng Song; Jianfeng Zhao

Proceedings of the 8th EAI International Conference on Green Energy and Networking, GreeNets 2021, June 6-7, 2021, Dalian, People’s Republic of China

Research Article

Learning Deep Representation of The Emotion Speech Signal

Download301 downloads

Cite: BibTeX Plain Text

@INPROCEEDINGS{10.4108/eai.6-6-2021.2307539,
    author={Junyi  Duan and Zheng  Song and Jianfeng  Zhao},
    title={Learning Deep Representation of The Emotion Speech Signal},
    proceedings={Proceedings of the 8th EAI International Conference on Green Energy and Networking, GreeNets 2021, June 6-7, 2021, Dalian, People’s Republic of China},
    publisher={EAI},
    proceedings_a={GREENETS},
    year={2021},
    month={8},
    keywords={deep representation deep learning speech signal},
    doi={10.4108/eai.6-6-2021.2307539}
}

Junyi Duan
Zheng Song
Jianfeng Zhao
Year: 2021
Learning Deep Representation of The Emotion Speech Signal
GREENETS
EAI
DOI: 10.4108/eai.6-6-2021.2307539

Junyi Duan¹, Zheng Song², Jianfeng Zhao³^,*

1: Inner Mongolia Branch of China Tower Corporation Limited
2: School of Electronic Information Engineering, Inner Mongolia University
3: Hangzhou Innovation Institute, Beihang University

*Contact email: nmgzjf@outlook.com

Abstract

This paper aims at learning deep representation of emotion speech signal directly from raw audio clip using a 1D convolutional encoder, and reconstructing the audio signal using a 1D deconvolutional decoder. The learned deep features which contain the essential information of the signal, should be robust enough to reconstruct the speech signal. The location of the maximal value in the pooled receptive field of the max pooling layer is passed to the corresponding unpooling layer for reconstructing the audio clip. Residual learning is adopted to ease the training process. A dual training mechanism was developed to enable the decoder to reconstruct the speech signal from the deep representation more accurate. After completing the training of the convolutional-deconvolutional encoder-decoder as a whole, the decoder with transferred features was trained again. Experiments conducted on Berlin EmoDB and SAVEE database achieve excellent performances.

Keywords: deep representation deep learning speech signal

Published: 2021-08-30
Publisher: EAI

: http://dx.doi.org/10.4108/eai.6-6-2021.2307539