Proceedings of the 8th EAI International Conference on Green Energy and Networking, GreeNets 2021, June 6-7, 2021, Dalian, People’s Republic of China

Research Article

Learning Deep Representation of The Emotion Speech Signal

Download283 downloads
  • @INPROCEEDINGS{10.4108/eai.6-6-2021.2307539,
        author={Junyi  Duan and Zheng  Song and Jianfeng  Zhao},
        title={Learning Deep Representation of The Emotion Speech Signal},
        proceedings={Proceedings of the 8th EAI International Conference on Green Energy and Networking, GreeNets 2021, June 6-7, 2021, Dalian, People’s Republic of China},
        publisher={EAI},
        proceedings_a={GREENETS},
        year={2021},
        month={8},
        keywords={deep representation deep learning speech signal},
        doi={10.4108/eai.6-6-2021.2307539}
    }
    
  • Junyi Duan
    Zheng Song
    Jianfeng Zhao
    Year: 2021
    Learning Deep Representation of The Emotion Speech Signal
    GREENETS
    EAI
    DOI: 10.4108/eai.6-6-2021.2307539
Junyi Duan1, Zheng Song2, Jianfeng Zhao3,*
  • 1: Inner Mongolia Branch of China Tower Corporation Limited
  • 2: School of Electronic Information Engineering, Inner Mongolia University
  • 3: Hangzhou Innovation Institute, Beihang University
*Contact email: nmgzjf@outlook.com

Abstract

This paper aims at learning deep representation of emotion speech signal directly from raw audio clip using a 1D convolutional encoder, and reconstructing the audio signal using a 1D deconvolutional decoder. The learned deep features which contain the essential information of the signal, should be robust enough to reconstruct the speech signal. The location of the maximal value in the pooled receptive field of the max pooling layer is passed to the corresponding unpooling layer for reconstructing the audio clip. Residual learning is adopted to ease the training process. A dual training mechanism was developed to enable the decoder to reconstruct the speech signal from the deep representation more accurate. After completing the training of the convolutional-deconvolutional encoder-decoder as a whole, the decoder with transferred features was trained again. Experiments conducted on Berlin EmoDB and SAVEE database achieve excellent performances.