Learning Deep Representation of The Emotion Speech Signal

Duan, Junyi and Song, Zheng and Zhao, Jianfeng (2021) Learning Deep Representation of The Emotion Speech Signal. In: GREENETS 2021, 6-7 June 2021, Dalian, People’s Republic of China.

[thumbnail of PDF]
Text (PDF)
eai.6-6-2021.2307539.pdf - Published Version

Download (720kB) | Preview


This paper aims at learning deep representation of emotion speech signal directly from raw audio clip using a 1D convolutional encoder, and reconstructing the audio signal using a 1D deconvolutional decoder. The learned deep features which contain the essential information of the signal, should be robust enough to reconstruct the speech signal. The location of the maximal value in the pooled receptive field of the max pooling layer is passed to the corresponding unpooling layer for reconstructing the audio clip. Residual learning is adopted to ease the training process. A dual training mechanism was developed to enable the decoder to reconstruct the speech signal from the deep representation more accurate. After completing the training of the convolutional-deconvolutional encoder-decoder as a whole, the decoder with transferred features was trained again. Experiments conducted on Berlin EmoDB and SAVEE database achieve excellent performances.

Item Type: Conference or Workshop Item (Paper)
Uncontrolled Keywords: deep representation deep learning speech signal
Subjects: T Technology > T Technology (General)
Depositing User: EAI Editor IV
Date Deposited: 10 Sep 2021 11:00
Last Modified: 10 Sep 2021 11:00
URI: https://eprints.eudl.eu/id/eprint/6785

Actions (login required)

View Item
View Item