Proposing Multimodal Integration Model Using LSTM and Autoencoder

Noguchi, Wataru and Iizuka, Hiroyuki and Yamamoto, Masahito (2016) Proposing Multimodal Integration Model Using LSTM and Autoencoder. EAI Endorsed Transactions on Security and Safety, 3 (10). e1. ISSN 2032-9393

Available under License Creative Commons Attribution No Derivatives.

Download (340kB) | Preview


We propose an architecture of neural network that can learn and integrate sequential multimodal information using Long Short Term Memory. Our model consists of encoder and decoder LSTMs and multimodal autoencoder. For integrating sequential multimodal information, firstly, the encoder LSTM encodes a sequential input to a fixed range feature vector for each modality. Secondly, the multimodal autoencoder integrates the feature vectors from each modality and generate a fused feature vector which contains sequential multimodal information in a mixed form. The original feature vectors from each modality are re-generated from the fused feature vector in the multimodal autoencoder. The decoder LSTM decodes the sequential inputs from the regenerated feature vector. Our model is trained with the visual and motion sequences of humans and is tested by recall tasks. The experimental results show that our model can learn and remember the sequential multimodal inputs and decrease the ambiguity generated at the learning stage of LSTMs using integrated multimodal information. Our model can also recall the visual sequences from the only motion sequences and vice versa.

Item Type: Article
Uncontrolled Keywords: multimodal integration, deep learning, autoencoder, long short term memory
Subjects: H Social Sciences > H Social Sciences (General)
Q Science > QA Mathematics > QA75 Electronic computers. Computer science
QA75 Electronic computers. Computer science
Depositing User: EAI Editor IV
Date Deposited: 26 Mar 2021 13:52
Last Modified: 26 Mar 2021 13:52

Actions (login required)

View Item View Item