Multi-Modal Network based on Spatio-temporal and Attention for Emotion Recognition | IEEE Conference Publication | IEEE Xplore