skip to main content
research-article

Emotional Attention Detection and Correlation Exploration for Image Emotion Distribution Learning

Published: 01 January 2023 Publication History

Abstract

Current works on image emotion distribution learning typically extract visual representations from the holistic image or explore emotion-related regions in the image from a global-wise perspective. However, different regions of an image contribute differently to the arousal of each emotion. Existing works do not deeply explore corresponding emotion-aware regions of each emotion in the image, nor do they fully capture the relationship between each emotion-aware region and the emotion labels. In this article, we propose a novel attention based emotion distribution learning method, which can explore the emotion-related regions of images from the perspective of each emotion category, and can conduct region relationship learning. Specifically, we introduce a semantic guided attention detection network to generate class-wise attention maps for each emotion and a global-wise attention map for the holistic image. Meanwhile, an emotional graph-based network is adopted to capture the correlation between each region and the emotion distribution. Experiments on several benchmark datasets demonstrate the superiority of the proposed method compared to related works.

References

[1]
S. Zhao, H. Yao, Y. Yang, and Y. Zhang, “Affective image retrieval via multi-graph learning,” in Proc. 22nd ACM Int. Conf. Multimedia, 2014, pp. 1025–1028.
[2]
J. Yang, D. She, Y.-K. Lai, and M.-H. Yang, “Retrieving and classifying affective images via deep metric learning,” in Proc. 32nd AAAI Conf. Artif. Intell., 2018, pp. 491–498. [Online]. Available: https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/17143
[3]
B. Yang, Y. Kim, and C. Yoo, “The integrated mobile advertising model: The effects of technology-and emotion-based evaluations,” J. Busi. Res., vol. 66, no. 9, pp. 1345–1352, 2013.
[4]
S. Zhao, G. Ding, Q. Huang, T.-S. Chua, B. W. Schuller, and K. Keutzer, “Affective image content analysis: A comprehensive survey,” in Proc. Int. Joint Conf. Artif. Intell., 2018, pp. 5534–5541.
[5]
A. Ortis, G. M. Farinella, and S. Battiato, “An overview on image sentiment analysis: Methods datasets and current challenges,” in Proc. 16th Int. Joint Conf. E-Bus. Telecommun., vol. 1, 2019, pp. 290–300.
[6]
S. Zhao, S. Wang, M. Soleymani, D. Joshi, and Q. Ji, “Affective computing for large-scale heterogeneous multimedia data: A survey,” ACM Trans. Multimedia Comput., Commun. Appl., vol. 15, no. 3s, pp. 1–32, 2019.
[7]
A. Ortis, G. M. Farinella, and S. Battiato, “Survey on visual sentiment analysis,” IET Image Process., vol. 14, no. 8, pp. 1440–1456, 2020.
[8]
X. Lu, P. Suryanarayan, R. Adams, J. Li, M. G. Newman, and J. Wang, “On shape and computability of emotions,’inProc. 20th ACM Int. Conf. Multimedia, 2015, pp. 229–238.
[9]
Q. You, J. Luo, H. Jin, and J. Yang, “Building a large scale dataset for image emotion recognition: The fine print and the benchmark,” 2016,.
[10]
J. Machajdik and A. Hanbury, “Affective image classification using features inspired by psychology and art theory,” in Proc. 18th ACM Int. Conf. Multimedia, 2010, pp. 83–92.
[11]
D. Borth, R. Ji, T. Chen, T. Breuel, and S.-F. Chang, “Large-scale visual sentiment ontology and detectors using adjective noun pairs,” in Proc. 21st ACM Int. Conf. Multimedia, 2013, pp. 223–232.
[12]
T. Chen, D. Borth, T. Darrell, and S.-F. Chang, “DeepSentiBank: Visual sentiment concept classification with deep convolutional neural networks,” 2014,.
[13]
Q. You, J. Luo, H. Jin, and J. Yang, “Robust image sentiment analysis using progressively trained and domain transferred deep networks,” in Proc. 29th AAAI Conf. Artif. Intell., 2015, pp. 381–388.
[14]
S. Zhao, H. Yao, Y. Gao, G. Ding, and T.-S. Chua, “Predicting personalized image emotion perceptions in social networks,” IEEE Trans. Affect. Comput., vol. 9, no. 4, pp. 526–540, Oct.–Dec. 2018.
[15]
S. Zhao, Z. Jia, H. Chen, L. Li, G. Ding, and K. Keutzer, “PDANet: Polarity-consistent deep attention network for fine-grained visual emotion regression,” in Proc. 27th ACM Int. Conf. Multimedia, 2019, pp. 192–201.
[16]
J. Yang, D. She, and M. Sun, “Joint image emotion classification and distribution learning via deep convolutional neural network,” in Proc. Int. Joint Conf. Artif. Intell., 2017, pp. 3266–3272.
[17]
H. Xiong, H. Liu, B. Zhong, and Y. Fu, “Structured and sparse annotations for image emotion distribution learning,” in Proc. AAAI Conf. Artif. Intell., 2019, pp. 363–370.
[18]
S. Zhao, Y. Gao, X. Jiang, H. Yao, T.-S. Chua, and X. Sun, “Exploring principles-of-art features for image emotion recognition,” in Proc. 22nd ACM Int. Conf. Multimedia, 2014, pp. 47–56.
[19]
X. Zhuet al., “Dependency exploitation: A unified CNN-RNN approach for visual emotion recognition,” in Proc. 26th Int. Joint Conf. Artif. Intell., 2017, pp. 3595–3601.
[20]
J. Yang, D. She, M. Sun, M.-M. Cheng, P. L. Rosin, and L. Wang, “Visual sentiment prediction based on automatic discovery of affective regions,” IEEE Trans. Multimedia, vol. 20, no. 9, pp. 2513–2525, Sep. 2018.
[21]
J. Lee, S. Kim, S. Kim, J. Park, and K. Sohn, “Context-aware emotion recognition networks,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2019, pp. 10142–10151.
[22]
D. Joshiet al., “Aesthetics and emotions in images,” IEEE Signal Process. Mag., vol. 28, no. 5, pp. 94–115, Sep. 2011.
[23]
K.-C. Peng, T. Chen, A. Sadovnik, and A. C. Gallagher, “A mixed bag of emotions: Model, predict, and transfer emotion distributions,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2015, pp. 860–868.
[24]
S. Zhao, H. Yao, X. Jiang, and X. Sun, “Predicting discrete probability distribution of image emotions,” in Proc. 2015 IEEE Int. Conf. Image Process., 2015, pp. 2459–2463.
[25]
X. Geng, “Label distribution learning,” IEEE Trans. Knowl. Data Eng., vol. 28, no. 7, pp. 1734–1748, Jul. 2016.
[26]
Q. You, H. Jin, and J. Luo, “Visual sentiment analysis by attending on local image regions,” in Proc. 31st AAAI Conf. Artif. Intell., 2017, pp. 231–237.
[27]
K.-C. Peng, A. Sadovnik, A. Gallagher, and T. Chen, “Where do emotions come from? Predicting the emotion stimuli map,” in Proc. IEEE Int. Conf. Image Process., 2016, pp. 614–618.
[28]
K. Song, T. Yao, Q. Ling, and T. Mei, “Boosting image sentiment analysis with visual attention,” Neurocomputing, vol. 312, pp. 218–228, 2018.
[29]
T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” 2016,.
[30]
M. Xu, J. S. Jin, S. Luo, and L. Duan, “Hierarchical movie affective content analysis based on arousal and valence features,” in Proc. 16th ACM Int. Conf. Multimedia, 2008, pp. 677–680.
[31]
M. A. Nicolaou, H. Gunes, and M. Pantic, “A multi-layer hybrid framework for dimensional emotion classification,” in Proc. 19th ACM Int. Conf. Multimedia, 2011, pp. 933–936.
[32]
M. Solli and R. Lenz, “Color based bags-of-emotions,” in Proc. Int. Conf. Comput. Anal. Images Patterns, 2009, pp. 573–580.
[33]
Y. Yanget al., “How do your friends on social media disclose your emotions?” in Proc. 28th AAAI Conf. Artif. Intell., 2014, pp. 306–312.
[34]
T. Rao, M. Xu, H. Liu, J. Wang, and I. Burnett, “Multi-scale blocks based image emotion classification using multiple instance learning,” in Proc. IEEE Int. Conf. Image Process., 2016, pp. 634–638.
[35]
T. Chen, F. X. Yu, J. Chen, Y. Cui, Y.-Y. Chen, and S.-F. Chang, “Object-based visual sentiment concept analysis and application,” in Proc. 22nd ACM Int. Conf. Multimedia, 2014, pp. 367–376.
[36]
H. Larochelle and G. E. Hinton, “Learning to combine foveal glimpses with a third-order Boltzmann machine,” Adv. Neural Inf. Process. Syst., vol. 23, pp. 1243–1251, 2010.
[37]
V. Mnihet al., “Recurrent models of visual attention,” Adv. Neural Inf. Process. Syst., 2014, pp. 2204–2212.
[38]
B. Zhao, X. Wu, J. Feng, Q. Peng, and S. Yan, “Diversified visual attention networks for fine-grained object classification,” IEEE Trans. Multimedia, vol. 19, no. 6, pp. 1245–1256, Jun. 2017.
[39]
K. Song, T. Yao, Q. Ling, and T. Mei, “Boosting image sentiment analysis with visual attention,” Neurocomputing, vol. 312, pp. 218–228, 2018.
[40]
J. Yang, D. She, Y.-K. Lai, P. L. Rosin, and M.-H. Yang, “Weakly supervised coupled networks for visual sentiment analysis,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 7584–7592.
[41]
X. Geng, C. Yin, and Z.-H. Zhou, “Facial age estimation by learning from label distributions,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 10, pp. 2401–2412, Oct. 2013.
[42]
S. Della Pietra, V. Della Pietra, and J. Lafferty, “Inducing features of random fields,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 19, no. 4, pp. 380–393, Apr. 1997.
[43]
J. Nocedal and S. Wright, Numerical Optimization. Berlin, Germany: Springer Science & Business Media, 2006.
[44]
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” Adv. Neural Inf. Process. Syst., vol. 60, no. 6, pp. 1097–1105, 2012.
[45]
J. Pennington, R. Socher, and C. D. Manning, “Glove: Global vectors for word representation,” in Proc. Conf. Empirical Methods. Natural Lang. Process., 2014, pp. 1532–1543.
[46]
Z. Yu, J. Yu, J. Fan, and D. Tao, “Multi-modal factorized bilinear pooling with co-attention learning for visual question answering,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 1821–1830.
[47]
Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and S. Y. Philip, “A comprehensive survey on graph neural networks,” IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 1, pp. 4–24, Jan. 2021.
[48]
Z. Zhang, P. Cui, and W. Zhu, “Deep learning on graphs: A survey,” IEEE Trans. Knowl. Data Eng., to be published.
[49]
R. Plutchik, “The nature of emotions: Human emotions have deep evolutionary roots, a fact that may explain their complexity and provide tools for clinical practice,” Amer. Sci., vol. 89, no. 4, pp. 344–350, 2001.
[50]
J. A. Mikels, B. L. Fredrickson, G. R. Larkin, C. M. Lindberg, S. J. Maglio, and P. A. Reuter-Lorenz, “Emotional category data on images from the international affective picture system,” Behav. Res. Methods, vol. 37, no. 4, pp. 626–630, 2005.
[51]
Y. Zhu, Y. Zhou, Q. Ye, Q. Qiu, and J. Jiao, “Soft proposal networks for weakly supervised object localization,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 1841–1850.
[52]
J. Yang, M. Sun, and X. Sun, “Learning visual sentiment distributions via augmented conditional probability neural network,” in Proc. 31st AAAI Conf. Artif. Intell., 2017, pp. 224–230.
[53]
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 770–778.
[54]
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2009, pp. 248–255.
[55]
S. Aman and S. Szpakowicz, “Identifying expressions of emotion in text,” in Proc. Int. Conf. Text, Speech. Dialogue, 2007, pp. 196–205.
[56]
S. Mohammad, F. Bravo-Marquez, M. Salameh, and S. Kiritchenko, “SemEval-2018 task 1: Affect in tweets,” in Proc. 12th Int. Workshop Semantic Eval. New Orleans, Louisiana: Association for Computational Linguistics, Jun. 2018, pp. 1–17.
[57]
W. Wang, L. Chen, K. Thirunarayan, and A. P. Sheth, “Harnessing twitter” big data” for automatic emotion identification,” in Proc. Int. Conf. Privacy, Secur., Risk Trust Int. Conf. Social Comput. IEEE, 2012, pp. 587–592.
[58]
Z.-M. Chen, X.-S. Wei, P. Wang, and Y. Guo, “Multi-label image recognition with graph convolutional networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 5177–5186.
[59]
J. Wang, Z. Wei, T. Zhang, and W. Zeng, “Deeply-fused nets,” 2016,.
[60]
A. Vaswaniet al., “Attention is all you need,” in Proc. Adv. Neural Inf. Process. Syst., 2017, pp. 6000–6010.
[61]
S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” Proc. Adv. Conf. Neural Inf. Process. Syst., vol. 28, pp. 91–99, 2015.
[62]
K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask R-CNN,” in Proc. IEEE Int. Conf. Comput. Vis., Oct. 2017, pp. 2980–2988.
[63]
J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proc. IEEE Int. Conf. Comput. Vis., Oct. 2017, pp. 2242–2251.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Affective Computing
IEEE Transactions on Affective Computing  Volume 14, Issue 1
Jan.-March 2023
863 pages

Publisher

IEEE Computer Society Press

Washington, DC, United States

Publication History

Published: 01 January 2023

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media