short-paper

Coherent Topic Modeling for Creative Multimodal Data on Social Media

Authors:

Usman NaseemAuthors Info & Claims

WWW '23: Proceedings of the ACM Web Conference 2023

Pages 3923 - 3927

https://doi.org/10.1145/3543507.3587433

Published: 30 April 2023 Publication History

Abstract

The creative web is all about combining different types of media to create a unique and engaging online experience. Multimodal data, such as text and images, is a key component in the creative web. Social media posts that incorporate both text descriptions and images offer a wealth of information and context. Text in social media posts typically relates to one topic, while images often convey information about multiple topics due to the richness of visual content. Despite this potential, many existing multimodal topic models do not take these criteria into account, resulting in poor quality topics being generated. Therefore, we proposed a Coherent Topic modeling for Multimodal Data (CTM-MM), which takes into account that text in social media posts typically relates to one topic, while images can contain information about multiple topics. Our experimental results show that CTM-MM outperforms traditional multimodal topic models in terms of classification and topic coherence.

References

[1]

Murugan Anandarajan, Chelsey Hill, Thomas Nolan, Murugan Anandarajan, Chelsey Hill, and Thomas Nolan. 2019. Text preprocessing. Practical text analytics: Maximizing the value of text data (2019), 45–59.

[2]

Yang Bao, Nigel Collier, and Anindya Datta. 2013. A partially supervised cross-collection topic model for cross-domain text classification. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management. 239–248.

Digital Library

[3]

Jingwen Bian, Yang Yang, and Tat-Seng Chua. 2013. Multimedia summarization for trending topics in microblogs. In Proceedings of the 22nd ACM international Conference on information & knowledge management. 1807–1812.

Digital Library

[4]

David Blei, Andrew Ng, and Michael Jordan. 2001. Latent dirichlet allocation. Advances in neural information processing systems 14 (2001).

[5]

David M Blei and Michael I Jordan. 2003. Modeling annotated data. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval. 127–134.

Digital Library

[6]

Gerlof Bouma. 2009. Normalized (pointwise) mutual information in collocation extraction. Proceedings of GSCL 30 (2009), 31–40.

[7]

Tao Chen, Hany M SalahEldeen, Xiangnan He, Min-Yen Kan, and Dongyuan Lu. 2015. Velda: Relating an image tweet’s text and images. In Twenty-Ninth AAAI Conference on Artificial Intelligence.

[8]

Haidong Gao, Siliang Tang, Yin Zhang, Dapeng Jiang, Fei Wu, and Yueting Zhuang. 2012. Supervised cross-collection topic modeling. In Proceedings of the 20th ACM international conference on Multimedia. 957–960.

Digital Library

[9]

Thomas Hofmann. 1999. Probabilistic latent semantic indexing. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. 50–57.

Digital Library

[10]

Yuting Hu, Liang Zheng, Yi Yang, and Yongfeng Huang. 2017. Twitter100k: A real-world dataset for weakly supervised cross-media retrieval. IEEE Transactions on Multimedia 20, 4 (2017), 927–938.

Digital Library

[11]

Yu-Gang Jiang, Jun Yang, Chong-Wah Ngo, and Alexander G Hauptmann. 2009. Representations of keypoint-based semantic concept detection: A comprehensive study. IEEE Transactions on Multimedia 12, 1 (2009), 42–53.

Digital Library

[12]

Mimu Kawai, Hiroyuki Sato, and Takayuki Shiohama. 2022. Topic model-based recommender systems and their applications to cold-start problems. Expert Systems with Applications 202 (2022), 117129.

Digital Library

[13]

Ralf Krestel, Peter Fankhauser, and Wolfgang Nejdl. 2009. Latent dirichlet allocation for tag recommendation. In Proceedings of the third ACM conference on Recommender systems. 61–68.

Digital Library

[14]

Jey Han Lau, David Newman, and Timothy Baldwin. 2014. Machine reading tea leaves: Automatically evaluating topic coherence and topic model quality. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics. 530–539.

[15]

Jian Ma, Lei Wang, Yuan-Rong Zhang, Wei Yuan, and Wei Guo. 2023. An integrated latent Dirichlet allocation and Word2vec method for generating the topic evolution of mental models from global to local. Expert Systems with Applications 212 (2023), 118695.

Digital Library

[16]

Jon Mcauliffe and David Blei. 2007. Supervised topic models. Advances in neural information processing systems 20 (2007).

[17]

Usman Naseem, Jinman Kim, Matloob Khushi, and Adam G Dunn. 2023. A Multimodal Framework for the Identification of Vaccine Critical Memes on Twitter. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining. 706–714.

Digital Library

[18]

David Newman, Arthur Asuncion, Padhraic Smyth, and Max Welling. 2009. Distributed algorithms for topic models.Journal of Machine Learning Research 10, 8 (2009).

[19]

David Newman, Jey Han Lau, Karl Grieser, and Timothy Baldwin. 2010. Automatic evaluation of topic coherence. In Human language technologies: The 2010 annual conference of the North American chapter of the association for computational linguistics. 100–108.

[20]

Junaid Rashid, Jungeun Kim, Usman Naseem, and Amir Hussain. 2022. A DistilBERTopic Model for Short Text Documents. In Proceedings of the The 20th Annual Workshop of the Australasian Language Technology Association. 84–89.

[21]

Junaid Rashid, Syed Muhammad Adnan Shah, and Aun Irtaza. 2019. Fuzzy topic modeling approach for text mining over short text. Information Processing & Management 56, 6 (2019), 102060.

Digital Library

[22]

Junaid Rashid, Syed Muhammad Adnan Shah, Aun Irtaza, Toqeer Mahmood, Muhammad Wasif Nisar, Muhammad Shafiq, and Akber Gardezi. 2019. Topic modeling technique for text mining over biomedical text corpora through hybrid inverse documents frequency and fuzzy k-means clustering. IEEE Access 7 (2019), 146070–146080.

[23]

Nikhil Rasiwasia, Jose Costa Pereira, Emanuele Coviello, Gabriel Doyle, Gert RG Lanckriet, Roger Levy, and Nuno Vasconcelos. 2010. A new approach to cross-modal multimedia retrieval. In Proceedings of the 18th ACM international conference on Multimedia. 251–260.

Digital Library

[24]

Michael Röder, Andreas Both, and Alexander Hinneburg. 2015. Exploring the space of topic coherence measures. In Proceedings of the eighth ACM international conference on Web search and data mining. 399–408.

Digital Library

[25]

Praveen Sv, Jose Manuel Lorenz, Rajesh Ittamalla, Kuldeep Dhama, Chiranjib Chakraborty, Daruri Venkata Srinivas Kumar, and Thivyaa Mohan. 2022. Twitter-Based Sentiment Analysis and Topic Modeling of Social Media Posts Using Natural Language Processing, to Understand People’s Perspectives Regarding COVID-19 Booster Vaccine Shots in India: Crucial to Expanding Vaccination Coverage. Vaccines 10, 11 (2022), 1929.

[26]

Surendrabikram Thapa, Aditya Shah, Farhan Jafri, Usman Naseem, and Imran Razzak. 2022. A multi-modal dataset for hate speech detection on social media: Case-study of russia-ukraine conflict. In Proceedings of the 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE). 1–6.

[27]

Feng Xue, Richang Hong, Xiangnan He, Jianwei Wang, Shengsheng Qian, and Changsheng Xu. 2020. Knowledge-Based Topic Model for Multi-Modal Social Event Analysis. IEEE Transactions on Multimedia 22, 8 (2020), 2098–2110.

[28]

Feng Xue, Jian Sun, Xueliang Liu, Tianpeng Liu, and Qiang Lu. 2019. Social multi-modal event analysis via knowledge-based weighted topic model. Journal of Visual Communication and Image Representation 59 (2019), 1–8.

Digital Library

[29]

Feng Xue, Jian Sun, Xueliang Liu, Tianpeng Liu, and Qiang Lu. 2019. Social multi-modal event analysis via knowledge-based weighted topic model. Journal of Visual Communication and Image Representation 59 (2019), 1–8.

Digital Library

[30]

Huaiwen Zhang, Quan Fang, Shengsheng Qian, and Changsheng Xu. 2019. Multi-modal knowledge-aware event memory network for social media rumor detection. In Proceedings of the 27th ACM international conference on multimedia. 1942–1951.

Digital Library

[31]

Huakui Zhang, Cai Yi, Bingshan Zhu, Haopeng Ren, and Qing Li. 2022. Multimodal Topic Modeling by Exploring Characteristics of Short Text Social Media. IEEE Transactions on Multimedia (2022).

Digital Library

[32]

Morteza Zihayat, Anteneh Ayanso, Xing Zhao, Heidar Davoudi, and Aijun An. 2019. A utility-based news recommendation system. Decision Support Systems 117 (2019), 14–27.

Index Terms

Coherent Topic Modeling for Creative Multimodal Data on Social Media
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Information extraction

Recommendations

PromptMTopic: Unsupervised Multimodal Topic Modeling of Memes using Large Language Models
MM '23: Proceedings of the 31st ACM International Conference on Multimedia

The proliferation of social media has given rise to a new form of communication: memes. Memes are multimodal and often contain a combination of text and visual elements that convey meaning, humor, and cultural significance. While meme analysis has been ...
Multi-aspect sentiment analysis for Chinese online social reviews based on topic modeling and HowNet lexicon

User-generated reviews on the Web reflect users' sentiment about products, services and social events. Existing researches mostly focus on the sentiment classification of the product and service reviews in document level. Reviews of social events such ...
Improving relationship management in universities with sentiment analysis and topic modeling of social media channels: learnings from UFPA
WI '17: Proceedings of the International Conference on Web Intelligence

Online Social networking (OSN) platforms such as Facebook, daily have a massive number of users and content being created. Users of such services have the power to share opinions and influence others. This creates an interesting scenario, where brands ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '23: Proceedings of the ACM Web Conference 2023

April 2023

4293 pages

ISBN:9781450394161

DOI:10.1145/3543507

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 April 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper
Research
Refereed limited

Funding Sources

National Research Foundation of Korea

Conference

WWW '23

Sponsor:

SIGWEB

WWW '23: The ACM Web Conference 2023

April 30 - May 4, 2023

TX, Austin, USA

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
284
Total Downloads

Downloads (Last 12 months)77
Downloads (Last 6 weeks)3

Reflects downloads up to 25 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten