skip to main content
10.1145/3543507.3587433acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
short-paper

Coherent Topic Modeling for Creative Multimodal Data on Social Media

Published: 30 April 2023 Publication History

Abstract

The creative web is all about combining different types of media to create a unique and engaging online experience. Multimodal data, such as text and images, is a key component in the creative web. Social media posts that incorporate both text descriptions and images offer a wealth of information and context. Text in social media posts typically relates to one topic, while images often convey information about multiple topics due to the richness of visual content. Despite this potential, many existing multimodal topic models do not take these criteria into account, resulting in poor quality topics being generated. Therefore, we proposed a Coherent Topic modeling for Multimodal Data (CTM-MM), which takes into account that text in social media posts typically relates to one topic, while images can contain information about multiple topics. Our experimental results show that CTM-MM outperforms traditional multimodal topic models in terms of classification and topic coherence.

References

[1]
Murugan Anandarajan, Chelsey Hill, Thomas Nolan, Murugan Anandarajan, Chelsey Hill, and Thomas Nolan. 2019. Text preprocessing. Practical text analytics: Maximizing the value of text data (2019), 45–59.
[2]
Yang Bao, Nigel Collier, and Anindya Datta. 2013. A partially supervised cross-collection topic model for cross-domain text classification. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management. 239–248.
[3]
Jingwen Bian, Yang Yang, and Tat-Seng Chua. 2013. Multimedia summarization for trending topics in microblogs. In Proceedings of the 22nd ACM international Conference on information & knowledge management. 1807–1812.
[4]
David Blei, Andrew Ng, and Michael Jordan. 2001. Latent dirichlet allocation. Advances in neural information processing systems 14 (2001).
[5]
David M Blei and Michael I Jordan. 2003. Modeling annotated data. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval. 127–134.
[6]
Gerlof Bouma. 2009. Normalized (pointwise) mutual information in collocation extraction. Proceedings of GSCL 30 (2009), 31–40.
[7]
Tao Chen, Hany M SalahEldeen, Xiangnan He, Min-Yen Kan, and Dongyuan Lu. 2015. Velda: Relating an image tweet’s text and images. In Twenty-Ninth AAAI Conference on Artificial Intelligence.
[8]
Haidong Gao, Siliang Tang, Yin Zhang, Dapeng Jiang, Fei Wu, and Yueting Zhuang. 2012. Supervised cross-collection topic modeling. In Proceedings of the 20th ACM international conference on Multimedia. 957–960.
[9]
Thomas Hofmann. 1999. Probabilistic latent semantic indexing. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. 50–57.
[10]
Yuting Hu, Liang Zheng, Yi Yang, and Yongfeng Huang. 2017. Twitter100k: A real-world dataset for weakly supervised cross-media retrieval. IEEE Transactions on Multimedia 20, 4 (2017), 927–938.
[11]
Yu-Gang Jiang, Jun Yang, Chong-Wah Ngo, and Alexander G Hauptmann. 2009. Representations of keypoint-based semantic concept detection: A comprehensive study. IEEE Transactions on Multimedia 12, 1 (2009), 42–53.
[12]
Mimu Kawai, Hiroyuki Sato, and Takayuki Shiohama. 2022. Topic model-based recommender systems and their applications to cold-start problems. Expert Systems with Applications 202 (2022), 117129.
[13]
Ralf Krestel, Peter Fankhauser, and Wolfgang Nejdl. 2009. Latent dirichlet allocation for tag recommendation. In Proceedings of the third ACM conference on Recommender systems. 61–68.
[14]
Jey Han Lau, David Newman, and Timothy Baldwin. 2014. Machine reading tea leaves: Automatically evaluating topic coherence and topic model quality. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics. 530–539.
[15]
Jian Ma, Lei Wang, Yuan-Rong Zhang, Wei Yuan, and Wei Guo. 2023. An integrated latent Dirichlet allocation and Word2vec method for generating the topic evolution of mental models from global to local. Expert Systems with Applications 212 (2023), 118695.
[16]
Jon Mcauliffe and David Blei. 2007. Supervised topic models. Advances in neural information processing systems 20 (2007).
[17]
Usman Naseem, Jinman Kim, Matloob Khushi, and Adam G Dunn. 2023. A Multimodal Framework for the Identification of Vaccine Critical Memes on Twitter. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining. 706–714.
[18]
David Newman, Arthur Asuncion, Padhraic Smyth, and Max Welling. 2009. Distributed algorithms for topic models.Journal of Machine Learning Research 10, 8 (2009).
[19]
David Newman, Jey Han Lau, Karl Grieser, and Timothy Baldwin. 2010. Automatic evaluation of topic coherence. In Human language technologies: The 2010 annual conference of the North American chapter of the association for computational linguistics. 100–108.
[20]
Junaid Rashid, Jungeun Kim, Usman Naseem, and Amir Hussain. 2022. A DistilBERTopic Model for Short Text Documents. In Proceedings of the The 20th Annual Workshop of the Australasian Language Technology Association. 84–89.
[21]
Junaid Rashid, Syed Muhammad Adnan Shah, and Aun Irtaza. 2019. Fuzzy topic modeling approach for text mining over short text. Information Processing & Management 56, 6 (2019), 102060.
[22]
Junaid Rashid, Syed Muhammad Adnan Shah, Aun Irtaza, Toqeer Mahmood, Muhammad Wasif Nisar, Muhammad Shafiq, and Akber Gardezi. 2019. Topic modeling technique for text mining over biomedical text corpora through hybrid inverse documents frequency and fuzzy k-means clustering. IEEE Access 7 (2019), 146070–146080.
[23]
Nikhil Rasiwasia, Jose Costa Pereira, Emanuele Coviello, Gabriel Doyle, Gert RG Lanckriet, Roger Levy, and Nuno Vasconcelos. 2010. A new approach to cross-modal multimedia retrieval. In Proceedings of the 18th ACM international conference on Multimedia. 251–260.
[24]
Michael Röder, Andreas Both, and Alexander Hinneburg. 2015. Exploring the space of topic coherence measures. In Proceedings of the eighth ACM international conference on Web search and data mining. 399–408.
[25]
Praveen Sv, Jose Manuel Lorenz, Rajesh Ittamalla, Kuldeep Dhama, Chiranjib Chakraborty, Daruri Venkata Srinivas Kumar, and Thivyaa Mohan. 2022. Twitter-Based Sentiment Analysis and Topic Modeling of Social Media Posts Using Natural Language Processing, to Understand People’s Perspectives Regarding COVID-19 Booster Vaccine Shots in India: Crucial to Expanding Vaccination Coverage. Vaccines 10, 11 (2022), 1929.
[26]
Surendrabikram Thapa, Aditya Shah, Farhan Jafri, Usman Naseem, and Imran Razzak. 2022. A multi-modal dataset for hate speech detection on social media: Case-study of russia-ukraine conflict. In Proceedings of the 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE). 1–6.
[27]
Feng Xue, Richang Hong, Xiangnan He, Jianwei Wang, Shengsheng Qian, and Changsheng Xu. 2020. Knowledge-Based Topic Model for Multi-Modal Social Event Analysis. IEEE Transactions on Multimedia 22, 8 (2020), 2098–2110.
[28]
Feng Xue, Jian Sun, Xueliang Liu, Tianpeng Liu, and Qiang Lu. 2019. Social multi-modal event analysis via knowledge-based weighted topic model. Journal of Visual Communication and Image Representation 59 (2019), 1–8.
[29]
Feng Xue, Jian Sun, Xueliang Liu, Tianpeng Liu, and Qiang Lu. 2019. Social multi-modal event analysis via knowledge-based weighted topic model. Journal of Visual Communication and Image Representation 59 (2019), 1–8.
[30]
Huaiwen Zhang, Quan Fang, Shengsheng Qian, and Changsheng Xu. 2019. Multi-modal knowledge-aware event memory network for social media rumor detection. In Proceedings of the 27th ACM international conference on multimedia. 1942–1951.
[31]
Huakui Zhang, Cai Yi, Bingshan Zhu, Haopeng Ren, and Qing Li. 2022. Multimodal Topic Modeling by Exploring Characteristics of Short Text Social Media. IEEE Transactions on Multimedia (2022).
[32]
Morteza Zihayat, Anteneh Ayanso, Xing Zhao, Heidar Davoudi, and Aijun An. 2019. A utility-based news recommendation system. Decision Support Systems 117 (2019), 14–27.

Index Terms

  1. Coherent Topic Modeling for Creative Multimodal Data on Social Media

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WWW '23: Proceedings of the ACM Web Conference 2023
    April 2023
    4293 pages
    ISBN:9781450394161
    DOI:10.1145/3543507
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 30 April 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Coherence
    2. Creative Web
    3. Multimodal
    4. Topic Modeling

    Qualifiers

    • Short-paper
    • Research
    • Refereed limited

    Funding Sources

    Conference

    WWW '23
    Sponsor:
    WWW '23: The ACM Web Conference 2023
    April 30 - May 4, 2023
    TX, Austin, USA

    Acceptance Rates

    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 284
      Total Downloads
    • Downloads (Last 12 months)77
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 25 Jan 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media