Text-Guided Image Editing Based on Post Score for Gaining Attention on Social Media
Abstract
:1. Introduction
2. Related Work
2.1. Social Media Marketing
2.2. Image–Text Matching
2.3. Text-Guided Image Editing
3. Proposed Text-Guided Image Editing Based on Post Score for Gaining Attention on Social Media
3.1. Calculation of Post Score Representing Engagement Rate on Social Media
3.1.1. Calculation of Post Score Using Multiple Features
3.1.2. Loss Function
3.2. Editing of Post Image Based on Post Score
4. Preliminary Validation
4.1. Validation Settings
- CM1: the model that does not use the text feature.
- CM2: the model that does not use the aesthetic feature.
- CM3: the model that does not use the category feature.
- CM4: the method that does not use all features except an image feature.
4.2. Accuracy of Proposed Model to Predict Post Scores
5. Experiments
5.1. Experimental Settings
- Editing: the extent to which the edited image is accurately edited based on the text prompt.
- Response: the extent to which you would like to give it a “like” or comment when you find a post containing the edited image on social media.
- Aesthetics: the extent to which the edited image is aesthetic, where the accuracy of the editing is not considered.
5.2. Accuracy of Proposed Method Compared to State-of-the-Art Methods
5.2.1. Quantitative Results
5.2.2. Qualitative Results
5.3. Discussion
5.3.1. Analysis of Model to Predict Post Scores
5.3.2. Limitations and Future Works
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Statista. Number of Instagram Users Worldwide from 2020 to 2025. 2023. Available online: https://www.statista.com/statistics/183585/instagram-number-of-global-users/ (accessed on 20 November 2023).
- Statista. Number of Social Media Users Worldwide from 2017 to 2027. 2023. Available online: https://www.statista.com/statistics/278414/number-of-worldwide-social-network-users/ (accessed on 20 November 2023).
- Center, P.R. Photos and Videos as Social Currency Online. 2012. Available online: https://www.pewresearch.org/internet/2012/09/13/photos-and-videos-as-social-currency-online/ (accessed on 20 November 2023).
- Wu, X.; Xu, K.; Hall, P. A survey of image synthesis and editing with generative adversarial networks. Tsinghua Sci. Technol. 2017, 22, 660–674. [Google Scholar] [CrossRef]
- Zhan, F.; Yu, Y.; Wu, R.; Zhang, J.; Lu, S.; Liu, L.; Ko-Rtylewski, A.; Theobalt, C.; Xing, E. Multimodal image synthesis and editing: The generative AI era. arXiv 2021, arXiv:2112.13592. [Google Scholar] [CrossRef]
- Liu, M.; Wei, Y.; Wu, X.; Zuo, W.; Zhang, L. Survey on leveraging pre-trained generative adversarial networks for image editing and restoration. Sci. China Inf. Sci. 2023, 66, 1–28. [Google Scholar] [CrossRef]
- Iizuka, S.; Simo-Serra, E.; Ishikawa, H. Let there be color! joint end-to-end learning of global and local image priors for automatic image colorization with simultaneous classification. ACM Trans. Graph. 2016, 35, 1–11. [Google Scholar] [CrossRef]
- Bertalmio, M.; Sapiro, G.; Caselles, V.; Ballester, C. Image inpainting. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH), New Orleans, LA, USA, 23–28 July 2000; pp. 417–424. [Google Scholar]
- Madaan, A.; Setlur, A.; Parekh, T.; Poczos, B.; Neubig, G.; Yang, Y.; Salakhutdinov, R.; Black, A.W.; Prabhumoye, S. Politeness transfer: A tag and generate approach. arXiv 2020, arXiv:2004.14257. [Google Scholar]
- Li, B.; Qi, X.; Lukasiewicz, T.; Torr, P.H. ManiGAN: Text-guided image manipulation. In Proceedings of the IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR), Virtual, 14–19 June 2020; pp. 7880–7889. [Google Scholar]
- Li, B.; Qi, X.; Torr, P.H.; Lukasiewicz, T. Lightweight generative adversarial networks for text-guided image manipulation. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Virtual, 6–12 December 2020; pp. 22020–22031. [Google Scholar]
- Xia, W.; Yang, Y.; Xue, J.H.; Wu, B. TediGAN: Text-guided diverse face image generation and manipulation. In Proceedings of the IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR), Virtual, 19–25 June 2021; pp. 2256–2265. [Google Scholar]
- Patashnik, O.; Wu, Z.; Shechtman, E.; Cohen-Or, D.; Lischinski, D. StyleCLIP: Text-driven manipulation of StyleGAN imagery. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Virtual, 11–17 October 2021; pp. 2085–2094. [Google Scholar]
- Choi, J.; Choi, Y.; Kim, Y.; Kim, J.; Yoon, S. Custom-Edit: Text-guided image editing with customized diffusion models. arXiv 2023, arXiv:2305.15779. [Google Scholar]
- Brooks, T.; Holynski, A.; Efros, A.A. InstructPix2Pix: Learning to follow image editing instructions. In Proceedings of the IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 18392–18402. [Google Scholar]
- Karras, T.; Laine, S.; Aila, T. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 4401–4410. [Google Scholar]
- Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Virtual, 6–12 December 2020; pp. 6840–6851. [Google Scholar]
- Kim, S.; Jiang, J.Y.; Nakada, M.; Han, J.; Wang, W. Multimodal post attentive profiling for influencer marketing. In Proceedings of the Web Conference (WWW), Virtual, 20–24 April 2020; pp. 2878–2884. [Google Scholar]
- Rameez, R.; Rahmani, H.A.; Yilmaz, E. ViralBERT: A user focused BERT-based approach to virality prediction. In Proceedings of the 30th ACM Conference on User Modeling, Adaptation and Personalization (UMAP), Barcelona, Spain, 4–7 July 2022; pp. 85–89. [Google Scholar]
- Ren, S.; Karimi, S.; Velázquez, A.B.; Cai, J. Endorsement effectiveness of different social media influencers: The moderating effect of brand competence and warmth. J. Bus. Res. 2023, 156, 113476. [Google Scholar] [CrossRef]
- Kim, S.; Jiang, J.Y.; Han, J.; Wang, W. InfluencerRank: Discovering effective influencers via graph convolutional attentive recurrent neural networks. In Proceedings of the International AAAI Conference on Web and Social Media (ICWSM), Limassol, Cyprus, 5–8 June 2023; Volume 17, pp. 482–493. [Google Scholar]
- Rahman, W.N.A.; Mutum, D.S.; Ghazali, E.M. Consumer engagement with visual content on Instagram: Impact of different features of posts by prominent brands. Int. J. E-Serv. Mob. Appl. 2022, 14, 1–21. [Google Scholar] [CrossRef]
- Thömmes, K. The Aesthetic Appeal of Photographs: Leveraging Instagram Data in Empirical Aesthetics. Ph.D. Thesis, Universitaet Konstanz, Konstanz, Germany, 2020. [Google Scholar]
- Felix, R.; Rauschnabel, P.A.; Hinsch, C. Elements of strategic social media marketing: A holistic framework. J. Bus. Res. 2017, 70, 118–126. [Google Scholar] [CrossRef]
- Liu, S.; Jiang, C.; Lin, Z.; Ding, Y.; Duan, R.; Xu, Z. Identifying effective influencers based on trust for electronic word-of-mouth marketing: A domain-aware approach. Inf. Sci. 2015, 306, 34–52. [Google Scholar] [CrossRef]
- Karnowski, V.; Kümpel, A.S.; Leonhard, L.; Leiner, D.J. From incidental news exposure to news engagement. How perceptions of the news post and news usage patterns influence engagement with news articles encountered on Facebook. Comput. Hum. Behav. 2017, 76, 42–50. [Google Scholar] [CrossRef]
- Borges-Tiago, M.T.; Tiago, F.; Cosme, C. Exploring users’ motivations to participate in viral communication on social media. J. Bus. Res. 2019, 101, 574–582. [Google Scholar] [CrossRef]
- Li, J.; Li, D.; Xiong, C.; Hoi, S. BLIP: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In Proceedings of the International Conference on Machine Learning (ICML), Baltimore, MD, USA, 17–23 July 2022; pp. 12888–12900. [Google Scholar]
- Saharia, C.; Chan, W.; Saxena, S.; Li, L.; Whang, J.; Denton, E.L.; Ghasemipour, K.; Gontijo Lopes, R.; Karagol Ayan, B.; Salimans, T.; et al. Photorealistic text-to-image diffusion models with deep language understanding. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), New Orleans, LA, USA, 28 November–9 December 2022; pp. 36479–36494. [Google Scholar]
- Alikhani, M.; Han, F.; Ravi, H.; Kapadia, M.; Pavlovic, V.; Stone, M. Cross-modal coherence for text-to-image retrieval. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), Virtual, 22 February–1 March 2022; pp. 10427–10435. [Google Scholar]
- Liu, Y.; Li, G.; Lin, L. Cross-modal causal relational reasoning for event-level visual question answering. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 11624–11641. [Google Scholar] [CrossRef] [PubMed]
- Liu, S.A.; Zhang, Y.; Qiu, Z.; Xie, H.; Zhang, Y.; Yao, T. CARIS: Context-aware referring image segmentation. In Proceedings of the ACM International Conference on Multimedia (ACM MM), Ottawa, ON, Canada, 29 October–3 November 2023; pp. 779–788. [Google Scholar]
- Kiros, R.; Salakhutdinov, R.; Zemel, R.S. Unifying visual-semantic embeddings with multimodal neural language models. arXiv 2014, arXiv:1411.2539. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR), Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
- Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning transferable visual models from natural language supervision. In Proceedings of the International Conference on Machine Learning (ICML), Virtual, 18–24 July 2021; pp. 8748–8763. [Google Scholar]
- Shen, S.; Li, L.H.; Tan, H.; Bansal, M.; Rohrbach, A.; Chang, K.W.; Yao, Z.; Keutzer, K. How much can CLIP benefit vision-and-language tasks? arXiv 2021, arXiv:2107.06383. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Montreal, QC, Canada, 8–13 December 2014; pp. 2672–2680. [Google Scholar]
- Dong, H.; Yu, S.; Wu, C.; Guo, Y. Semantic image synthesis via adversarial learning. In Proceedings of the e IEEE/CVF International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 5706–5714. [Google Scholar]
- Nam, S.; Kim, Y.; Kim, S.J. Text-adaptive generative adversarial networks: Manipulating images with natural language. In Proceedings of the IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 42–51. [Google Scholar]
- Watanabe, Y.; Togo, R.; Maeda, K.; Ogawa, T.; Haseyama, M. Generative adversarial network including referring image segmentation for text-guided image manipulation. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Singapore, 22–27 May 2022; pp. 4818–4822. [Google Scholar]
- Watanabe, Y.; Togo, R.; Maeda, K.; Ogawa, T.; Haseyama, M. Text-guided image manipulation via generative adversarial network with referring image segmentation-based guidance. IEEE Access 2023, 11, 42534–42545. [Google Scholar] [CrossRef]
- Kocasari, U.; Dirik, A.; Tiftikci, M.; Yanardag, P. StyleMC: Multi-channel based fast text-guided image generation and manipulation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 4–8 January 2022; pp. 895–904. [Google Scholar]
- Shi, Y.; Yang, X.; Wan, Y.; Shen, X. SemanticStyleGAN: Learning compositional generative priors for controllable image synthesis and editing. In Proceedings of the IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR), New Orleans, LA, USA, 19–24 June 2022; pp. 11254–11264. [Google Scholar]
- Sun, J.; Deng, Q.; Li, Q.; Sun, M.; Ren, M.; Sun, Z. AnyFace: Free-style text-to-face synthesis and manipulation. In Proceedings of the IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR), New Orleans, LA, USA, 19–24 June 2022; pp. 18687–18696. [Google Scholar]
- Karras, T.; Aittala, M.; Hellsten, J.; Laine, S.; Lehtinen, J.; Aila, T. Training generative adversarial networks with limited data. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Virtual, 6–12 December 2020; pp. 12104–12114. [Google Scholar]
- Karras, T.; Laine, S.; Aittala, M.; Hellsten, J.; Lehtinen, J.; Aila, T. Analyzing and improving the image quality of StyleGAN. In Proceedings of the IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR), Virtual, 14–19 June 2020; pp. 8110–8119. [Google Scholar]
- Nilsback, M.E.; Zisserman, A. Automated flower classification over a large number of classes. In Proceedings of the 6th Indian Conference on Computer Vision, Graphics and Image Processing (ICVGIP), Bhubaneswar, India, 16–19 December 2008; pp. 722–729. [Google Scholar]
- Wah, C.; Branson, S.; Welinder, P.; Perona, P.; Belongie, S. The Caltech-UCSD Birds-200-2011 Dataset; California Institute of Technology: Pasadena, CA, USA, 2011. [Google Scholar]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common objects in context. In Proceedings of the European Conference on Computer Vision (ECCV), Zurich, Switzerland, 6–12 September 2014; pp. 740–755. [Google Scholar]
- Richardson, E.; Alaluf, Y.; Patashnik, O.; Nitzan, Y.; Azar, Y.; Shapiro, S.; Cohen-Or, D. Encoding in style: A StyleGAN encoder for image-to-image translation. In Proceedings of the IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR), Virtual, 19–25 June 2021; pp. 2287–2296. [Google Scholar]
- Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language models are few-shot learners. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Virtual, 6–12 December 2020; pp. 1877–1901. [Google Scholar]
- Talebi, H.; Milanfar, P. NIMA: Neural image assessment. IEEE Trans. Image Process. 2018, 27, 3998–4011. [Google Scholar] [CrossRef]
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Murray, N.; Marchesotti, L.; Perronnin, F. AVA: A large-scale database for aesthetic visual analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA, 16–21 June 2012; pp. 2408–2415. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Li, B.; Zhang, Y.; Chen, L.; Wang, J.; Pu, F.; Yang, J.; Li, C.; Liu, Z. MIMIC-IT: Multi-modal in-context instruction tuning. arXiv 2023, arXiv:2306.05425. [Google Scholar]
- Kwon, G.; Ye, J.C. CLIPstyler: Image style transfer with a single text condition. In Proceedings of the IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR), New Orleans, LA, USA, 19–24 June 2022; pp. 18062–18071. [Google Scholar]
- Couairon, G.; Verbeek, J.; Schwenk, H.; Cord, M. DiffEdit: Diffusion-based semantic image editing with mask guidance. arXiv 2022, arXiv:2210.11427. [Google Scholar]
- StudioBinder. What Is Magic Hour Photography & Cinematography Explained. 2021. Available online: https://www.studiobinder.com/blog/what-is-magic-hour-photography/ (accessed on 20 November 2023).
class:1 | class:2 | class:3 | class:4 | |
---|---|---|---|---|
Number of data | 10,000 | 10,000 | 10,000 | 10,000 |
train:validation:test | 80%:10%:10% | 80%:10%:10% | 80%:10%:10% | 80%:10%:10% |
engagement rate | 0.000~0.025 | 0.025~0.050 | 0.050~0.075 | 0.075~0.100 |
Accuracy | F-Measure | |
---|---|---|
CM1 | 0.380 | 0.381 |
CM2 | 0.396 | 0.396 |
CM3 | 0.393 | 0.393 |
CM4 | 0.379 | 0.379 |
Proposed model | 0.409 | 0.409 |
Editing | Response | Aesthetics | |
---|---|---|---|
CLIPstyler | |||
DiffEdit | |||
InstructPix2Pix | |||
Proposed method |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Watanabe, Y.; Togo, R.; Maeda, K.; Ogawa, T.; Haseyama, M. Text-Guided Image Editing Based on Post Score for Gaining Attention on Social Media. Sensors 2024, 24, 921. https://doi.org/10.3390/s24030921
Watanabe Y, Togo R, Maeda K, Ogawa T, Haseyama M. Text-Guided Image Editing Based on Post Score for Gaining Attention on Social Media. Sensors. 2024; 24(3):921. https://doi.org/10.3390/s24030921
Chicago/Turabian StyleWatanabe, Yuto, Ren Togo, Keisuke Maeda, Takahiro Ogawa, and Miki Haseyama. 2024. "Text-Guided Image Editing Based on Post Score for Gaining Attention on Social Media" Sensors 24, no. 3: 921. https://doi.org/10.3390/s24030921
APA StyleWatanabe, Y., Togo, R., Maeda, K., Ogawa, T., & Haseyama, M. (2024). Text-Guided Image Editing Based on Post Score for Gaining Attention on Social Media. Sensors, 24(3), 921. https://doi.org/10.3390/s24030921