skip to main content
10.1145/3673791.3698424acmconferencesArticle/Chapter ViewAbstractPublication Pagessigir-apConference Proceedingsconference-collections
research-article
Open access

Offline Evaluation of Set-Based Text-to-Image Generation

Published: 08 December 2024 Publication History

Abstract

Text-to-Image (TTI) systems often support people during ideation, the early stages of a creative process when exposure to a broad set of relevant or partially relevant images can help explore the design space. Since ideation is an important subclass of TTI tasks, understanding how to quantitatively evaluate TTI systems according to how well they support ideation is crucial to promoting research and development for these users. However, existing evaluation metrics for TTI remain focused on distributional similarity metrics like Fréchet Inception Distance (FID). We take an alternative approach and, based on established methods from ranking evaluation, develop TTI evaluation metrics with explicit models of how users browse and interact with sets of spatially arranged generated images. Our proposed offline evaluation metrics for TTI not only capture how relevant generated images are with respect to the user's ideation need but also take into consideration the diversity and arrangement of the set of generated images. We analyze our proposed family of TTI metrics using human studies on image grids generated by three different TTI systems based on subsets of the widely used benchmarks such as MS-COCO captions and Localized Narratives as well as prompts used in naturalistic settings. Our results demonstrate that grounding metrics in how people use systems is an important and understudied area of benchmark design.

References

[1]
JM Ah-Pine, CM Cifarelli, SM Clinchant, GM Csurka, and Jean-Michel Renders. 2008. XRCE's Participation to ImageCLEF 2008. In 9th Workshop of the Cross-Language Evaluation Forum (CLEF 2008).
[2]
Marwah Alaofi, Negar Arabzadeh, Charles LA Clarke, and Mark Sanderson. 2024. Generative Information Retrieval Evaluation. arXiv preprint arXiv:2404.08137 (2024).
[3]
Negar Arabzadeh and Charles LA Clarke. 2024. A Comparison of Methods for Evaluating Generative IR. arXiv preprint arXiv:2404.04044 (2024).
[4]
Negar Arabzadeh and Charles LA Clarke. 2024. Fr\'echet Distance for Offline Evaluation of Information Retrieval Systems with Sparse Labels. arXiv preprint arXiv:2401.17543 (2024).
[5]
Negar Arabzadeh, Oleksandra Kmet, Ben Carterette, Charles LA Clarke, Claudia Hauff, and Praveen Chandar. 2023. A is for Adele: An offline evaluation metric for instant search. In ACM SIGIR International Conference on Theory of Information Retrieval. 3--12.
[6]
Negar Arabzadeh, Alexandra Vtyurina, Xinyi Yan, and Charles LA Clarke. 2022. Shallow pooling for sparse labels. Information Retrieval Journal 25, 4 (2022), 365--385.
[7]
Eslam Mohamed Bakr, Pengzhan Sun, Xiaogian Shen, Faizan Farooq Khan, Li Erran Li, and Mohamed Elhoseiny. 2023. Hrs-bench: Holistic, reliable and scalable benchmark for text-to-image models. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 20041--20053.
[8]
Shane Barratt and Rishi Sharma. 2018. A note on the inception score. arXiv preprint arXiv:1801.01973 (2018).
[9]
Mikołaj Bińkowski, Danica J. Sutherland, Michael Arbel, and Arthur Gretton. 2021. Demystifying MMD GANs. rXiv:1801.01401 [stat.ML]
[10]
Ali Borji. 2022. Pros and cons of GAN evaluation measures: New developments. Computer Vision and Image Understanding 215 (2022), 103329.
[11]
Pia Borlund and Peter Ingwersen. 1997. The development of a method for the evaluation of interactive information retrieval systems. Journal of Documentation 53, 3 (1997), 225--250.
[12]
Sebastian Bruch, Shuguang Han, Mike Bendersky, and Marc Najork. 2020. A Stochastic Treatment of Learning to Rank Scoring Functions. In Proceedings of the 13th ACM International Conference on Web Search and Data Mining (WSDM 2020).
[13]
Z. Bylinskii, T. Judd, A. Oliva, A. Torralba, and F. Durand. 2018. What do different evaluation metrics tell us about saliency models? IEEE TPAMI 41, 3 (2018), 740--757.
[14]
Ben Carterette. 2011. System effectiveness, user models, and user utility: a conceptual framework for investigation. In Proceedings of the 34th international ACM SIGIR conference on Research and development in information retrieval. 903--912.
[15]
Ben Carterette and James Allan. 2007. Semiautomatic evaluation of retrieval systems using document similarities. In CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management. ACM, New York, NY, USA, 873--876.
[16]
Souradeep Chakraborty, Zijun Wei, Conor Kelton, Seoyoung Ahn, Aruna Balasubramanian, Gregory J. Zelinsky, and Dimitris Samaras. 2022. Predicting visual attention in graphic design documents. IEEE Transactions on Multimedia (2022), 1--1.
[17]
Praveen Chandar, Fernando Diaz, and Brian St. Thomas. 2020. Beyond Accuracy: Grounding Evaluation Metrics for Human-Machine Learning Systems. In Advances in Neural Information Processing Systems.
[18]
Olivier Chapelle, Donald Metlzer, Ya Zhang, and Pierre Grinspan. 2009. Expected reciprocal rank for graded relevance. In Proceedings of the 18th ACM conference on Information and knowledge management. 621--630.
[19]
Olivier Chapelle and Ya Zhang. 2009. A dynamic bayesian network click model for web search ranking. In Proceedings of the 18th international conference on World wide web. 1--10.
[20]
Wenhu Chen, Hexiang Hu, Chitwan Saharia, and William W Cohen. 2022. Re-imagen: Retrieval-augmented text-to-image generator. arXiv preprint arXiv:2209.14491 (2022).
[21]
Xinlei Chen, Hao Fang, Tsung-Yi Lin, Ramakrishna Vedantam, Saurabh Gupta, Piotr Dollár, and C Lawrence Zitnick. 2015. Microsoft coco captions: Data collection and evaluation server. arXiv preprint arXiv:1504.00325 (2015).
[22]
Jaemin Cho, Jie Lei, Hao Tan, and Mohit Bansal. 2021. Unifying vision-and-language tasks via text generation. In International Conference on Machine Learning. PMLR, 1931--1942.
[23]
Jaemin Cho, Abhay Zala, and Mohit Bansal. 2022. DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers. CoRR abs/2202.04053 (2022). arXiv:2202.04053 https://arxiv.org/abs/2202.04053
[24]
Jaemin Cho, Abhay Zala, and Mohit Bansal. 2023. Dall-eval: Probing the reasoning skills and social biases of text-to-image generation models. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3043--3054.
[25]
Min Jin Chong and David Forsyth. 2020. Effectively Unbiased FID and Inception Score and Where to Find Them. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 6069--6078. https://doi.org/10.1109/CVPR42600.2020.00611
[26]
Min Jin Chong and David Forsyth. 2020. Effectively unbiased fid and inception score and where to find them. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 6070--6079.
[27]
Nick Craswell, Onno Zoeter, Michael Taylor, and Bill Ramsey. 2008. An experimental comparison of click position-bias models. In Proceedings of the 2008 international conference on web search and data mining. 87--94.
[28]
Hai Dang, Lukas Mecke, Florian Lehmann, Sven Goller, and Daniel Buschek. 2022. How to Prompt? Opportunities and Challenges of Zero-and Few-Shot Learning for Human-AI Interaction in Creative Applications of Generative Models. arXiv preprint arXiv:2209.01390 (2022).
[29]
Niklas Deckers, Maik Fröbe, Johannes Kiesel, Gianluca Pandolfo, Christopher Schröder, Benno Stein, and Martin Potthast. 2023. The Infinite Index: Information Retrieval on Generative Text-To-Image Models. In Proceedings of the 2023 Conference on Human Information Interaction and Retrieval (Austin, TX, USA) (CHIIR '23). Association for Computing Machinery, New York, NY, USA, 172--186. https://doi.org/10.1145/3576840.3578327
[30]
Fernando Diaz, Ryen W. White, Georg Buscher, and Dan Liebling. 2013. Robust Models of Mouse Movement on Dynamic Web Search Results Pages. In Proceedings of the 22nd ACM conference on Information and knowledge management (CIKM 2013). Association for Computing Machinery, New York, NY, USA, 1451--1460.
[31]
Ming Ding, Zhuoyi Yang, Wenyi Hong, Wendi Zheng, Chang Zhou, Da Yin, Junyang Lin, Xu Zou, Zhou Shao, Hongxia Yang, et al. 2021. Cogview: Mastering text-to-image generation via transformers. Advances in Neural Information Processing Systems 34 (2021), 19822--19835.
[32]
Stanislav Frolov, Tobias Hinz, Federico Raue, Jörn Hees, and Andreas Dengel. 2021. Adversarial text-to-image synthesis: A review. Neural Networks 144 (2021), 187--209.
[33]
Oran Gafni, Adam Polyak, Oron Ashual, Shelly Sheynin, Devi Parikh, and Yaniv Taigman. 2022. Make-a-scene: Scene-based text-to-image generation with human priors. arXiv preprint arXiv:2203.13131 (2022).
[34]
Yiqi Gao, Xinglin Hou, Yuanmeng Zhang, Tiezheng Ge, Yuning Jiang, and Peng Wang. 2022. Caponimage: Context-driven dense-captioning on image. arXiv preprint arXiv:2204.12974 (2022).
[35]
J. P. Guilford. 1950. Creativity. American Psychologist 5, 9 (1950), 444--454.
[36]
Shengbo Guo and Scott Sanner. 2010. Probabilistic latent maximal marginal relevance. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval. 833--834.
[37]
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems 30 (2017).
[38]
Tobias Hinz, Stefan Heinrich, and Stefan Wermter. 2020. Semantic object accuracy for generative text-to-image synthesis. IEEE transactions on pattern analysis and machine intelligence 44, 3 (2020), 1552--1565.
[39]
Yushi Hu, Benlin Liu, Jungo Kasai, Yizhong Wang, Mari Ostendorf, Ranjay Krishna, and Noah A Smith. 2023. Tifa: Accurate and interpretable text-to-image faithfulness evaluation with question answering. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 20406--20417.
[40]
Kaiyi Huang, Kaiyue Sun, Enze Xie, Zhenguo Li, and Xihui Liu. 2023. T2i-compbench: A comprehensive benchmark for open-world compositional text-to-image generation. Advances in Neural Information Processing Systems 36 (2023), 78723--78747.
[41]
Youngseung Jeon, Seungwan Jin, Patrick C. Shih, and Kyungsik Han. 2021. FashionQ: An AI-Driven Creativity Support Tool for Facilitating Ideation in Fashion Design. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI '21). Association for Computing Machinery, New York, NY, USA, Article 576, 18 pages. https://doi.org/10.1145/3411764.3445093
[42]
Ming Jiang, Shengsheng Huang, Juanyong Duan, and Qi Zhao. 2015. Salicon: Saliency in context. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1072--1080.
[43]
Chandresh S Kanani, Sriparna Saha, and Pushpak Bhattacharyya. 2020. Improving diversity and reducing redundancy in paragraph captions. In 2020 International Joint Conference on Neural Networks (IJCNN). IEEE, 1--8.
[44]
Pegah Karimi, Jeba Rezwana, Safat Siddiqui, Mary Lou Maher, and Nasrin Dehbozorgi. 2020. Creative Sketching Partner: An Analysis of Human-AI Co-Creativity. In Proceedings of the 25th International Conference on Intelligent User Interfaces (Cagliari, Italy) (IUI '20). Association for Computing Machinery, New York, NY, USA, 221--230. https://doi.org/10.1145/3377325.3377522
[45]
Andruid Kerne, Andrew M. Webb, Steven M. Smith, Rhema Linder, Nic Lupfer, Yin Qu, Jon Moeller, and Sashikanth Damaraju. 2014. Using Metrics of Curation to Evaluate Information-Based Ideation. ACM Trans. Comput.-Hum. Interact. 21, 3 (jun 2014).
[46]
Yuval Kirstain, Adam Polyak, Uriel Singer, Shahbuland Matiana, Joe Penna, and Omer Levy. 2023. Pick-a-pic: An open dataset of user preferences for text-to-image generation. Advances in Neural Information Processing Systems 36 (2023), 36652--36663.
[47]
Hyung-Kwon Ko, Gwanmo Park, Hyeon Jeon, Jaemin Jo, Juho Kim, and Jinwook Seo. 2023. Large-Scale Text-to-Image Generation Models for Visual Artists' Creative Works. In Proceedings of the 28th International Conference on Intelligent User Interfaces (Sydney, NSW, Australia) (IUI '23). Association for Computing Machinery, New York, NY, USA, 919--933. https://doi.org/10.1145/3581641.3584078
[48]
Janin Koch, Andrés Lucero, Lena Hegemann, and Antti Oulasvirta. 2019. May AI? Design Ideation with Cooperative Contextual Bandits. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI '19). Association for Computing Machinery, New York, NY, USA, 1--12. https://doi.org/10.1145/3290605.3300863
[49]
Jing Yu Koh, Jason Baldridge, Honglak Lee, and Yinfei Yang. 2021. Text-to-image generation grounded by fine-grained user attention. In Proceedings of the IEEE/CVF winter conference on applications of computer vision. 237--246.
[50]
Yuheng Li, Haotian Liu, Qingyang Wu, Fangzhou Mu, Jianwei Yang, Jianfeng Gao, Chunyuan Li, and Yong Jae Lee. 2023. Gligen: Open-set grounded text-to-image generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 22511--22521.
[51]
Wentong Liao, Kai Hu, Michael Ying Yang, and Bodo Rosenhahn. 2022. Text to image generation with semantic-spatial aware gan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 18187--18196.
[52]
Chunmian Lin, Lin Li, Wenting Luo, Kelvin CP Wang, and Jiangang Guo. 2019. Transfer learning based traffic sign recognition using inception-v3 model. Periodica Polytechnica Transportation Engineering 47, 3 (2019), 242--250.
[53]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European conference on computer vision. Springer, 740--755.
[54]
Shaohui Liu, Yi Wei, Jiwen Lu, and Jie Zhou. 2018. An improved evaluation framework for generative adversarial networks. arXiv preprint arXiv:1803.07474 (2018).
[55]
Lorenzo Luzi, Carlos Ortiz Marrero, Nile Wynar, Richard G Baraniuk, and Michael J Henry. 2023. Evaluating generative networks using Gaussian mixtures of image features. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 279--288.
[56]
Loic Maisonnasse, Philippe Mulhem, Eric Gaussier, and Jean Pierre Chevallet. 2009. LIG at ImageCLEF 2008. In Evaluating Systems for Multilingual and Multimodal Information Access: 9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008, Aarhus, Denmark, September 17--19, 2008, Revised Selected Papers 9. Springer, 704--711.
[57]
Alistair Moffat and Justin Zobel. 2008. Rank-biased Precision for Measurement of Retrieval Effectiveness. ACM Trans. Inf. Syst. 27, 1 (Dec. 2008), 2:1--2:27.
[58]
William Morgan,Warren Greiff, and John Henderson. 2004. Direct Maximization of Average Precision by Hill-climbing, with a Comparison to a Maximum Entropy Approach. In Proceedings of HLT-NAACL 2004: Short Papers (HLT-NAACLShort '04). Association for Computational Linguistics, Stroudsburg, PA, USA, 93--96.
[59]
Vidhya Navalpakkam, Ravi Kumar, Lihong Li, and D. Sivakumar. 2012. Attention and Selection in Online Choice Tasks. In User Modeling, Adaptation, and Personalization, Judith Masthoff, Bamshad Mobasher, Michel C. Desmarais, and Roger Nkambou (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 200--211.
[60]
Artem Obukhov and Mikhail Krasnyanskiy. 2020. Quality assessment method for GAN based on modified metrics inception score and Fréchet inception distance. In Software Engineering Perspectives in Intelligent Systems: Proceedings of 4th Computational Methods in Systems and Software 2020, Vol. 1 4. Springer, 102--114.
[61]
Jonas Oppenlaender. 2022. The creativity of text-to-image generation. In Proceedings of the 25th International Academic Mindtrek Conference. 192--202.
[62]
Mayu Otani, Riku Togashi, Yu Sawai, Ryosuke Ishigami, Yuta Nakashima, Esa Rahtu, Janne Heikkilä, and Shin'ichi Satoh. 2023. Toward Verifiable and Reproducible Human Evaluation for Text-to-Image Generation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[63]
Mayu Otani, Riku Togashi, Yu Sawai, Ryosuke Ishigami, Yuta Nakashima, Esa Rahtu, Janne Heikkilä, and Shin'ichi Satoh. 2023. Toward verifiable and reproducible human evaluation for text-to-image generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14277--14286.
[64]
Ville Paananen, Jonas Oppenlaender, and Aku Visuri. 2023. Using Text-to-Image Generation for Architectural Design Ideation. arXiv:2304.10182 [cs.HC]
[65]
Dong Huk Park, Samaneh Azadi, Xihui Liu, Trevor Darrell, and Anna Rohrbach. 2021. Benchmark for compositional text-to-image synthesis. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1).
[66]
G. Parmar, R. Zhang, and J. Zhu. 2022. On Aliased Resizing and Surprising Subtleties in GAN Evaluation. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA, 11400--11410. https://doi.org/10.1109/CVPR52688.2022.01112
[67]
Nikita Pavlichenko and Dmitry Ustalov. 2022. Best Prompts for Text-to-Image Models and How to Find Them. arXiv preprint arXiv:2209.11711 (2022).
[68]
Vitali Petsiuk, Alexander E. Siemenn, Saisamrit Surbehera, Zad Chin, Keith Tyser, Gregory Hunter, Arvind Raghavan, Yann Hicke, Bryan A. Plummer, Ori Kerret, Tonio Buonassisi, Kate Saenko, Armando Solar-Lezama, and Iddo Drori. 2022. Human Evaluation of Text-to-Image Models on a Multi-Task Benchmark. In NeurIPS 2022 Workshop on Human Evaluation of Generative Models (HEGM).
[69]
Robin L Plackett. 1975. The analysis of permutations. Journal of the Royal Statistical Society Series C: Applied Statistics 24, 2 (1975), 193--202.
[70]
Jordi Pont-Tuset, Jasper Uijlings, Soravit Changpinyo, Radu Soricut, and Vittorio Ferrari. 2020. Connecting vision and language with localized narratives. In European conference on computer vision. Springer, 647--664.
[71]
Alec Radford, JongWook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning. PMLR, 8748--8763.
[72]
Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. 2022. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125 (2022).
[73]
Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, and Ilya Sutskever. 2021. Zero-shot text-to-image generation. In International Conference on Machine Learning. PMLR, 8821--8831.
[74]
Navyasri Reddy, Samyak Jain, Pradeep Yarlagadda, and Vineet Gandhi. 2020. Tidying Deep Saliency Prediction Architectures. In IROS.
[75]
Matthew Richardson, Ewa Dominowska, and Robert Ragno. 2007. Predicting clicks: estimating the click-through rate for new ads. In Proceedings of the 16th international conference on World Wide Web. 521--530.
[76]
Cynthia L. Bennett Emily Denton Rida Qadri, Renee Shelby. 2023. AI's Regimes of Representation: A Community-centered Study of Text-to-Image Models in South Asia. In 2023 ACM Conference on Fairness, Accountability, and Transparency (FAccT '23).
[77]
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10684--10695.
[78]
Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Raphael Gontijo-Lopes, Burcu Karagol Ayan, Tim Salimans, Jonathan Ho, David J. Fleet, and Mohammad Norouzi. 2022. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. In Advances in Neural Information Processing Systems, Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (Eds.). https://openreview.net/forum?id=08Yk-n5l2Al
[79]
Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. 2016. Improved techniques for training gans. Advances in neural information processing systems 29 (2016).
[80]
Jami J. Shah, Noe Vargas-Hernandez, and Steve M. Smith. 2003. Metrics for measuring ideation effectiveness. Design Studies 24, 2 (March 2003), 111--134.
[81]
Shikhar Sharma, Dendi Suhubdy, Vincent Michalski, Samira Ebrahimi Kahou, and Yoshua Bengio. 2018. Chatpainter: Improving text to image generation using dialogue. arXiv preprint arXiv:1802.08216 (2018).
[82]
Chengyao Shen and Qi Zhao. 2014. Webpage Saliency. In ECCV. 33--46.
[83]
Johanna M Silvennoinen and Jussi PP Jokinen. 2016. Appraisals of salient visual elements in web page design. Advances in Human-Computer Interaction 2016 (2016).
[84]
Brandon Smith, Miguel Farinha, Siobhan Mackenzie Hall, Hannah Rose Kirk, Aleksandar Shtedritski, and Max Bain. 2023. Balancing the Picture: Debiasing Vision-Language Datasets with Synthetic Contrast Sets. arXiv:2305.15407 [cs.CV]
[85]
Gencer Sumbul, Sonali Nayak, and Begüm Demir. 2020. SD-RSIC: Summarization-driven deep remote sensing image captioning. IEEE Transactions on Geoscience and Remote Sensing 59, 8 (2020), 6922--6934.
[86]
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2818--2826.
[87]
Roelof van Zwol, Vanessa Murdock, Lluis Garcia Pueyo, and Georgina Ramirez. 2008. Diversifying image search with user generated content. In Proceedings of the 1st ACM international conference on Multimedia information retrieval. 67--74.
[88]
Henriikka Vartiainen and Matti Tedre. 2023. Using artificial intelligence in craft education: crafting with text-to-image generative models. Digital Creativity 34, 1 (2023), 1--21.
[89]
Ellen M. Voorhees, Nick Craswell, and Jimmy Lin. 2022. Too Many Relevants: Whither Cranfield Test Collections?. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (Madrid, Spain) (SIGIR '22). Association for Computing Machinery, New York, NY, USA, 2970--2980. https://doi.org/10.1145/3477495.3531728
[90]
Cheng Wang, Delei Chen, Lin Hao, Xuebo Liu, Yu Zeng, Jianwei Chen, and Guokai Zhang. 2019. Pulmonary image classification based on inception-v3 transfer learning model. IEEE Access 7 (2019), 146533--146541.
[91]
Su Wang, Chitwan Saharia, Ceslee Montgomery, Jordi Pont-Tuset, Shai Noy, Stefano Pellegrini, Yasumasa Onoe, Sarah Laszlo, David J Fleet, Radu Soricut, et al. 2023. Imagen editor and editbench: Advancing and evaluating text-guided image inpainting. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 18359--18369.
[92]
Zirui Wang, Jiahui Yu, Adams Wei Yu, Zihang Dai, Yulia Tsvetkov, and Yuan Cao. 2022. SimVLM: Simple Visual Language Model Pretraining with Weak Supervision. In International Conference on Learning Representations. https://openreview.net/forum?id=GUrhfTuf_3
[93]
Norbert Wiener. 1960. Some Moral and Technical Consequences of Automation. Science 131, 3410 (1960), 1355--1358.
[94]
Xiaoling Xia, Cui Xu, and Bing Nan. 2017. Inception-v3 for flower classification. In 2017 2nd international conference on image, vision and computing (ICIVC). IEEE, 783--787.
[95]
Jiazheng Xu, Xiao Liu, YuchenWu, Yuxuan Tong, Qinkai Li, Ming Ding, Jie Tang, and Yuxiao Dong. 2024. Imagereward: Learning and evaluating human preferences for text-to-image generation. Advances in Neural Information Processing Systems 36 (2024).
[96]
Li-Chia Yang and Alexander Lerch. 2020. On the evaluation of generative models in music. Neural Computing and Applications 32, 9 (2020), 4773--4784.
[97]
Michal Yarom, Yonatan Bitton, Soravit Changpinyo, Roee Aharoni, Jonathan Herzig, Oran Lang, Eran Ofek, and Idan Szpektor. 2024. What you see is what you read? improving text-image alignment evaluation. Advances in Neural Information Processing Systems 36 (2024).
[98]
Jiahui Yu, Yuanzhong Xu, Jing Yu Koh, Thang Luong, Gunjan Baid, Zirui Wang, Vijay Vasudevan, Alexander Ku, Yinfei Yang, Burcu Karagol Ayan, Ben Hutchinson, Wei Han, Zarana Parekh, Xin Li, Han Zhang, Jason Baldridge, and Yonghui Wu. 2022. Scaling Autoregressive Models for Content-Rich Text-to-Image Generation. Trans. Mach. Learn. Res. 2022 (2022). https://openreview.net/forum?id=AFDcYJKhND
[99]
Maia Zaharieva, Bogdan Ionescu, Alexandru-Lucian Gînsca, Rodrygo LT Santos, and Henning Müller. 2017. Retrieving Diverse Social Images at MediaEval 2017: Challenges, Dataset and Evaluation. In MediaEval.
[100]
Han Zhang, Jing Yu Koh, Jason Baldridge, Honglak Lee, and Yinfei Yang. 2021. Cross-modal contrastive learning for text-to-image generation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 833--842.
[101]
Yufan Zhou, Ruiyi Zhang, Changyou Chen, Chunyuan Li, Chris Tensmeyer, Tong Yu, Jiuxiang Gu, Jinhui Xu, and Tong Sun. [n.d.]. LAFITE: Towards Language-Free Training for Text-to-Image Generation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) ([n.d.]). https://par.nsf.gov/biblio/10351124

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR-AP 2024: Proceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region
December 2024
328 pages
ISBN:9798400707247
DOI:10.1145/3673791
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 December 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. offline evaluation
  2. text to image generation
  3. user modelling

Qualifiers

  • Research-article

Conference

SIGIR-AP 2024
Sponsor:

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 63
    Total Downloads
  • Downloads (Last 12 months)63
  • Downloads (Last 6 weeks)63
Reflects downloads up to 12 Jan 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media