skip to main content
research-article

AI content detection in the emerging information ecosystem: new obligations for media and tech companies

Published: 21 September 2024 Publication History

Abstract

The world is about to be swamped by an unprecedented wave of AI-generated content. We need reliable ways of identifying such content, to supplement the many existing social institutions that enable trust between people and organisations and ensure social resilience. In this paper, we begin by highlighting an important new development: providers of AI content generators have new obligations to support the creation of reliable detectors for the content they generate. These new obligations arise mainly from the EU’s newly finalised AI Act, but they are enhanced by the US President’s recent Executive Order on AI, and by several considerations of self-interest. These new steps towards reliable detection mechanisms are by no means a panacea—but we argue they will usher in a new adversarial landscape, in which reliable methods for identifying AI-generated content are commonly available. In this landscape, many new questions arise for policymakers. Firstly, if reliable AI-content detection mechanisms are available, who should be required to use them? And how should they be used? We argue that new duties arise for media and Web search companies arise for media companies, and for Web search companies, in the deployment of AI-content detectors. Secondly, what broader regulation of the tech ecosystem will maximise the likelihood of reliable AI-content detectors? We argue for a range of new duties, relating to provenance-authentication protocols, open-source AI generators, and support for research and enforcement. Along the way, we consider how the production of AI-generated content relates to ‘free expression’, and discuss the important case of content that is generated jointly by humans and AIs.

References

[1]
Berry, S. (2024). Fake Google restaurant reviews and the implications for consumers and restaurants. PhD dissertation, William Howard Taft University. https://arxiv.org/pdf/2401.11345.pdf
[2]
Bradford A The Brussels effect: How the European Union rules the world 2020 Oxford University Press
[3]
Candelon, F., Krayer, L., Rajendran, S. and Zuluaga Martínez, D. (2023). How People Can Create—and Destroy—Value with Generative AI. BCG Henderson Institute report. https://www.bcg.com/publications/2023/how-people-create-and-destroy-value-with-gen-ai
[4]
Crothers E, Japkowicz N, and Viktor HL Machine-generated text: A comprehensive survey of threat models and detection methods IEEE Access 2023 11 70977-71002
[5]
Davis H Search engine optimization 2006 O’Reilly Press
[6]
Dell'Acqua, F., McFowland, E., Mollick, E. R., Lifshitz-Assaf, H., Kellogg, K., Rajendran, S., Krayer, L. Candelon, F., & Lakhani, K. R. (2023). Navigating the jagged technological frontier: Field experimental evidence of the effects of AI on knowledge worker productivity and quality. Harvard Business School Technology & Operations Mgt. Unit Working Paper, (24–013).
[7]
de Wilde, P., Arora, P., Buarque de Lima Neto, F., Chin, Y., Thinyane, M., Stinckwich, S., Fournier-Tombs, E., & Marwala, T. (2024). Recommendations on the use of synthetic data to train AI models. United Nations University Policy Guideline. https://collections.unu.edu/eserv/UNU:9480/Use-of-Synthetic-Data-to-Train-AI-Models.pdf
[8]
Dohmatob, E., Feng, Y., & Kempe, J. (2024a). Model Collapse Demystified: The Case of Regression. arXiv preprint arXiv:2402.07712.
[9]
Dohmatob, E., Feng, Y., Yang, P., Charton, F., & Kempe, J. (2024b). A Tale of Tails: Model Collapse as a Change of Scaling Laws. arXiv preprint arXiv:2402.07043.
[10]
EU (2022). Regulation (EU) 2022/1925 of the European Parliament and of the Council of 14 September 2022 on contestable and fair markets in the digital sector and amending Directives (EU) 2019/1937 and (EU) 2020/1828 (Digital Markets Act). EUR-Lex.
[11]
EU/FLI (2024). EU Artificial Intelligence Act. The Act Texts. Resources provided by the Future of Life Institute. https://artificialintelligenceact.eu/the-act/
[13]
Farhi, P. (2023). A news site used AI to write articles. It was a journalistic disaster. Washington Post, January 2023. https://www.washingtonpost.com/media/2023/01/17/cnet-ai-articles-journalism-corrections/
[14]
Fernandes, F (2023). Mapped: Interest in Generative AI by Country. Visual Capitalist blog post. https://www.visualcapitalist.com/cp/mapped-interest-in-generative-ai-by-country/
[15]
FID (2024). AI as a Public Good: Ensuring Democratic Control of AI in the Information Space. Report by the Forum for Information and Democracy. https://informationdemocracy.org/2024/02/28/new-report-of-the-forum-more-than-200-policy-recommendations-to-ensure-democratic-control-of-ai/
[16]
Founta, A. M., Chatzakou, D., Kourtellis, N., Blackburn, J., Vakali, A., & Leontiadis, I. (2019, June). A unified deep learning architecture for abuse detection. In Proceedings of the 10th ACM conference on Web Science (pp. 105–114).
[17]
Gao, C., Chen, D., Zhang, Q., Huang, Y., Wan, Y., & Sun, L. (2024). LLM-as-a-coauthor: The challenges of detecting LLM-human mixcase. arXiv preprint arXiv:2401.05952.
[18]
Google (2024). Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context. arXiv preprint arXiv:2403.05530.
[19]
GPAI. (2023). State-of-the-art Foundation AI Models Should be Accompanied by Detection Mechanisms as a Condition of Public Release. Report, Global Partnership on AI. https://gpai.ai/projects/responsible-ai/social-media-governance/Social%20Media%20Governance%20Project%20-%20July%202023.pdf
[20]
[21]
Hans, A., Schwarzschild, A., Cherepanova, V., Kazemi, H., Saha, A., Goldblum, M., Geiping, J., & Goldstein, T. (2024). Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text. arXiv preprint arXiv:2401.12070.
[22]
Harris, David Evan. (2023). How to regulate unsecured “Open-Source” AI: No exemptions. Tech Policy Press, December 2023. https://www.techpolicy.press/how-to-regulate-unsecured-opensource-ai-no-exemptions/
[23]
Heller B and van Hoboken J Freedom of expression: A comparative summary of United States and European law Available at SSRN 4563882 2019
[24]
Heyman SJ Righting the balance: An inquiry into the foundations and limits of freedom of expression BUL Rev 1998 78 1275
[25]
Hołyst JA, Mayr P, Thelwall M, Frommholz I, Havlin S, Sela A, and Sienkiewicz J Protect our environment from information overload Nature Human Behaviour 2024 8 402-403
[26]
Jakesch, M., Bhat, A., Buschek, D., Zalmanson, L., & Naaman, M. 2023. Co-Writing with Opinionated Language Models Affects Users’ Views. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, 1–15. Hamburg, Germany: ACM.
[27]
Jiang AQ, Sablayrolles A, Roux A, Mensch A, Savary B, Bamford C, and Sayed WE Mixtral of experts arXiv preprint arXiv:2401.04088 2024
[28]
Kapoor, S., Bommasani, R., Klyman, K., Longpre, S., Ramaswami, A., Cihon, P., Hopkins, A., Bankston, K., Biderman, S., Bogen, M., Chowdhury, R., Engler, A., Henderson, P., Jernite, Y., Lazar, S., Maffulli, S., Nelson, A., Pineau, J., Skowron, A., Song, D., Storchan, V., Zhang, D., Ho, D., Liang, P., Narayanan, A. (2024). On the Societal Impact of Open Foundation Models. Stanford University Center for Research on Foundation Models. https://crfm.stanford.edu/open-fms/paper.pdf
[29]
Knott A, Pedreschi D, Chatila R, Chakraborti T, Leavy S, Baeza-Yates R, Eyers D, Trotman A, Teal PD, Biecek P, Russell S, and Bengio Y Generative AI models should include detection mechanisms as a condition for public release Ethics and Information Technology 2023 25 4 55
[30]
Krishna K, Song Y, Karpinska M, Wieting J, and Iyyer M Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense Advances in Neural Information Processing Systems 2023
[31]
Liang, W., Izzo, Z., Zhang, Y., Lepp, H., Cao, H., Zhao, X., ... & Zou, J. Y. (2024). Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews. arXiv preprint arXiv:2403.07183.
[32]
Májovský M, Černý M, Netuka D, and Mikolov T Perfect detection of computer-generated text faces fundamental challenges Cell Reports Physical Science 2024 5 1 101769
[33]
Meade, C. (2023). News Corp using AI to produce 3,000 Australian local news stories a week. The Guardian, July 2023. https://www.theguardian.com/media/2023/aug/01/news-corp-ai-chat-gpt-stories
[34]
Munich (2024). Tech Accord to Combat Deceptive Use of AI in 2024 Elections. Pledge made at the Munich Security Conference, February 2024. https://securityconference.org/en/aielectionsaccord/
[36]
Newsguard (2024). Tracking AI-enabled Misinformation: 702 ‘Unreliable AI-Generated News’ Websites (and Counting). https://www.newsguardtech.com/special-reports/ai-tracking-center/
[37]
Notopoulos, K. (2024). Women laughing alone with AI-generated content spam. Business Insider https://www.businessinsider.com/the-hairpin-blog-ai-spam-content-farm-cybersquatting-2024-1
[38]
NYT An A.I.-generated spoof rattles the markets 2023 New York Times
[40]
OpenAI (2021). DALL·E: creating images from text. Retrieved from https://openai.com/research/dall-e (accessed 19 March 2024).
[41]
OpenAI. (2023). GPT-4: Scaling up deep learning. Retrieved from https://openai.com/research/gpt-4
[42]
OpenAI. (2024). Sora: Creating video from text. Retrieved from https://openai.com/sora
[43]
Oremus, W and Verma, P. These look like prizewinning photos. They’re AI fakes. Washington Post, November 2023. https://www.washingtonpost.com/technology/2023/11/23/stock-photos-ai-images-controversy/
[44]
Originality (2024). AI-Generated Research Papers Published On arXiv Post ChatGPT Launch. Originality.AI blog post. https://originality.ai/blog/ai-generated-research-papers
[45]
Pacheco D, Hui P-M, Torres-Lugo C, Truong BT, Flammini A, and Menczer F Uncovering coordinated networks on social media: Methods and case studies Proceedings of the International AAAI Conference on Web and Social Media 2021 15 1 455-466
[47]
Poller A, Waldmann U, Vowé S, and Türpe S Electronic identity cards for user authentication-promise and practice IEEE Security & Privacy Magazine 2012 10 1 46-54
[48]
Rawte V, Sheth A, and Das A A survey of hallucination in large foundation models arXiv preprint arXiv: 2309.05922 2023
[49]
Bommasani, R., Kapoor, S., Klyman, K., Longpre, S., Ramaswami, A., Zhang, D., Schaake, M., Ho, D. E., Narayanan, A., & Liang, P. (2023). Considerations for Governing Open Foundation Models. Stanford University Center for Research on Foundation Models.
[50]
Ryan-Mosley, T. (2023). Junk websites filled with AI-generated text are pulling in money from programmatic ads. MIT Technology Review. https://www.technologyreview.com/2023/06/26/1075504/junk-websites-filled-with-ai-generated-text-are-pulling-in-money-from-programmatic-ads/
[51]
Sadasivan VS, Kumar A, Balasubramanian S, Wang W, and Feizi S Can AI-generated text be reliably detected? arXiv preprint arXiv: 2303.11156 2023
[52]
Schwartz, B. (2024). Google Responds To Claims Of Google News Boosting Garbage AI Content. Search Engine Roundtable, Jan 2024. https://www.seroundtable.com/google-responds-garbage-ai-content-in-google-news-36757.html
[53]
Seger, E., Dreksler, N., Moulange, R., Dardaman, E., Schuett, J., Wei, K., Gupta, A. (2023). Open-Sourcing Highly Capable Foundation Models: An Evaluation of Risks. Benefits, and Alternative Methods for Pursuing Open-Source Objectives.
[54]
Shumailov I, Shumaylov Z, Zhao Y, Gal Y, Papernot N, and Anderson R The curse of recursion: Training on generated data makes models forget arXiv preprint arXiv: 2305.17493 2023
[55]
Srinivasan, S. (2024). Detecting AI fingerprints: A guide to watermarking and beyond. Brookings Institute report. https://www.brookings.edu/articles/detecting-ai-fingerprints-a-guide-to-watermarking-and-beyond/
[56]
Stokel-Walker, C. (2023). TV channels are using AI-generated presenters to read the news. The question is, will we trust them? BBC News, January 2024. https://www.bbc.com/future/article/20240126-ai-news-anchors-why-audiences-might-find-digitally-generated-tv-presenters-hard-to-trust
[57]
Su, J., Zhuo, T. Y., Wang, D., & Nakov, P. (2023). DetectLLM: Leveraging Log Rank Information for Zero-Shot Detection of Machine-Generated Text. arXiv preprint arXiv:2306.05540.
[58]
Tenorio P Freedom of Communication in the US and Europe ICL Journal 2013 7 2 150-173
[59]
UAE TII. Falcon-180b: A 180 billion token language model. https://huggingface.co/tiiuae/falcon-180B, 2023.
[60]
Tonmoy, S. M., Zaman, S. M., Jain, V., Rani, A., Rawte, V., Chadha, A., & Das, A. (2024). A comprehensive survey of hallucination mitigation techniques in large language models. arXiv preprint arXiv:2401.01313.
[61]
Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M. A., Lacroix, T., ... & Lample, G. (2023). Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
[62]
Tucker, E. (2024). New ways we’re tackling spammy, low-quality content on Search. Google blog post, March 2024. https://blog.google/products/search/google-search-update-march-2024/
[63]
Valyaeva, I (2023). AI Has Already Created As Many Images As Photographers Have Taken in 150 Years. Statistics for 2023. EveryPixel Journal. https://journal.everypixel.com/ai-image-statistics
[64]
Veselovsky, V., Ribeiro, M. H., & West, R. (2023). Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899.
[65]
Wang Z, Bao J, Zhou W, Wang W, Hezhen Hu, Chen H, and Li H DIRE for diffusion-generated image detection Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) 2023 2023 22445-22455
[66]
WSJ (2024). There’s a tool to catch students cheating with ChatGPT. OpenAI Hasn’t Released It. Wall Street Journal Article, August 2024. https://www.wsj.com/tech/ai/openai-tool-chatgpt-cheating-writing-135b755a?.
[67]
Zerilli J, Knott A, Maclaurin J, and Gavaghan C Algorithmic DECISION-MAKING AND THE CONTROL PROBLEM Minds and Machines 2019 29 555-578
[68]
Zhang, Y., & Xu, X. Diffusion noise feature: Accurate and fast generated image detection. arXiv preprint arXiv:2312.02625, 2023.
[69]
Zhou ZH Ensemble methods Combining pattern classifiers 2014 Wiley 186-229

Index Terms

  1. AI content detection in the emerging information ecosystem: new obligations for media and tech companies
              Index terms have been assigned to the content through auto-classification.

              Recommendations

              Comments

              Information & Contributors

              Information

              Published In

              cover image Ethics and Information Technology
              Ethics and Information Technology  Volume 26, Issue 4
              Dec 2024
              82 pages

              Publisher

              Kluwer Academic Publishers

              United States

              Publication History

              Published: 21 September 2024
              Accepted: 05 August 2024

              Author Tags

              1. Generative AI
              2. AI-generated content
              3. AI regulation

              Qualifiers

              • Research-article

              Funding Sources

              • Montreal International Center of Expertise in AI (CEIMIA)

              Contributors

              Other Metrics

              Bibliometrics & Citations

              Bibliometrics

              Article Metrics

              • 0
                Total Citations
              • 0
                Total Downloads
              • Downloads (Last 12 months)0
              • Downloads (Last 6 weeks)0
              Reflects downloads up to 21 Jan 2025

              Other Metrics

              Citations

              View Options

              View options

              Media

              Figures

              Other

              Tables

              Share

              Share

              Share this Publication link

              Share on social media