Home Artificial Intelligence 10 most critical LLM vulnerabilities

Maria Korolov

mhill

by Maria Korolov and Michael Hill

10 most critical LLM vulnerabilities

News Analysis

15 Oct 202414 mins

Generative AIVulnerabilities

From prompt injections to model theft, OWASP has identified the most prevalent and impactful vulnerabilities found in AI applications based on large language models (LLMs).

LLM on cubes

Credit: Shutterstock

The Open Worldwide Application Security Project (OWASP) lists the top 10 most critical vulnerabilities often seen in large language model (LLM) applications. Prompt injections, poisoned training data, data leaks, and overreliance on LLM-generated content are still on the list, while newly added threats include model denial of service, supply chain vulnerabilities, model theft, and excessive agency.

The list aims to educate developers, designers, architects, managers, and organizations about the potential security risks when deploying and managing LLMs, raising awareness of vulnerabilities, suggesting remediation strategies, and improving the security posture of LLM applications.

“Organizations considering deploying generative AI technologies need to consider the risks associated with it,” says Rob T. Lee, chief of research and head of faculty at SANS Institute. “The OWASP top 10 does a decent job at walking through the current possibilities where LLMs could be vulnerable or exploited.” The top 10 list is a good place to start the conversation about LLM vulnerabilities and how to secure these AIs, he adds.

“We are just beginning to examine the ways to set up proper controls, configurations, and deployment guidelines that should be followed to best protect data from a privacy and security mindset. The OWASP Top 10 is a great start, but this conversation is far from over.”

Here are the top 10 most critical vulnerabilities affecting LLM applications, according to OWASP.

1. Prompt injections

Prompt injection occurs when an attacker manipulates a large language model through crafted inputs, causing the LLM to unknowingly execute the attacker’s intentions. This can be done directly by “jailbreaking” the system prompt or indirectly through manipulated external inputs, potentially leading to data exfiltration, social engineering, and other issues.

The results of a successful prompt injection attack can vary greatly — from solicitation of sensitive information to influencing critical decision-making processes under the guise of normal operation, OWASP said.

For example, a user can write a clever prompt that forces a company chatbot to reveal proprietary information the user doesn’t normally have access to — or upload a resume into an automated system with instructions buried inside the resume that tell the system to recommend the candidate.

Preventative measures for this vulnerability include:

Enforce privilege control on LLM access to backend systems. Provide the LLM with its own API tokens for extensible functionality and follow the principle of least privilege by restricting the LLM to only the minimum level of access necessary for its intended operations.
Add a human in the loop for the most sensitive operations, requiring an extra approval step to reduce the opportunity for unauthorized actions.

2. Insecure output handling

Insecure output handling refers specifically to insufficient validation, sanitization, and handling of the outputs generated by large language models before they are passed downstream to other components and systems. Since LLM-generated content can be controlled by prompt input, this behavior is similar to providing users indirect access to additional functionality.

For example, if the LLM’s output is sent directly into a system shell or similar function, it can result in remote code execution. And if the LLM generates JavaScript or markdown code and sends it to a user’s browser, the browser can run the code, resulting in a cross-site scripting attack.

Preventative measures for this vulnerability include:

Treat the model like any other user, adopting a zero-trust approach, and apply proper input validation on responses coming from the model to backend functions.
Follow the OWASP ASVS (application security verification standard) guidelines to ensure effective input validation and sanitization and encode the output to mitigate undesired code execution.

3. Training data poisoning

Training data poisoning refers to manipulation of pre-training data or data involved within the fine-tuning or embedding processes to introduce vulnerabilities, backdoors or biases that could compromise the model, OWASP says.

For example, a malicious attacker or insider who gains access to a training data set can change the data to make the model give incorrect instructions or recommendations to damage the company or benefit the attacker. Corrupted training data sets that come from external sources can also fall under supply chain vulnerabilities.

Preventative measures for this vulnerability include:

Verify the supply chain of the training data, especially when sourced externally.
Craft different models via separate training data or fine-tuning for different use-cases to create a more granular and accurate generative AI output.
Ensure sufficient sandboxing to prevent the model from scraping unintended data sources.
Use strict vetting or input filters for specific training data or categories of data sources to control volume of falsified data.
Detect signs of a poisoning attack by analyzing model behavior on specific test inputs and monitor and alert when the skewed responses exceed a threshold.
Use a human in the loop to review responses and auditing.

4. Model denial of service

In a model denial of service, an attacker interacts with an LLM in a way that uses an exceptionally high amount of resources, which results in a decline in the quality of service for them and other users, as well as potentially incurring high resource costs. This issue is becoming more critical due to the increasing use of LLMs in various applications, their intensive resource utilization, the unpredictability of user input, and a general unawareness among developers regarding this vulnerability, according to OWASP.

For example, an attacker could use automation to flood a company’s chatbot with complicated queries, each of which takes time — and costs money — to answer.

Preventative measures for this vulnerability include:

Implement input validation and sanitization to ensure user input adheres to defined limits and filters out any malicious content.
Cap resource use per request or step, so that requests involving complex parts execute more slowly, enforce API rate limits per individual user or IP address, or limit the number of queued actions and the number of total actions in a system reacting to LLM responses.
Continuously monitor the resource utilization of the LLM to identify abnormal spikes or patterns that may indicate a denial-of-service attack.

5. Supply chain vulnerabilities

LLM supply chains are vulnerable at many points, especially when companies use open-source, third-party components, poisoned or outdated pre-trained models, or corrupted training data sets. This vulnerability also covers cases where the creator of the original model did not properly vet the training data, leading to privacy or copyright violations. According to OWASP, this can lead to biased outcomes, security breaches, or even complete system failures.

Preventative measures for this vulnerability include:

Careful vetting of data sources and suppliers.
Only use reputable plug-ins and ensure they have been tested for your application requirements and use model and code signing when using external models and suppliers.
Use vulnerability scanning, management, and patching to mitigate against the risk of vulnerable or outdated components and maintain an up-to-date inventory of these components to quickly identify new vulnerabilities.
Scan environments for unauthorized plugins and out-of-date components, including the model and its artifacts and have a patching policy to remediate issues.

6. Sensitive information disclosure

Large language models have the potential to reveal sensitive information, proprietary algorithms, or other confidential details through their output. This can result in unauthorized access to sensitive data, intellectual property, privacy violations, and other security breaches.

Sensitive data can get into an LLM during the initial training, fine-tuning, RAG embedding, or be cut-and-pasted by a user into their prompt.

Once the model has access to this information, there’s the potential for other, unauthorized users to see it. For example, customers might see private information belonging to other customers, or users might be able to extract proprietary corporate information.

Preventative measures for this vulnerability include:

Use data sanitization and scrubbing to prevent the LLM from getting access to sensitive data either during training or during inference — when the model is used.
Apply filters to user inputs to prevent sensitive data from being uploaded.
When the LLM needs to access data sources during inference, use the strict access controls and the principle of least privilege.

7. Insecure plugin design

LLM plugins are extensions that are called automatically by the model during user interactions. They are driven by the model, and there is no application control over the execution, and, often, no validation or type checking on inputs.

This allows a potential attacker to construct a malicious request to the plugin, which could result in a wide range of undesired behaviors, up to and including data exfiltration, remote code execution, and privilege escalation, OWASP warns.

For plugins supplied by third parties, see OWASP 5. Supply chain vulnerabilities.

Preventative measures for this vulnerability include:

Strict input controls, including type and range checks, and OWASP’s recommendations in ASVS (Application Security Verification Standard) to ensure effective input validation and sanitization.
Appropriate authentication mechanisms, such as OAuth2 and API keys that reflect the plugin route rather than the default user.
Inspection and testing before deployment.
Plugins should follow least-privilege access and expose as little functionality as possible while still doing what they’re supposed to.
Require additional human authorization for sensitive actions.

8. Excessive agency

As LLMs get smarter, companies want to give them the power to do more, to access more systems, and to do stuff autonomously. Excessive agency is when an LLM gets too much power to do things or is allowed to do the wrong things. Damaging actions could be performed when an LLM hallucinates, when it falls victim to a prompt injection, a malicious plugin, poorly written prompts — or just because it’s a badly performing model, OWASP says.

Depending on just how much access and authority the LLM gets, this could cause a wide range of problems. For example, if the LLM is given access to a plugin that allows it to read documents in a repository so that it can summarize them, but the plugin also allows it to modify or delete documents, a bad prompt could cause it to change or delete things unexpectedly.

If a company creates an LLM personal assistant that summarizes emails for employees but also has the power to send emails, then the assistant could start sending spam, whether accidentally or maliciously.

Preventative measures for this vulnerability include:

Limit the plugins and tools that the LLM is allowed to call, and the functions that are implemented in those plugins and tools, to the minimum necessary.
Avoid open-ended functions such as running a shell command or fetching a URL and use those with more granular functionality.
Limit the permissions that LLMs, plugins and tools are granted to other systems to the minimum necessary
Track user authorization and security scope to ensure actions taken on behalf of a user are executed on downstream systems in the context of that specific user, and with the minimum privileges necessary.

9. Overreliance

Overreliance can occur when an LLM produces erroneous information and provides it in an authoritative manner. While LLMs can produce creative and informative content, they can also generate content that is factually incorrect, inappropriate or unsafe. This is referred to as hallucination or confabulation. When people or systems trust this information without oversight or confirmation it can result in a security breach, misinformation, miscommunication, legal issues, and reputational damage.

For example, if a company relies on an LLM to generate security reports and analysis and the LLM generates a report containing incorrect data which the company uses to make critical security decisions, there could be significant repercussions due to the reliance on inaccurate LLM-generated content.

Rik Turner, a senior principal analyst for cybersecurity at Omdia, refers to this as LLM hallucinations. “If it comes back talking rubbish and the analyst can easily identify it as such, he or she can slap it down and help train the algorithm further. But what if the hallucination is highly plausible and looks like the real thing?”

Preventative measures for this vulnerability include:

Regularly monitor and review the LLM outputs.
Cross-check the LLM output with trusted external sources or implement automatic validation mechanisms that can cross-verify the generated output against known facts or data.
Enhance the model with fine-tuning or embeddings to improve output quality.
Communicate the risks and limitations associated with using LLMs and build APIs and user interfaces that encourage responsible and safe use of LLMs.

10. Model theft

Model theft is when malicious actors access and exfiltrate entire LLM models or their weights and parameters so that they can create their own versions. This can result in economic or brand reputation loss, erosion of competitive advantage, unauthorized use of the model, or unauthorized access to sensitive information contained within the model.

For example, an attacker might get access to an LLM model repository via a misconfiguration in the network or application security setting, a disgruntled employee might leak a model. Attackers can also query the LLM to get enough question-and-answer pairs to create their own shadow clone of the model, or use the responses to fine tune their model. According to OWASP, it’s not possible to replicate an LLM 100% through this type of model extraction, but they can get close.

Attackers can use this new model for its functionality, or they can use it as a testing ground for prompt injection techniques which they can then use to break into the original model. As large language models become more prevalent and more useful, LLM thefts will become a significant security concern, OWASP says.

Preventative measures for this vulnerability include:

Strong access controls such as role-based access and the rule of least privilege to limit access to model repositories and training environments, such as by having a centralized model registry.
Regular monitoring and auditing of access logs and activities to detect any suspicious or unauthorized behavior promptly.
Input filters and rate limiting of API calls to reduce risk of model cloning.

Security leaders or teams and their organizations are responsible for ensuring the secure use of generative AI chat interfaces that use LLMs.

AI-powered chatbots need regular updates to remain effective against threats and human oversight is essential to ensure LLMs function correctly, CEO at Tovie AI Joshua Kaiser previously told CSO. “Additionally, LLMs need contextual understanding to provide accurate responses and catch any security issues and should be tested and evaluated regularly to identify potential weaknesses or vulnerabilities.”

Maria Korolov

by Maria Korolov

Contributing writer

Maria Korolov is an award-winning technology journalist covering AI and cybersecurity. She also writes science fiction novels, edits a sci-fi and fantasy magazine, and hosts a YouTube show.

More from this author

mhill

by Michael Hill

UK Editor

Michael Hill is the UK editor of CSO Online. He has spent the past five-plus years covering various aspects of the cybersecurity industry, with particular interest in the ever-evolving role of the human-related elements of information security. A keen storyteller with a passion for the publishing process, he enjoys working creatively to produce media that has the biggest possible impact on the audience.

More from this author

Show me more

US consumer protection agency bans employee mobile calls amid Chinese hack fears

08 Nov 20244 mins

Choosing AI: the 7 categories cybersecurity decision-makers need to understand

By Christopher Whyte

08 Nov 202410 mins

CSO and CISOSecurity PracticesSecurity Software

The US Department of Defense has finalized cyber rules for its suppliers

By John P. Mello Jr.

08 Nov 20245 mins

RegulationAerospace and Defense IndustryGovernment

CSO Executive Sessions: Guardians of the Games - How to keep the Olympics and other major events cyber safe

07 Aug 202417 mins

CSO Executive Session India with Dr Susil Kumar Meher, Head Health IT, AIIMS (New Delhi)

17 Jul 202417 mins

CSO Executive Session India with Charanjit Bhatia, Head of Cybersecurity, COE, Bata Brands

08 Jul 202418 mins

CSO Executive Sessions: Open Source Institute’s Eric Nguyen on supply chain risks to critical infrastructure (Part 1)

04 Nov 202419 mins

Supply ChainCritical InfrastructureSecurity

CSO Executive Sessions: Standard Chartered’s Alvaro Garrido on cybersecurity in the financial services industry

23 Oct 202410 mins

Financial Services IndustrySecurity

CSO Executive Sessions: New World Development’s Dicky Wong on securing critical infrastructure

16 Oct 202412 mins

Critical InfrastructureSecurity

Sponsored Links

Zscaler ThreatLabz 2024 Mobile, IoT, and OT Threat Report