10 LLM Vulnerabilities and How to Establish LLM Security [OWASP]

Manjesh S.

Senior Technical Engagement Manager

August 7th, 2023

In the rapidly evolving world of technology, the use of Large Language Models (LLMs) and Generative AI (GAI) in applications has become increasingly prevalent. While these models offer incredible benefits in terms of automation and efficiency, they also present unique security challenges. The Open Web Application Security Project (OWASP) just released the “Top 10 for LLM Applications 2023,” a comprehensive guide to the most critical security risks to LLM applications. At HackerOne, we strive to be at the forefront of AI security research and are proud to have two of our team members, Manjesh S., Technical Engagement Manager, and Mike Finch, former Senior Product Designer, contribute to this important initiative. Their involvement underscores HackerOne's commitment to advancing the field of application security, particularly in emerging areas like LLMs.

Here is HackerOne’s perspective on the Top 10 list for LLM vulnerabilities and how organizations can prevent these critical security risks.

Browse by LLM vulnerability:

Prompt Injection
Insecure Output Handling
Training Data Poisoning
Model Denial of Service
Supply Chain Vulnerabilities
Sensitive Information Disclosure
Insecure Plugin Design
Excessive Agency
Overreliance
Model Theft

LLM01: Prompt Injection

What Is Prompt Injection?

One of the most commonly discussed LLM vulnerabilities, Prompt Injection is a vulnerability during which an attacker manipulates the operation of a trusted LLM through crafted inputs, either directly or indirectly. For example, an attacker leverages an LLM to summarize a webpage containing a malicious and indirect prompt injection. The injection contains “forget all previous instructions” and new instructions to query private data stores, leading the LLM to disclose sensitive or private information.

Solutions to Prompt Injection

Several actions can contribute to preventing Prompt Injection vulnerabilities, including:

Enforcing privilege control on LLM access to the backend system
Segregating external content from user prompts
Keeping humans in the loop for extensible functionality

LLM02: Insecure Output Handling

What Is Insecure Output Handling?

Insecure Output Handling occurs when an LLM output is accepted without scrutiny, potentially exposing backend systems. Since LLM-generated content can be controlled by prompt input, this behavior is similar to providing users indirect access to additional functionality, such as passing LLM output directly to backend, privileged, or client-side functions. This can, in some cases, lead to severe consequences like XSS, CSRF, SSRF, privilege escalation, or remote code execution.

Solutions to Insecure Output Handling

There are three key ways to prevent Insecure Output Handling:

Treating the model output as any other untrusted user content and validating inputs
Encoding output coming from the model back to users to mitigate undesired code interpretations
Pentesting to uncover insecure outputs and identify opportunities for more secure handling techniques

LLM03: Training Data Poisoning

What Is Training Data Poisoning?

Training data poisoning refers to the manipulation of data or fine-tuning of processes that introduce vulnerabilities, backdoors, or biases and could compromise the model’s security, effectiveness, or ethical behavior. It’s considered an integrity attack because tampering with training data impacts the model’s ability to output correct predictions.

Solutions to Training Data Positioning

Organizations can prevent Training Data Poisoning by:

Verifying the supply chain of training data, the legitimacy of targeted training data, and the use case for the LLM and the integrated application
Ensuring sufficient sandboxing to prevent the model from scraping unintended data sources
Use strict vetting or input filters for specific training data or categories of
data sources

LLM04: Model Denial of Service

What Is Model Denial of Service?

Model Denial of Service is when attackers cause resource-heavy operations on LLMs, leading to service degradation or high costs. This vulnerability can occur by sending queries that are unusually resource-consuming, repetitive inputs, and flooding the LLM with a large volume of variable-length inputs, to name a few examples. Model Denial of Service is becoming more critical due to the increasing use of LLMs for different applications, their intensive resource utilization, and the unpredictability of user input.

Solutions to Model Denial of Service

In order to prevent Model Denial of Service and identify issues early, organizations should:

Implement input validation, sanitization and enforce limits/caps
Cap resource use per request
Limit the number of queued actions
Continuously monitor the resource utilization of LLMs

LLM05: Supply Chain Vulnerabilities

What Are Supply Chain Vulnerabilities?

The supply chain in LLMs can be vulnerable, impacting the integrity of training data, Machine Learning (ML) models, and deployment platforms. Supply Chain Vulnerabilities in LLMs can lead to biased outcomes, security breaches, and even complete system failures. Traditionally, supply chain vulnerabilities are focused on third-party software components, but within the world of LLMs, the supply chain attack surface is extended through susceptible pre-trained models, poisoned training data supplied by third parties, and insecure plugin design.

Solutions to Supply Chain Vulnerabilities

Supply Chain Vulnerabilities in LLMs can be prevented and identified by:

Carefully vetting data sources and suppliers
Using only reputable plug-ins, scoped appropriately to your particular implementation and use cases
Conducting sufficient monitoring, adversarial testing, and proper patch management

LLM06: Sensitive Information Disclosure

What Is Sensitive Information Disclosure?

Sensitive Information Disclosure is when LLMs inadvertently reveal confidential data. This can result in the exposing of proprietary algorithms, intellectual property, and private or personal information, leading to privacy violations and other security breaches. Sensitive Information Disclosure can be as simple as an unsuspecting legitimate user being exposed to other user data when interacting with the LLM application in a non-malicious manner. But it can also be more high-stakes, such as a user targeting a well-crafted set of prompts to bypass input filters from the LLM to cause it to reveal personally identifiable information (PII). Both scenarios are serious, and both are preventable.

Solutions to Sensitive Information Disclosure

To prevent sensitive information disclosure, organizations need to:

Integrate adequate data input/output sanitization and scrubbing techniques
Implement robust input validation and sanitization methods
Practice the principle of least privilege when training models
Leverage hacker-based adversarial testing to identify possible sensitive information disclosure issues

LLM07: Insecure Plugin Design

What Is Insecure Plugin Design?

The power and usefulness of LLMs can be extended with plugins. However, this does come with the risk of introducing more vulnerable attack surface through poor or insecure plugin design. Plugins can be prone to malicious requests leading to wide range of harmful and undesired behaviors, up to and including sensitive data exfiltration and remote code execution.

Solutions to Insecure Plugin Design

Insecure plugin design can be prevented by ensuring that plugins:

Enforce strict parameterized input
Use appropriate authentication and authorization mechanisms
Require manual user intervention and approval for sensitive actions
Are thoroughly and continuously tested for security vulnerabilities

LLM08: Excessive Agency

What Is Excessive Agency?

Excessive Agency is typically caused by excessive functionality, excessive permissions, and/or excessive autonomy. One or more of these factors enables damaging actions to be performed in response to unexpected or ambiguous outputs from an LLM. This takes place regardless of what is causing the LLM to malfunction — confabulation, prompt injection, poorly engineered prompts, etc. — and creates impacts across the confidentiality, integrity, and availability spectrum.

Solutions to Excessive Agency

To avoid the vulnerability of Excessive Agency, organizations should:

Limit the tools, functions, and permissions to only the minimum necessary for the LLM
Tightly scope functions, plugins, and APIs to avoid over-functionality
Require human approval for major and sensitive actions, leverage an audit log

LLM09: Overreliance

What Is Overreliance?

Overreliance is when systems or people depend on LLMs for decision-making or content generation without sufficient oversight. LLMs and Generative AI are becoming increasingly mainstream to apply in a wide range of scenarios with very beneficial results. However, organizations and the individuals that comprise them can come to overrely on LLMs without the knowledge and validation mechanisms required to ensure information is accurate, vetted, and secure.

For example, an LLM could provide inaccurate information in a response, and a user could take this information to be true, resulting in the spread of misinformation. Or, an LLM can suggest insecure or faulty code, which, when incorporated into a software system, results in security vulnerabilities.

Solutions to Overreliance

In regards to both company culture and internal processes, there are many methods to prevent Overreliance on LLMs, including:

Regularly monitoring and cross-checking LLM outputs with trusted external sources to filter out misinformation and other poor outputs
Fine-tuning LLM models to continuously improve output quality
Breaking down complex tasks into more manageable ones to reduce the chances of model malfunctions
Communicating and training the benefits, as well as the risks and limitations of LLMs at an organizational level

LLM10: Model Theft

What Is Model Theft?

Model Theft is when there is unauthorized access, copying, or exfiltration of proprietary LLM models. This can lead to economic loss, reputational damage, and unauthorized access to highly sensitive data.

This is a critical vulnerability because, unlike many of the others on this list, it is not only about securing outputs and verifying data — it’s about controlling the power and prevalence associated with large language models.

Solutions to Model Theft

The security of propriety LLMs is of the utmost importance, and organizations can implement effective measures such as:

Implementing strong access controls (RBAC, principle of least privilege, etc.) and exercising particular caution around LLM model repositories and training environments
Restrict the LLM’s access to network resources and internal services
Monitoring and auditing access logs to catch suspicious activity
Automate governance and compliance tracking
Leverage hacker-based testing to identify vulnerabilities that could lead to model theft

Securing the Future of LLMs

This new release by the OWASP Foundation enables organizations looking to adopt LLM technology (or recently did so) to guard against common pitfalls. In many cases, organizations simply are unable to catch every vulnerability. HackerOne is committed to helping organizations secure their LLM applications and to staying at the forefront of security trends and challenges. HackerOne’s solutions are effective at identifying vulnerabilities and risks that stem from weak or poor LLM implementations. Conduct continuous adversarial testing through Bug Bounty, targeted hacker-based testing with Challenge, or comprehensively assess an entire application with Pentest or Code Security Audit. Contact us today to learn more about how we can help secure your LLM and secure against LLM vulnerabilities.

Additional Resources

The 8th Annual Hacker-Powered Security Report

Read the Report

HackerOne and the OWASP Top 10 for LLM: A Powerful Alliance for Secure AI

Share