AI Red Teaming: HackerOne's Playbook

Thought Leadership,

Generative AI

Michiel Prins

Co-founder and Senior Director, Product Management

Alex Rice

CTO

Dane Sherrets

Senior Solutions Architect

April 1st, 2024

As AI is adopted by every industry and becomes an integral part of enterprise solutions, ensuring its safety and security is critical. In fact, the Biden Administration recently released an Executive Order (EO) that aims to shape the safe, secure, and trustworthy development of AI. This follows action taken in California and by the Leaders of the Group of Seven (“G7”) to address AI.

To ensure that AI is more secure and trustworthy, the EO calls on companies who develop AI and other companies in critical infrastructure that use AI to rely on “red-teaming”: testing to find flaws and vulnerabilities. The EO also requires broad disclosures of some of these red-team test results.

Testing AI systems isn’t necessarily new. Back in 2021, HackerOne organized a public algorithmic bias review with Twitter as part of the AI Village at DEF CON 29. The review encouraged members of the AI and security communities to identify bias in Twitter’s image-cropping algorithms. The results of the engagement brought to light various confirmed biases, informing improvements to make the algorithms more equitable.

In this blog post, we'll delve into the emerging playbook developed by HackerOne, focusing on the collaboration between ethical hackers and AI safety to fortify these systems. Bug bounty programs have proven effective at finding security vulnerabilities, but AI safety requires a new approach. According to recent findings published in the 7th Annual Hacker Powered Security Report, 55% of hackers say that GenAI tools themselves will become a major target for them in the coming years, and 61% said they plan to use and develop hacking tools using GenAI to find more vulnerabilities.

“Every properly designed AI application has a unique safety threat model and should implement some safety parameters or guard rails to protect against adverse outcomes. The protections you care most about are going to vary based on the use case for the application and the intended audience. But how easily are those guard rails bypassed? That is what you find out with AI red teaming.”
— Dane Sherrets, Senior Solutions Architect, HackerOne

HackerOne's Approach to AI Red Teaming

HackerOne partners with leading technology firms to evaluate their AI deployments for safety issues. The ethical hackers selected for our early AI Red Teaming exceeded all expectations. Drawing from these experiences, we're eager to share the insights gleaned, which have shaped our evolving playbook for AI safety red teaming.

Our approach builds upon the powerful bug bounty model, which HackerOne has successfully offered for over a decade, but with several modifications necessary for optimal AI Safety engagement.

Team Composition: A meticulously selected and, more importantly, diverse team is the backbone of an effective assessment. Emphasizing diversity in background, experience, and skill sets is pivotal for ensuring a safe AI. A blend of curiosity-driven thinkers, individuals with varied experiences, and those skilled in production LLM prompt behavior has yielded the best results.
Collaboration and Size: Collaboration among AI Red Teaming members holds unparalleled significance, often exceeding that of traditional security testing. A team size ranging from 15-25 testers has been found to strike the right balance for effective engagements, bringing in diverse and global perspectives.
Duration: Because AI technology is evolving so quickly, we’ve found that engagements between 15 and 60 days work best to assess specific aspects of AI Safety. However, in at least a handful of cases, a continuous engagement without a defined end date was adopted. This method of continuous AI red teaming pairs well with an existing bug bounty program.
Context and Scope: Unlike traditional security testing, AI Red Teamers cannot approach a model blindly. Establishing both broad context and specific scope in collaboration with customers is crucial to determining the AI's purpose, deployment environment, existing safety features, and limitations.
Private vs. Public: While most AI Red Teams operate in private due to the sensitivity of safety issues, there are instances where public engagement, such as Twitter's algorithmic bias bounty challenge, has yielded significant success.
Incentive Model: Tailoring the incentive model is a critical aspect of the AI safety playbook. A hybrid economic model that includes both fixed-fee participation rewards in conjunction with rewards for achieving specific safety outcomes (akin to bounties) has proven most effective.
Empathy and Consent: As many safety considerations may involve encountering harmful and offensive content, it is important to seek explicit participation consent from adults (18+ years of age), offer regular support for mental health, and encourage breaks between assessments.

“It’s important to underscore that different AI models or deployments will have drastically different threat models. An AI text-to-image generator deployed on a social media network will have a different threat model than an AI chabot in a medical context. Early on in these conversations we define what the threat model is based on the use case, the regulatory environment, architecture, and other factors.”
— Dane Sherrets, Senior Solutions Architect, HackerOne

In the HackerOne community, over 750 active hackers specialize in prompt hacking and other AI security and safety testing. To date, 90+ of those hackers have participated in HackerOne's AI Red Teaming engagements. In a single recent engagement, a team of 18 quickly identified 26 valid findings within the initial 24 hours and accumulated over 100 valid findings in the two-week engagement. In one notable example, one of the challenges put forth to the team was bypassing significant protections built to prevent the generation of images containing a Swastika. A particularly creative hacker on the AI Red Team was able to swiftly bypass these protections, and thanks to their findings, the model is now far more resilient against this type of abuse.

As AI continues to shape our future, the ethical hacker community, in collaboration with platforms like HackerOne, is committed to ensuring its safe integration. Our AI Red Teams stand ready to assist enterprises in navigating the complexities of deploying AI models responsibly, ensuring that their potential for positive impact is maximized while guarding against unintended consequences.

“In my opinion, the best way to secure AI is also through the use of crowdsourcing. By engaging hackers through AI red teaming engagements, I believe we can obtain a better understanding of the rapidly changing nature of AI security and AI Safety. This will result in reduced risk in implementing these exciting new technologies and allow us to capitalize on all of the benefits.”
— Josh Donlan, Senior Solutions Engineer, HackerOne

By using the expertise of ethical hackers and adapting the bug bounty model to address AI safety, HackerOne's playbook is a proactive approach to fortifying AI while mitigating potential risks. For technology and security leaders venturing into AI integration, we look forward to partnering with you to explore how HackerOne and ethical hackers can contribute to your AI safety journey. To learn more about how to implement AI Red Teaming for your organization, download the AI Red Teaming solution brief or contact our experts at HackerOne.

The 8th Annual Hacker-Powered Security Report

Read the Report

An Emerging Playbook for AI Red Teaming With HackerOne

Share

HackerOne's Approach to AI Red Teaming