DEF CON 33: Field Notes on AI Security, AI Red Teaming, and the Road Ahead

Naz Bozdemir
Lead Product Researcher
Image
DEF CON 33 Banner

The AI security conversation at DEF CON has matured. This year’s focus moved beyond defining “AI red teaming” to making it repeatable, meaningful, and tied to real-world risk. Across the AI Village, OWASP tracks, and the AIxCC stage, speakers emphasized continuous evaluation loops, practical human-AI collaboration, and turning technical findings into policy and operational change.

For HackerOne, tracking these developments is critical to understanding where AI security is headed and translating those trends into actionable insights. To that end, DEF CON still remains one of the best places to see the field evolve in real time; spotting what’s working, where the gaps are, and which innovations will shape the next wave of tools and techniques driving our product strategy.

Below are my field notes and key takeaways from standout sessions at my fifth DEF CON, DC33.

CSET – AI Red Teaming as an Evaluation Process

AI Village – Jessica Ji & Evelyn Yee, CSET/Georgetown - Stanford

The CSET team reframed AI red teaming as one of two core AI evaluation methodologies, alongside benchmarking. As a refresher, AI benchmarks measure performance on fixed tasks; whereas AI red teaming probes behavior under adversarial conditions.

Their “design decisions” framework, covering threat models, tester expertise, tooling, and access, showed why two red teams can share a name but deliver completely different results.

A Broad Menu of Design Options

Category

Examples

Threat model

  • Risks
  • Actors being emulated
  • Scope

Testers

  • Expertise levels
  • Number of testers
  • Access levels

Resources

  • Time spent testing
  • Financial cost
  • Compute cost

Target system

  • Version
  • Guardrails
  • Release stage

Metrics

  • Success criteria
  • Summary statistics

Methods

  • Manual vs. automated
  • Use of AI

Tooling

  • Framework or eval harness

The policy emphasis in their presentation was strong: without transparent, interoperable reporting, results risk becoming siloed anecdotes rather than governance inputs. CSET positioned common vocabulary, frameworks, and reporting standards to be critical to creating a “virtuous cycle” that improves AI evaluations and informs policy, acquisition, and deployment decisions.

Slide from the CSET presentation showing the "Virtuous Cycle"
  • My take: AI red teaming’s value is about producing consistent, transparent outputs that policymakers and practitioners can act on. Without common frameworks and reporting standards, even the best red teaming risks becoming isolated case studies instead of inputs that shape trustworthy AI deployment.

Google – AI Red Teaming for Everyone

AI Village – Monica Carranza & Chang Mou, Google 

One of the most informative/approachable talks I caught at the AI village was from Google’s Red Team. They broke AIRT into three tiers: Adversarial testing, adversarial simulation, and capabilities testing. This perspective dovetailed with the other DC talks, clarifying that not all “red teaming” is created equal, and that structure matters as much as tooling. 

Their framework made it easy to see where each type of test fits, but the catch is that if you start too broad, you might under-resource, and if you start too narrow and you might miss systemic risks. 

Slide from Google's Red Team presentation on three tiers of AI red teaming

What stood out in this session was their take on accessibility. Their team has recently partnered with Hack The Box to launch an AI Red Teamer career path to help people break into the field. This pathway is designed to reflect Google Red Teams’ years of expertise, and also to build practical skills. 

  • My take: With the right structure, scope, and accessible training, AI red teaming can evolve from a niche lab activity into a repeatable discipline for any motivated team or individual.

OWASP – Breaking the Black Box: Why Testing GenAI Is Full Spectrum

OWASP Community – Jason Ross, Salesforce 

It’s always a pleasure catching Jason’s talks; his years of hands-on experience feel like a live playbook for modern AI exploitation. He split jailbreaks (attacks on a model’s guardrails) from prompt injections (attacks on application logic), a crucial distinction for understanding responsibility and mitigation.

What stood out in his presentation were the novel vectors he walked us through:

  • Context poisoning: Injecting false conversation history or misleading context to manipulate behavior.
  • Context window exhaustion: Flooding the context window to weaken policy compliance and increase vulnerability.
  • KROP (Knowledge Return Oriented Prompting): Developed by HiddenLayer to extract restricted knowledge indirectly.
  • ASCII smuggling and Unicode attacks: Bypassing filters with non-standard character encoding.

Jason warned about the “agentic challenge” in his talk; AI agents granted real-world permissions can perform actions autonomously. Historically, the advice was “don’t do sensitive things without human approval,” but we all know that’s already breaking down as agents are increasingly integrated into automated workflows. 

Slide from OWASP's presentation showing a safety analysis

He also covered some cool (unpublished) tooling for AIRT, because we need AI to test AI, considering how vast/infinite the attack surface has become. The Tresher tool he demoed exemplifies exactly this: you give the AI a goal, and three LLMs work together, a generator to create prompts, an executor to run attacks, and a judge to score results and recommend improvements. 

Looking ahead, Jason sees AI red teaming as continuous and adaptive, not one-off. Agentic pentesting is already testing multi-team simulations with blue, red, and “IT manager” roles, but many current teams still underperform, relying on single-shot attacks, lacking multimodal capabilities, and trailing behind a handful of experts refining these methods for years. This will be the #1 challenge vendors will face going forward.

  • My take: Effective AI red teaming must go beyond surface-level jailbreaks to probe hidden, indirect, and agentic vulnerabilities. Staying ahead will require continuous, adaptive, and agentic testing.

Assessing the Capabilities Gap Between Foundation Models and Cybersecurity Experts: Benchmarks, Safeguards, and Policy

Creator Stage - Justin W. Lin, OpenAI - Stanford

This comparative study by Lin and his super impressive team, including great minds from OpenAINASA, and Stanford, pitted human testers against AI agents in a live test on a large university network. The results were more than intriguing:

  • Agents were faster (18 mins vs. 45 mins to first exploit) but often missed context-driven vulnerabilities that humans spotted instantly.
  • Human reports were clearer, with better signal-to-noise.
  • Agents excelled at breadth and speed, but humans leveraged intuition and tactical shortcuts that the agents couldn’t replicate.

The most concrete example for the last bullet was that on a public site, a human tester identified a stored URI XSS in a user profile by adding a javascript: URL. The trick worked not because the tester saw the filter, but because prior experience (a hunch) told them how such filters could be bypassed. The agent, by contrast, never investigated the host deeply enough to uncover the flaw, even when it was actively prompted to do so. It lacked the anticipatory reasoning to test for it, something the human had internalized from all of the past engagements.

Slide from the OpenAI presentation showing findings from humans, agents, and both

So the agents excelled at breadth and speed, but humans leveraged intuition and tactical shortcuts that agents couldn’t replicate. Cost analysis ($550/day for agents vs. human billable hours) hinted at future hybrid models combining human judgment with agent-scale coverage.

  • My take: Agents can cover more ground quickly, but without human insight and experience-driven tactics, they risk missing high-impact vulnerabilities entirely, making a hybrid human-agent model not only preferable but necessary.

Claude - Climbing a CTF Scoreboard Near You

Main stage - Keane Lucas, Anthropic

Lucas detailed how Anthropic tested Claude as an active competitor in seven capture-the-flag (CTF) and cyber defense competitions, both to see if it could solve challenges and also to map its strengths, weaknesses, and operational risks in red and blue teaming contexts.

The experiments spanned seven competitions, and the  results showed super interesting patterns:

Claude performed strongly in easy and mid-tier challenges (top 3% in picoCTF, 19/20 solves in Humans vs. AI, 15/30 in Airbnb CTF) but failed to score in elite contests like PlaidCTF and DEF CON Qualifiers, where deep expertise and multi-step reasoning were essential. 

Slide from Anthropic's presentation showing outcomes from Claude

On the bright side for Claude, the outcomes improved dramatically with better orchestration, giving clear instructions, running many sub-agents in parallel, and adding external tools.

  • My take: With the right setup, Claude can match or surpass early-career security talent in some areas, but high-context reasoning still belongs to humans; emphasizing that the future of AI red teaming is hybrid human-AI teams - at least for now...

The Human’s Guide to Understanding AIxCC

AIxCC Stage – Mark Griffin, @seeinglogic

This was by far one of the most exciting talks for me. The AIxCC (aka the AI Cyber Challenge) is a DARPA-backed competition in which AI agents compete to secure and exploit complex software systems at scale. For the human observer, the contest is impossible to follow in real time, as thousands of automated actions happen in the cloud, invisible to our mortal eyes.

Slide from the AIxCC presentation showing system exploits visually

Griffin solved that problem by turning it into a living cyberpunk data visualization. Every glowing line, shifting node, and pulsing icon represents a tangible game component: players, attack paths, reasoning chains, and resource allocation. He also showed the repo logging every agent action, mapped to commits, tool invocations, and reasoning steps. It was exciting to see how viewers could zoom into a single maneuver and see exactly how it unfolded from decision to execution.

Animated visualization showing each action from an agent during the AI Cyber Challenge

Griffin argued that making these competitions human-readable is essential for transparency, fairness, and knowledge-sharing. 

  • My take: Observability can transform AI security competitions from opaque technical duels into shared learning platforms, enabling the community to study strategies, benchmark performance, and even turn AIxCC into a true spectator sport. Could this really be the e-sport of the future?

Five Key Takeaways from DEF CON 33

Across DEF CON 33, five themes stood out for me:

  1. Continuous testing is inevitable: AI systems evolve and advance too quickly for static checks.
  2. Hybrid human-AI collaboration is already here: Pairing human intuition with model abilities delivers the strongest outcomes, so far.
  3. Definition matters: Agreed scope is essential for meaningful results.
  4. Policy and technical assurance are converging: Standardized reporting will determine whether AIRT stays niche or becomes the norm.
  5. Agentic systems raise the stakes: When AI is empowered to act, not just answer, vulnerabilities can trigger real-world consequences at machine speed. Testing must expand to cover the workflows, permissions, and decision-making chains that agents are allowed to control.

All of these prove that the next frontier of AI security is no longer about sharper exploits or stronger guardrails; what matters is embedding adversarial thinking into every stage of the AI lifecycle, from procurement and deployment to governance and enforcement.

That means moving from one-off/ad hoc exercises to continuous evaluation, adopting hybrid human-agent models as the default, expanding testing to cover agentic behaviors, and defining terms so AI red teaming, benchmarking, or evaluation results are comparable. Most importantly, it requires turning technical findings into actions that shape standards, guide procurement, and drive operational change.

I can’t wait to see what themes emerge at the next DEF CON and how far the AI security conversation has evolved by then!

Explore HackerOne’s AI Red Teaming capabilities

About the Author

Naz Bozdemir Headshot
Naz Bozdemir
Lead Product Researcher

Naz Bozdemir is the Lead Product Researcher for Research at HackerOne. She holds an MA and an MSC in cybersecurity and international relations.