Pentesting Must Evolve: Continuous Validation at AI Scale, Expert Verified

Naz Bozdemir
Lead Product Researcher
Dan Mateer
Director of Product, Code Security
Image
Digital landscape

For a long time, pentesting had a workable rhythm.

Most teams focused on the crown jewels, the highest-risk apps, and the most sensitive data. The smaller slice of the attack surface that mattered most. If tested well, risk felt manageable. Then the cloud emerged, data spread, and AI accelerated both development and attacks. What used to be “out of scope” may now be the easiest way in.

While security leaders test the same 20%, their real exposure risk has easily doubled or tripled. As a result, security decisions rely on an incomplete picture of today’s exploitability.

That’s where traditional pentesting breaks down: Point-in-time tests can’t keep up with a fast-expanding attack surface, and human-led testing alone can’t scale without running into resource and budget limits. 

Recent survey data reinforces this: For most leaders (67%), pentesting is the norm but nearly half of teams say resource and budget limits keep them from keeping pace with change*.

Closing the between-test gap calls for a hybrid model: the combination of an always-on automation to track change, paired with expert human judgment to deliver validated results that are accurate, prioritized, and actionable.

It’s the difference between a one-time look and knowing the results still match reality as things change.

Where Automation and Human Testing Hit Their Limits

Teams deploy software updates constantly across expanding application and API ecosystems, while attackers use automation and AI to shrink the window between discovery and exploitation.

In this environment, human-only pentesting struggles to keep up. Skilled testers are scarce and costly, limiting how often and how broadly tests can run. Even the most experienced pentesters spend significant time on setup, enumeration, and repeatable checks before they can focus on higher-order attack paths.

This isn’t a challenge simply solved by more automation. The most recent Hacker-Powered Security Report’s findings are clear: leaders say reliably triaging complex findings or accurately assessing severity, their top two critical security decisions, still require human judgment*.

Image
GenAI Datapoint

Expertise is necessary, yet human-led testing remains difficult to scale, as security leaders point to resource shortages and budget constraints. 

When security leaders struggle to combine automation with expert validation at scale, several consequences follow:

  • Findings go stale faster than environments change
  • New endpoints and integrations ship without testing
  • Pentesters spend more time on setup than exploitation
  • Clean reports create confidence that doesn’t match reality.

This creates a familiar pattern: teams repeat thorough tests on known paths while blind spots quietly expand elsewhere as the business moves continuously.

Agents + Humans Make Continuous Validation Real

The way forward is a hybrid approach: agentic execution for scale, paired with expert validation for accountability.

That’s the power of HackerOne's Agentic Pentest as a Service (PTaaS). It uses a coordinated system of AI agents and human experts to scale reconnaissance, setup, exploitation, and validation across large, fast-changing attack surfaces, while keeping findings grounded in real-world exploitability.

HackerOne Agentic PTaaS is designed around a simple goal: continuous security validation that security teams can trust and act on at enterprise scale. In this model, agents take responsibility for coverage and consistency, while human experts retain responsibility for judgment, prioritization, and impact.

In practice, this offers:

  • Agentic scale with expert accountability: A coordinated system of AI agents and human experts that scales reconnaissance, setup, exploitation, and validation across large and changing attack surfaces while preserving judgment, accountability, and trust.
  • Real exploitability, not theoretical risk: Findings are verified so teams can focus on what actually matters. Validation occurs before results reach engineering teams, preserving credibility and reducing wasted remediation cycles.
  • Built on proven pentesting foundations: Agentic PTaaS builds on the foundation of HackerOne Pentest, extending it into a continuous model.
  • Exploit intelligence meets elite expertise: Agents are trained and refined using proprietary exploit intelligence informed by years of testing real enterprise systems, paired with a robust, verified community of elite pentesters.

Our tests show the impact: HackerOne’s PTaaS agent delivered 88% fix-verified accuracy on our benchmark suite, more than doubling model-only accuracy while keeping false positives low.

What Makes Agentic PTaaS Different in Real Environments

Bolting AI onto a pentest workflow is one approach. But the real impact comes from what that AI knows, how it’s guided, and who stands behind the results.

HackerOne’s approach is unique for a few practical reasons:

Agents are shaped by real enterprise exploit intelligence

The agents learn from years of real-world enterprise pentest experience and are continually tuned using proprietary exploit signals, and built to recognize how vulnerabilities show up in the messy reality of modern environments.

 

They plan, execute, and validate across multiple steps instead of relying on static payload execution.

Elite human expertise is deployed at real scale

That intelligence is paired with a robust, verified community of elite pentesters for depth, coverage, and creativity at a scale teams can’t replicate easily.

 

Human expertise is applied where uncertainty exists, not where automation is already sufficient.

Code-aware testing goes deeper, when you want it

Securely connect source code and Agentic PTaaS becomes code-aware: agents spot risky patterns, propose testable hypotheses, and the AI-accelerated, expert-validated system delivers high-confidence findings that match how your app is built.

 

Testing shifts from broad guessing to targeted validation.

Real-world benchmarks for validation

Agents are evaluated through public and proprietary benchmarks and tested in real-world enterprise production environments across industries, rather than relying on synthetic validation.

 

Performance is measured by the quality of outcomes, not by model claims.

Together, this is what helps Agentic PTaaS stand out from solutions that are either automation-heavy but unverified, or human-heavy but hard to scale. The goal is not to produce more findings, but driving better decisions, backed by validated evidence, to lower organizational risk.

Additional Elements in HackerOne’s PTaaS Evolution

Agentic PTaaS is part of a bigger shift in how HackerOne is transforming pentesting by weaving AI deeper into scoping, execution, and validation while keeping experts accountable for the final call.

New additions support this vision:

  • Pentest Scoping Assistant helps teams define clearer objectives, provide better context, and map a more accurate attack surface before testing starts, improving scoping consistency across engagements.
  • Expanded LLM Application Testing supports assessments of AI-powered applications and large language models for newer risk categories like prompt injection, data leakage, and unsafe agent behavior.
  • Hai (HackerOne’s Agentic AI System) that supports scoping, triage, reporting, and validation workflows, with humans staying firmly in the loop.

Agentic PTaaS and the capabilities listed above are delivered through the HackerOne Platform and play a central role in operationalizing continuous threat exposure management (CTEM), connecting validation, prioritization, and remediation into a repeatable loop to strengthen your overall security program.

The Future of Pentesting Is Agentic: Continuous and Verified

With AI reshaping engineering and security workflows, offensive security must evolve with it. Fully automated testing alone cannot deliver the trust enterprises need, and human-only services cannot scale fast enough on their own.

With years operating in large, complex enterprise environments, HackerOne brings together automation and expert judgment to deliver security results that hold up at scale.

This is the model HackerOne Agentic PTaaS enables:

  • Proof for only what’s exploitable
  • With a faster time to actionable findings
  • Grounded in real attack steps
  • For governance-ready results you can defend

By combining agentic AI with elite human expertise, HackerOne Agentic PTaaS moves enterprises beyond inconsistent testing and toward continuous exposure reduction, helping teams find, validate, and fix the issues that actually matter.

Read how Agentic PTaaS delivers a modern approach for continuous validation.

Reach out to schedule a demo.


*Hacker-Powered Security Report 2025: The Rise of the Bionic Hacker

Survey methodology: HackerOne and UserEvidence surveyed 99 HackerOne customer representatives between June and August 2025. Respondents represented organizations across industries and maturity levels, including 6% from Fortune 500 companies, 43% from large enterprises, and 31% in executive or senior management roles. In parallel, HackerOne conducted a researcher survey of 1,825 active HackerOne researchers, fielded between July and August 2025. Findings were supplemented with HackerOne platform data from July 1, 2024 to June 30, 2025, covering all active customer programs. Payload analysis: HackerOne also analyzed over 45,000 payload signatures from 23,579 redacted vulnerability reports submitted during the same period.

About the Authors

Naz Bozdemir Headshot
Naz Bozdemir
Lead Product Researcher

Naz Bozdemir is the Lead Product Researcher for Research at HackerOne. She holds an MA and an MSC in cybersecurity and international relations.