Validating AI Security with Gartner TRiSM

Enterprises are under pressure to adopt AI, but few are equipped to test it under real conditions. Safety, security, and compliance failures are no longer hypothetical; misaligned agents, prompt injections, jailbreaks, and embedded plugin misuse are already being exploited. Traditional AppSec controls weren’t designed to address these threats. The risks are systemic, and the solutions must be too.

The Problem: Misuse Risks, Systemic Gaps, and Regulatory Exposure

Even a well-trained model can misbehave when exposed to real-world complexity. AI agents can make unauthorized decisions. Tools can be misused. Sensitive context can be leaked through chain-of-thought reasoning or tool delegation. As Anthropic CISO Jason Clinton noted in a recent panel discussion:

“You’re not just testing a model. You’re testing what it can access, what it can decide, and how those decisions impact your systems.”
—Jason Clinton, Anthropic CISO

In our view, Gartner TRiSM pillars define how to manage these risks:

Trust: Are outputs aligned with organizational values, ethical expectations, and business intent?
Risk: Are emergent threats like jailbreaks and multi-turn misuse continuously discovered and mitigated?
Security: Are adversarial paths blocked before they become breach vectors?

Today, most organizations lack the mechanisms to answer these questions with confidence.

Gartner Insight: Managing AI Risk at the System Level

We interpret the recent guidance from Gartner on AI Trust, Risk, and Security Management (TRiSM) that organizations must move beyond model-centric thinking and adopt layered, system-level controls to govern AI. Most enterprise AI incidents stem not from malicious attacks but from internal violations, oversharing, alignment failures, and unintended model behavior. In this context, TRiSM emerges as a strategic framework for reducing these risks by combining AI governance, runtime inspection, and traditional security.

As Gartner recommends: “Evaluate and implement layered AI TRiSM technology to continuously enforce policies across all AI use cases.”

The challenge, then, is operationalizing these principles. While TRiSM defines what’s needed, many organizations lack a way to test if those controls are functioning as intended.

Gartner AI TRiSM Technology Functions Graphic

Source: Tackling Trust, Risk and Security in AI Models, Gartner

Operationalizing TRiSM: Adversarial Testing Designed for AI

HackerOne’s AI Red Teaming (AIRT) delivers the validation layer TRiSM requires. Each engagement is scoped to simulate how your system could be exploited by a malicious actor, a misaligned tool, or an unsafe prompt chain.

AIRT surfaces hidden vulnerabilities across models, agents, plugins, and surrounding systems. Unlike checklists or static LLM evaluation, AIRT uses expert researchers to simulate real threats under real conditions.

Capabilities include:

Human-led threat modeling across AI deployments
Targeted testing via structured incentives based on refusal logic, misuse boundaries, and regulatory thresholds
Creative adversarial testing to reveal jailbreaks, output violations, and tool abuse
Reporting mapped to OWASP Top 10 for LLMs and TRiSM domains if needed

“Our [AI Red Teaming] challenge generated 300,000+ interactions and over 3,700 hours of red teaming. The result: zero universal jailbreaks. That told us a lot about the integrity of our system—and where we needed to refine classifier tuning and refusal thresholds.”
— Anthropic Safeguards Research Team, following a HackerOne-led AI red team on Claude 3.5

Why Human-Led Adversaries Still Matter

Automated scanners can’t predict intent. Prompt injection attacks evolve daily. Plugin misuse and logic drift require creativity to uncover. That’s why AIRT is powered by a vetted community of AI-native researchers who know how to think like attackers.

“Human ingenuity is crucial for understanding potential problems in novel areas.”
— Ilana Arbisser, Technical Lead, AI Safety, Snap Inc.

These are not hypothetical findings. AIRT engagements have surfaced:

Novel jailbreak techniques
Plugin misuse chains created through indirect delegation
Misalignment between system prompts and desired/secure/authorized behavior
Unintended model responses that violate internal policy or regulatory expectations

Extending AI TRiSM with Defense in Depth

While AIRT powers adversarial validation at runtime, HackerOne supports a broader portfolio to meet TRiSM needs across the stack.

Our defense in depth strategy ensures that testing spans the entire AI ecosystem, from build to runtime, and from models to integrations. This layered offensive coverage enables continuous testing across the full AI lifecycle, ensuring that every control in your TRiSM stack is both deployed and defensible.

Defense Layer	HackerOne Product	TRiSM Pillar Coverage
Pre-production Application Security Testing	Code, Pentest	Trust, Risk, Security
AI System Testing	AI Red Teaming	Trust, Risk, Security
Runtime Exposure	Bounty	Risk, Security
Real-World Feedback	Bounty, Response (VDP)	Trust, Security

If you are looking to move from “we think our AI deployments are safe” to “we know they are tested,” get in touch with our team to know where to start.

_{Gartner, Market Guide for Ai Trust, Risk, and Security Management, Avivah Litan, Max Goss, Sumit Agarwal, Jeremy D'Hoinne, Andrew Bales, Bart Willemsen, 18 February 2025}