Operationalizing Gartner TRiSM: Testing AI for Trust, Risk, and Security

Enterprises are under pressure to adopt AI, but few are equipped to test it under real conditions. Safety, security, and compliance failures are no longer hypothetical; misaligned agents, prompt injections, jailbreaks, and embedded plugin misuse are already being exploited. Traditional AppSec controls weren’t designed to address these threats. The risks are systemic, and the solutions must be too.
We believe the Gartner® Market Guide for AI Trust, Risk, and Security Management (TRiSM) offers a roadmap for governance, runtime inspection, and layered protection. However, TRiSM is only credible if its controls can withstand pressure.
The Problem: Misuse Risks, Systemic Gaps, and Regulatory Exposure
Even a well-trained model can misbehave when exposed to real-world complexity. AI agents can make unauthorized decisions. Tools can be misused. Sensitive context can be leaked through chain-of-thought reasoning or tool delegation. As Anthropic CISO Jason Clinton noted in a recent panel discussion:
“You’re not just testing a model. You’re testing what it can access, what it can decide, and how those decisions impact your systems.”
—Jason Clinton, Anthropic CISO
In our view, Gartner TRiSM pillars define how to manage these risks:
- Trust: Are outputs aligned with organizational values, ethical expectations, and business intent?
- Risk: Are emergent threats like jailbreaks and multi-turn misuse continuously discovered and mitigated?
- Security: Are adversarial paths blocked before they become breach vectors?
Today, most organizations lack the mechanisms to answer these questions with confidence.
Gartner Insight: Managing AI Risk at the System Level
We interpret the recent guidance from Gartner on AI Trust, Risk, and Security Management (TRiSM) that organizations must move beyond model-centric thinking and adopt layered, system-level controls to govern AI. Most enterprise AI incidents stem not from malicious attacks but from internal violations, oversharing, alignment failures, and unintended model behavior. In this context, TRiSM emerges as a strategic framework for reducing these risks by combining AI governance, runtime inspection, and traditional security.
As Gartner recommends: “Evaluate and implement layered AI TRiSM technology to continuously enforce policies across all AI use cases.”
The challenge, then, is operationalizing these principles. While TRiSM defines what’s needed, many organizations lack a way to test if those controls are functioning as intended.

Source: Tackling Trust, Risk and Security in AI Models, Gartner
Operationalizing TRiSM: Adversarial Testing Designed for AI
HackerOne’s AI Red Teaming (AIRT) delivers the validation layer TRiSM requires. Each engagement is scoped to simulate how your system could be exploited by a malicious actor, a misaligned tool, or an unsafe prompt chain.
AIRT surfaces hidden vulnerabilities across models, agents, plugins, and surrounding systems. Unlike checklists or static LLM evaluation, AIRT uses expert researchers to simulate real threats under real conditions.
Capabilities include:
- Human-led threat modeling across AI deployments
- Targeted testing via structured incentives based on refusal logic, misuse boundaries, and regulatory thresholds
- Creative adversarial testing to reveal jailbreaks, output violations, and tool abuse
- Reporting mapped to OWASP Top 10 for LLMs and TRiSM domains if needed
“Our [AI Red Teaming] challenge generated 300,000+ interactions and over 3,700 hours of red teaming. The result: zero universal jailbreaks. That told us a lot about the integrity of our system—and where we needed to refine classifier tuning and refusal thresholds.”
— Anthropic Safeguards Research Team, following a HackerOne-led AI red team on Claude 3.5
Why Human-Led Adversaries Still Matter
Automated scanners can’t predict intent. Prompt injection attacks evolve daily. Plugin misuse and logic drift require creativity to uncover. That’s why AIRT is powered by a vetted community of AI-native researchers who know how to think like attackers.
“Human ingenuity is crucial for understanding potential problems in novel areas.”
— Ilana Arbisser, Technical Lead, AI Safety, Snap Inc.
These are not hypothetical findings. AIRT engagements have surfaced:
- Novel jailbreak techniques
- Plugin misuse chains created through indirect delegation
- Misalignment between system prompts and desired/secure/authorized behavior
- Unintended model responses that violate internal policy or regulatory expectations
Extending AI TRiSM with Defense in Depth
While AIRT powers adversarial validation at runtime, HackerOne supports a broader portfolio to meet TRiSM needs across the stack.
Our defense in depth strategy ensures that testing spans the entire AI ecosystem, from build to runtime, and from models to integrations. This layered offensive coverage enables continuous testing across the full AI lifecycle, ensuring that every control in your TRiSM stack is both deployed and defensible.
Defense Layer | HackerOne Product | TRiSM Pillar Coverage |
---|---|---|
Pre-production Application Security Testing | Code, Pentest | Trust, Risk, Security |
AI System Testing | AI Red Teaming | Trust, Risk, Security |
Runtime Exposure | Bounty | Risk, Security |
Real-World Feedback | Bounty, Response (VDP) | Trust, Security |
If you are looking to move from “we think our AI deployments are safe” to “we know they are tested,” get in touch with our team to know where to start.
Gartner, Market Guide for Ai Trust, Risk, and Security Management, Avivah Litan, Max Goss, Sumit Agarwal, Jeremy D'Hoinne, Andrew Bales, Bart Willemsen, 18 February 2025
GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally and is used herein with permission. All rights reserved