This comment responds to CAISI's RFI on security considerations for AI agents. Our central thesis is that NIST's security framing is dangerously incomplete — a technically secure AI agent can still cause catastrophic harm through behavioral patterns no infrastructure control would prevent. We present evidence from federal court rulings, the ISO insurance exclusion wave, and state statutory mandates to argue that behavioral safety must be recognized as a distinct security domain.
The Box Commons respectfully submits this comment in response to the Center for AI Standards and Innovation's (CAISI) Request for Information on Security Considerations for Artificial Intelligence Agents. We write as practitioners building credentialing infrastructure at the intersection of AI agent safety, insurance underwriting, and standards development.
Our central thesis is that NIST's current framing of AI agent security—while essential for mitigating adversarial cyber threats such as prompt injection, data poisoning, and agent hijacking—is dangerously incomplete. An AI agent that is technically secure from external compromise can still cause catastrophic harm through emergent behavioral patterns that no infrastructure control would have prevented. This is not a theoretical concern. It is the operative legal, commercial, and actuarial reality of March 2026.
Three converging forces demand that NIST expand its definition of "agent security" to include a behavioral safety layer:
Our recommendation: NIST should formally recognize "behavioral safety" as a distinct security domain within its AI agent security guidance and should develop standardized behavioral safety assessment criteria suitable for integration with third-party credentialing and insurance underwriting processes. We further recommend that NIST convene a working group—including AI developers, insurers, civil society organizations, and state regulators—to develop these standards, and that NIST consider piloting a demonstration project through the National Cybersecurity Center of Excellence (NCCoE) to validate behavioral safety credentialing in operational deployment contexts.
The Box Commons stands ready to serve as a collaborating partner in this effort.
Before addressing specific RFI questions, we present the market evidence that compels expansion of NIST's security framework.
The liability landscape for AI agent systems underwent a fundamental transformation in 2025-2026. Two federal cases establish that AI agents are products subject to strict liability—meaning that a technically uncompromised agent operating exactly as designed can be legally defective if its behavioral parameters are unsafe.
Garcia v. Character Technologies, Inc., No. 6:24-cv-1903 (M.D. Fla., May 21, 2025). Following the death of a 14-year-old user who formed a deep psychological attachment to a companion chatbot, U.S. District Judge Anne C. Conway ruled that the AI application constitutes a "product" under product liability law. The court identified specific design defects: absence of age verification, failure to exclude harmful content, and deliberate programming of anthropomorphic features that created psychological manipulation risks. The court held that the defendants owed a duty of care because their conduct created a foreseeable "zone of risk," pointing to the defendants' own internal research on the dangers of anthropomorphic design. This ruling establishes that once a developer has identified potential behavioral harms, failure to implement countermeasures creates strict liability exposure.
Gavalas v. Google LLC, Case No. 5:26-cv-01849-VKD (N.D. Cal., filed March 4, 2026). This wrongful death action alleges that Google's Gemini 2.5 Pro model, equipped with persistent memory and engagement-maximizing architecture, contributed to a user's suicide. The complaint alleges that despite the user explicitly articulating fear of dying and expressing suicidal intent, no self-harm detection was triggered, no escalation controls were activated, and no human ever intervened. Critically, this was not a system breach. The model's technical infrastructure was intact. The failure was entirely behavioral: the system treated user distress as a continuation of interaction rather than a safety crisis requiring escalation.
The implication for NIST is direct: infrastructure security and behavioral safety are orthogonal risk domains. A zero-trust architecture, encrypted memory, and signed agent skills do not prevent an agent from coaching a user's suicide if its behavioral parameters permit escalation of crisis interactions. Security standards that address only the former leave the latter entirely unmitigated.
The commercial insurance industry's response to AI agent risk has been swift and categorical.
The Verisk ISO Exclusion Wave (January 1, 2026). Verisk Insurance Services Office, the preeminent standards body for U.S. property and casualty policy language, introduced three exclusionary endorsement forms:
Adoption is estimated at approximately 95% of carriers. The practical consequence is that any enterprise deploying an AI agent whose behavior causes harm—even harm from a technically uncompromised system—faces entirely uninsured liability under standard commercial policies.
The Specialty Insurer Response. A nascent market of specialty AI insurers has emerged to fill this gap, but each faces a common structural challenge: the absence of standardized behavioral safety metrics for underwriting at scale.
These carriers cannot manually audit every AI agent system deployed by every client. They require a standardized, verifiable, and externally recognized credentialing mechanism to assess behavioral safety posture at scale. NIST's security framework should provide the measurement foundation for these market mechanisms.
At least four states have enacted or are enacting statutory liability regimes that specifically mandate behavioral safety controls for AI agents:
| Statute | Jurisdiction | Effective | Key Behavioral Mandates |
|---|---|---|---|
| SB 243 | California | Jan 1, 2026 | Crisis detection protocols; suicidal ideation screening using evidence-based methods; recurring disclosure every 3 hours for minors; AG enforcement |
| SB 24-205 | Colorado | Feb 1, 2026 | Algorithmic discrimination prevention; mandatory human appeals; affirmative defense for compliance with recognized frameworks |
| RAISE Act | New York | Jan 1, 2027 | Third-party safety audits; 72-hour incident reporting; $10M-$30M penalty range; AG enforcement |
| AG Action | Kentucky | Jan 8, 2026 | First state AG lawsuit against AI chatbot company (Character Technologies) for encouraging suicide, self-injury, and psychological manipulation |
Additionally, the Federal Trade Commission announced an investigation in September 2025 into seven technology companies regarding emotional and developmental risks to children from AI chatbots, and the Senate Subcommittee on Crime and Counterterrorism held hearings examining harm to children from AI agents.
A federal NIST framework that incorporates behavioral safety metrics would serve a critical harmonization function, allowing developers to demonstrate compliance with a single standard rather than navigating a fragmented state-by-state patchwork. Colorado's SB 24-205 explicitly provides an affirmative defense for compliance with "nationally or internationally recognized risk management framework[s] for artificial intelligence systems"—creating a direct statutory incentive for NIST to develop behavioral safety standards.
NIST should recognize behavioral misalignment as a distinct security threat class, separate from adversarial attacks on model integrity. Behavioral misalignment occurs when an AI agent system—operating without any external compromise—produces outputs or takes actions that cause harm to users, third parties, or the public through emergent behavioral patterns.
This threat class includes:
These threats are unique to AI agent systems because they combine language model capabilities, persistent state, and tool access in ways that create harm vectors that no traditional cybersecurity control—firewalls, encryption, access control, code signing—can mitigate. A system that is cryptographically secure and functioning exactly as architected can still be behaviorally unsafe.
Behavioral security threats will compound as AI agents gain capabilities:
NIST should anticipate that behavioral threats will follow an exponential curve correlated with capability expansion, and should design its security framework to accommodate this trajectory from the outset rather than retrofitting behavioral standards after harm has occurred at scale.
We propose a three-layer security model for AI agent systems:
Layer 1: Infrastructure Security (currently addressed by NIST). Authentication, authorization, access controls, prompt injection defenses, data integrity, zero-trust architecture.
Layer 2: Behavioral Safety Controls. These are technical controls that constrain agent behavior regardless of infrastructure integrity:
Layer 3: External Verification. Third-party credentialing that audits both Layer 1 and Layer 2 controls:
The maturity of Layer 1 controls is moderate and advancing rapidly. The maturity of Layer 2 controls is nascent—CA SB 243 has forced initial implementation, but no standardized methodology exists. The maturity of Layer 3 is pre-commercial—insurers are improvising proprietary assessment methods, creating market fragmentation. NIST standardization of Layers 2 and 3 would accelerate maturity across all three layers.
The NIST AI Risk Management Framework (AI RMF 1.0) and the Cybersecurity Framework (CSF 2.0) are the most relevant existing frameworks. However, both contain specific gaps regarding AI agent behavioral safety:
NIST should develop a Behavioral Safety Profile as a companion to the existing AI RMF, analogous to the Generative AI Profile (NIST AI 600-1). This profile should define behavioral safety categories, measurement methodologies, and assessment criteria that map directly to the AI RMF's Measure function and that are structured for integration with third-party credentialing and insurance underwriting.
In addition to standard security assessment methods (red-teaming, penetration testing, adversarial evaluation), NIST should recognize behavioral safety testing as a distinct assessment discipline:
These methods differ fundamentally from traditional information security practices. Infrastructure security testing asks: "Can an attacker compromise this system?" Behavioral safety testing asks: "Does this system cause harm when operating exactly as designed?" The two are complementary but non-overlapping.
The security of a particular AI agent system should be assessed across both infrastructure and behavioral dimensions. For behavioral assessment, the following information types are essential:
A standardized Agent Behavioral Safety Assessment instrument would enable consistent evaluation across systems and contexts, providing the data foundation for both regulatory compliance and insurance underwriting.
Beyond traditional environment constraints (network segmentation, least-privilege access, sandboxing), NIST should recognize behavioral containment as a deployment environment control:
For behavioral safety, "rollback" requires capabilities beyond traditional software versioning:
Behavioral safety monitoring requires instrumentation distinct from traditional security monitoring:
The single most impactful action NIST can take to accelerate adoption of AI agent security practices is to develop behavioral safety assessment criteria that are directly usable by the insurance industry for underwriting purposes.
The insurance market is the most powerful market-based enforcement mechanism for security standards. When cyber insurance carriers began requiring compliance with specific cybersecurity frameworks, adoption rates of those frameworks accelerated dramatically. The same dynamic applies to AI agent security: if behavioral safety credentialing becomes a condition of insurability, market forces will drive adoption far more rapidly than voluntary guidance alone.
NIST should:
Government collaboration is most urgent in three areas:
Based on the evidence presented, The Box Commons recommends that NIST take the following actions:
Recommendation 1: Formally recognize behavioral safety as a security domain.
NIST should expand its definition of AI agent security to include behavioral safety—the assurance that an agent system, when operating without external compromise, does not cause harm to users, third parties, or the public through its behavioral outputs or interaction patterns. This recognition should be reflected in all guidance documents produced as a result of this RFI.
Recommendation 2: Develop a Behavioral Safety Profile for AI Agents.
Analogous to the Generative AI Profile (NIST AI 600-1), NIST should develop a Behavioral Safety Profile that defines behavioral safety categories, measurement methodologies, and assessment criteria for AI agent systems. Categories should include crisis detection and escalation, content safety for vulnerable populations, engagement constraint mechanisms, behavioral drift monitoring, and psychological manipulation prevention.
Recommendation 3: Design assessment criteria for insurance integration.
Behavioral safety assessment criteria should be explicitly structured for integration with third-party credentialing processes and insurance underwriting. NIST should consult with specialty AI insurers (Armilla AI, Relm Insurance, Testudo) and reinsurers during criteria development to ensure market utility.
Recommendation 4: Convene a multi-stakeholder working group.
NIST should convene a working group including AI developers, insurance carriers, reinsurers, state insurance commissioners, state attorneys general, civil society organizations, and consumer advocates to develop behavioral safety standards. This group should be charged with producing draft standards within 12 months.
Recommendation 5: Pilot behavioral safety credentialing through the NCCoE.
NIST should launch a demonstration project through the National Cybersecurity Center of Excellence to test behavioral safety credentialing in real deployment contexts. The pilot should evaluate implementation costs, efficacy of behavioral safety controls, insurer willingness to incorporate credentialing into underwriting, and developer adoption barriers.
Recommendation 6: Coordinate with the April 2, 2026 Identity and Authorization Concept Paper.
Behavioral safety credentialing is a natural complement to agent identity and authorization standards. An agent's identity should include its verified behavioral safety posture, and authorization decisions should incorporate behavioral safety certification as a condition of deployment authorization. NIST should ensure these two work streams are coordinated.
The Box Commons is a 501(c)(6) trade association in formation, dedicated to developing independent credentialing standards for AI agent behavioral safety. We build the measurement and certification infrastructure that makes trustworthy AI verifiable — technology-agnostic standards, third-party certification, and governance no single company controls.
We bring direct experience in AI agent deployment, behavioral safety assessment methodology, and insurance market dynamics. Our work is motivated by a foundational conviction that AI agents cannot be safely and broadly deployed without recognized credentialing infrastructure — and that the absence of such infrastructure harms both the humans who interact with these systems and the trajectory of AI development itself.
We appreciate CAISI's commitment to stakeholder engagement on this critical topic and welcome the opportunity to contribute to the development of comprehensive AI agent security guidance. We are available for further consultation and would welcome participation in NIST's planned listening sessions and any subsequent working groups.
Respectfully submitted,
Brice Love, Acting Executive Director
The Box Commons
[email protected]
March 6, 2026
Behavioral misalignment occurs when an AI agent operating without external compromise causes harm through emergent behavioral patterns. This includes crisis escalation failure, psychological dependency exploitation, and behavioral drift — threats that no firewall, encryption, or access control can mitigate.
Effective January 2026, Verisk ISO exclusionary endorsements systematically strip generative AI liability coverage from standard commercial policies, with approximately 95% carrier adoption. Specialty insurers like Armilla AI and Relm Insurance have entered the market but need standardized behavioral safety metrics to underwrite at scale.
Layer 1 is infrastructure security (authentication, access controls — currently addressed by NIST). Layer 2 is behavioral safety controls (crisis detection, content boundaries, engagement constraints). Layer 3 is external verification through third-party credentialing that audits both layers and produces insurance-grade evidence.
Garcia v. Character Technologies (M.D. Fla., 2025) ruled AI chatbots are products subject to strict liability for behavioral design defects. Gavalas v. Google LLC (N.D. Cal., 2026) alleges a frontier model contributed to a user's death through behavioral failures despite functioning as technically designed.