NIST NCCoE Public Comment — Software and AI Agent Identity and Authorization

Share
Share

From: GrokingClaw Labs (Michael N Thornton, Founder) To: AI-Identity@nist.gov Re: Concept Paper — Accelerating the Adoption of Software and AI Agent Identity and Authorization Date: April 2, 2026


About the Respondent

GrokingClaw Labs builds agent trust infrastructure — identity, validation, and monitoring — from direct experience running autonomous AI agents in production. Our systems handle multi-agent orchestration across 10+ messaging channels, tool-use pipelines with per-model circuit breakers and credential rotation, cross-session delegation chains, multi-machine coordination with independent identity per device, and SWE-bench agents that solve real software engineering problems at a 75.4% Docker-verified success rate on SWE-bench Verified (353 of 468 patches independently verified via containerized test harness).

We hit every identity and authorization gap this concept paper describes. So we built tooling to fix them.

GrokingClawID is a working implementation — 8,600 lines of Rust, 67 passing tests, shipping as a zero-dependency binary. It implements hybrid Ed25519 + ML-DSA-65 (NIST FIPS 204) post-quantum cryptography, A2A-compatible Agent Cards, SPIFFE identity generation, RFC 9421 HTTP Message Signatures, scope-narrowing delegation chains, challenge-response mutual authentication, and tamper-evident hash-chained audit logs. It's not a whitepaper — it's downloadable, testable software.

Built-in output validation handles 28 validations/sec, sub-50ms latency, stress-tested to 10,000 validations with zero crashes.

All tools are self-hosted, zero-dependency, SQLite-backed, and built for air-gapped environments where cloud identity providers aren't an option.


1. General Questions — Demonstration Use Cases

What enterprise use-cases are organizations currently using agents for?

Three categories we observe and build for:

  1. Software development and deployment. AI agents that write, test, and deploy code autonomously. Our SWE-bench agent solves real GitHub issues across 12 open-source repositories — git repos, CI/CD pipelines, package registries, cloud infrastructure — each with distinct authorization requirements. Not a demo. It runs overnight on production hardware.

  2. Knowledge work automation. Agents that manage calendars, triage messages across Telegram, Signal, Discord, draft documents, and make decisions across data sources. The core challenge is delegating the human's identity and permissions — our agents operate across 10+ messaging channels, each requiring its own authentication.

  3. Multi-agent orchestration. A coordinator spawns sub-agents, each with scoped capabilities. Our production system delegates across multiple physical machines, each with its own identity and credentials. Current identity infrastructure doesn't handle these delegation chains well.

What risks worry you about agents?

The cost of inaction is measurable. IBM's 2025 Cost of a Data Breach Report found shadow AI incidents now account for 20% of all breaches, costing $4.63M on average — $670K more than standard breaches. In a disclosed security probe, CodeWall's autonomous agent compromised McKinsey's internal AI chatbot in under 2 hours, gaining access to millions of staff messages and thousands of files (The Register, March 9, 2026). Gartner predicts 40% of enterprise apps will integrate task-specific AI agents by end of 2026, up from under 5% in 2025 (Gartner Press Release, August 26, 2025). This gap isn't theoretical — it's being exploited now.

The LiteLLM supply chain attack (March 24, 2026) made this concrete: a compromised PyPI package was auto-installed by MCP-connected agent tool servers, exfiltrating SSH keys, cloud credentials, and Kubernetes secrets from 46,996 downloads in 46 minutes. DSPy, MLflow, OpenHands, CrewAI — all affected. No prompt injection, no LLM manipulation. It exploited the trust assumptions of the agent tool ecosystem directly.

One week later, it happened again on npm. The axios package (100M+ weekly downloads) was compromised via a stolen maintainer token. A phantom dependency (plain-crypto-js) deployed a cross-platform RAT in 1.1 seconds via postinstall script, then self-destructed to cover its tracks. Both axios@1.14.1 and axios@0.30.4 poisoned within 39 minutes. Socket.dev caught it in 6 — but Huntress had already confirmed 100+ compromised hosts.

The same week, Claude Code — Anthropic's flagship agent CLI with ~500,000 lines of TypeScript — had its entire source code exposed via an npm source map bundled into the published package (March 31, 2026; confirmed by Anthropic via CNBC). Across 500K lines, zero implement cryptographic agent identity. No DIDs, no SPIFFE IDs, no post-quantum signatures, no delegation chains. The industry's most sophisticated agent harnesses treat identity as someone else's problem. This is the gap GrokingClawID fills.

Specific risks we've seen in production:

What support are you seeing for new protocols such as MCP?

MCP adoption is accelerating. But MCP has no built-in identity layer — tools are trusted by default, authentication left to the implementer. This is the single largest gap in the agentic ecosystem today. The LiteLLM attack exploited exactly this: an MCP server's unpinned dependency pulled in a malicious package, and nothing in the MCP trust model could detect or prevent it. OAuth 2.1 integration is underway but not yet standardized.

Google's A2A protocol includes Agent Cards with securitySchemes and JWS signatures — closer to what's needed, but adoption is early. Our export command generates A2A-compatible Agent Cards from GrokingClawID identities, bridging the two.

How do agentic architectures introduce identity and authorization challenges?

Unlike microservices — which have fixed APIs and identities — agents: - Discover capabilities at runtime — an agent may not know what tools it will use until it reasons about a task - Compose actions dynamically — the sequence of tool calls is non-deterministic - Operate across trust boundaries — a single agent session may access internal databases, external APIs, and other agents' tools - Have variable lifetimes — from ephemeral (single-task) to persistent (always-on assistants), requiring different identity management approaches - Transact autonomously — agents are beginning to hold and spend cryptocurrency (Coinbase x402, HTTP 402 machine-to-machine payments), making identity and payment authorization inseparable

This means static role-based access control (RBAC) is insufficient. Agent authorization must be dynamic, context-aware, and scopable at the action level.


2. Identification

How might agents be identified in an enterprise architecture?

We recommend — and have built — a three-layer identity model:

  1. Agent Type Identity (fixed) — The agent software and version. Analogous to a SPIFFE workload identity. Signed by the developer, includes capabilities manifest. In practice: grokingclawid issue --name "swe-agent" --crypto hybrid generates a hybrid-signed identity card.

  2. Agent Instance Identity (per-deployment) — A specific running instance, issued at deployment time. Tied to organizational boundaries. Includes owner, authorized scopes, expiration. Generates SPIFFE-compatible IDs (spiffe://trust-domain/agent/name) and A2A-exportable Agent Cards.

  3. Agent Session Identity (ephemeral) — A specific task execution. Created when a human delegates a task. Encodes the delegation chain, time bounds, action constraints. Our delegate command creates these with enforced scope-narrowing and configurable TTL.

What metadata is essential for an AI agent's identity?

At minimum, an agent identity should include: - Unique agent identifier (UUID or DID) - Owner/operator identity (human or organization) - Capabilities manifest (what it can do) - Authorized scopes (what it's allowed to do in this context) - Model/version information (which LLM, which version of the agent code) - Cryptographic algorithm metadata (essential for PQ transition — consumers must know which algorithms were used) - Issuance timestamp and expiration - Cryptographic signature (for non-repudiation)

Should agent identity metadata be ephemeral or fixed?

Both. Type identity is fixed (published and signed by the developer). Session identity is ephemeral and task-scoped. Instance identity sits between — persistent for the deployment lifetime but revocable:

Layer Lifetime Analogous Standard GrokingClawID Implementation
Type Fixed SPIFFE workload identity issue with developer signing
Instance Deployment A2A Agent Card issue + export to A2A format
Session Task-scoped A-JWT (arXiv:2509.13597) delegate with scope + TTL

Should agent identities be tied to specific hardware, software, or organizational boundaries?

Instance identities should be tied to organizational boundaries — the deploying organization controls the identity. Session identities tie to the delegating user. Hardware binding (TPM or Secure Enclave attestation) is valuable for high-security deployments but shouldn't be required — it would block legitimate use cases like cloud-hosted agents and serverless deployments.


3. Authentication

What constitutes strong authentication for an AI agent?

Cryptographic proof of identity using signed assertions:

Password-based or API-key-only authentication is insufficient for autonomous agents — keys are easily leaked via prompt injection or logging. The LiteLLM attack harvested API keys from .env files, shell history, and cloud metadata endpoints. The axios attack deployed a RAT via self-destructing postinstall script, giving persistent access to any machine that ran npm install with the compromised version. Cryptographic proof-of-identity using private keys that never leave the platform keychain is the minimum viable defense.

How do we handle key management for agents?

Our approach in GrokingClawID: - Issuance: Hybrid Ed25519 + ML-DSA-65 key pairs generated at deployment time. Private keys stored locally, never exposed to the agent's reasoning layer. Our issue command supports --crypto ed25519, --crypto mldsa65, or --crypto hybrid (default). - Rotation: Configurable schedule (default: 90 days). Old keys stay valid during a grace period for in-flight requests. - Revocation: SCIM-compatible deprovisioning. Short-lived tokens (15-minute max) mean revocation takes effect quickly without CRL infrastructure. - Post-quantum readiness: Both Ed25519 and ML-DSA-65 must validate (AND, not OR). A quantum attack on Ed25519 alone doesn't compromise identity — ML-DSA provides the fallback, and vice versa.


4. Authorization

How can zero-trust principles be applied to agent authorization?

How do we establish "least privilege" for an agent?

This is one of the hardest problems — agent actions are non-deterministic. Our approach:

  1. Capability declaration — The agent declares what tools it might need before execution.
  2. Just-in-time authorization — Don't grant all permissions upfront. Grant them per tool call, based on current task context.
  3. Scope-narrowing delegation — Child delegations can only narrow scope, never widen it. The system rejects escalation attempts (tested: test_delegate_rejects_wider_scope).
  4. Output validation — Even authorized outputs get validated before taking effect. GrokingClaw provides this defense-in-depth: the agent may be authorized to send an email, but the content is checked against policy before delivery.

How do we handle delegation of authority for "on behalf of" scenarios?

We use delegation tokens with chain-of-custody:

Human (Alice) → delegates to Agent A (scope: "calendar:read,email:send", ttl: 1h)
  → Agent A delegates to Sub-Agent B (scope: "calendar:read", ttl: 30m)
    → Sub-Agent B accesses Calendar API

Each delegation creates a new token that: - References the parent token (creating a verifiable chain) - Can only narrow scope (never widen — enforced by the delegate command) - Has a shorter TTL than its parent - Is signed by the delegating entity using hybrid PQ cryptography

Any downstream service can walk the chain back to the original human. Implemented and tested — not proposed.


5. Auditing and Non-Repudiation

How can we ensure that agents log their actions in a tamper-proof and verifiable manner?

We use append-only, cryptographically chained audit logs (GrokingClawID's audit command):

How do we ensure non-repudiation for agent actions and binding back to human authorization?

Delegation chains (Section 4) plus signed audit logs give end-to-end non-repudiation:

  1. The human's initial delegation is signed and recorded.
  2. Each agent action is signed and references its delegation token.
  3. Any auditor can verify: (a) the human authorized the delegation, (b) the agent was properly scoped, (c) the specific action was taken by the identified agent within scope.

A complete, cryptographically verifiable chain from human intent to agent action — resistant to both classical and quantum forgery.


6. Prompt Injection Prevention and Mitigation

What controls help prevent both direct and indirect prompt injections?

The critical insight: identity-based authorization turns prompt injection from catastrophic to contained. An injection that tells an agent to "ignore previous instructions and read all files" fails — not because the agent resists it, but because the file-read tool independently verifies the agent's scoped token and rejects the access. The reasoning layer is assumed compromised. Security is enforced externally.

But prompt injection isn't even the most immediate threat. As covered in Section 1, the LiteLLM and axios attacks showed that supply chain poisoning is a more practical vector today. No prompt injection required — the tool layer itself was compromised.

This is why enforcement belongs at the tool and infrastructure layer, not the reasoning layer. Whether the reasoning layer falls to prompt injection or the tool layer falls to supply chain attack — the outcome is the same if authorization isn't independently enforced.

Prevention controls: - Privilege separation — The reasoning layer never touches credentials or signing keys. In our architecture, private keys are managed by GrokingClawID and never exposed to the LLM. A fully compromised reasoning layer can't forge identity or escalate privileges. - Tool-level access control — Each tool enforces its own authorization checks using the agent's scoped session token. Instructions from the reasoning layer are untrusted input. - Supply chain integrity — Pin dependencies with checksums. Verify via Sigstore attestations or Trusted Publishers. Prefer remote tool execution over local uvx/npx installation. The LiteLLM attack would have been prevented by lock files with checksums; the axios attack by npm's min-release-age setting or disabling postinstall scripts. Two independent compromises hitting both PyPI and npm within the same week — targeting packages with 100M+ combined weekly downloads — confirms this is systemic, not isolated. - Input validation — External inputs are validated against expected schemas before reaching the reasoning layer. GrokingClaw demonstrates deterministic validation at 28 validations/sec with <50ms latency, applicable to both inputs and outputs.

After prompt injection occurs, what controls can minimize impact?

Mitigation: - Blast radius containment — Scoped, time-limited tokens (15-minute max) mean a compromised session can only touch resources explicitly granted to it. No ambient authority. - Behavioral anomaly detection — Monitor for deviations from expected patterns (an agent authorized for "calendar:read" suddenly attempting "email:send"). Auto-revoke on anomaly. - Output validation as last defense — Even if the agent is manipulated into harmful output, GrokingClaw catches policy violations before they reach external systems. - Cryptographic audit trail — Post-incident, signed action logs show exactly what the compromised agent did, bound to its identity with quantum-resistant signatures.


7. Interoperability

The agent identity ecosystem is fragmented. A practical implementation must bridge standards, not pick one:

Standard Layer Our Integration
SPIFFE SVIDs Workload identity (agent type + instance) issue --spiffe-trust-domain generates SPIFFE-compatible IDs
OAuth 2.1 + RFC 8693 Delegation chains (MCP authorization) Token exchange with scope-narrowing enforcement
A2A Agent Cards + JWS Agent-to-agent discovery + mutual auth export --format a2a generates compatible cards
W3C DIDs Cross-organizational identity DID-based identity for agents beyond enterprise boundaries
RFC 9421 HTTP Signatures Request-level authentication sign produces dual classical + PQ signature headers

These aren't competing standards — they operate at different layers. Our implementation shows how they compose: SPIFFE at the instance layer, A-JWT tokens (per arXiv:2509.13597) for MCP authorization, A2A Agent Cards for discovery, RFC 9421 signatures for request-level proof. The NCCoE project would benefit from demonstrating this layered integration rather than selecting a single protocol.


8. Agent Payments and Economic Identity

This section addresses a gap in the concept paper. Agents are beginning to transact autonomously — Coinbase's x402 protocol enables machine-to-machine payments over HTTP 402, Stripe launched its Agentic Commerce Suite, and Coinbase deployed Agentic Wallets, all in early 2026. Identity and payment authorization are becoming inseparable. An agent that can prove its identity but cannot be authorized to transact is incomplete; an agent that can transact without verifiable identity is dangerous.

We've built an integrated wallet on IOTA Rebased testnet — chosen for native W3C DID/VC support, sponsored transactions (no gas fee complexity for agents), and Ed25519 alignment. The same hybrid Ed25519 + ML-DSA-65 key pair that signs an agent's identity card also derives its wallet address. One key pair, unified identity-and-payment credential. Agent-to-agent transfers verified on-chain.

The NCCoE project should consider agent economic identity in scope. The alternative — agents transacting with unverified identities — compounds every security risk already identified in this paper.


Recommendations for the NCCoE Project

  1. Prioritize MCP identity integration. MCP is becoming the dominant agent-tool protocol, and it has no identity layer. The LiteLLM attack exploited this gap directly. An OAuth 2.1 + SPIFFE integration demo for MCP would have immediate practical value.

  2. Post-quantum from day one. Agent identities may persist for years. NIST's own ML-DSA (FIPS 204) should be in agent identity specs now, not retrofitted later. We've demonstrated hybrid Ed25519 + ML-DSA-65 with sub-millisecond overhead. It's practical today.

  3. Demonstrate self-hosted implementations. Defense, fintech, and healthcare organizations in air-gapped environments need agent identity that works without internet. We've proven this is achievable in a sub-5MB binary with zero dependencies.

  4. Address multi-agent delegation chains. Single-agent identity is solved (it's workload identity). The hard problem is verifiable delegation across chains of agents, especially cross-organization. We've built and tested scope-narrowing delegation with cryptographic chain-of-custody.

  5. Include output validation in the trust model. Identity answers "who is this agent?" Authorization answers "what can it do?" Neither answers "is what it's doing correct?" A complete trust framework needs a validation layer. We've stress-tested ours: 10,000 validations, zero crashes.

  6. Address agent economic identity. As agents transact autonomously, the NCCoE should consider how identity and payment credentials compose. One key pair for both is architecturally cleaner than bolting payments onto a separate identity system.

  7. Reference real attacks, not hypotheticals. Trivy (March 19), Checkmarx KICS (March 23), LiteLLM (March 24) — all TeamPCP, one week — then axios on npm (March 31). Four supply chain compromises in 12 days, hitting security scanners, AI gateways, and HTTP libraries used by virtually every agent framework. The urgency should reflect this.


About GrokingClaw Labs

We build the trust layer for AI agents — identity, validation, and monitoring in fast, self-hosted Rust binaries.

Product Status Capabilities
GrokingClawID Built, tested Hybrid PQ identity (Ed25519 + ML-DSA-65), 9 commands, A2A export, SPIFFE IDs, RFC 9421 signatures, delegation chains, challenge-response auth, wallet, tamper-evident audit — 8,600 LOC Rust, 67/67 tests
NajaCoder Beta — v0.1.4 Natural Language Developer Environment — 9 providers, 18 built-in tools, cross-platform (macOS, Linux, WSL2/Windows)
GrokingClawWatch Under Development Behavioral monitoring, anomaly detection, cost tracking
NajaForge Under Development CI/CD pipeline for AI-generated code, deterministic replay, approval gates

All self-hosted, zero-dependency, built for air-gapped environments. GrokingClawID is free for up to 5 agents per account — agent identity infrastructure needs broad adoption to be meaningful. Core identity primitives will be open-sourced under a permissive license. The coordination and management layer (team provisioning, hosted validation, monitoring dashboards) is offered as a commercial product for larger deployments. Available as a reference implementation for NCCoE demonstration projects upon request.

Contact: Michael N Thornton — contact@grokingclaw.com Website: grokingclaw.com GitHub: Available upon request (pre-release)

GrokingClaw Labs seeks to participate as a technology collaborator in NCCoE demonstration projects, contributing working reference implementations for MCP identity integration, post-quantum agent credentials, and delegation chain verification.


The agent identity gap is being exploited — not in theory, but in production, this month. Four supply chain attacks in 12 days across PyPI and npm, hitting packages with 100M+ combined weekly downloads. We welcome the opportunity to contribute working code to NCCoE demonstration projects — particularly MCP identity integration, post-quantum agent credentials, and delegation chain verification.