On May 5, 2026, Anthropic released 10 agent templates for financial services, available as plugins for Claude Cowork and Claude Code, and as cookbooks in the Claude Managed Agents SDK. The same week, Anthropic announced that Claude Mythos Preview completed a 32-step end-to-end cyber attack scenario autonomously — a signal that agent autonomy is reaching a new tier of capability.
What the Templates Actually Cover
Financial services work tends to be structured but judgment-heavy: reviewing earnings reports, checking documents against regulatory requirements, assessing portfolio risk. These tasks require reading, reasoning, and producing structured output — which is where an LLM with tools can add real value.
The 10 templates cover:
- Automated financial report summarization
- Regulatory compliance document review
- Portfolio risk assessment
- Transaction anomaly detection
- Investment scenario modeling for clients
Each template packages a complete agent configuration: system prompt, tool definitions, memory strategy, and sub-agent routing — not just a prompt.
Architecture of a Financial Agent
Here's a simplified version of the compliance review pattern using the Claude Managed Agents SDK:
import anthropic
client = anthropic.Anthropic()
def run_compliance_review(document: str) -> dict:
tools = [
{
"name": "lookup_regulation",
"description": "Look up applicable regulations for a specific topic in a given jurisdiction",
"input_schema": {
"type": "object",
"properties": {
"topic": {"type": "string"},
"jurisdiction": {
"type": "string",
"enum": ["US", "EU", "JP"]
}
},
"required": ["topic", "jurisdiction"]
}
},
{
"name": "log_compliance_issue",
"description": "Record a compliance issue identified in the document",
"input_schema": {
"type": "object",
"properties": {
"section": {"type": "string"},
"description": {"type": "string"},
"severity": {"type": "string", "enum": ["low", "medium", "high"]}
},
"required": ["section", "description", "severity"]
}
}
]
messages = [{"role": "user", "content": f"Review this document for compliance:\n\n{document}"}]
while True:
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=4096,
tools=tools,
messages=messages
)
if response.stop_reason == "end_turn":
return parse_findings(response.content)
tool_results = execute_tools(response.content)
messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": tool_results})Tool definition quality drives agent accuracy. Vague tool descriptions produce inconsistent behavior. Write tool descriptions that specify exactly what each tool does, what inputs it expects, and what a successful result looks like.
Claude Mythos and the Autonomy Context
Mythos Preview's autonomous completion of a 32-step cyber attack scenario is notable context. It shows that multi-step, consequential agent actions are now within reach of frontier models. The financial agent templates are a more bounded version of this capability — constrained to a defined domain with explicit tools and clear task scope.
The implication: the technical capability for autonomous financial agents exists. The limiting factor is governance, not model capability.
What to Build Before Going to Production
Data governance first
Financial data contains PII and regulated information. Determine what you can legally send to the Claude API before building the pipeline. Apply masking or tokenization upstream. This is foundational, not optional.
Tool failure handling
The templates assume external tools (market data APIs, compliance databases) work reliably. They don't. Design fallback behavior for every tool: what happens when a lookup fails, times out, or returns unexpected data.
Human-in-the-loop gates
For consequential outputs — trade recommendations, compliance verdicts, risk ratings — build explicit human approval steps into the workflow. Use tool calls to pause the agent at these checkpoints and wait for confirmation before proceeding.
Takeaway
The templates are worth studying even if you're not in financial services. The patterns — structured tool use, multi-step agentic loops, domain-scoped system prompts — apply broadly.
Treat them as a starting architecture, not a finished product. The customization you need (your data pipeline, your error handling, your approval flows) is where the actual engineering work happens.
Sources: Anthropic Blog, Air Street Press "State of AI: May 2026", blog.greeden.me