Blog

Why the AI transformation in finance needs governance at the signing layer

Resources
·
·

About: How Turnkey helps teams govern AI agents that move money onchain with enclave-backed key management, policy-controlled signing, scoped wallet authority, and infrastructure-level controls that limit what an agent can do when it drifts.

Audience: Developers building AI agents, agentic finance products, trading systems, treasury automation, crypto payment flows, DeFi applications, autonomous wallet workflows, and infrastructure teams responsible for securing agent-controlled assets.

What you’ll learn:
  • Why more capable AI agents create longer autonomy windows and larger signing risks
  • How goal drift, prompt injection, poisoned data sources, and overbroad wallet permissions can lead to unauthorized asset movement
  • Why secure agentic finance needs more than prompts, key isolation, or model alignment alone
  • How Turnkey helps teams keep private keys out of the agent environment, enforce policies before signing, scope agent authority by wallet, chain, contract, function, recipient, and value, and make financial autonomy safer onchain

Reading time: ~9 minutes

When you hand an AI agent its own wallet and a job to do, how long can you trust it before it drifts off task?

By combining the results from METR's time horizon research, which measures how long agents can work autonomously, and a peer-reviewed goal drift study co-authored at Apollo Research, measuring whether they stay on task while they do, we’re able to make some assumptions.

What we found reframes what "governing" an AI agent with a wallet actually requires.

The two studies do not measure the same thing. One looks at how long agents can work. The other looks at whether they stay aligned with their instructions while working.

Taken together, they point to the same risk: as agents become capable of working for longer periods, teams will give them longer tasks. And the longer the task, the more opportunity there is for goal adherence to drift.

A concrete way to think about it is that the more autonomy an agent can handle, the more room it has to wander off its instructions partway through. For an agent that moves money, that can be extremely damaging.

The AI transformation in finance is an onchain transformation

When people talk about AI agents handling payments, treasury, or trading, they are describing software that holds funds, signs transactions, and acts without a human in the loop on each step.

This is why agentic finance and blockchain infrastructure are converging. An agent that can pay per request, settle instantly, and operate across chains needs to control a wallet directly, not file a request with a payment processor.

In these modern finance applications, agentic autonomy is the point, and it's important for more capable agents to handle this autonomy for longer and longer periods.

But, still, if left unchecked, autonomy can be risky.

In 2026, an attacker drained roughly $200,000 from a live agent-controlled wallet on Base using nothing more than a crafted message. The attacker didn’t steal a private key or exploit a smart contract. The agent was simply talked into signing. 

Context

What is the AI transformation in finance?

Banks have used AI for decades to analyze money: scoring credit, flagging fraud, forecasting risk. The transformation now underway is different in kind. AI agents are moving from recommending transactions to executing them, shifting payments from human-initiated instructions to agent-mediated decisions.

The IMF's note on agentic AI in payments calls out the core tension: probabilistic AI behavior meeting the deterministic requirements of payment infrastructure. Gartner projects that 40% of enterprise applications will be integrated with task-specific AI agents by the end of 2026, up from less than 5% today. The money is already in motion; the question is what governs it.

When an agent drifts, the wallet still needs to hold the line

Drift can come from accumulated patterns in the agent's context, conflicting tool outputs, or simply too many steps between the user's intent and the final transaction. By the time the agent asks for a signature, the request may still look legitimate. The agent is authenticated. The workflow is active. The wallet is available. But the action no longer matches the mandate.

This is the problem with giving an agent broad signing authority. If the only control is "the agent is allowed to use this wallet," then every drifted decision can become a financial action. The agent does not need to steal the key to misuse funds. It only needs to ask the wallet to sign something it should not sign.

That is why agentic finance needs governance at the signing layer. The wallet cannot rely only on the agent's current reasoning. It needs an independent policy boundary that checks every transaction against the user's intended limits before a signature is produced.

Those limits can be scoped by wallet, chain, contract, function, recipient, value, time window, approval rule, or any other policy the application requires. The agent can still operate autonomously, but only inside a defined financial perimeter.

In this model, drift does not automatically become loss. The agent may wander from its mandate, but the wallet refuses actions outside the mandate.

The safe autonomy horizon

Here is where the two studies come in.

Research from METR on the length of tasks AI agents can complete found that the duration of work a frontier agent can handle at 50% reliability has been doubling roughly every seven months for six years, and more recent data suggests that pace has accelerated to around every four months. 

The clear direction of travel is that agents can be trusted with longer, more autonomous tasks each cycle. That makes agentic finance possible, long-term autonomy to freely interact within the economy. 

But the second trend points the other way. A study on goal drift in language model agents, co-authored by Apollo Research director Marius Hobbhahn and published at the AAAI/ACM Conference on AI, Ethics, and Society, measured how well agents stick to an assigned objective as they operate.

The best agent in the study stayed almost perfectly on task across more than 100,000 tokens in the hardest setting. But every model drifted at least somewhat.

And drift increased as agents ran longer and faced more adversarial pressure.

In other words, goal adherence gets weaker over longer operations. That creates a problem for long-horizon agents. The more useful they become, the more time they spend in the zone where they are likely to drift from their mandate.

As capability is improving quickly, there is less evidence that goal adherence is improving at the same pace.

That gap creates what we call the safe autonomy horizon: the point where an agent can keep working, but the risk of drift has grown high enough that continued autonomy becomes a liability.

Below the line, autonomy pays off. Above it, you are running an agent that can finish the task but probably will not finish the intended one. For an agentic wallet, "drifted partway through a long task" is just a polite way of saying funds were misused.

Framework

As agents run longer, goal adherence has more room to fall

Safe autonomy horizon Task length handled Goal adherence Autonomy pays off Drift-dominated Duration of autonomous operation
Task length handled (METR) Goal adherence (Arike et al.)

Illustrative. Axes not to scale; the two curves come from two separate studies.

How Turnkey's enclave-backed policy engine enforces governance at the signing layer

Key concept

Governance at the signing layer

Prompt layer

You tell the agent "thou shall not" and trust it to remember. The rule lives in the prompt, and the prompt is exactly what the agent drifts away from.

Signing layer

You enforce the rule where the agent cannot reach it. Every signing request is checked inside the enclave before a signature exists, so the forbidden transaction is never possible, no matter what the agent decides. Telling someone not to enter a room versus never giving them the key.


Evals, monitoring, and human oversight all belong in the governance stack. For an agent that moves money, the signing layer is the floor: the one layer that holds when everything above it fails.

In agentic finance, the instinct is to make the agent trustworthy enough via prompts to stay on task. The honest reading of the research is that you cannot fully, because drift is a property of how these models behave over long operations, and that is not something you control from the outside.

So governance is not "make the agent never drift." Governance is "bound what a drifted agent is allowed to do." That is a different and far more achievable goal.

Policy engines secured within enclaves, like Turnkey's, are a hedge against this drift, with two distinct advantages:

  • First, Turnkey policies are executed inside secure enclaves, where they cannot be changed by outside bad actors or by the agents themselves. The same is true for the keys. No one, not even Turnkey, has access to them.
  • Second, policies create an environment where the agent is never given unrestricted signing authority. The agent can request an action, but the infrastructure decides whether that action is allowed. Instead of relying on a commandment in the prompt, the system enforces a boundary around what the wallet can actually do.  

A policy engine does not stop an agent from drifting. It limits what drift can do.

Without policy checks, every additional step gives the agent another chance to make an off-mandate decision. Over longer runs, that risk compounds.

With policy checks at the signing layer, each transaction is evaluated against the user's intent before a signature is produced. The agent can be wrong, but the wallet still cannot sign outside its allowed boundaries.

In effect, policy enforcement decouples financial blast radius from cognitive drift. It severs the link between "the agent made a bad decision" and "assets moved." That decoupling is what lets you operate an agent past the point where you would otherwise have to pull it back.

Governance

A policy engine caps what a drifted agent can do

near certain zero No policy drift is inevitable With in-enclave policy Operation time / number of actions
No policy With in-enclave policy

Illustrative. Policy does not stop drift; it caps what a drifted agent can do.

Turnkey: With agentic money movement, governance belongs in the infrastructure, not the prompt

If the goal is to bound what a drifted agent can do, then the binding has to happen somewhere the agent cannot reach. A rule written into the agent's own prompt is just more text the agent can be talked out of. The enforcement has to sit at the same trust boundary as the keys.

This is the argument for evaluating every signing request inside a secure enclave before any signature is produced, scoped by recipient, contract, function, chain, and per-transaction value. 

The agent authenticates and receives a signature; it never holds the key, and it cannot rewrite the rules that govern the key. Drift at the model layer becomes survivable because the infrastructure layer refuses anything outside the mandate.

For now, the takeaway is the reframing. A secure enclave protects the key. A policy engine protects the decision. An AI agent entrusted with money needs both, and as agents get more capable, it needs them more, not less.

Get started with Turnkey today.

Related articles

10 Wallet Security Best Practices for Consumer and Retail Apps

Learn where embedded wallet security can break down and 10 practices developers can put into place right now to help protect user assets

Agentic security: How to protect critical assets in AI-driven systems

As AI agents become financial actors, Turnkey helps teams give them controlled access to move money safely, without giving them unlimited power.