
When you hand an AI agent its own wallet and a job to do, how long can you trust it before it drifts off task?
By combining the results from METR's time horizon research, which measures how long agents can work autonomously, and a peer-reviewed goal drift study co-authored at Apollo Research, measuring whether they stay on task while they do, we’re able to make some assumptions.
What we found reframes what "governing" an AI agent with a wallet actually requires.
The two studies do not measure the same thing. One looks at how long agents can work. The other looks at whether they stay aligned with their instructions while working.
Taken together, they point to the same risk: as agents become capable of working for longer periods, teams will give them longer tasks. And the longer the task, the more opportunity there is for goal adherence to drift.
A concrete way to think about it is that the more autonomy an agent can handle, the more room it has to wander off its instructions partway through. For an agent that moves money, that can be extremely damaging.
The AI transformation in finance is an onchain transformation
When people talk about AI agents handling payments, treasury, or trading, they are describing software that holds funds, signs transactions, and acts without a human in the loop on each step.
This is why agentic finance and blockchain infrastructure are converging. An agent that can pay per request, settle instantly, and operate across chains needs to control a wallet directly, not file a request with a payment processor.
In these modern finance applications, agentic autonomy is the point, and it's important for more capable agents to handle this autonomy for longer and longer periods.
But, still, if left unchecked, autonomy can be risky.
In 2026, an attacker drained roughly $200,000 from a live agent-controlled wallet on Base using nothing more than a crafted message. The attacker didn’t steal a private key or exploit a smart contract. The agent was simply talked into signing.
When an agent drifts, the wallet still needs to hold the line
Drift can come from accumulated patterns in the agent's context, conflicting tool outputs, or simply too many steps between the user's intent and the final transaction. By the time the agent asks for a signature, the request may still look legitimate. The agent is authenticated. The workflow is active. The wallet is available. But the action no longer matches the mandate.
This is the problem with giving an agent broad signing authority. If the only control is "the agent is allowed to use this wallet," then every drifted decision can become a financial action. The agent does not need to steal the key to misuse funds. It only needs to ask the wallet to sign something it should not sign.
That is why agentic finance needs governance at the signing layer. The wallet cannot rely only on the agent's current reasoning. It needs an independent policy boundary that checks every transaction against the user's intended limits before a signature is produced.
Those limits can be scoped by wallet, chain, contract, function, recipient, value, time window, approval rule, or any other policy the application requires. The agent can still operate autonomously, but only inside a defined financial perimeter.
In this model, drift does not automatically become loss. The agent may wander from its mandate, but the wallet refuses actions outside the mandate.
The safe autonomy horizon
Here is where the two studies come in.
Research from METR on the length of tasks AI agents can complete found that the duration of work a frontier agent can handle at 50% reliability has been doubling roughly every seven months for six years, and more recent data suggests that pace has accelerated to around every four months.
The clear direction of travel is that agents can be trusted with longer, more autonomous tasks each cycle. That makes agentic finance possible, long-term autonomy to freely interact within the economy.
But the second trend points the other way. A study on goal drift in language model agents, co-authored by Apollo Research director Marius Hobbhahn and published at the AAAI/ACM Conference on AI, Ethics, and Society, measured how well agents stick to an assigned objective as they operate.
The best agent in the study stayed almost perfectly on task across more than 100,000 tokens in the hardest setting. But every model drifted at least somewhat.
And drift increased as agents ran longer and faced more adversarial pressure.
In other words, goal adherence gets weaker over longer operations. That creates a problem for long-horizon agents. The more useful they become, the more time they spend in the zone where they are likely to drift from their mandate.
As capability is improving quickly, there is less evidence that goal adherence is improving at the same pace.
That gap creates what we call the safe autonomy horizon: the point where an agent can keep working, but the risk of drift has grown high enough that continued autonomy becomes a liability.
Below the line, autonomy pays off. Above it, you are running an agent that can finish the task but probably will not finish the intended one. For an agentic wallet, "drifted partway through a long task" is just a polite way of saying funds were misused.
How Turnkey's enclave-backed policy engine enforces governance at the signing layer
In agentic finance, the instinct is to make the agent trustworthy enough via prompts to stay on task. The honest reading of the research is that you cannot fully, because drift is a property of how these models behave over long operations, and that is not something you control from the outside.
So governance is not "make the agent never drift." Governance is "bound what a drifted agent is allowed to do." That is a different and far more achievable goal.
Policy engines secured within enclaves, like Turnkey's, are a hedge against this drift, with two distinct advantages:
- First, Turnkey policies are executed inside secure enclaves, where they cannot be changed by outside bad actors or by the agents themselves. The same is true for the keys. No one, not even Turnkey, has access to them.
- Second, policies create an environment where the agent is never given unrestricted signing authority. The agent can request an action, but the infrastructure decides whether that action is allowed. Instead of relying on a commandment in the prompt, the system enforces a boundary around what the wallet can actually do.
A policy engine does not stop an agent from drifting. It limits what drift can do.
Without policy checks, every additional step gives the agent another chance to make an off-mandate decision. Over longer runs, that risk compounds.
With policy checks at the signing layer, each transaction is evaluated against the user's intent before a signature is produced. The agent can be wrong, but the wallet still cannot sign outside its allowed boundaries.
In effect, policy enforcement decouples financial blast radius from cognitive drift. It severs the link between "the agent made a bad decision" and "assets moved." That decoupling is what lets you operate an agent past the point where you would otherwise have to pull it back.
Related articles

10 Wallet Security Best Practices for Consumer and Retail Apps
Learn where embedded wallet security can break down and 10 practices developers can put into place right now to help protect user assets

Agentic security: How to protect critical assets in AI-driven systems
As AI agents become financial actors, Turnkey helps teams give them controlled access to move money safely, without giving them unlimited power.
