I run my own AI the way you would run a small team - a handful of agents, an orchestration layer that hands work between them, a memory that holds what mattered yesterday. It is woven into how I work, every day.
I have never tried to measure what it gives back, because there is nothing standalone to measure. You cannot isolate the value of that any more than you can isolate the value of knowing how to read. It is simply how the work happens.
For a long time I assumed organisations would experience this the same way at scale. They do not. What one person experiences has to be turned, deliberately, into something the company can see, and the value stays invisible until someone does that.
The logic that works at home
When you use a tool personally, value is self-evident. You feel it, you come back to it, you tell someone else about it. The signal is direct and you are both the user and the judge. There is no measurement problem because there is no distance between the experience and the outcome.
Collective value works differently. The distance between a person saving an hour and an organisation seeing that hour on a P&L is enormous, and it does not close by itself. It closes through design choices: what the organisation actually wants the AI to change, who owns that, and whether the people making those calls are measured and rewarded for the outcome that follows.
In the programmes I see, the numbers that get tracked all measure activity: adoption, active users, hours saved. They are worth having, and they tell you the tool is being used. Whether it has actually produced anything is a harder question, and these numbers were never built to answer it. They can even point at the wrong people: the heaviest user on the dashboard might be saving time on routine work, while the real return sits with someone who reaches for it twice a week, on the decisions that carry weight. Six months in, the CFO asks what it has done for the business, and the honest answer is that nobody quite knows yet.
Where the measurement points next
A lending team I work with measures its AI time, hours by workflow type, reported on a cycle. That is the input, captured. It says nothing about what those hours are turning into, and before long that was the question on the table: what are those hours actually becoming?
Some of the freed capacity has gone into higher-value work. Some has been absorbed back into the day, the way freed time does when no one has decided where it should go. None of that is a failure. The measurement captures the input, exactly as built. The output is a separate decision, one level up.
That decision is the work in front of us, and it is where a lot of teams sit once the tracking is running: the hours are counted, the outcome is not yet chosen. The job now is to choose it, and build toward it.
Deciding the outcome up front
A financial review team I work with has been sampling their work for years. The volume made full review impossible: a human team can cover a fraction of the caseload deeply, or all of it shallowly. They made the rational choice, and built their workflows, their escalation process, and their sense of professional contribution around it.
The deployment we are building is designed around one target, set before any agent goes live: move the team from sampling to full coverage. The freed capacity is not meant to vanish into the day. It has a defined place to go. Every case that today falls into the unreviewed majority will get a judgement, and the team will work a queue of genuine exceptions where they have only ever seen a sample.
The outcome it is built around is one the organisation can put on a review and reward: coverage rate, exception quality, the cases where an issue is caught that a sample would have missed. The time saved will be real, and the bigger prize is a class of work that has never been possible becoming routine, legible to the people whose job it is to evaluate performance.
This team set its orientation at the outset, the detail to follow. The question of what better performance would look like under full coverage was asked, and answered in outline, up front. The measurement is being designed around that answer, built to capture the outcome they are aiming for.
What turns hours into value
These are two teams doing genuinely different work. What decides whether the hours turn into value is the same for both: a clear decision about what the outcome should be, and someone who owns it. It was never about the technology.
And this is where the value matures. Time saved is usually the first thing you see, and it is worth tracking. Left there, it stays an input. Followed through, those same hours turn into something the business can see and reward. The teams that pull real value let the measure travel with the work, from the hours returned to what those hours produced.
Here is how I see it working. Those hours come back in one of three shapes: lower cost, growth you could not staff before, or a risk taken off the table before it lands. Most workflows lean on one of them. Choose it on purpose, and build the work around that outcome.

The real design choice comes earlier than the tool itself. It is deciding what you are genuinely trying to get out of the work, and then measuring that in the terms the business already rewards.
The work that is human
The people foundations that turn a deployment into an outcome the business can see are org design and incentive work.
That means asking structural questions in the contract stage: what will the AI do, and what will the people do with what the AI returns? Who decides how the freed capacity is redeployed? Does the performance framework recognise the new output, or does it still measure the old activity? Is the team's sense of professional contribution tied to the volume they processed or the quality of the judgements they made?
These questions do not have easy answers, and most of the deployments I see do not ask them early enough. The ones that do are the ones where the value actually lands.
This is the work that makes AI pay. It is structural, and it is human. Without that, you have bought an expensive way to feel modern.
I write about AI adoption in commercial and corporate banking, and what actually happens when agents go into production. Covecta is the platform I help customers build these workflows on.



%201.png)

