February 7, 2026

What do recent reports actually show about LLM ROI?

A critical look at recent reporting on LLM ROI, adoption, and measurable impact.

Wooden walkway through a green forest.

Last week I gave a very basic description of how Large Language Models work, forming the basis to assess a small selection of articles today and in future posts.

But before moving on to the main topic for today, a brief follow up on the Kanban program.

Packaging Kantan Kanban

I have finished minor refactoring and tests of the packaged version, and have uploaded the current Beta version of Kantan Kanban to PyPi.org.

Anyone interested in giving it a try who has Python installed can install it via: pip install kantan-kanban.

I’ll look into creating stand alone executables in the future, but that is not a top priority at this time.

If you do try it out, please let me know about your experience! There are still a lot of features to add, and improvements to be made, but more feedback will help ensure I do what is the most valuable sooner.

Now back to the topic at hand.

A brief review:

  • LLM models are created by digesting vast amounts of textual data, resulting in complex numerical representations of words and their relationships
  • These relationships are refined through various processes, which prioritize whatever the organization creating the model desires
  • When a prompt or query is made, the model determines what the statistically most likely response would be, taking into account the previous tweaks and whatever contextual information has been provided

The Impact of LLMs in Workplaces

I’d recommend reading the articles I’m going to discuss because I won’t be able to exhaustively cover everything they mention.

Experiment suggests AI chatbot would save insurance agents a whopping 3 minutes a day.

This article covers a small study performed by students at Dakota State University.

They customized Google’s Gemini to use a RAG (Retrieval Augmented Generation) - essentially an extension to the LLM’s normal word associations that highly prioritizes text the user has provided (an Insurance company in this case). The goal was for the LLM to provide search results for experienced independent brokers, and to compare the average times against the existing software solution.

Of course there is the point from the article title, that the LLM saved an average of ~3 seconds per query - which extrapolates to ~3 minutes saved a day. I think it’s important to mention here that the study assumes ~80 contacts per day where the LLM would be useful, while a third-party independent insurance broker guessed that 10-20 would be more likely. That is a huge discrepancy, and drastically changes the ROI as shown below; as does inclusion of the current subscription costs.

Both an author from the study, and the Register author, provide potential ROI calculations - check the link for the exact math.

The article author calculates based on the lower estimate of 20 contacts a day, and includes the Gemini yearly license, resulting in an annual loss of ~$131 (roughly a -52% ROI).

While the study author uses the estimate of 80 contacts a day, and does not seem to include the yearly licensing cost, resulting in a 222% (daily) ROI.

There were two other data points I found interesting:
First: the error rates per query type. Only one of the four searches had an accuracy rate that was nearly zero, two searches had error rates ~5-10%, and the final one had ~16% of the responses as incorrect. This is noteworthy because it leads me to think that every LLM search would need to be validated externally, with a rough average of 8% that you’d be getting incorrect information. Though for the types of searches and contacts they’re covering, perhaps the only downside would be needing to redo a search, or potentially lowering customer confidence.

In other contexts, where risks could be higher - the possibility of incorrect information would need to be taken into account, along with whatever remediation steps might be needed.

Second, the actual time study is a little odd. It appears that they had two agents engage in predetermined searches using the LLM, and didn’t try and use it with live customers. The issue is the sample size of 2 users, running 4 searches for details on 8 policies - equaling a grand total of 64 searches. This seems like too small of a study to draw any sort of firm conclusion.

Other interesting points: they only used experienced brokers - specifically to identify and combat “hallucinations”, the study author states that there were no errors in LLM responses - which I can only assume means that they view an eventual correct response as correct even if the initial answer was incorrect.

If experience is needed to identify errors, and LLMs significantly alter the initial steps where staff gain that knowledge - will they be equipped to identify and counter those errors? If not, what steps would be needed to reduce risks down the road, and what other impacts would flow from this deskilling?

All in all, it’s a somewhat interesting study, but I don’t think it tells us much. The sample size is small and the basis for some of their assumptions are questionable.

The study highlights the need to seriously consider the practical considerations of how a LLM based tool would slot into existing work, and what would be needed for it to be worthwhile. While indicating that using it as an alternative interface for existing tools/information may have a low return ceiling.

AI hasn’t delivered the profits it was hyped for, says Deloitte

The overall conclusions could be summed up as: most companies they spoke with want to grow revenue by using LLMs, but only 20% are, and only 12% are both growing revenue and lowering costs.

But Deloitte doesn’t provide any information on the companies that say their revenues are up.

I would want to know if there are even broad trends in terms of: how it’s being implemented, sectors these companies are in, technologies in place (RAG, agents, etc.), what scale of revenue impact is there (is it 1% or 50%), and those are just off the top of my head.

The Register author raises a good point when they highlight that 66% of the respondents say it’s increasing efficiency and productivity, but only 20% are seeing revenue increases. So what are these companies actually experiencing? It’s impossible to say.

Similarly I’ll highlight that 25% of the respondents said LLMs are transforming their organizations. Given that in an absolute best-case scenario 5% of those companies are seeing no revenue uplift, what exactly are they referring to? I can’t imagine it’s improved staff cohesion, as LLMs would be taking the place of at least some staff interactions; what else could it be?

The last thing that really stood out to me was: they said that non-technical staff had guarded but not negative views of LLM tools; but they didn’t give technical users opinions. I doubt they’re exactly in lockstep, and I feel like knowing what the IT department/etc. thinks would be useful here.

Overall I would advise skepticism on the Deloitte report. It raised more questions than it answered and there was not enough detail to meaningfully evaluate their claims.

Microsoft spends billions on AI, converts just 3.3% of Copilot Chat users

This is another article that doesn’t demand a deep dive since it’s mostly a recap of a recent earnings call.

Even though Microsoft has been aggressively pushing LLM technology across its product stack, they have only managed to convert ~3.3% to paying customers. That would seem to reinforce the takeaway from the time study article where benefits were minuscule - that the average user doesn’t see enough benefit to be willing to pay for more/better LLM service.

I also have to ask if the given number of Microsoft 365 Copilot seats are actually LLM users or not, given the recent re-branding of Microsoft 365 to Microsoft 365 Copilot App. There is enough ambiguity there that I would want to see actual numbers before commenting definitively.

Conclusion

Overall I think these three articles show two things.

  1. After 3 years of public usage of LLMs, there still is a lack of high-quality, transparent information on impacts within businesses. Some of that may be from protecting proprietary information, but methodologically rigorous independent studies seem rare given the scale of discussion and adoption. As a result, adopters need to define their own metrics and measures of impact, rather than use broadly validated guidelines.
  2. There seems to be a disconnect between measurable outcomes and sentiment. In the Dakota State University study the authors argue that the outcome is very positive, while the time savings are in the best case well under 5 minutes a day. Likewise the Deloitte survey had many reporting increases in productivity and efficiency while having no revenue gains. Finally, the reported adoption rate of Copilot raises the question whether this gap may go beyond perceptions of efficiency versus revenue. Together these highlight the importance of actively checking for cognitive biases (eg. optimism bias, conservatism bias, etc.).

I don’t think that there’s a single absolute conclusion we can draw from these three articles, but those points are worth considering.

When moving to a new technology or process, it’s important to look past isolated metrics and marketing to view the system and the full range of impacts a change would create.

Takeaway

If you’re somehow in the position of only now trying to add LLMs to your organization, it would be a good idea to press for detailed and concrete breakdowns of expected uses, ROI, plans for continued increases in subscription costs, and potential general sentiment shifts among stakeholders. The less firm the answers are to any of those concerns, the more cautious I would be in approving large outlays of money or staff.

Both for an organization, and an individual, get clear on why you want to use LLMs, what real, concrete benefits you expect, what you’re willing to spend (time, money), and identify the break-point that tells you if it is worth continuing to use an LLM or not.

Next Time

I will continue this review by looking at some articles describing how LLMs are affecting and interacting with organizations and society at large.

Photo credit: Photo by Ali Kazal on Unsplash