The Honest State of Enterprise AI: Impressive Demos, Shaky Foundations

The Honest State of Enterprise AI: Impressive Demos, Shaky Foundations

If you ask someone with “AI solution architect” on their resume to name five prompt engineering techniques—not explain them, just name them—you’re likely to be met with a long pause. Ask them what framework they use to evaluate which model fits a given use case, and the pause just gets longer.

Now, these aren’t trick questions. They’re the equivalent of asking a database engineer about indexing strategies, or a network engineer about routing protocols. But the equivalent baseline doesn’t exist yet in AI solution development. And that gap, more than anything else, is what’s slowly undermining enterprise AI efforts right now.

The Standardization Problem Nobody Wants to Name

When organizations are asked why their AI initiatives aren’t delivering the way they expected, they’ll default to familiar culprits: data quality, talent gaps, or change management. These are real problems, but they’re only symptoms. The root cause is something the industry is much less comfortable saying: there is no standardized architecture for implementing AI solutions end-to-end. Everyone is figuring it out as they go.

Think about how different that is from every other technology domain we’ve worked in. Take payments infrastructure. An engineer who’s worked at Visa can walk into a fintech startup and orient themselves quickly; the request comes in here, gets validated there, hits the queue, clears the ledger. The specific tools might differ, but the mental model is shared. Same with trading systems. Same with cloud architecture. When we moved workloads from data centers to the cloud, there were plenty of debates about approach, but not about the underlying model of how things fit together. A person moving between companies in any of those domains brings a durable mental model with them.

AI doesn’t have that yet. Walk into ten companies all running what they call AI agents, and you’ll find ten fundamentally different architectures. Most of them undocumented, many of them held together by one or two people who made the early decisions and haven’t written them down. The implementation patterns that make software reliable—how you evaluate models, how you handle failure cases, how you define handoffs between components—haven’t emerged in any consistent way. Every team is essentially starting from scratch, and they’re doing it while a new framework or platform drops every few weeks and tells them their current approach might already be obsolete.

Architectural instincts develop by building things, running them, watching where they break, and adjusting. If the tools keep changing before that cycle completes, the knowledge never accumulates.

Most Enterprise AI is Still in the Calculator Phase

Here’s a useful way to think about where we actually are. When calculators first came into workplaces, people used them but they still checked the math manually. They hadn’t yet decided to fully trust the output. The calculator was an assistant, not a replacement. Gradually, over years, that trust was earned, the checking stopped, and calculators became invisible infrastructure. Now we don’t think about them at all.

AI is in that early calculator period right now. Useful, genuinely impressive in places, but still being assisted and supervised rather than trusted to operate on its own. The real measure of AI having truly arrived in an enterprise context is when an agent completes a meaningful task without anyone having to prompt it, check behind it, or intervene. Things like travel booked automatically, a claim processed end-to-end, a shipment rerouted without a human in the loop. That’s not where most organizations are today, and there’s no shame in that. The problem arises when planning and investment decisions are made as though we’re further along than we are.

There’s also a related dynamic worth naming. A workforce analytics study by ActivTrak, across 160,000 employees in over 1,000 organizations, found that AI is actually increasing average workloads rather than reducing them: more messages, more coordination overhead, more validation work. That might seem counterintuitive, but it tracks. When a new tool arrives and trust in it isn’t fully established, people use it and then check the output anyway. You haven’t removed a step; you’ve added one. The efficiency gains come later, once the trust is built and the workflows are redesigned around it. Most organizations haven’t gotten there yet.

The Cost Surprise is History Repeating Itself

For anyone who was around for early cloud adoption, the AI cost conversation will feel familiar. The model looks affordable in a proof of concept. Then token usage compounds, concurrency limits hit, and the bill six months in looks nothing like the projection. There’s even a specific term for it: token runoff; usage spiraling past what was anticipated because nobody had a proper model for how consumption would scale.

Cloud taught us that “cheap at demo scale” and “cheap at enterprise scale” are two completely different statements. The early promise was that cloud would reduce costs dramatically. Then companies actually migrated, started receiving real bills, and discovered the nuance. AI is running through the same lesson, but at speed. Accessing an AI platform through an enterprise API is a fundamentally different proposition than using the consumer product. Different concurrency limits, different token quotas, different pricing behavior at scale. Without solid architectural planning upfront, there’s no principled way to model any of that before you’re already committed.

Your AI Output is a Probability, Not a Certainty

There’s a design assumption embedded in a lot of AI deployments that doesn’t get examined nearly enough: that the model’s output is an answer rather than an approximation.

A calculator is deterministic. Same inputs, same output, every single time. A language model, on the other hand, is probabilistic; it gives you its best approximation based on what it has been trained on, and that approximation can be excellent, or wrong, or confidently wrong in a way that’s difficult to detect on the surface. The difference matters enormously for how you design the systems around it.

So, how do you build systems that are honest about that uncertainty? What are the checkpoints? Where does a human need to be in the loop before an output triggers a consequential downstream action and how is that checkpoint designed, not just assumed? These are the kinds of questions that mature engineering disciplines have documented answers to. In AI solution development, teams are largely solving them individually, which means the solutions don’t transfer, the lessons don’t accumulate, and the same design decisions get made, and sometimes unmade, from project to project.

This is ultimately the same problem as the standardization gap described earlier, just surfacing in a specific and high-stakes place. The absence of shared architectural patterns means every team is discovering the hard way where probabilistic outputs create risk, rather than inheriting a framework that already knows.

This is Still Being Figured Out. That’s the Opportunity.

What separates the companies building real AI capability from those stuck in perpetual pilot mode isn’t the models they’re using or the budget they’ve allocated. It’s whether they’re treating implementation knowledge as something worth capturing.

That means building internal frameworks for evaluating models against specific use cases. Documenting why architectural decisions were made, not just what they were. Developing testing approaches that account for probabilistic outputs rather than assuming deterministic behavior. Treating prompt engineering and model evaluation as engineering disciplines with standards, not as informal skills individuals develop on their own.

Every major platform shift goes through a period where everyone is building on uncertain ground. Cloud went through it. Mobile went through it. The ground eventually stabilizes, the patterns crystallize, and what once required heroic individual effort becomes something you can hire for and onboard into.

We’re still in the unstable period with AI, but not so early that there’s nothing to document or standardize. Someone will eventually write the playbook. It might as well be the people already in the middle of figuring it out.

About Bhaskar Gandavabi

As Senior Vice President of Technology & Innovation at Fulcrum Digital, Bhaskar Gandavabi leads enterprise technology strategy, AI-led solutioning, and digital innovation across industries. He brings over 30 years of experience building scalable platforms, leading engineering teams, and delivering complex transformation programs.

View Profile