Over the last year, headlines around artificial intelligence have fixated on one thing: scale. Bigger models, bigger clusters, bigger training runs. But in the rush to measure progress by parameter counts and GPU hours, one dimension has remained critically underdiscussed: how and where will these models be deployed?
AI doesn’t generate economic value in a training cluster. It creates value when it runs—inference, not training. That means snap decisions in a self-driving car, speech interpreted locally on a wearable, or quality control in a factory running autonomously. And it’s opening a new avenue for investors who understand that performance is increasingly a question of where and how AI runs, not only what it knows.
To understand what’s at stake, we spoke with Srinidhi Goud Myadaboyina, a senior machine learning engineer whose work spans Cruise, Amazon, and Cisco. He’s a published author in NTP, GSAR, and SARC, and a Globee Awards judge with deep expertise in deploying AI models in constrained, safety-critical environments. At Cruise, he’s led deployment for more than fifty models across LiDAR, radar, computer vision, and language-based systems—making him uniquely positioned to comment on why deployment, not training, is where AI becomes real.
Inference Over Training
“Everyone focuses on training, but in the AV world, you realize quickly that getting a model to run with the car—consistently, within timing constraints—is where the real wins happen,” Myadaboyina says.
AVs are perhaps the most demanding edge platforms. There’s no time to send queries to the cloud. Models must run locally, with strict timing guarantees. Even a modest delay in inference can collapse the vehicle’s decision-making window. Add power constraints, real-time sensor fusion, and fail-safe requirements, and you have an environment where many large models—even accurate ones—fail the deployment test.
In consumer electronics, logistics, robotics, and even cloud-native apps, inference is also where costs and reliability converge in production. Cloud providers increasingly report that inference makes up a growing share of AI compute costs, especially for applications running at massive query volume. Investors who understand this are looking beyond the model zoo. The more telling question isn’t how big is your model, but how fast, how reproducible, and where does it run?
What Matters in Production AI
Myadaboyina’s work exemplifies how much leverage there is in treating deployment as a first-class problem. At Cruise, he’s implemented techniques like TensorRT acceleration, CUDA graphs, quantization, and speculative decoding, routinely achieving 10x–100x speedups with no drop in model quality.
One of the trickier issues he’s addressed is precision divergence—the subtle but serious behavior changes that emerge when converting models from 32-bit to lower-bit formats for deployment. “You can get a model working in simulation, but once it’s on-device, you might see unpredictable behavior that’s hard to trace,” he explains. “Reproducibility issues can—and should—block a release.”
These kinds of optimizations, which enable better real-world performance, with lower power draw and more predictable behavior, are becoming a key differentiator across the AI ecosystem.
Deployment Efficiency as a Market Signal
For investors evaluating AI companies, there’s now a second layer to technical diligence. It’s no longer enough to ask whether a company has trained a capable model. The follow-up questions are becoming key indicators of real-world traction: Can it be deployed in production, under latency constraints, across edge devices, with reproducible behavior?
According to Myadaboyina, one of the best leading indicators is whether a company has a mature model optimization and deployment pipeline. Look for cross-functional efforts between model design, systems engineering, and hardware teams. Strong partnerships with hardware vendors—especially those focused on edge accelerators—are also a positive signal.
He cites Cruise’s rollout of the FasterViT architecture as a case in point. “We saw a 15% improvement in object detection accuracy with no increase in latency,” he says. “It’s an end-to-end deployment win, made possible by close coordination between perception and infrastructure teams.”
This is the kind of result investors should be watching for: concrete gains in production performance that come from deployment engineering. And the companies that prioritize it early tend to scale more reliably, with better margins and fewer surprises in deployment.
Edge-Ready Means Market-Ready
For investors tracking AI, deployment is now the litmus test. It reveals which companies can operate in edge environments, scale without ballooning costs, and serve sectors where latency and power matter more than FLOPs.
That makes deployment not just an engineering concern, but a market moat. Companies that can optimize for edge contexts tend to scale more predictably, with better margins, fewer infrastructure surprises, and broader addressable markets.
For Myadaboyina, this is the next logical step in the maturation of AI. “We’ve come through the research phase, where it was about what’s possible,” he said. “Now we’re in the engineering phase—where the question is how to make it reliable and production-ready.”