
In the increasingly competitive AI chip market, there’s another startup in production that claims an advantage over Nvidia, the world’s most valuable company.
D-Matrix, located three miles away from Nvidia’s Silicon Valley headquarters, says its chips can run inference workloads 10 times faster and using five times less energy than a standalone graphics processing unit from the market leader — as long as the workloads are small.
The new inference chip, called Corsair, takes a novel approach to memory that’s similar to Cerebras and Groq. With tech giants demanding all the computing resources they can get their hands on, it’s becoming clear that there’s substantial opportunity for smaller players to find their niche.
Cerebras, founded in 2015, held a blockbuster IPO last month, raising over $5.5 billion, and is now valued at over $50 billion. And Groq’s assets were bought by Nvidia for $20 billion in December, making it the AI giant’s largest purchase to date. Nvidia then released a new Groq chip at GTC in March, called a language processing unit.
“This is a $1 trillion market in the making,” D-Matrix co-founder and CEO Sid Sheth told CNBC in an interview, adding that he has no intention of selling the company. “Can the market support yet another public company? Absolutely.”
Founded in 2019, D-Matrix has raised around $500 million so far, putting it at around a $2 billion valuation. Microsoft was one of the investors, through its M12 venture arm. That’s notable because of Microsoft’s own chip ambitions, including its Maia 200 chip for AI inference, new PC processors built with Nvidia, and an in-house quantum computing chip announced last week.
Sheth won’t name Corsair customers yet, but said he has commitments from high-profile hyperscalers, neoclouds and frontier AI labs eager to get their hands on as much compute as possible. D-Matrix begins shipping to those customers this month. About 90% of them are in the U.S., while overseas customers are in the Middle East and Southeast Asia, Sheth said.
Jensen Huang, chief executive officer of Nvidia Corp., presents the RTX Spark Superchip at the Nvidia GTC conference on the sidelines of Computex 2026 in Taipei, Taiwan, on Monday, June 1, 2026.
Lam Yik Fei | Bloomberg | Getty Images
“Quite often they sell to customers to use this stuff in conjunction with Nvidia,” said semiconductor analyst Stacy Rasgon of Bernstein Research, adding that the different chips are better at different tasks. “Sounds like he’s got a fair number of actual, real customer engagements.”
D-Matrix’s Corsair chip achieves low latency inference on low power by tightly integrating memory and compute on a single chip.
Like Groq and Cerebras, D-Matrix relies on SRAM, a type of memory that can be made at logic fabs like Taiwan Semiconductor Manufacturing Company and integrated on the same chip. GPUs rely on large amounts of another kind of memory called DRAM that’s packaged into stacks of high bandwidth memory added around the logic chip.
That DRAM is also what’s in short supply from Micron, Samsung and SK Hynix.
“We’re not running into a chokepoint around DRAM with our product because our product doesn’t really rely on DRAM to be successful,” Sheth said.
The big downside to D-Matrix’s approach is that SRAM can’t handle massive reasoning models, according to Rick Bahr, adjunct professor of electrical engineering at Stanford University.
While on-chip SRAM enables “remarkable inference speeds” because data has to travel such short distances, it can’t handle the trillions of parameters that now make up large models from leaders like OpenAI and Anthropic.
“That number of parameters just simply can’t be be put onto an SRAM-based design,” Bahr said. “That’s the big challenge.”
Sheth says Corsair is designed for AI inference, where “you’re optimizing for interactivity or speed” over language size. Think chatbots, voice agents and agentic tools like Claude Code and OpenClaw.
When paired with an Nvidia Blackwell GPU, D-Matrix says, citing research from Gimlet Labs, that Corsair can run inference 10 times faster, three times cheaper and up to five times more energy efficiently than a standalone GPU.
Nvidia CEO Jensen Huang said last week that his company remains the leader in low-cost inference with its leading Vera Rubin system because it’s not just about speed.
At Computex in Taiwan, Huang said “the reason for that is we integrate everything, we design everything from the ground up, we simulate the entire system and we use extreme co-design.”
D-Matrix sells four Corsair chips packaged together inside a card that slides into slots in a data center server rack and costs tens of thousands of dollars, Sheth said.
It’s a plug-and-play approach that differentiates D-Matrix from Cerebras and Groq, according to Sheth, who called Corsair the “densest SRAM solution in the market today,” with up to 128 gigabytes of SRAM memory in a single server.
D-Matrix also teamed up with Arista, Broadcom and Super Micro to build a full rack-scale system called SquadRack for deploying its chips in AI data centers.
The chip is made in Taiwan on TSMC’s 6-nanometer node. D-Matrix’s next chip, Raptor, is scheduled to launch next year on TSMC 4 nanometer, which Sheth said could run out of the Taiwanese company’s factory in Arizona.
“Building a computing solution for AI inference is going to be the grand prize,” Sheth said.
WATCH: From GPUs to TPUs, here’s how the top AI chips work

