Custom AI ASICs are growing faster than Nvidia GPUs for the first time, and the hyperscalers built the wedge themselves

A pair of analyst reports landing in the last week put a number on something the industry has been muttering about for two years. TrendForce now projects 44.6 percent year-over-year shipment growth for custom AI ASICs in 2026, against 16.1 percent for merchant GPUs. Counterpoint forecasts global AI server ASIC shipments tripling between 2024 and 2027. ASIC-based AI servers will reach 27.8 percent of the total AI server market this year, the highest share since 2023. By 2028 ASIC shipments are projected to overtake GPU shipments outright.

The structural driver: Google’s TPU, AWS Trainium, Microsoft Maia, and Meta MTIA are all hitting full production at roughly the same moment. Those four companies are also Nvidia’s largest customers. Every dollar of internal silicon they ship displaces a dollar that previously went to Nvidia. Broadcom is the design and packaging partner on most of those programs, which is why Broadcom’s stock chart has, ahem, behaved the way it has. Marvell is the dark-horse second source, which is why Nvidia put $2 billion into Marvell earlier this year, a move that is either an ecosystem partnership or a sealed-envelope hedge depending on which analyst note you read.

Evercore’s framing is the cleanest. The buyer environment has flipped from a “max throughput and bandwidth” regime into an “inference-led regime” where the criteria are cost-per-token, power, cooling, utilization, and total cost of ownership. Purpose-built silicon wins on every one of those dimensions against general-purpose GPUs running the same workload. The general-purpose advantage was always about flexibility, and at hyperscale the workloads are now stable enough to design hardware against directly.

The Nvidia response is the obvious one: keep extending NVLink as a moat, lock in the next training generation while the customer ASICs are still inference-only, and use the AI factory narrative to sell rack-scale systems instead of cards. None of that is wrong. It is also a different business than the one that minted the trillion-dollar valuation. The first AI compute boom was a GPU shortage. The next one is a procurement diversification, and it is happening fastest in the customer accounts that drove the original boom. Strange way for a hardware cycle to peak, and a useful reminder that being the only viable supplier is a transient condition no matter how good your roadmap is.

nvidiaasictputrainiummaiamtiahyperscalerssemiconductorsbroadcommarvellinference

Related briefs

Hugging Face's Inference API pricing changes and the open-source model hosting market

AMD's MI450 hits customer sampling, with twelve gigawatts of OpenAI and Meta capacity waiting

Jensen flew to Taiwan over the holiday weekend because the bottleneck on the next Nvidia chip is a glue problem