The narrative most American AI coverage settled on after the January 2025 DeepSeek release was that it was a one-time embarrassment for the US labs and that the export controls would eventually choke off Chinese capability. Fifteen months in, that narrative is not aging well. The Chinese open source ecosystem has shipped four major model families since DeepSeek R1 hit, and the gap on standard benchmarks has narrowed in nearly every category that matters for enterprise deployment.

The current state of play breaks down across roughly six labs. DeepSeek shipped V3 in December 2024 and R1 in January 2025, then released a refresh on the V3 base in late 2025 with better tool-use performance. Alibaba's Qwen team released Qwen3 in early 2025 and has been on a roughly quarterly cadence since, with Qwen3-Max reaching parity with mid-tier American closed models on coding tasks. Moonshot AI released Kimi K2 in mid-2025 and Kimi K3 this spring, focused on long-context performance up to two million tokens. Zhipu AI released GLM-4.5 in late 2025. ByteDance's Doubao team and Tencent's Hunyuan team have both released competitive models that get less English-language press because their primary deployment market is Chinese consumer apps.

The benchmarks that matter for enterprise customers have shifted in the past year. Pure language modeling benchmarks like MMLU have largely saturated. The conversations that drive purchasing decisions now center on coding benchmarks like SWE-Bench Verified, agentic benchmarks like TAU-bench, and long-context retrieval. On SWE-Bench Verified, the top open Chinese models are within 8 to 12 percentage points of Claude Sonnet and GPT-5, where they were 30 points behind in early 2025. On TAU-bench retail, the gap is roughly 6 percentage points. On long-context tasks above 500,000 tokens, the open Chinese models in some cases now lead.

The cost story is the part that has changed enterprise behavior. DeepSeek published API pricing of $0.14 per million input tokens and $0.28 per million output tokens in 2025, and the rest of the field followed within a few percentage points. Anthropic and OpenAI hold premium pricing on their flagship tiers, with Claude Opus and GPT-5 in the $15 per million input range. For workloads that do not require frontier reasoning, the math changed. A startup running a customer support chatbot with two billion tokens a month was looking at $30,000 monthly on Claude Sonnet and $560 on DeepSeek V3 for comparable performance on simple tasks.

The deployment pattern that has emerged among American enterprises is the routing model. Companies do not simply switch from US to Chinese providers. They route different requests to different models based on complexity, sensitivity, and cost tolerance. The simplest 60 to 70% of requests go to a cheap open model, often DeepSeek or Qwen running on Together AI, Fireworks, or Groq. The middle 20 to 30% go to Claude Haiku or GPT-5 mini. The top 5 to 10% that actually require frontier reasoning go to Claude Opus or GPT-5. The combined effective cost per token is a fraction of running everything on the flagship tier.

The data residency question is the one that keeps Chinese-origin models out of regulated workloads. Banks, healthcare systems, and government contractors largely cannot send data to a Chinese-controlled inference endpoint. The workaround that has unlocked enterprise adoption is hosting the open weights on US infrastructure. Together AI, Fireworks, Groq, and the major hyperscalers all offer DeepSeek and Qwen inference on US hardware with US-only data flow. The contracts that get signed are with the US infrastructure provider, not the Chinese lab. The model weights themselves do not phone home.

The export control argument that the chip restrictions would prevent Chinese training has not played out the way the policy intended. DeepSeek V3 was reportedly trained on a mix of H800 chips and a smaller cluster of older H100s acquired before the controls tightened. The training cost was disclosed at $5.6 million, which is one to two orders of magnitude lower than typical American flagship training runs. Whether that number captures all the costs is debated, but the order of magnitude has been corroborated by other Chinese labs that have published similar figures.