The open vs. closed model debate is the wrong frame

The open vs. closed model debate is one of the most active in AI policy and product circles. It is also poorly framed. The argument is almost always about whether model weights are publicly available — and weights availability, it turns out, is not the thing that determines the questions people actually care about: safety, capability, concentration of power, and practical utility.

The thing that determines all of those questions is training data.

Why weights matter less than the debate assumes

Open weights — models where you can download the parameters and run them yourself — provide specific, real benefits. You can run inference without an API dependency. You can fine-tune for your domain without sending data to a third party. You can inspect the model architecture, understand what compute the inference requires, and optimize for your hardware. These are genuine advantages.

But open weights do not make a model “open” in the way that matters most for questions of safety, bias, capability, or alignment. The model’s behavior — what it knows, what it values, what it gets wrong, how it fails — is determined primarily by training. And the training data and process for models like Llama 3, Mistral, and most other “open” models is not public.

You can inspect the weights of Llama 3. You cannot inspect what went into training those weights, which documents shaped its world model, what was filtered out, or how the RLHF process modified its outputs. The weights are open; the training is closed.

Who controls the training data

The training data question has two dimensions.

The first is capability: the model is only as good as what it was trained on. The massive crawl datasets (Common Crawl and derivatives) that form the backbone of most model pretraining are dominated by English, by certain time periods, by certain content types. The capability gaps that result are not random — they reflect what was in the training data.

The second is alignment and risk: problematic behavior in deployed models almost always traces to training. A model that generates biased outputs was trained on biased data, or trained with a feedback process that encoded bias, or both. Open weights let you measure the output of that process. They don’t let you fix the inputs.

The concentration that actually matters

The concentration of power debate in AI usually focuses on whether a small number of companies control the frontier models. This is a real concern. But the concentration that matters more practically is who controls the training infrastructure.

Training a frontier model requires billions of dollars of compute, access to a curated dataset at internet scale, and the engineering organization to coordinate both. That concentrates in five or six entities globally: the major AI labs and the cloud providers backing them. Open weights do not change this concentration — they change who can use the output of that concentration, not who controls the input.

Meta publishing Llama 3 weights is genuinely useful. It is also consistent with Meta maintaining significant control over what future Llama models look like, because Meta controls the training data collection, the training process, and the infrastructure. The weights are a gift; the training pipeline is not shared.

What the right frame would look like

A better frame for AI governance discussions focuses on training data accountability: what data was used, how was it collected, was the content generated by humans who consented to or were compensated for that use, and how does the training data distribution shape the model’s downstream behavior?

This frame is harder to reason about than open vs. closed, which may be why it doesn’t dominate the debate. But it is the frame that would produce better policy. Requiring training data disclosure and documentation is more tractable, and more consequential, than requiring weights disclosure.

For enterprise buyers: the open weights vs. closed weights question matters for your deployment architecture (can you run it on your infrastructure, can you fine-tune it). The training data question matters for your risk assessment — it determines the failure modes you’ll encounter in production and the alignment properties of the model you’re deploying.

Both questions are worth asking. Only one of them is usually asked.

open-sourceopen-weightstraining-datallamadata-control

Why weights matter less than the debate assumes

Who controls the training data

The concentration that actually matters

What the right frame would look like

More on this

Anthropic is the only frontier lab the US is trying to ban, and also the one everyone else is racing to integrate

The model is now table stakes. The consultant is the product.

Your HBM supplier is now your shareholder, and that is how you know the compute crunch is permanent