Llama 4.5 lands, Meta keeps quietly funding the open-weight resistance

Meta released Llama 4.5 as open weights under the standard Llama community license, continuing one of the stranger long-term strategic bets in the industry: spend billions training frontier-adjacent models, then put them on Hugging Face. The architecture is a 600B-parameter sparse mixture-of-experts with roughly 17B active per token, which is the number that actually decides what hardware can serve it. On a representative slice of reasoning and code evals, 4.5 lands within striking distance of the closed flagships. Closer than any open-weight release to date.

The deployment implication is where this stops being a benchmark conversation and starts being a budget conversation. 17B active parameters means a single 8x H200 node serves the model at production latency without aggressive quantization. That is the exact inflection a lot of regulated buyers have been waiting on: open weights, frontier-adjacent quality, single-node inference, on-prem feasible. Expect the next 60 days to be heavy on private-cloud announcements from vendors who have spent two years getting blocked by their customers’ data-residency lawyers.

The open question is fine-tuning ergonomics. Sparse MoE fine-tuning is still rougher than dense models in the OSS tooling stack, and that gap will decide whether 4.5 displaces dense Llama 3.x in production workloads or just stays a frontier-curiosity download for people who like watching their GPU fans spin up.

metallamaopen-weightsmoemodels

Related briefs

AMD's MI450 hits customer sampling, with twelve gigawatts of OpenAI and Meta capacity waiting

Mistral shipped a 128B open-weight model that opens its own pull requests, and the SWE-Bench number is two points off Claude

RAG is not dead. It just got a smaller job.