Meta released Llama 4.5 as open weights under the standard Llama community license. The architecture is a 600B-parameter sparse mixture-of-experts with roughly 17B active per token, which is the number that determines what hardware actually serves it. The headline: on a representative slice of reasoning and code evals, Llama 4.5 lands within striking distance of the closed flagships, closer than any open-weight release to date.
The deployment implication is the part operators should focus on. 17B active means a single 8x H200 node serves the model at production latency without aggressive quantization. That is the inflection a lot of regulated buyers have been waiting on: open weights, frontier-adjacent quality, single-node inference, on-prem feasible. Expect the next 60 days to be heavy on private-cloud announcements from vendors who have been blocked on data-residency arguments.
The open question is fine-tune ergonomics. Sparse MoE fine-tuning is still rougher than dense models in the OSS tooling stack, and that gap will decide whether 4.5 displaces dense Llama 3.x in real workloads or stays a frontier-curiosity download.