Mistral shipped a 128B open-weight model that opens its own pull requests, and the SWE-Bench number is two points off Claude

Mistral released Mistral Medium 3.5 this week, the company’s first flagship merged model: a single 128B dense set of weights that handles instruction-following, reasoning, multimodal input, and coding without separate specialist variants. License is the modified MIT, which lands the model squarely in the open-weights camp. Context window is 262,144 tokens. The custom vision encoder takes variable image sizes. Reasoning effort is configurable per request.

The headline number is the SWE-Bench Verified score: 77.6 percent. That is two points behind Claude Sonnet 4.6 at 79.6 percent, and it is the first time an open-weights coding model has gotten close enough to the proprietary frontier on this benchmark that an enterprise procurement deck can plausibly include it without footnotes. The remaining benchmarks (GPQA Diamond, MMLU-Pro, LiveCodeBench) have not been published by Mistral; expect the community numbers to land in the next two weeks.

The packaged-coding-agent piece is the part that should make Cursor and Windsurf nervous. Mistral is shipping the model alongside an agent that takes a task, plans the changes, edits the files, runs the tests, and opens a pull request against the target repo. Open weights, open license, self-hostable on four GPUs. The deployment story is “git clone, point at your codebase, get back a PR.” This is the part of the coding-agent value proposition that has been gated behind subscription paywalls all year, and it just became something you can run on a single Hopper box in your own VPC.

The European angle matters too. Mistral has been the political answer to “what if the open-weights ecosystem ends up being a Chinese-only thing,” and Medium 3.5 is the first model from the company that holds up against a top US closed model on a benchmark that enterprise buyers actually run. The EU AI Act’s foundation-model rules finalized two weeks ago made the regulatory environment around open weights inside Europe noticeably more workable than the US one. The Mistral pitch in late May is starting to look less like “scrappy French upstart” and more like “default infrastructure for EU government and large EU enterprise procurement,” which is a different and considerably more interesting business.

mistralopen-weightsswe-benchcoding-agentseu-aimistral-mediumhugging-face

Related briefs

Hugging Face's Inference API pricing changes and the open-source model hosting market

Llama 4.5 lands, Meta keeps quietly funding the open-weight resistance

OpenAI bills Microsoft and Stripe per merged PR, sees how that goes