Qwen3 was apparently posted early, then quickly pulled from HuggingFace and Modelscope. The large ones are MoEs, per screenshots from Reddit:
Including a 235B/22B active and a 30B/3B active.
Context appears to ‘only’ be 32K unfortunately: https://huggingface.co/qingy2024/Qwen3-0.6B/blob/main/config_4b.json
But its possible they’re still training them to 256K:
Take it all with a grain of salt, configs could change with the official release, but it appears it is happening today.
You must log in or # to comment.
Seems that there are both dense and sparse models with this launch, like the 1.5 release. This “leak” (for instance) references what appears to be a real Qwen3 32B: