GPT Proto
Schuyler Stacy2026-06-24

What Is Wan 2.7? Guide to Alibaba's Thinking-Mode Model (2026)

Most guides say Wan 2.7 is open source. The first-party evidence says otherwise. What Alibaba's April 2026 model actually ships — and how to run it.

What Is Wan 2.7? Guide to Alibaba's Thinking-Mode Model (2026)

Search "wan 2.7" and you get two answers that can't both be true. One set of guides tells you to download the weights and run it on your own GPU. Another says you can only reach it through an API. I went looking for which one is right, because the answer decides whether you can self-host this model at all — and the short version is that most of the confident "it's open source" posts are repeating a habit, not a fact. Here's what Wan 2.7 actually is, what Alibaba has and hasn't shipped, and how to put a Wan model into production today.

Table of contents

What Wan 2.7 actually is

Wan 2.7 is not a single model. It's a release wave from Alibaba's Tongyi Lab that landed in early April 2026 and spans two media at once. There's an image side — Wan2.7-Image and a separate Wan2.7-Image Pro, which the launch materials date to around April 1 — and a video side, a four-model suite covering text-to-video, image-to-video, reference-to-video, and instruction-based video editing, which rolled out over the following days. That dual scope is exactly why the SERP disagrees with itself about whether Wan 2.7 is an "image model" or a "video model." It's both, shipped together under one version number.

The name itself is worth pinning down, because it trips people up. "Wan" is the international shorthand for Tongyi Wanxiang (通义万相), Alibaba's creative-AI line. It sits next to the Qwen language models under the same corporate roof but is a different product family. So when a hosting platform files Wan under a qwen path, that's a cataloging choice, not a claim that Wan is a Qwen model.

One-line version: Wan 2.7 is Alibaba's April-2026 image-and-video generation wave, headlined by a feature it calls Thinking Mode.

How Thinking Mode works

Before the mechanism, the reason it exists. A standard video model maps your prompt straight to frames. There's no pause to reason about what you asked for — it reads the text, picks a path through its latent space, and decodes. That's fine for a single clean shot. It gets fragile the moment you ask for something with structure: a multi-shot sequence, a character who has to look the same in shot three as in shot one, a camera move that has to resolve on a specific beat. The model rushes the brief and the seams show.

Thinking Mode, as Alibaba describes it, inserts a planning stage ahead of generation. The model first builds a compositional plan — how to read the prompt's intent, how elements relate in space and time, what the shot logic should be — and only then generates. The pitch is fewer artifacts and better coherence across a sequence.

I'd treat the size of that gain as an open question rather than a settled number, and I'll come back to why in a moment. But the design idea is sound and it's the one genuinely new thing here. Takeaway: Thinking Mode trades a bit of speed for a planning pass, which matters most when your prompt has more than one moving part.

Key capabilities — and how much to trust each number

Here's where confidence layering matters, because the capability list for Wan 2.7 comes almost entirely from Alibaba's own launch notes, and vendor numbers deserve a label.

Alibaba reports the following: character consistency strong enough to hold a face across a clip (its fix for the "AI same-face" problem), precise color control that accepts HEX values and palettes, long-text rendering of 3,000+ tokens across 12 languages including tables and formulas drawn into frames, multi-shot narrative control, first-and-last-frame guidance, reference-to-video that accepts up to nine reference images, and native audio generated in the same pass. Those are the vendor's claims, presented as the vendor's claims. I have not been able to verify them against a neutral test.

On the hard specs, the picture is fuzzier than most posts admit. Output resolution is quoted as 720p or 1080p depending on where you run it, clip length lands somewhere in the 2-to-15-second range, and the image model is the part that carries the 4K figure. These vary by host, so treat any single number you see as "true on that platform" rather than "true of the model." Generation is also not instant — early hands-on reports describe it as noticeably slower than lighter video models, which is the quiet cost of the planning pass and worth budgeting for if you're rendering at volume.

And the honest gap: as of writing, I could not find Wan 2.7 on a neutral leaderboard — no Artificial Analysis Video Arena entry, no published VBench score for this version. That doesn't mean it's bad. It means the quality claims floating around are still vendor-reported, full stop. The closest verifiable anchor is the lineage: Wan 2.1 topped VBench at its February-2025 launch, which is a real result for an earlier model, not a stand-in for 2.7.

Claim type Examples How much to trust it
Verifiable now 2.1/2.2 are open-weight; 2.1 topped VBench at launch Confirmed on first-party repos and the benchmark
Vendor-reported Thinking Mode gains, color/text/consistency specs Alibaba's word; no neutral test yet
Varies by host 720p/1080p, 2–15s, 4K (image) Platform-dependent, not a fixed model spec

Is Wan 2.7 open source?

Short answer first: not in the way the Wan series trained everyone to expect.

Here is the fact. Wan 2.1 (February 2025) and Wan 2.2 (July 2025) shipped as open weights under Apache 2.0 — downloadable, self-hostable, fine-tunable, with no commercial restriction. You can confirm that yourself on the Wan-AI organization on Hugging Face and the Wan-Video organization on GitHub. That open-weight track record is real, and it's the reason so many guides assume 2.7 is open too.

Here is the second fact. Those same official channels — the GitHub org, the Hugging Face org, Alibaba's own ModelScope — do not list Wan 2.7 weights. As of writing, the newest open-weight release on the first-party trail is still the Wan 2.2 family (the A14B text-to-video and image-to-video models, the 5B TI2V, plus the later S2V and Animate variants). The official news log on that repo stops well before any 2.7 entry.

So when a page tells you Wan 2.7 is "open-weight under Apache 2.0," check whether it links to an actual first-party weights repository. In every case I traced, it didn't — the claim rode on the series' reputation, and at least one widely-cited source contradicted itself across its own pages. My read: Wan 2.7 follows the API-only path Alibaba already took with Wan 2.5 and 2.6, not the open path of 2.1 and 2.2. I'd hold that as a judgment, not gospel — if Alibaba uploads the weights next week, this changes — but today the verifiable evidence points one way.

What this means in practice is a clean decision. If your project genuinely needs the weights — local inference, fine-tuning, data that can't leave your infrastructure — Wan 2.7 won't give you that today, and your real open option remains Wan 2.2. If you can work through an API, 2.5, 2.6, and 2.7 are all available that way. Don't pick 2.7 expecting a download that isn't there.

Which Wan should you actually use?

The family splits cleanly once you stop treating "newest" as "best for you."

Version Access Resolution / length Notable Pick it when
Wan 2.2 Open weights (Apache 2.0) 720p, ~5s MoE, runs on a 4090 You must self-host or fine-tune
Wan 2.5 API only 1080p, ~10s Native audio You want audio without self-hosting
Wan 2.6 API only 1080p, up to 15s Multi-shot, character identity You need longer, multi-shot clips
Wan 2.7 API only 720p/1080p, 2–15s Thinking Mode, first/last frame, image+video You want planning + reference control via API

Where does Wan 2.7 sit against non-Wan models? In open-source video, the Wan line has been the reference point since 2.1 — that lineage is its real claim. Among closed API models, 2.7's differentiator is Thinking Mode plus the shared image-and-video stack, not a proven quality lead over the other current API players. I'm deliberately not putting numbers on a Wan-2.7-vs-Seedance or vs-Kling table here, because there's no neutral benchmark to back one yet and I won't manufacture one. If you want a real head-to-head, that deserves its own tested comparison rather than a throwaway row in an explainer.

How to use a Wan model via API today

Straight talk: if you're reading this on GPT Proto, note that 2.7 itself isn't in the catalog — the closest hosted sibling with the same profile (1080p, multi-shot, native audio) is Wan 2.6, at $0.45 per run. Here's a request that actually runs against it.

There are a few gotchas worth flagging before you paste anything, because they've cost people time. First, auth uses a Bearer token — these Wan endpoints sit on the /api/v3/alibaba/ path and expect Authorization: Bearer $KEY, not a raw key. Second, the website URL files the model under /qwen/ but the API path uses alibaba — different segment, same model. Third, it's a two-step async call: you POST to submit and get back an id, then GET the result endpoint to poll. Fourth, parameters differ between models — 2.6 takes audio and shot_type, while 2.2-plus and 2.5 use enable_prompt_expansion and don't.

# 1. Submit the job
curl --request POST "https://gptproto.com/api/v3/alibaba/wan-2.6/text-to-video" \
  --header "Authorization: Bearer $GPTPROTO_API_KEY" \
  --header "Content-Type: application/json" \
  --data '{
    "prompt": "A first-person POV gliding through a damp stone tunnel toward a bright exit. [0-3s] fast forward motion, light at the end growing. [3-5s] emerge into a sunlit forest, a waterfall on the right. Sound: heavy breathing, then birdsong.",
    "negative_prompt": "",
    "size": "1920*1080",
    "duration": 5,
    "prompt_extend": true,
    "audio": "",
    "shot_type": "",
    "seed": 1
  }'
# -> returns a prediction id
 
# 2. Poll for the result
curl --request GET "https://gptproto.com/api/v3/predictions/$RESULT_ID/result" \
  --header "Authorization: Bearer $GPTPROTO_API_KEY"

The same flow in Python, with the polling loop written out:

import os, time, requests
 
KEY = os.environ["GPTPROTO_API_KEY"]
HEADERS = {"Authorization": f"Bearer {KEY}", "Content-Type": "application/json"}
BASE = "https://gptproto.com/api/v3"
 
payload = {
    "prompt": (
        "A first-person POV gliding through a damp stone tunnel toward a bright exit. "
        "[0-3s] fast forward motion, light at the end growing. "
        "[3-5s] emerge into a sunlit forest, a waterfall on the right. "
        "Sound: heavy breathing, then birdsong."
    ),
    "negative_prompt": "",
    "size": "1920*1080",
    "duration": 5,
    "prompt_extend": True,
    "audio": "",
    "shot_type": "",
    "seed": 1,
}
 
submit = requests.post(f"{BASE}/alibaba/wan-2.6/text-to-video",
                       headers=HEADERS, json=payload).json()
result_id = submit["id"]
 
while True:
    r = requests.get(f"{BASE}/predictions/{result_id}/result", headers=HEADERS).json()
    if r.get("status") in ("succeeded", "failed"):
        print(r)
        break
    time.sleep(5)

Billing scales with the configuration, not a flat per-clip rate — resolution, duration, and audio all move the number, which is why the 1080p/10s preview quotes higher than the $0.45 base. Each hosted Wan model shows its current rate on its own model page, and they all share one balance with the rest of the catalog, so you're not juggling separate accounts to test 2.5 against 2.6.

Prompting for Thinking Mode

Because 2.7 plans before it renders, the prompts that work best read less like a pile of keywords and more like a brief. Give it intent and structure. A timestamped beat sheet — [0-3s] … [3-5s] … — gives the planning pass something concrete to organize around, which is the same pattern the working code above uses. Audio is part of the same prompt: a short Sound: cue ("heavy breathing, then birdsong") drives the native audio track rather than needing a separate call. And use negative_prompt to fence off the failure you actually expect — "no glowing plants, no warped hands" — instead of leaving it blank. None of this is exotic; it's just writing for a model that reads the whole brief before it starts.

Who should use it — and who shouldn't

If you need open weights — self-hosting, fine-tuning, data that stays in your walls — Wan 2.7 is the wrong pick today, and Wan 2.2 is the right one. If you want controllable image-and-video generation through an API and you can live with a closed model and a slower render, 2.7 is a reasonable choice, with the honest caveat that its quality lead is so far Alibaba's claim, not a measured one. And if you're prototyping on a budget, the cheaper hosted Wan tiers will get you moving for cents per run before you commit. Match the model to the constraint, not to the version number.

FAQ

Is Wan 2.7 open source?

Not on the evidence available. Wan 2.1 and 2.2 are open-weight under Apache 2.0; Wan 2.7's weights are not published on Alibaba's official Hugging Face, GitHub, or ModelScope channels as of writing. Treat it as API-only until first-party weights appear.

When was Wan 2.7 released?

Early April 2026 — the image models around April 1, the video suite over the following days.

Is Wan 2.7 an image model or a video model?

Both. The release includes image models (Wan2.7-Image / Image Pro) and a four-model video suite.

Wan 2.7 vs Wan 2.2 — what's the difference?

2.2 is open-weight, 720p, runs locally. 2.7 is API-only, adds Thinking Mode, first/last-frame control, reference-to-video, and image generation. Different tools for different constraints.

Is Wan 2.7 better than other video models?

Unproven on neutral benchmarks. There's no Artificial Analysis or VBench entry for it yet, so any "better than X" claim right now is vendor-reported.

Can I run Wan 2.7 locally?

Not today — there are no published weights. For local deployment, use Wan 2.2.

How much does it cost via API?

It depends on resolution, length, and audio. The closest hosted sibling, Wan 2.6, starts at $0.45 per run.