GPT Proto
Schuyler Stacy2026-06-22

Doubao API: The Complete Guide (2026) — Which Model to Call, and How

No Chinese account needed: call Seedream, Seedance and Seed from one endpoint. Which Doubao model to use, real pricing, and copy-paste code.

Doubao API: The Complete Guide (2026) — Which Model to Call, and How

 

Search “doubao api” and you hit a wall fast. The official console is Volcano Engine, the pages default to Chinese, and the signup wants an identity check that most developers outside China can’t clear in an afternoon. Then there’s the naming. Doubao, Dola, Cici, Seed, Seedream, Seedance, Volcengine — all ByteDance, and none of them is obviously the thing you typed into the search box.

I write integration guides for GPTProto, and Doubao is the model family I get asked about most. So this is the guide I wanted to exist: which Doubao model does which job, what each one actually costs, and copy-paste code that calls them — text, image, and video — from a single endpoint, with no Chinese phone number in the loop.

Here is a frame the Seedance 2.0 video model produced from a text prompt, called through the API exactly the way the code further down does it. We build up to it.

What “Doubao API” actually means

Doubao (豆包) is ByteDance’s consumer chatbot. Its international app is Dola, which used to be called Cici. Underneath both sits ByteDance’s Seed research family, and that family — not the chat app — is what an API gives you. Seed is the text side. Seedream generates images. Seedance generates video. Volcano Engine (its international brand is BytePlus) is the cloud that hosts all of them.

The mistake worth avoiding: reading these as one ranked lineup where a bigger number is always better. They aren’t comparable tiers. They’re different jobs. You don’t pick “Seedance over Seed” the way you pick GPT-5 over GPT-4; you pick the one whose output type matches what you’re building. Once you see it that way, “which Doubao model do I call?” stops being a research project and becomes a lookup.

Here is that lookup, mapped to the model IDs GPT Proto actually hosts:

Your job Model Output
Image generation Seedream 5.0 (newest), 4.5, 4.0 Image
Video generation Seedance 2.0, 2.0 Fast, 1.5 Pro Video (with audio)
Text / reasoning Doubao 1.5 Pro, Seed 1.6 Thinking Text
Cheapest text / high throughput Seed 1.6 Flash Text
Image understanding, OCR Doubao 1.5 Vision Pro, Seed 1.6 Text from image

One thing about those names: in code you pass the full model ID — for example doubao-seedream-5-0-260128 — exactly as shown in the samples below, hyphens and date suffix included. That long string is the literal API parameter; the short names above are just how I refer to the models in this guide.

One gap worth flagging: coding. ByteDance ships a coding-specialized model, Seed 2.0 Code, but on GPT Proto that model is marked deprecated, so I won’t send you to it. For code-leaning work the closest current option is Seed 1.6 Thinking, which is a reasoning model rather than a code specialist. If a dedicated coding model is the whole point of your project, that’s a reason to look elsewhere — I’d rather say so than oversell.

Why not just use Volcano Engine directly?

You can. ByteDance does run an international platform, and a non-China developer can sign up. The question is what each path costs you in friction, so here are the four routes:

The consumer app (Dola) is free, but it’s not an API, video generation is region-locked, and the app isn’t even available in the US, Canada, or Australia. The VPN-plus-China-account trick works for the app and breaks constantly; it gives you nothing programmatic. Volcano Engine / BytePlus direct is the real API, but the console defaults to Chinese, signup asks for ID or enterprise verification, and you take on one more billing relationship. An aggregated endpoint — what this guide uses — gives you one key, one base URL, and OpenAI-style calls with no Chinese ID; the cost is that you’re trusting a middle layer, so you should check its uptime and read its pricing before you wire it into production.

The short version: if you just want to call Seedream or Seedance from code today, an aggregated endpoint removes the two real blockers, the ID wall and the Chinese console. It does not remove your job of reading the pricing — which, for video, is where most people get surprised.

What each model actually costs

Images — flat, per image

Seedream is billed per generated image, so the number on the page is the number you pay. Note the prices aren’t ordered by version: 4.5 is more expensive than 5.0, which is more expensive than 4.0.

Image model GPT Proto Market ref Note
Seedream 5.0 $0.0298 $0.035 newest, 15% under ref
Seedream 4.5 $0.034 $0.04 priciest of the three
Seedream 4.0 $0.0255 $0.03 cheapest, 128K context

Video — billed by configuration, not flat

This is the part to get right. Seedance video cost is not a flat per-clip rate; it scales with resolution, aspect ratio, duration, and whether you generate audio. The headline price on the model page reflects a baseline configuration. A heavier configuration costs more — in one real run, a 720p / 16:9 / 5-second clip with audio came out around $0.605, well above the baseline number.

Configuration (Seedance 2.0 fast) Approx. cost
480p / 1:1 / 4s (baseline) $0.215
720p / 16:9 / 5s / audio on ~$0.605

Two things to watch. First, Seedance 2.0 here sits about 10% above the market reference, not below it — if a clip is cheaper to render directly on ByteDance’s own Dreamina, that’s true, and the reason to use the API is stable programmatic access and no credit system, not a lower sticker price. Second, because cost scales with parameters, don’t quote yourself a per-video budget off the headline number; check the dashboard estimate for your actual settings.

Text — per million tokens

Standard token billing. The cheapest entry point is Seed 1.6 Flash; the heaviest are the vision-pro tiers.

Text model Input /1M Output /1M Best for
Seed 1.6 Flash $0.0172 $0.1815 high-throughput, low latency
Doubao 1.5 Pro $0.0965 $0.2424 general bilingual reasoning
Seed 1.6 Thinking $0.0965 $0.9706 chain-of-thought / math
Doubao 1.5 Vision Pro $0.3641 $1.0924 document vision, OCR

There is no standing free API tier here. You pay per call from the first request, so the cheapest way to experiment is Seed 1.6 Flash at roughly two cents per million input tokens.

Quick start: one key, two API surfaces

All you need to start is a GPT Proto API key from the dashboard. Before the first call, though, learn the one thing that trips people up: there are two API surfaces, and they don’t behave the same.

Text chat is OpenAI-compatible and synchronous, at /v1/chat/completions. Image and video are asynchronous tasks at /api/v3/doubao/… — you submit a request, get back a result id, then poll a second endpoint until the file is ready.

And the footgun that costs people their first hour: the auth header is not identical across the two. The chat endpoint’s docs pass the key raw; the image and video endpoints want “Bearer ” in front of it. Mix them up and you get a 401 with “Invalid signature.” Keep them straight and everything else is ordinary HTTP.

Calling the text models (OpenAI-compatible)

Because the text surface speaks the OpenAI format, you can point the official OpenAI SDK at it and change two things: the base URL and the model. Here it is in cURL, matching the documented format, then in Python.

curl -X POST "https://gptproto.com/v1/chat/completions" \
  -H "Authorization: YOUR_GPTPROTO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "doubao-seed-1-6-250615",
    "messages": [
      { "role": "user", "content": "Who are you?" }
    ],
    "stream": false
  }'
from openai import OpenAI
 
client = OpenAI(
    api_key="YOUR_GPTPROTO_API_KEY",
    base_url="https://gptproto.com/v1",
)
 
resp = client.chat.completions.create(
    model="doubao-seed-1-6-250615",          # or doubao-1-5-pro-32k-250115
    messages=[{"role": "user", "content": "Who are you?"}],
    stream=False,
)
 
print(resp.choices[0].message.content)

Swap in Doubao 1.5 Pro when you want stronger reasoning, or Seed 1.6 Flash for the cheapest, fastest responses — just change the model field to that model’s ID (the samples show the exact strings). The errors you’ll actually meet are documented: 401 “Invalid signature” (key wrong or in the wrong header), 403 “Insufficient balance” (out of credit), and 503 “Content policy violation” (the prompt was blocked). That last one matters: these endpoints have a content policy, so don’t plan around them being unrestricted.

Generating images with the Seedream API

Seedream uses the asynchronous pattern: POST to submit, then GET to fetch the result. The request body is small — a prompt, a size, and two booleans.

# 1. Submit the task
curl --request POST \
  "https://gptproto.com/api/v3/doubao/doubao-seedream-5-0-260128/text-to-image" \
  --header "Authorization: Bearer YOUR_GPTPROTO_API_KEY" \
  --header "Content-Type: application/json" \
  --data '{
    "prompt": "Young woman with auburn hair reading in a rustic coffee shop, warm Edison-bulb light, rain on the window, photorealistic, 8k, 35mm lens, f/1.8",
    "size": "2048x2048",
    "enable_base64_output": false,
    "enable_sync_mode": false
  }'
 
# 2. Poll for the result using the id the submit call returned
curl --request GET \
  "https://gptproto.com/api/v3/predictions/YOUR_RESULT_ID/result" \
  --header "Authorization: Bearer YOUR_GPTPROTO_API_KEY"

The same flow in Python, with a small poll loop:

import time, requests
 
API_KEY = "YOUR_GPTPROTO_API_KEY"
HEADERS = {"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json"}
 
# 1. Submit
submit = requests.post(
    "https://gptproto.com/api/v3/doubao/doubao-seedream-5-0-260128/text-to-image",
    headers=HEADERS,
    json={
        "prompt": "Young woman reading in a rustic coffee shop, warm Edison-bulb light, 8k",
        "size": "2048x2048",
        "enable_base64_output": False,
        "enable_sync_mode": False,
    },
)
result_id = submit.json()["id"]   # confirm the exact field name in the docs response
 
# 2. Poll until the image is ready
while True:
    r = requests.get(
        f"https://gptproto.com/api/v3/predictions/{result_id}/result",
        headers={"Authorization": f"Bearer {API_KEY}"},
    )
    data = r.json()
    if data.get("status") in ("succeeded", "failed"):
        break
    time.sleep(2)
 
print(data)

Two practical notes. The size separator is not the same across versions — the Seedream 5.0 example uses 2048x2048 with an “x,” while the 4.x examples use 2048*2048 with an asterisk — so copy the format from the model you’re actually calling. And enable_sync_mode is the escape hatch from polling: set it to true and the response comes back inline, at the price of holding the connection open longer.

Generating video with the Seedance API

Video is the same submit-then-poll shape, with a richer body: aspect ratio, duration, resolution, an audio toggle, a camera lock, and a seed. The one difference in practice is time — a video takes meaningfully longer than an image, so your poll loop should expect to run for a while rather than a couple of seconds.

# 1. Submit
curl --request POST \
  "https://gptproto.com/api/v3/doubao/doubao-seedance-2-0-260128/text-to-video" \
  --header "Authorization: Bearer YOUR_GPTPROTO_API_KEY" \
  --header "Content-Type: application/json" \
  --data '{
    "prompt": "Cinematic wide shot of a sun-drenched Maldives beach, friends playing volleyball, turquoise water, 8k, 35mm lens",
    "aspect_ratio": "16:9",
    "duration": 5,
    "resolution": "720p",
    "generate_audio": true,
    "camera_fixed": false,
    "seed": -1
  }'
 
# 2. Poll (expect this to take a while for video)
curl --request GET \
  "https://gptproto.com/api/v3/predictions/YOUR_RESULT_ID/result" \
  --header "Authorization: Bearer YOUR_GPTPROTO_API_KEY"

Reach for Seedance 2.0 Fast when you want lower cost and quicker turnaround and can accept a little less polish; reach for Seedance 1.5 Pro when the newest model is overkill. Remember the pricing rule from earlier: duration, resolution, and audio drive the bill, so a 5-second 720p clip with audio is the ~$0.60 case, not the baseline number.

Real things built with the Doubao API

Two outputs from the video model, with the exact prompts that made them. Both run through the same submit-then-poll code above — only the prompt and the model ID change.

F1 broadcast realism — Seedance 2.0

Seedance 2.0 is strongest on broadcast-style realism and motion, and it can carry an identity through from a reference. This prompt leans into all three:

Prompt
Ultra-realistic F1 live TV broadcast screenshot, identity preserved exactly from reference image.
Young woman sitting in the VIP paddock / team garage during a Formula 1 race, shown on the official live race broadcast as the girlfriend of an F1 driver. It is the final lap, listening to the team radio through a professional racing headset, watching the garage monitors nervously, leaning forward with one hand near her mouth, proud tense expression.
She wears a fitted white tank top, oversized racing team jacket draped over her shoulders, large black team-radio headset with boom mic, gold jewelry, soft glam makeup. A slim paddock pass hangs from her neck.
Realistic F1 broadcast graphics: “FINAL LAP” banner, lap counter, driver timing tower on the left, small F1-style logo bug, “LIVE” indicator, lower-third identifying her as paddock guest.
Team staff, headsets, garage screens, mechanics blurred around her. Telephoto broadcast camera from across the garage, compression artifacts, digital noise, bright paddock lighting, natural skin texture, no smoothing, 8k.

Multi-shot emotion — Seedance 2.0 Fast

The fast variant still handles a five-shot sequence with continuity and mood. This is the clip from the top of the guide:

Prompt
An extremely frail elderly ballerina, 80s, in a tattered tutu, performs alone on an abandoned theater stage lit only by a single spotlight.
Shot 1: Close-up on her gnarled, arthritic feet sliding into first position on the dusty stage floor, the sound of creaking wood beneath her.
Shot 2: Wide shot — she raises her arms overhead with trembling elegance, spine straightening inch by inch, empty velvet seats stretching into darkness.
Shot 3: Medium shot — she begins to turn, slowly then faster, her tutu catching the light, dust swirling around her ankles like smoke.
Shot 4: Low-angle shot — she launches into a grand jeté, suspended in the air for a breathless moment, face locked in fierce concentration.
Shot 5: She lands, staggers one step, stands perfectly still — chest heaving, tears streaming silently — and takes a deep, solitary bow to no one.
The mood is bittersweet and haunting, soaked in faded glory and unbroken love for a life lived in motion.

A caveat the model’s own notes admit: on fast, high-motion sequences you can see texture grain and the occasional consistency wobble. Budget a retry or two for hero shots rather than assuming the first render ships.

Text and multimodal, briefly

If you came for the chat side of “doubao api,” here’s where they land. Doubao 1.5 Pro is the bilingual reasoning workhorse; Seed 1.6 Flash is the cheap, fast option; the vision-pro tiers read documents and do OCR. They all speak the OpenAI format shown above.

General frontier chat isn’t where Doubao stands out on this platform — the image and video models are the reason to be here. And the coding specialist, Seed 2.0 Code, is deprecated, so it isn’t a current option. Use the text models when you want cheap bilingual inference alongside your image and video calls under one key, not because they’ll outscore a Western frontier model on English reasoning.

Compliance and procurement notes

Because ByteDance also owns TikTok, Doubao carries procurement questions that a purely technical comparison would miss, and they’re worth naming. For most commercial use — marketing content, prototypes, internal tooling — these models are fine. For regulated industries, government, or defense work, or anywhere data residency is contractually fixed, evaluate carefully or choose a Western model; some buyers will rule it out on policy alone, and that’s a legitimate call, not a knock on the model’s quality.

On the content side, remember the 503 “Content policy violation” error: these endpoints enforce a policy and will refuse some prompts. Plan your application around that rather than around unrestricted generation.

Start building

Pick the model that matches your job and grab the request format from its page: Seedream 5.0  for images, Seedance 2.0 for video. Check live rates on the model page  before you wire anything into production — especially for video, where configuration drives the bill.

If you wanted the product-level picture of Doubao — what the app is, how it compares to ChatGPT, what it’s like to use — rather than the API, that’s a different read: see our full Doubao AI review.

All-in-One Creative Studio

Generate images and videos here. The GPTProto API ensures fast model updates and the lowest prices.

Start Creating
All-in-One Creative Studio
Related Models
Bytedance
Bytedance
doubao-seedance-1-5-pro-251215/text-to-video is a next-gen multimodal AI model designed for transforming textual input into high-quality videos within seconds. Developed as part of the advanced doubao-seedance family, this model leverages accelerated generation speed and precise scene synthesis. Compared to basic models, it features improved temporal consistency, enhanced visual fidelity, and customizable output options. Ideal for marketing, education, creative production, and business prototyping, it empowers developers to automate video workflows with scalable API support. Its unique processing pipeline offers fast, reliable video creation from contextual prompts, setting it apart from traditional text or image-focused models.
$ 0.0408
15% off
$ 0.048
Bytedance
Bytedance
The doubao-seedream-5-0-260128/text-to-image model represents the pinnacle of semantic-to-visual translation, engineered to bridge the gap between complex natural language descriptions and breathtaking, high-resolution imagery. Developed with a focus on lighting accuracy, anatomical precision, and cultural nuance, doubao-seedream-5-0-260128/text-to-image allows creators to generate professional-grade assets in seconds. Available now on GPT Proto, this iteration optimizes latent diffusion workflows to ensure that every pixel aligns with your creative intent, making it the preferred choice for advertising, game design, and digital artistry.
$ 0.0298
15% off
$ 0.035
Bytedance
Bytedance
Doubao Seedance 2.0 is ByteDance's second-generation video model, built on a unified audio-video architecture that takes text, image, video, and audio in a single call and returns synchronized video plus audio in one pass. This page covers the text-to-video endpoint: write a prompt, get a 4–15 second clip up to 1080p with native sound. Call doubao-seedance-2-0-260128 on GPTProto from $0.2957/run — one balance, OpenAI-compatible access, no Jimeng/Volcano Ark account or regional payment setup required.
$ 0.2957
10% up
$ 0.2688
Claude
Claude
Claude Opus 4.8 Thinking is Anthropic's most advanced model, featuring deep reasoning blocks for complex logic. Use Claude for high-accuracy coding, agentic workflows, and 200k context tasks via our high-performance API at GPTProto.com.
$ 20
20% off
$ 25

FAQs

Can I use the Doubao API outside China?

Yes. Through an aggregated, OpenAI-style endpoint you can call the Seed, Seedream, and Seedance models without a Chinese phone number or ID verification.

Do I need a Volcano Engine account?

Not for this route. The aggregated endpoint is the account; you use one GPTProto key for text, image, and video.

Is there a free Doubao API tier?

No standing free tier — you pay per call from the first request. The cheapest way to experiment is Seed 1.6 Flash at about $0.0172 per million input tokens.

Which model do I use for image, video, and chat?

Seedream for images, Seedance for video, and the Seed line (1.5-pro or 1.6) for text. See the job-to-model table above.

Is the Doubao API OpenAI-compatible?

The text models are, at /v1/chat/completions. Image and video use a submit-and-poll REST pattern at /api/v3/doubao/… instead.

Where are the full docs?

The per-model API reference, with request and response schemas, lives at docs.gptproto.com.