What “Doubao API” actually means
Doubao (豆包) is ByteDance’s consumer chatbot. Its international app is Dola, which used to be called Cici. Underneath both sits ByteDance’s Seed research family, and that family — not the chat app — is what an API gives you. Seed is the text side. Seedream generates images. Seedance generates video. Volcano Engine (its international brand is BytePlus) is the cloud that hosts all of them.
The mistake worth avoiding: reading these as one ranked lineup where a bigger number is always better. They aren’t comparable tiers. They’re different jobs. You don’t pick “Seedance over Seed” the way you pick GPT-5 over GPT-4; you pick the one whose output type matches what you’re building. Once you see it that way, “which Doubao model do I call?” stops being a research project and becomes a lookup.
Here is that lookup, mapped to the model IDs GPT Proto actually hosts:
| Your job |
Model |
Output |
| Image generation |
Seedream 5.0 (newest), 4.5, 4.0 |
Image |
| Video generation |
Seedance 2.0, 2.0 Fast, 1.5 Pro |
Video (with audio) |
| Text / reasoning |
Doubao 1.5 Pro, Seed 1.6 Thinking |
Text |
| Cheapest text / high throughput |
Seed 1.6 Flash |
Text |
| Image understanding, OCR |
Doubao 1.5 Vision Pro, Seed 1.6 |
Text from image |
One thing about those names: in code you pass the full model ID — for example doubao-seedream-5-0-260128 — exactly as shown in the samples below, hyphens and date suffix included. That long string is the literal API parameter; the short names above are just how I refer to the models in this guide.
One gap worth flagging: coding. ByteDance ships a coding-specialized model, Seed 2.0 Code, but on GPT Proto that model is marked deprecated, so I won’t send you to it. For code-leaning work the closest current option is Seed 1.6 Thinking, which is a reasoning model rather than a code specialist. If a dedicated coding model is the whole point of your project, that’s a reason to look elsewhere — I’d rather say so than oversell.
Why not just use Volcano Engine directly?
You can. ByteDance does run an international platform, and a non-China developer can sign up. The question is what each path costs you in friction, so here are the four routes:
The consumer app (Dola) is free, but it’s not an API, video generation is region-locked, and the app isn’t even available in the US, Canada, or Australia. The VPN-plus-China-account trick works for the app and breaks constantly; it gives you nothing programmatic. Volcano Engine / BytePlus direct is the real API, but the console defaults to Chinese, signup asks for ID or enterprise verification, and you take on one more billing relationship. An aggregated endpoint — what this guide uses — gives you one key, one base URL, and OpenAI-style calls with no Chinese ID; the cost is that you’re trusting a middle layer, so you should check its uptime and read its pricing before you wire it into production.
The short version: if you just want to call Seedream or Seedance from code today, an aggregated endpoint removes the two real blockers, the ID wall and the Chinese console. It does not remove your job of reading the pricing — which, for video, is where most people get surprised.
What each model actually costs
Images — flat, per image
Seedream is billed per generated image, so the number on the page is the number you pay. Note the prices aren’t ordered by version: 4.5 is more expensive than 5.0, which is more expensive than 4.0.
| Image model |
GPT Proto |
Market ref |
Note |
| Seedream 5.0 |
$0.0298 |
$0.035 |
newest, 15% under ref |
| Seedream 4.5 |
$0.034 |
$0.04 |
priciest of the three |
| Seedream 4.0 |
$0.0255 |
$0.03 |
cheapest, 128K context |
Video — billed by configuration, not flat
This is the part to get right. Seedance video cost is not a flat per-clip rate; it scales with resolution, aspect ratio, duration, and whether you generate audio. The headline price on the model page reflects a baseline configuration. A heavier configuration costs more — in one real run, a 720p / 16:9 / 5-second clip with audio came out around $0.605, well above the baseline number.
| Configuration (Seedance 2.0 fast) |
Approx. cost |
| 480p / 1:1 / 4s (baseline) |
$0.215 |
| 720p / 16:9 / 5s / audio on |
~$0.605 |
Two things to watch. First, Seedance 2.0 here sits about 10% above the market reference, not below it — if a clip is cheaper to render directly on ByteDance’s own Dreamina, that’s true, and the reason to use the API is stable programmatic access and no credit system, not a lower sticker price. Second, because cost scales with parameters, don’t quote yourself a per-video budget off the headline number; check the dashboard estimate for your actual settings.
Text — per million tokens
Standard token billing. The cheapest entry point is Seed 1.6 Flash; the heaviest are the vision-pro tiers.
| Text model |
Input /1M |
Output /1M |
Best for |
| Seed 1.6 Flash |
$0.0172 |
$0.1815 |
high-throughput, low latency |
| Doubao 1.5 Pro |
$0.0965 |
$0.2424 |
general bilingual reasoning |
| Seed 1.6 Thinking |
$0.0965 |
$0.9706 |
chain-of-thought / math |
| Doubao 1.5 Vision Pro |
$0.3641 |
$1.0924 |
document vision, OCR |
There is no standing free API tier here. You pay per call from the first request, so the cheapest way to experiment is Seed 1.6 Flash at roughly two cents per million input tokens.
Quick start: one key, two API surfaces
All you need to start is a GPT Proto API key from the dashboard. Before the first call, though, learn the one thing that trips people up: there are two API surfaces, and they don’t behave the same.
Text chat is OpenAI-compatible and synchronous, at /v1/chat/completions. Image and video are asynchronous tasks at /api/v3/doubao/… — you submit a request, get back a result id, then poll a second endpoint until the file is ready.
And the footgun that costs people their first hour: the auth header is not identical across the two. The chat endpoint’s docs pass the key raw; the image and video endpoints want “Bearer ” in front of it. Mix them up and you get a 401 with “Invalid signature.” Keep them straight and everything else is ordinary HTTP.
Calling the text models (OpenAI-compatible)
Because the text surface speaks the OpenAI format, you can point the official OpenAI SDK at it and change two things: the base URL and the model. Here it is in cURL, matching the documented format, then in Python.
curl -X POST "https://gptproto.com/v1/chat/completions" \
-H "Authorization: YOUR_GPTPROTO_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "doubao-seed-1-6-250615",
"messages": [
{ "role": "user", "content": "Who are you?" }
],
"stream": false
}'
from openai import OpenAI
client = OpenAI(
api_key="YOUR_GPTPROTO_API_KEY",
base_url="https://gptproto.com/v1",
)
resp = client.chat.completions.create(
model="doubao-seed-1-6-250615", # or doubao-1-5-pro-32k-250115
messages=[{"role": "user", "content": "Who are you?"}],
stream=False,
)
print(resp.choices[0].message.content)
Swap in Doubao 1.5 Pro when you want stronger reasoning, or Seed 1.6 Flash for the cheapest, fastest responses — just change the model field to that model’s ID (the samples show the exact strings). The errors you’ll actually meet are documented: 401 “Invalid signature” (key wrong or in the wrong header), 403 “Insufficient balance” (out of credit), and 503 “Content policy violation” (the prompt was blocked). That last one matters: these endpoints have a content policy, so don’t plan around them being unrestricted.
Generating images with the Seedream API
Seedream uses the asynchronous pattern: POST to submit, then GET to fetch the result. The request body is small — a prompt, a size, and two booleans.
# 1. Submit the task
curl --request POST \
"https://gptproto.com/api/v3/doubao/doubao-seedream-5-0-260128/text-to-image" \
--header "Authorization: Bearer YOUR_GPTPROTO_API_KEY" \
--header "Content-Type: application/json" \
--data '{
"prompt": "Young woman with auburn hair reading in a rustic coffee shop, warm Edison-bulb light, rain on the window, photorealistic, 8k, 35mm lens, f/1.8",
"size": "2048x2048",
"enable_base64_output": false,
"enable_sync_mode": false
}'
# 2. Poll for the result using the id the submit call returned
curl --request GET \
"https://gptproto.com/api/v3/predictions/YOUR_RESULT_ID/result" \
--header "Authorization: Bearer YOUR_GPTPROTO_API_KEY"
The same flow in Python, with a small poll loop:
import time, requests
API_KEY = "YOUR_GPTPROTO_API_KEY"
HEADERS = {"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json"}
# 1. Submit
submit = requests.post(
"https://gptproto.com/api/v3/doubao/doubao-seedream-5-0-260128/text-to-image",
headers=HEADERS,
json={
"prompt": "Young woman reading in a rustic coffee shop, warm Edison-bulb light, 8k",
"size": "2048x2048",
"enable_base64_output": False,
"enable_sync_mode": False,
},
)
result_id = submit.json()["id"] # confirm the exact field name in the docs response
# 2. Poll until the image is ready
while True:
r = requests.get(
f"https://gptproto.com/api/v3/predictions/{result_id}/result",
headers={"Authorization": f"Bearer {API_KEY}"},
)
data = r.json()
if data.get("status") in ("succeeded", "failed"):
break
time.sleep(2)
print(data)
Two practical notes. The size separator is not the same across versions — the Seedream 5.0 example uses 2048x2048 with an “x,” while the 4.x examples use 2048*2048 with an asterisk — so copy the format from the model you’re actually calling. And enable_sync_mode is the escape hatch from polling: set it to true and the response comes back inline, at the price of holding the connection open longer.
Generating video with the Seedance API
Video is the same submit-then-poll shape, with a richer body: aspect ratio, duration, resolution, an audio toggle, a camera lock, and a seed. The one difference in practice is time — a video takes meaningfully longer than an image, so your poll loop should expect to run for a while rather than a couple of seconds.
# 1. Submit
curl --request POST \
"https://gptproto.com/api/v3/doubao/doubao-seedance-2-0-260128/text-to-video" \
--header "Authorization: Bearer YOUR_GPTPROTO_API_KEY" \
--header "Content-Type: application/json" \
--data '{
"prompt": "Cinematic wide shot of a sun-drenched Maldives beach, friends playing volleyball, turquoise water, 8k, 35mm lens",
"aspect_ratio": "16:9",
"duration": 5,
"resolution": "720p",
"generate_audio": true,
"camera_fixed": false,
"seed": -1
}'
# 2. Poll (expect this to take a while for video)
curl --request GET \
"https://gptproto.com/api/v3/predictions/YOUR_RESULT_ID/result" \
--header "Authorization: Bearer YOUR_GPTPROTO_API_KEY"
Reach for Seedance 2.0 Fast when you want lower cost and quicker turnaround and can accept a little less polish; reach for Seedance 1.5 Pro when the newest model is overkill. Remember the pricing rule from earlier: duration, resolution, and audio drive the bill, so a 5-second 720p clip with audio is the ~$0.60 case, not the baseline number.
Real things built with the Doubao API
Two outputs from the video model, with the exact prompts that made them. Both run through the same submit-then-poll code above — only the prompt and the model ID change.
F1 broadcast realism — Seedance 2.0
Seedance 2.0 is strongest on broadcast-style realism and motion, and it can carry an identity through from a reference. This prompt leans into all three:
Prompt
Ultra-realistic F1 live TV broadcast screenshot, identity preserved exactly from reference image.
Young woman sitting in the VIP paddock / team garage during a Formula 1 race, shown on the official live race broadcast as the girlfriend of an F1 driver. It is the final lap, listening to the team radio through a professional racing headset, watching the garage monitors nervously, leaning forward with one hand near her mouth, proud tense expression.
She wears a fitted white tank top, oversized racing team jacket draped over her shoulders, large black team-radio headset with boom mic, gold jewelry, soft glam makeup. A slim paddock pass hangs from her neck.
Realistic F1 broadcast graphics: “FINAL LAP” banner, lap counter, driver timing tower on the left, small F1-style logo bug, “LIVE” indicator, lower-third identifying her as paddock guest.
Team staff, headsets, garage screens, mechanics blurred around her. Telephoto broadcast camera from across the garage, compression artifacts, digital noise, bright paddock lighting, natural skin texture, no smoothing, 8k.
Multi-shot emotion — Seedance 2.0 Fast
The fast variant still handles a five-shot sequence with continuity and mood. This is the clip from the top of the guide:
Prompt
An extremely frail elderly ballerina, 80s, in a tattered tutu, performs alone on an abandoned theater stage lit only by a single spotlight.
Shot 1: Close-up on her gnarled, arthritic feet sliding into first position on the dusty stage floor, the sound of creaking wood beneath her.
Shot 2: Wide shot — she raises her arms overhead with trembling elegance, spine straightening inch by inch, empty velvet seats stretching into darkness.
Shot 3: Medium shot — she begins to turn, slowly then faster, her tutu catching the light, dust swirling around her ankles like smoke.
Shot 4: Low-angle shot — she launches into a grand jeté, suspended in the air for a breathless moment, face locked in fierce concentration.
Shot 5: She lands, staggers one step, stands perfectly still — chest heaving, tears streaming silently — and takes a deep, solitary bow to no one.
The mood is bittersweet and haunting, soaked in faded glory and unbroken love for a life lived in motion.
A caveat the model’s own notes admit: on fast, high-motion sequences you can see texture grain and the occasional consistency wobble. Budget a retry or two for hero shots rather than assuming the first render ships.
Text and multimodal, briefly
If you came for the chat side of “doubao api,” here’s where they land. Doubao 1.5 Pro is the bilingual reasoning workhorse; Seed 1.6 Flash is the cheap, fast option; the vision-pro tiers read documents and do OCR. They all speak the OpenAI format shown above.
General frontier chat isn’t where Doubao stands out on this platform — the image and video models are the reason to be here. And the coding specialist, Seed 2.0 Code, is deprecated, so it isn’t a current option. Use the text models when you want cheap bilingual inference alongside your image and video calls under one key, not because they’ll outscore a Western frontier model on English reasoning.
Compliance and procurement notes
Because ByteDance also owns TikTok, Doubao carries procurement questions that a purely technical comparison would miss, and they’re worth naming. For most commercial use — marketing content, prototypes, internal tooling — these models are fine. For regulated industries, government, or defense work, or anywhere data residency is contractually fixed, evaluate carefully or choose a Western model; some buyers will rule it out on policy alone, and that’s a legitimate call, not a knock on the model’s quality.
On the content side, remember the 503 “Content policy violation” error: these endpoints enforce a policy and will refuse some prompts. Plan your application around that rather than around unrestricted generation.
Start building
Pick the model that matches your job and grab the request format from its page: Seedream 5.0 for images, Seedance 2.0 for video. Check live rates on the model page before you wire anything into production — especially for video, where configuration drives the bill.
If you wanted the product-level picture of Doubao — what the app is, how it compares to ChatGPT, what it’s like to use — rather than the API, that’s a different read: see our full Doubao AI review.