logo

Explore the Best AI Models Online

Browse a curated directory of cutting-edge AI models for text, images, and more. Compare capabilities, features, and pricing to find the right model for your projects.

logoBytedance
logoNovelAI
logoGrok
logoClaude
logoOpenAI
logoGoogle
logoQwen
logoKling
logoMiniMax
logoTripo3d
logoGptproto
logoDeepSeek
logoHiggsfield
logoFlux
logoIdeogram
logoSuno
logoMidjourney
Models
$0.034/per time$0.04/per time

seedream-4-5-251128/text-to-image is a modern, high-performance multimodal AI model that converts text instructions into detailed and accurate images. Designed as part of the Seedream model family, it delivers reliable, creative, and context-aware results for commercial and research scenarios. Compared to its foundational base, seedream-4-5-251128/text-to-image optimizes speed and accuracy for image generation tasks, supporting seamless integration for developers and businesses. Its advanced architecture ensures fast processing, flexible input handling, and consistent output, distinguishing it from other mainstream models with robust, scalable multimodal workflows.

$0.034/per time$0.04/per time

seedream-4-5-251128/image-edit is an AI image editing model designed for high-precision visual enhancements, transformations, and content-aware modifications. Part of the seedream model family, it brings advanced deep learning techniques for seamless inpainting, upscaling, and restoration tasks. Compared to general models, seedream-4-5-251128/image-edit delivers focused, efficient image-conditional editing and supports fast, reliable workflows. This model fits professional creative industries, developers, and researchers seeking robust automated image manipulation tools with customizable features and integration support.

$0.0287/per time$0.0338/per time

doubao-seedream-4-5-251128/text-to-image is an advanced AI model optimized for generating high-quality images from textual prompts. As part of the Doubao Seedream family, the model excels in creative and professional visual synthesis, offering developers fast rendering and robust customization. With improved architectural efficiency over its predecessor, this model enables adaptable style, rich detail, and reliable outputs for art, product visualization, or marketing content. It is designed for scalable performance and stable integration, distinguishing itself by industry-grade speed, diversity, and precision in multimodal workflows.

$0.0287/per time$0.0338/per time

doubao-seedream-4-5-251128/image-edit is an advanced multimodal AI model designed for image editing. Developed under the Doubao Seedream series, it delivers high-speed, high-precision edits for diverse image-processing needs. Compared to its base models, it offers refined image understanding, improved prompt adaptation, and efficient output generation. This model is suitable for automation workflows, creative tasks, and production environments, with strengths in both quality and speed. Ideal for developers and industry teams seeking robust, scalable solutions for real-time image editing, content generation, and visual customization.

$0.0348/per time

NovelAI Diffusion V4.5 Full is a state-of-the-art diffusion model for generating high-resolution images from text prompts. It excels in creative automation, delivering vivid, contextually accurate visuals with a high degree of control and customization. Compared to earlier diffusion models, it offers faster inference, stronger prompt adherence, and broader stylistic flexibility. Its robust architecture supports easy integration into creative and production workflows, making it ideal for concept art, advertising, illustration, and rapid design development.

$0.0348/per time

nai-diffusion-4-5-full/image-to-image is an advanced AI model specializing in image-to-image conversion and enhancement. Developed by NovelAI, it is part of the powerful nai-diffusion 4.5 family, offering fast, accurate, and creative transformations across diverse visual styles. The model stands out for its reliable processing speed, customizability, and robust multi-modal capabilities. Compared to prior NaiDiffusion generations, it delivers superior resolution and flexibility for professional workflows, making it ideal for creative teams, designers, animators, and developers seeking state-of-the-art image generation.

$0.135/per time

Grok Imagine v0.9 is xAI's advanced text-to-video AI model powered by the Aurora engine, generating 6-15 second HD videos with native synchronized audio, lip-sync dialogue, music, and cinematic effects at 24 FPS. It supports image-to-video, voice prompts, and rapid rendering (<15 seconds) for marketing, storytelling, and prototyping via X Premium+ or API.

$0.135/per time

Grok Imagine v0.9 image-to-video transforms static images into 5-15 second HD clips (480p-1080p) with synchronized audio, lip-sync, music, and cinematic motion in under 30 seconds. Features 4 modes (Normal, Fun, Custom, Spicy), natural animations, camera effects, and optional soundtracks—ideal for social media, marketing, and rapid prototyping.

Input:$3.5/1M tokens$5/1M tokens
Output:$17.5/1M tokens$25/1M tokens

claude-opus-4-5-20251101 is an advanced AI language model from Anthropic’s Claude family. Designed for rapid, high-quality text generation and code, it supports broad use cases from content creation to complex analysis. Compared to previous Claude models, it brings improved reasoning, greater reliability, and more control over context windows and task-specific outputs. Professionals choose claude-opus-4-5-20251101 for its balance of speed, creativity, and precision across enterprise, research, and general productivity applications.

Input:$3.5/1M tokens$5/1M tokens
Output:$17.5/1M tokens$25/1M tokens

claude-opus-4-5-20251101/file-analysis is an Anthropic Claude Opus family model focused on robust file analysis, document parsing, and code review tasks. It delivers high-speed, accurate text and code interpretation, setting itself apart from general-purpose models through specialized workflow optimizations. It features advanced multi-file handling and context retention, making it an excellent choice for developers, data analysts, and researchers seeking scalable, reliable file-centric AI solutions.

Input:$3.5/1M tokens$5/1M tokens
Output:$17.5/1M tokens$25/1M tokens

claude-opus-4-5-20251101 is a flagship AI model from Anthropic, built for complex language comprehension, fluent generation, and programmatic reasoning. It outperforms earlier Claude models with faster responses, higher accuracy, and extensive context capabilities. Tailored for professionals in research, coding, data analysis, and customer support, it offers reliable, nuanced outputs. Compared to GPT-4 and Gemini, this release delivers strong alignment, safety, and advanced reasoning while maintaining competitive speed. Developers appreciate its scalable performance and robust integration support for modern workflows.

Input:$0.12/1M tokens$0.2/1M tokens
Output:$0.3/1M tokens$0.5/1M tokens

Grok-4-1-fast-non-reasoning is a fast and efficient AI language model designed primarily for high-speed content generation and automation. Part of the Grok family, this model emphasizes throughput and reliability over complex reasoning, making it ideal for large-scale workflows, batch processing, and scenarios where rapid responses are critical. Compared to foundational Grok models, grok-4-1-fast-non-reasoning trades deeper reasoning for optimized speed, supporting tasks such as templated copywriting, straightforward summarization, and auto-messaging. It is ideal for developers and enterprises demanding maximum efficiency and scalable performance.

Input:$0.12/1M tokens$0.2/1M tokens
Output:$0.3/1M tokens$0.5/1M tokens

Grok-4-1-fast-non-reasoning/image-to-text is a specialized AI model designed for ultra-fast image-to-text conversion. As part of the Grok 4.1 fast series, it focuses on quick and accurate extraction of textual information from images, without complex reasoning modules. Distinctively, it prioritizes response speed and throughput, making it ideal for large-scale OCR tasks, rapid document digitization, and developer pipelines needing high-efficiency vision processing. Compared to standard multimodal models, this variant trades deeper semantic interpretation for unmatched speed, making it a practical choice for direct image text extraction.

Input:$0.12/1M tokens$0.2/1M tokens
Output:$0.3/1M tokens$0.5/1M tokens

Grok-4-1-fast-reasoning is a next-generation AI language model developed by xAI, engineered for high-speed reasoning and rapid response in text-based tasks. It excels at fast, context-rich outputs in scenarios including code generation, analytics, and technical writing. Compared to standard Grok models, grok-4-1-fast-reasoning provides accelerated processing and enhanced performance for real-time applications. This model is ideal for developers and technical professionals seeking reliable and efficient AI for fast-paced workflows and dynamic environments.

Input:$0.12/1M tokens$0.2/1M tokens
Output:$0.3/1M tokens$0.5/1M tokens

Grok-4-1-fast-reasoning/image-to-text is a next-generation multimodal AI model from Grok, engineered for rapid image-to-text conversion, robust context handling, and fast reasoning. It enables seamless workflows for professionals who require precise visual content analysis alongside rapid textual interpretation. Compared to the base Grok-4-1 model, this variant uniquely integrates visual understanding with advanced natural language reasoning for efficient feedback. Its optimized speed and cross-modal logic empower developers, data scientists, and analysts to extract structured information from images while maintaining reliable response quality across integrated tasks.

Input:$0.75/1M tokens$1.25/1M tokens
Output:$6/1M tokens$10/1M tokens

GPT-5.1-Codex is an advanced coding model from OpenAI optimized for sustained, long-horizon software engineering tasks. It features a unique context compaction mechanism that preserves critical information across multiple sessions to handle large projects coherently. GPT-5.1-Codex-Max offers higher token efficiency, long-duration agentic coding workflows, and improved quality in debugging, refactoring, and CI/CD automation, making it ideal for complex and multi-file codebase management

Input:$0.75/1M tokens$1.25/1M tokens
Output:$6/1M tokens$10/1M tokens

GPT-5.1-Code image to text is a multimodal capability of GPT-5.1 that enables extracting and interpreting text directly from images. It uses advanced AI to analyze layout, fonts, and stylized or handwritten text beyond traditional OCR, supporting complex document structures and multiple languages. This feature is useful for digitizing documents, UI designs, and extracting code or information embedded in images with high accuracy and contextual understanding.

$0.0335/per time$0.134/per time

Gemini-3-Pro-Image-Preview, or Nano Banana Pro (nano banana 2) , is Google's advanced AI image model built on Gemini 3 Pro. It generates high-fidelity 1K–4K images with accurate text, deep reasoning, and enhanced editing features like 3D object control and localized changes. It enables professional-grade visuals with fast production, watermarking for authenticity, and supports complex multi-step prompts and compositions.

$0.0335/per time$0.134/per time

Gemini-3-Pro-Image-Preview image edit, also known as Nano Banana Pro (nano banana 2) image editing, provides studio-quality control for creating and modifying images. It supports localized edits like adjusting lighting, camera angles, focus, and color grading. Users can transform scenes, blend multiple images, and maintain character consistency. This model excels at generating clear, accurate text in images and supports multi-turn conversational editing by preserving visual context with thought signatures. Advanced AI reasoning and grounding with Google Search improve real-world accuracy in edits.

$0.96/per time$1.2/per time

Veo-3.1-Fast-Generate-Preview is a rapid video generation model from Google DeepMind that enables real-time creation of short, cinematic videos from text, images, or video frames, prioritizing speed and lower latency over maximum fidelity. It supports text-to-video, image-to-video, and video-to-video generation workflows with native audio and is optimized for rapid previews and iterative creative processes.

$0.96/per time$1.2/per time

Veo-3.1-fast-generate-preview image-to-video is a fast AI model that converts static images into high-quality, smooth videos with synchronized audio. It supports resolutions up to 1080p and offers quick generation within seconds, enabling creators to animate images for social media, storytelling, and prototypes with cinematic realism.

$0.96/per time$1.2/per time

Veo-3.1-fast-generate-preview video-to-video creates seamless video transitions by generating intermediate frames between given first and last video frames. It produces short, high-quality video clips with native audio, supporting 1080p and 24fps, ideal for extending scenes, creative video morphing, and rapid video production workflows.

Input:$1.2/1M tokens$2/1M tokens
Output:$7.1992/1M tokens$11.9986/1M tokens

Gemini 3 Pro was officially released by Google on November 18, 2025. It is the company’s most advanced multimodal AI model, excelling in complex reasoning, long-context understanding, and processing text, images, audio, and video. Gemini 3 Pro powers Google Search, Workspace, and developer tools, setting new standards on AI benchmarks at launch with broad enterprise and consumer integration.

Input:$1.2/1M tokens$2/1M tokens
Output:$7.1992/1M tokens$11.9986/1M tokens

Gemini 3 Pro’s image-to-text model excels at accurately interpreting and describing images. It processes complex visuals, including photos and documents, to generate precise textual descriptions and extract structured data. This enables superior OCR, video analysis, and content understanding in multilingual, real-world scenarios, making it powerful for enterprise applications requiring high-fidelity vision-to-text conversion.

Input:$1.2/1M tokens$2/1M tokens
Output:$7.1992/1M tokens$11.9986/1M tokens

gemini-3-pro-preview/file-analysis is a cutting-edge AI model from Google’s Gemini 3 family, focused on robust file and document analysis. It stands out with multimodal capabilities, efficiently processing diverse formats such as text, code, images, and PDFs. Compared to core Gemini models, it adds enhanced document handling and context-aware extraction, making it ideal for technical workflows. Its high processing speed, accuracy and adaptability help developers automate code reviews, analyze reports, and unlock insights from complex files—perfect for those seeking advanced, scalable AI file analysis.

$2.56/per time$3.2/per time

Veo-3.1-generate-preview is an advanced AI video generator by Google offering three main modes: text-to-video, image-to-video, and video-to-video. It creates high-quality 4-8 second videos in 720p/1080p with synchronized audio and realistic visuals. Key features include using up to 3 reference images for consistency, smooth transitions between start/end frames, and video extensions for longer sequences.

$2.56/per time$3.2/per time

Veo-3.1-generate-preview image-to-video lets you input one or more images (up to three reference images) to guide video content, animating objects or scenes from the image and preserving subject consistency across frames. This modality uses the input image as the initial frame to generate smooth video transitions.

$2.56/per time$3.2/per time

Veo-3.1-generate-preview video-to-video supports extending or editing existing videos by specifying first and last frames to generate seamless transitions and continuity. It enhances videos by adding realistic audiovisual elements and narrative control while maintaining coherent scene evolution.

$0.0244/per time$0.0375/per time

Qwen-Image-LoRA is an advanced AI image editing and generation model based on the Qwen-Image foundational model. It supports precise editing of images, including complex bilingual text edits in Chinese and English, multi-image batch processing, and style preservation. It allows custom LoRA models for flexible style control, enabling professionals to perform high-quality, detailed, and customizable image modifications efficiently.

$0.0244/per time$0.0375/per time

Qwen-Image-Plus-Lora extends the Qwen-Image family with LoRA (Low-Rank Adaptation) technology, enabling rapid fine-tuning or customization on specific styles or subjects using LoRA adapters. Developed by Alibaba Cloud’s Qwen team, it maintains core Qwen-Image editing and generation capabilities while supporting efficient, lightweight model adaptation for branded content, stylistic transfers, and specialized creative tasks.

$0.0195/per time$0.03/per time

Qwen-Image-Plus (also known as Qwen-Image-Edit-2509) is an advanced AI image editing model by Alibaba Cloud’s Qwen team. It supports multi-image editing, enhanced consistency in preserving identities of people and products, advanced text editing, and native ControlNet support for precise image manipulation. It excels in semantic, appearance editing, creative generation, and dynamic pose creation, enabling versatile, high-quality image edits.

Input:$0.06/1M tokens$0.15/1M tokens
Output:$0.24/1M tokens$0.6/1M tokens

gpt-4o-mini-2024-07-18 is an optimized AI language model from OpenAI’s GPT-4o family, built for fast, scalable natural language understanding and generation. It delivers multichannel support, reliable coding assistance, content creation, and data tasks in a lighter, efficient form. Compared with larger GPT-4o models, gpt-4o-mini-2024-07-18 offers reduced latency and resource demands, making it ideal for developers seeking balance between capability and responsiveness across business, education, and creative applications.

Input:$3/1M tokens$5/1M tokens
Output:$9/1M tokens$15/1M tokens

ChatGPT-4o-latest is the most recent update of OpenAI’s GPT-4 Omni (4o) model, integrated into ChatGPT as of early 2025. This version emphasizes increased creativity, clearer and more natural communication, better code handling, and more concise, focused responses. It improves instruction following, readability, and reduces clutter in outputs, available both for ChatGPT users and via the API as the current flagship multimodal chat model.

Input:$0.75/1M tokens$1.25/1M tokens
Output:$6/1M tokens$10/1M tokens

GPT-5.1 is OpenAI's newest GPT-5 series model, designed for developers. It uses adaptive reasoning to dynamically adjust thinking time, speeding up simple tasks by 2-3x without sacrificing intelligence. New features like "reasoning-free" mode, 24-hour caching, and apply_patch/shell tools significantly boost code editing and programming efficiency. This release delivers a powerful and optimized AI experience.

Input:$0.75/1M tokens$1.25/1M tokens
Output:$6/1M tokens$10/1M tokens

GPT-5.1 image-to-text refers to OpenAI’s GPT-5.1 release with enhanced multimodal capabilities that can process images and text together to generate descriptive text, captions, summaries, or structured data from visual content. It emphasizes improved image understanding, better OCR-like text extraction, and more context-aware reasoning for image inputs, along with customizable output styles and longer context handling.

$0.035/per time$0.07/per time

Grok-4-image extends Grok 4’s abilities to visual understanding and reasoning. It can interpret and analyze images, supporting multimodal interaction that combines text and vision. Future developments aim to include image generation, enabling rich AI-assisted workflows that unify text, vision, and code capabilities in one powerful system.

Input:$1.5/1M tokens$2.5/1M tokens
Output:$4.8/1M tokens$8/1M tokens

GPT-image-1-mini is OpenAI’s lightweight model for creating new images directly from textual prompts. It provides fast and affordable image generation up to 1536×1024 resolution, with adjustable quality and fidelity. It’s ideal for bulk creative applications, though maximum micro-detail and photorealism are less than premium models

Input:$1.5/1M tokens$2.5/1M tokens
Output:$4.8/1M tokens$8/1M tokens

GPT-image-1-mini / image-to-image is an OpenAI model that transforms or edits existing images using text prompts, reference images, and masks. Designed for efficient, affordable workflows, it enables targeted edits and style guidance, but has slightly reduced detail and realism compared to larger models. It’s optimized for production environments needing speed and cost savings.

$1.12/per time$1.4/per time

Kling-v2.1-master is Kuaishou's premium text-to-video and image-to-video AI model, generating 1080p cinematic clips (5-10s) with realistic physics, smooth motion, and temporal consistency. It supports 16:9/9:16/1:1 ratios via API, excels in complex prompts/camera controls, but lacks audio. Ideal for professional storytelling/marketing; costs ~$1.40-$2.80 per clip.

$1.12/per time$1.4/per time

Kling-v2.1-master text-to-video is Kuaishou's premium AI model that generates 1080p cinematic video clips (5-10s) from text prompts. It delivers smooth motion dynamics, realistic physics, temporal consistency, and precise prompt adherence across 16:9/9:16/1:1 ratios. Ideal for storytelling/marketing; no audio support; ~$1.40-$2.80 per clip via API.

$0.392/per time$0.49/per time

Kling-v2.1-pro is Kuaishou's professional-grade image-to-video AI model, generating 1080p clips (5-10s) from static images with enhanced visual fidelity, precise camera movements (pan/zoom/tilt), and smooth motion dynamics. It preserves details/textures, supports motion brush controls, and excels in cinematic storytelling for marketing/product demos. API pricing ~$0.32-$1.40 per clip.

$0.392/per time$0.49/per time

Kling-v2.1-pro "start-end-framed" refers to its Start/End Frame Conditioning feature, allowing users to upload images for the video's first and last frames. The AI generates smooth 1080p transitions (5-10s clips) between them, ensuring precise continuity, cinematic motion, and loop effects (same image for both). Ideal for product reveals, narrative beats, and seamless multi-clip workflows via API.​

$0.224/per time$0.28/per time

Kling-v2.1-standard is Kuaishou's entry-level image-to-video and text-to-video AI model, producing 720p clips (5-10s) with reliable motion, prompt adherence, and basic camera controls. More affordable (~$0.18-$0.25 per clip) than Pro/Master tiers, it's suited for social media, previews, and casual content creation via API.

$0.171/per time$0.19/per time

Hailuo 2.3 Fast is a high-speed AI video generation model focused on image-to-video creation. It produces smooth, realistic videos with dynamic motion at 2.5 times the speed of standard models and lower cost. The model supports 768p resolution clips around 6 seconds long, ideal for rapid video creation and iterative testing while maintaining good visual quality and motion fluidity.

$0.441/per time$0.49/per time

Hailuo-2.3-Pro image to video is a MiniMax-developed AI model that converts static images into smooth animated videos. It maintains image composition and color fidelity while adding fluid motion, camera transitions, and scene coherence. This model supports multi-aspect ratios and rapid generation speeds, serving creators who need high-quality video output from images efficiently.

$0.441/per time$0.49/per time

Hailuo-2.3-Pro text to video is an AI video generator developed by MiniMax, a Shanghai-based AI foundation model company. It produces cinematic 6 to 10-second 1080p videos with realistic human motions, detailed facial expressions, and dynamic camera work. The model excels in choreography, artistic style stability, and is optimized for commercial marketing and storytelling use.

$0.252/per time$0.28/per time

Hailuo-2.3-Standard image to video is a MiniMax AI model designed to animate static images into smooth, cinematic 768p videos lasting up to 10 seconds. It maintains image composition, lighting, and character details while adding realistic motion, camera movements, and scene transitions. The model balances quality and cost-effectiveness for fast, high-fidelity video production.

$0.252/per time$0.28/per time

Hailuo-2.3-Standard text to video is an AI model from MiniMax that generates 6 to 10-second videos in 1080p resolution based on text prompts. It features improved motion capture, realistic facial expressions, dynamic camera angles, and artistic style control, making it suitable for marketing, entertainment, and professional storytelling.

$0.252/per time$0.28/per time

Hailuo-02-Standard is a version of MiniMax's AI video generation model designed for producing high-quality videos from images or text prompts. It typically generates videos at 768p resolution (compared to 1080p for the Pro version) with 6 or 10 second lengths at 25 frames per second. The model excels in natural motion synthesis, advanced camera controls, and deep prompt understanding for creating cinematic videos with realistic physics. It balances fast generation times (around 4 minutes) and professional visual quality, making it suitable for social media, marketing, and creative content production.

$0.252/per time$0.28/per time

Hailuo-02-Standard image-to-video is an AI video generation model by MiniMax designed to convert static images into dynamic videos at 768p resolution with 25 frames per second. It features natural motion synthesis that preserves the integrity of the original image while creating smooth, lifelike animations. Processing time is around 4 minutes, supporting various image formats like JPG, PNG, GIF, and AVIF. The model is suitable for social media content, marketing, and creative applications, and provides consistent output quality with fast generation speed. It supports user prompts to guide the video motion and style

$0.441/per time$0.49/per time

Hailuo-02-Pro is a state-of-the-art AI video generation model developed by MiniMax. It produces professional-grade, high-definition 1080p videos up to 10 seconds long from text or image prompts. The model excels in realistic physics simulation, cinematic motions, and director-level controls such as camera angles and timing. It maintains visual and semantic consistency with low hallucination rates and is widely used for marketing, social media content, education, and prototyping.

$0.441/per time$0.49/per time

Hailuo-02-Pro image-to-video is an advanced AI video generation model by MiniMax that creates high-definition 1080p videos from a single input image combined with text prompts. It specializes in producing realistic cinematic motion with physics-based animation, including natural hair, water, and material interactions. The model supports detailed director controls such as camera movement and scene timing for professional-grade videos up to 10 seconds long. It delivers smooth, visually rich video with stable characters and accurate prompt interpretation, ideal for social media, marketing, and creative content. The workflow includes uploading an image, providing a descriptive prompt, choosing motion styles, and adjusting video length and settings for the final output.

$0.09/per time$0.1/per time

Hailuo-02-fast is MiniMax’s advanced AI video generation model producing 1080p cinematic-quality videos up to 10 seconds from text or images. It features ultra-realistic physics simulation (fluid dynamics, collision, lighting), precise director-level camera control (pan, zoom, tracking), and consistent character rendering. Ranked #2 globally, it excels in fast, professional-grade video creation with rich motion and visual effects.

$0.09/per time$0.1/per time

WAN-2.2-Plus Text-to-Video is an advanced AI model that transforms text descriptions into professional, cinematic-quality videos. It uses a 5 billion parameter architecture to generate 720p videos at 24 frames per second. The model features sophisticated controls over lighting, camera angles, and motion dynamics to create visually rich, realistic, and fluid animations. It is fast, user-friendly, and designed for creators and commercial use

$0.09/per time$0.1/per time

WAN-2.2-Plus Image-to-Video uses similar technology to animate static images, turning them into dynamic videos with natural, smooth motion. It supports complex camera movements and transitions, maintaining visual consistency and stability. The model outputs high-resolution videos and is optimized for consumer GPUs, making it accessible for both creative and professional applications. It enhances images by adding cinematic motion while preserving style and detail.

Input:$0.75/1M tokens$1.25/1M tokens
Output:$6/1M tokens$10/1M tokens

GPT-5-Chat is OpenAI's flagship multimodal chatbot interface powered by GPT-5, featuring adaptive reasoning (instant or chain-of-thought), 400K token context, reduced hallucinations (~45-80% fewer), and preset personalities (Cynic, Robot, Listener, Nerd). Excels in coding, writing, multilingual support, and real-world tasks via ChatGPT or API.

Input:$0.75/1M tokens$1.25/1M tokens
Output:$6/1M tokens$10/1M tokens

GPT-5-Chat image-to-text capability enables the model to process and analyze images alongside text inputs, generating accurate and detailed textual descriptions, visual question answers, document analysis, and multimodal reasoning. It supports various image formats and can interpret complex visuals such as charts, screenshots, and photos for use in applications like content creation, accessibility, and interactive assistants.

Input:$0.75/1M tokens$1.25/1M tokens
Output:$6/1M tokens$10/1M tokens

GPT-5-Codex is developed by OpenAI. It is a version of GPT-5 specifically optimized for software engineering tasks, featuring advanced capabilities like dynamic thinking time adjustment, autonomous coding, and deep integration with developer tools. The model is a key part of OpenAI's efforts to enhance AI-assisted programming and software development.

Input:$0.75/1M tokens$1.25/1M tokens
Output:$6/1M tokens$10/1M tokens

GPT-5-Codex/image-to-text is an advanced multimodal model from OpenAI, tailored for robust image-to-text conversion. Optimized for developers, it enables high-quality code generation and visual data extraction. Compared to the core GPT-5, this model offers specialized image understanding, faster processing, and enhanced accuracy for technical tasks. Key application areas include code review, documentation automation, and data analysis. Its speed and multimodal capability create unique value for tech-driven workflows.

$0.3/per time

Tripo3D v2.5 is an advanced AI-powered 3D modeling tool that generates high-quality 3D assets from single images and text prompts. It features improved geometric precision with sharper edges, enhanced PBR rendering for realistic materials, and seamless integration with tools like Blender and ComfyUI. It supports customizable styles, quad mesh topology, and efficient workflows for designers and game developers.

$0.01/per time

image-watermark-remover/image-to-image is a specialized deep learning AI model designed for removing watermarks from digital images. Leveraging advanced image-to-image translation techniques, it processes visual inputs to produce clean, watermark-free outputs. The model stands apart from baseline image models by its trained ability to detect and remedy visible watermarks, making it essential for media restoration tasks, digital asset management, and visual quality enhancement in both professional and technical sectors.

$0.02/per time

The image-zoom/image-to-image model is an advanced AI generative tool specialized for transforming and enhancing images. Differing from base image models, it supports high-resolution processing with versatile image-to-image transfer capabilities. Ideal for creative, technical, and professional applications, the model focuses on speed, accuracy, and flexible API integration, making it especially attractive for developers and designers seeking adaptive image solutions.

$0.01/per time

image-upscaler/image-to-image is a modern AI model designed for image enhancement and transformation. Built by reputable AI teams, this model excels at converting low-resolution or noisy images into cleaner, higher-quality versions. Compared to basic upscaling models, it offers advanced processing, faster speeds, and reliable output consistency. It is ideal for developers working in imaging, creative industries, and technical workflows requiring fast, accurate results.

$0.001/per time

image-background-remover/image-to-image is an advanced AI model designed for fast and precise background removal from images. It specializes in image-to-image transformation, making it distinct from text-based or multi-task models. Developed to support creative, commercial, and automation workflows, it delivers high-speed processing and reliable output quality for developers. Compared to basic background removal tools, this model provides optimized accuracy, multi-format compatibility, and seamless API integration. Ideal for content creators, e-commerce, and digital design industries.

$0.0156/per time$0.039/per time

Gemini 2.5 Flash Image HD is an advanced AI image generation and editing model with enhanced resolution and creative control. It supports blending multiple images, maintaining character consistency, and precise local edits through natural language prompts. The model enables users to perform tasks like background blurring, object removal, pose alteration, and colorization with real-world understanding.

$0.0156/per time$0.039/per time

Gemini 2.5 Flash Image HD is a powerful image editing feature allowing precise, targeted transformations and local edits via natural language. It enables blending multiple images, maintaining character consistency, altering poses, removing objects, and colorizing photos with fast, high-quality output and real-world understanding for creative workflows.

Input:$0.7/1M tokens$1/1M tokens
Output:$3.5/1M tokens$5/1M tokens

Claude Haiku 4.5 is Anthropic’s fastest, most cost-effective small AI model, offering near-frontier reasoning and coding, 200K-token context, and extended “thinking” for deep logic. It excels in real-time applications, supports text/image input, and delivers rapid, reliable output at one-third the cost of larger frontier models

Input:$0.7/1M tokens$1/1M tokens
Output:$3.5/1M tokens$5/1M tokens

Claude Haiku 4.5 features advanced file analysis capabilities, processing both text and images with a 200,000-token context window. It supports extended thinking for deeper reasoning, context awareness for sustained coherence in multi-session tasks, and the ability to interact with software interfaces. This makes it powerful for analyzing, summarizing, and extracting information from large documents and complex workflows seamlessly. It balances speed, cost, and near-frontier intelligence effectively.

Input:$0.7/1M tokens$1/1M tokens
Output:$3.5/1M tokens$5/1M tokens

claude-haiku-4-5-20251001 is a highly efficient AI language model from Anthropic’s Claude family. It is optimized for rapid and cost-effective text generation, coding, summarization, and professional workflows. Compared to its larger siblings like Claude Opus, it offers much faster response times with lower compute requirements, making it ideal for scalable chatbot experiences, customer support, and creative writing. Skilled at concise reasoning and dialogue, claude-haiku-4-5-20251001 is designed for developers and businesses who value agility, precise output, and seamless integration into high-volume applications.

$0.5/per time

Veo 3.1 generates smooth, high-quality videos by transforming a single image or multiple reference images into video sequences. It supports start-and-end frame control for seamless transitions, maintaining consistent characters and styles. Videos can be created in 720p or 1080p with synchronized audio, ideal for storytelling, marketing, and social media content creation.

$0.5/per time

Veo 3.1 converts detailed text prompts into vivid videos, demonstrating strong prompt understanding and cinematic style control. It produces realistic motion, character consistency, and audio synchronization with natural sounds and dialogue. This tool empowers creators to quickly generate professional, narrative-driven video content, supporting popular aspect ratios for various platforms.

$0.5/per time

veo3.1/reference-to-video is an innovative multi-modal AI model designed for accurate and high-speed video reference tasks. It supports precise mapping between text descriptions and video content, making it ideal for indexing, video search, and smart media workflows. Its robust architecture and faster processing output set it apart from earlier models. veo3.1/reference-to-video offers developers powerful referencing capabilities, enabling industry-leading solutions in content management, video analytics, and generative video pipelines.

$2.5/per time

Veo 3.1 Pro is Google's latest advanced AI video generation model designed for creating high-quality 8-second videos at 720p or 1080p with natively synchronized audio. It offers enhanced scene and shot control with features like multi-shot sequencing, reference-image guidance, and cinematic presets including lighting and camera effects. The model supports longer seamless video extensions, richer native audio including dialogue and environmental sounds, and precise editing tools for inserting or removing objects. Veo 3.1 Pro enables creators and enterprises to produce realistic, immersive, and consistent video content efficiently, perfect for media, marketing, and storytelling applications.

$2.5/per time

Veo 3.1 Pro image-to-video is an advanced feature of Google DeepMind’s Veo 3.1 AI video generation model that transforms single still images or pairs of start and end frames into high-fidelity, cinematic 1080p videos with native synchronized audio. It supports up to 3 reference images for maintaining visual consistency and offers rich creative controls including multi-shot sequencing, realistic camera motion, lighting effects, and voice-synced dialogue. This capability is designed for content creators and enterprises needing professional-quality video production with flexible scene management and enhanced prompt adherence.

$0.5/per time

Veo 3.1 Fast is a fast and cost-effective version of Google's Veo 3.1 AI video generation model that produces 4-8 second 1080p videos with synchronized native audio in under 60 seconds. It supports both text-to-video and image-to-video workflows for rapid content creation with cinematic motion and ambient sounds.

$0.5/per time

Veo 3.1 Fast image-to-video enables converting a single image into a dynamic video clip with guided motion, narrative, and sound through optional text prompts, delivering smooth and realistic audiovisual experiences quickly.

$0.5/per time

Veo 3.1 Fast reference-to-video allows using 1-3 reference images to maintain subject consistency and appearance throughout the video, ensuring continuity for characters or objects in complex scenes. This is ideal for storytelling and content requiring visual coherence across frames.

$0.096/per time$0.12/per time

Seedance-1-0-pro-250528 is ByteDance's pro-grade Seedance 1.0 video generation model variant, supporting text-to-video (T2V) and image-to-video (I2V) for 5-10s clips at up to 1080p resolution and 24 FPS. It excels in multi-shot cinematic sequences with smooth motion, camera control (pan/zoom/drone), style diversity, and temporal consistency.

$0.096/per time$0.12/per time

Seedance-1-0-pro-250528 image-to-video is a ByteDance AI model that converts images into high-quality 1080p videos with smooth, natural motion and cinematic camera effects like panning and zooming. It supports multi-shot sequences, dynamic scene transitions, and diverse visual styles, ideal for storytelling, branded content, and complex narratives. It offers fine-grained control over motion intensity, video length, and resolution.

$0.035/per time$0.07/per time

Grok-2-image is xAI's multimodal vision model for image analysis, text descriptions, visual Q&A, and content creation. It processes 4K images (JPG/PNG/PDF) with low latency (<500ms), supports real-time apps, and integrates with X platform. Outperforms GPT-4 Vision in efficiency for e-commerce, healthcare, and marketing.

$0.3243/per time$0.4054/per time

Sora-2-Pro is OpenAI’s most advanced AI video generation model that produces short videos with synchronized visuals and sound from text or image prompts. It enhances realism, motion physics, and audio-video coherence—delivering narrative-driven clips with accurate lip-sync, ambient sound, and expressive motion, making it ideal for creative professionals and content creators.

$0.3243/per time$0.4054/per time

Sora-2-Pro image-to-video is an advanced AI model by OpenAI that generates high-quality videos with synchronized audio from single images and text prompts. It supports resolutions up to 1792x1024 and produces clips up to 25 seconds long. This model excels in realistic motion, physics, lip-sync, and cohesive sound, making it ideal for professional cinematic, marketing, and storytelling uses

$0.0156/per time$0.039/per time

Gemini 2.5 Flash Image, also known as Nano Banana, is Google’s advanced AI model for fast, high-quality image generation and editing. It supports blending multiple images, consistent character rendering, and precise natural language editing. The model leverages real-world knowledge for context-aware visuals, offers various aspect ratios. It is cost-effective and production-ready.

$0.0156/per time$0.039/per time

Gemini-2.5-flash-image / image-edit enables precise modifications using natural language. It supports object removal, background changes, pose adjustments, and multi-image blending while maintaining character consistency. The model integrates real-world knowledge for context-aware edits and delivers fast, high-quality results.

$0.027/per time

Sora 2 text-to-video is OpenAI’s flagship AI model that generates high-fidelity, realistic videos directly from natural language prompts. It understands and simulates complex scenes, follows script-level instructions, and creates synchronized audio and persistent characters. Sora 2 excels in physical realism, cinematic quality, and multi-shot continuity for rapid content production and storytelling.​

$0.027/per time

Sora 2 image-to-video transforms a single image into a dynamic, animated video sequence. It brings still images to life with realistic motion, scene continuity, and sophisticated effects, supporting advanced editing like inpainting or style transfers. The model preserves subject and background while animating the original content for engaging marketing, entertainment, and creative projects.

Input:$2.0991/1M tokens$2.9986/1M tokens
Output:$10.5/1M tokens$15/1M tokens

claude-sonnet-4-5-20250929-thinking/text-to-text is a versatile AI language model from Anthropic, designed for high-quality text understanding and generation. It supports advanced reasoning, creative writing, and code assistance at high speed. Compared to legacy Claude models, it improves context handling, reasoning capability, and accuracy for professional workflows. Its reliability and focused text-to-text processing make it a robust choice for developers, data analysts, and content creators seeking safe, ethical AI assistance.

Input:$2.0991/1M tokens$2.9986/1M tokens
Output:$10.5/1M tokens$15/1M tokens

claude-sonnet-4-5-20250929-thinking/file-analysis is an advanced AI model in the Claude Sonnet family by Anthropic. Designed for multi-modal file analysis, it supports robust natural language processing, code interpretation, document summarization, and contextual reasoning. Its strengths include fast file parsing, accurate data extraction, and seamless integration with complex workflows. Compared to the baseline Claude Sonnet 4.5, this variant emphasizes enhanced file analytic capabilities and developer-centric features. Its ability to process varied formats makes it ideal for technical teams requiring speed, reliability, and depth in business, legal, or research settings.

Input:$2.0991/1M tokens$2.9986/1M tokens
Output:$10.5/1M tokens$15/1M tokens

claude-sonnet-4-5-20250929-thinking is a state-of-the-art AI model from the Claude family by Anthropic. It excels in natural language understanding, code generation, and advanced reasoning. This version stands out for its improved speed, higher context window, and robust multimodal abilities over earlier Sonnet variants. Designed for enterprise-grade scalability, it optimizes task-specific output for technical, creative, and analytical workflows. Its differences from base Claude models include larger input capacity and more consistent logic handling, making it an efficient tool for developers, businesses, and educators needing accurate, reliable AI solutions.

Input:$2.0991/1M tokens$2.9986/1M tokens
Output:$10.5/1M tokens$15/1M tokens

Claude Sonnet 4.5 is Anthropic's top AI for coding, reasoning, and complex tasks with up to 30+ hours of focus and 10M token context. It excels in coding accuracy (0% error rate), finance, law, medicine, and computer use with strong safety and alignment improvements.

Input:$2.0991/1M tokens$2.9986/1M tokens
Output:$10.5/1M tokens$15/1M tokens

Claude Sonnet 4.5 file-analysis excels at creating and refining professional work deliverables like presentations, spreadsheets, and documents. It improves formula accuracy, layout consistency, and formula logic in spreadsheets. It autonomously interprets, edits, and summarizes complex files, accelerating tasks like vulnerability detection and legal or financial document review with high accuracy and reliability.

Input:$2.0991/1M tokens$2.9986/1M tokens
Output:$10.5/1M tokens$15/1M tokens

claude-sonnet-4-5-20250929/web-search is a cutting-edge AI model from Anthropic's Sonnet family. It provides fast, scalable performance for developers, blending advanced language understanding, code generation, and contextual search capabilities. Compared to base Sonnet models, it supports deep web context integration for richer, real-time outputs across technical and creative tasks. Ideal for businesses and professionals, it stands out in speed, accuracy, and context-driven intelligence.

Input:$10.5/1M tokens$15/1M tokens
Output:$52.5/1M tokens$75/1M tokens

claude-opus-4-1-20250805-thinking is a next-generation AI language model in the Claude family developed by Anthropic. It offers advanced performance for text generation, programming help, and analytical tasks. Compared to its predecessors, this model brings improved context understanding, increased speed, and enhanced multi-turn reasoning. Developers appreciate its reliability, safety-centric design, and scalability. Its strengths make it ideal for creative writing, intelligent automation, and knowledge-based solutions across various industries.

Input:$10.5/1M tokens$15/1M tokens
Output:$52.5/1M tokens$75/1M tokens

claude-opus-4-1-20250805-thinking/file-analysis is a state-of-the-art AI language model built for detailed file analysis, coding workflows, and structured data interpretation. Developed by Anthropic, this model advances the Claude family with faster multi-file processing and improved reasoning. It features robust context understanding and precise content extraction, making it ideal for professionals handling technical documentation, codebases, or large datasets. Compared to previous Claude models, claude-opus-4-1-20250805-thinking/file-analysis delivers enhanced speed and accuracy in file-oriented scenarios, as well as scalable support for complex files and multi-modal data.

Input:$10.5/1M tokens$15/1M tokens
Output:$52.5/1M tokens$75/1M tokens

claude-opus-4-1-20250805-thinking/web-search is a leading AI model from Anthropic’s Claude Opus family, designed for deep reasoning and intelligent synthesis of real-time web data. It supports complex workflows, rapid information extraction, and creative content tasks. This model differentiates itself from the Claude Opus base by enhancing web search integration and analytical aptitude, ideal for situations demanding trusted, up-to-date information and advanced problem solving. Its robust multi-modal processing and scalable response speed make it a top choice for developers, researchers, and business professionals.

$0.024/per time$0.03/per time

Seedream-4-0-250828 is ByteDance’s advanced text-to-image generation model capable of producing highly detailed, ultra-high-resolution (up to 4K) images by interpreting text prompts. It features fast processing, strong prompt adherence, and supports editing and multi-image blending, making it ideal for creative, commercial, and professional visual workflows.

$0.024/per time$0.03/per time

Seedream-4-0-250828 image-edit refers to the model’s advanced image editing capability, powered by natural language instructions. Users can upload an image and describe modifications, such as background replacement, object addition or removal, style changes, or attribute adjustments, and Seedream 4.0 applies these edits at professional quality with high feature retention and strong prompt adherence, all within seconds and up to 4K resolution.

$0.03/per time

Wan 2.5 Text-to-Image generates high-quality, detailed images from text prompts, supporting artistic and realistic styles with resolutions up to 1440x1440. It offers flexible aspect ratios and prompt expansions, catering to creative, commercial, and multimedia applications.

$0.03/per time

Wan 2.5 Image Edit allows instruction-based interactive editing of images or videos, enabling object removal, addition, or repositioning with natural language commands. This AI-powered editing integrates visual reasoning for refined and adaptive modifications.

$0.03/per time

Wan 2.5 Text to Video creates cinematic videos up to 10 seconds long at 1080p from textual descriptions, with realistic motion, lighting, and rich temporal details. It also generates synchronized audio including voice and ambient sound, ideal for storytelling and marketing.

$0.03/per time

Wan 2.5 Image to Video dynamically animates still images into videos, preserving scene structure, lighting, and perspective. It produces smooth, natural camera movements and transitions with audio synchronization, supporting diverse aspect ratios and high visual fidelity.

$0.28/per time$0.35/per time

Kling-v2.5-turbo-pro is a state-of-the-art AI video generator delivering high-quality, cinematic videos with realistic motion, advanced physics, and smooth transitions. It supports up to 10-second HD videos in multiple aspect ratios with up to 2500-character prompts, ideal for marketing, entertainment, education, and professional use.

$0.28/per time$0.35/per time

Kling-v2.5-turbo-pro text-to-video converts detailed text descriptions into dynamic videos featuring lifelike character expressions, natural movements, and advanced camera control. It offers rapid generation with professional-level output, supporting complex multi-step prompts and creative customization, suitable for social media, advertising, and storytelling applications.

$0.28/per time$0.35/per time

Kling-v2.5-turbo-pro is a state-of-the-art AI video generator delivering high-quality, cinematic videos with realistic motion, advanced physics, and smooth transitions. It supports up to 10-second HD videos in multiple aspect ratios with up to 2500-character prompts, ideal for marketing, entertainment, education, and professional use.

Input:$0/1M tokens
Output:$36/1M tokens$60/1M tokens

Speech-2.5-turbo-preview is a high-definition text-to-speech model supporting 40 languages with natural, expressive voices. It offers fast, real-time streaming, precise voice replication, customizable parameters, and is suitable for conversational AI, content creation, and global applications requiring emotional nuance and low latency.

$0.5003/per time$0.8338/per time

Speech-2.5-turbo-preview-voice-clone is MiniMax's fast text-to-speech variant with integrated voice cloning, enabling realistic replication from 6-second audio samples across 40+ languages. It preserves accents, styles, and emotions with ultra-low latency streaming, ideal for real-time apps like personalized assistants and multilingual content.

$0.5003/per time$0.8338/per time

speech-2.5-turbo-preview-voice-clone is a state-of-the-art AI voice model designed for rapid, realistic speech synthesis and precise voice cloning. Built upon the Turbo family’s fast generation engine, this model achieves low-latency performance ideal for real-time applications. Unlike standard speech AI, it features advanced voice reproduction and customization capabilities, making it optimal for customer service, accessibility tools, and interactive media. With robust support for multi-speaker and dynamic modulation, it enables seamless integration into production workflows.

$0.0021/per time$0.0034/per time

Speech-02-turbo is MiniMax's real-time text-to-speech (TTS) AI model designed for ultra-low latency and high-speed audio generation. It supports 100+ voices across 30+ languages with customizable parameters such as pitch, speed, volume, and emotional expression. Ideal for interactive apps like gaming, virtual meetings, and live assistants, it delivers smooth, natural voice output with advanced voice cloning features.

$0.0082/per time$0.0137/per time

Speech-02-HD is a high-definition text-to-speech (TTS) AI model developed by MiniMax, designed for producing natural, human-like voice output with studio-grade clarity. It supports over 30 languages and 300+ voices, offers advanced features like emotion and pitch control, voice cloning from short samples, and real-time streaming with low latency. It is ideal for professional voiceovers, audiobooks, and interactive applications requiring high-quality, expressive speech synthesis.

$0.5003/per time$0.8338/per time

Speech-2.5-hd-preview-voice-clone is an advanced AI speech model by MiniMax that offers ultra-realistic, high-definition voice cloning and text-to-speech synthesis. It can clone a person's voice from just seconds of audio and generate natural-sounding speech in 40+ languages, preserving accent and emotion even across languages. It supports detailed voice customization, real-time synthesis, and produces studio-quality expressive audio for applications like narrations, voiceovers, and interactive voice systems.

$0.5003/per time$0.8338/per time

speech-2.5-hd-preview-voice-clone is an advanced AI model specializing in high-definition voice cloning and speech synthesis. It delivers lifelike, expressive audio outputs suited for entertainment, customer interaction, accessibility, and more. Compared to foundational speech-2.5-hd models, the voice-clone variant offers more nuanced cloning, richer prosody, and flexible adaptation to user voice samples. Its efficient processing supports real-time deployment and precise control, standing out for professionals seeking reliable, high-quality voice generation across multimedia and service applications.

Input:$0/1M tokens
Output:$60/1M tokens$100/1M tokens

Speech-2.5-hd-preview is MiniMax's high-definition text-to-speech (TTS) model preview, featuring ultra-realistic voices, enhanced multilingual support (40+ languages), precise voice cloning (6-second clips), and real-time streaming. It offers customizable pitch, speed, emotion, and natural pronunciation for professional audio generation up to 5000 characters.

Input:$0.12/1M tokens$0.3/1M tokens
Output:$1/1M tokens$2.5/1M tokens

Gemini-2.5-flash-nothinking is a version of Google’s Gemini 2.5 Flash model with the reasoning ("thinking") feature turned off to prioritize speed and low latency. It offers fast, efficient responses suitable for simpler or high-throughput tasks where deep reasoning is unnecessary. Developers can control the "thinking budget" via API to balance quality, cost, and latency, with non-thinking mode delivering quicker outputs at a lower cost.

Input:$0.12/1M tokens$0.3/1M tokens
Output:$1/1M tokens$2.5/1M tokens

Gemini-2.5-flash-nothinking image-to-text is a mode of the Google DeepMind Gemini 2.5 Flash model that supports fast image understanding and optical character recognition (OCR) without using deep reasoning to prioritize speed. It excels in extracting readable text from images quickly for real-time applications. This model balances multimodal capabilities with low latency, suitable for automation, technical analysis, and integrated developer workflows requiring rapid visual text extraction.​​

Input:$0.12/1M tokens$0.3/1M tokens
Output:$1/1M tokens$2.5/1M tokens

gemini-2.5-flash-nothinking/file-analysis is a next-generation AI model from Google Gemini’s 2.5 family, specialized in fast, multimodal file and text analysis. It delivers rapid, context-aware processing of documents and images, making it ideal for file-heavy workflows and enterprise data tasks. Compared to other Gemini models, it focuses on speed and minimal latency with streamlined reasoning, enabling seamless integration in real-time applications. Its high accuracy and multimodal engine distinguish it from GPT and Claude, supporting scenarios requiring efficient analysis of large or complex files across industries.

$0.0247/per time$0.029/per time

Doubao Seedream 4.0-250828 is a high-speed, multimodal AI image generator from ByteDance’s Doubao team, producing ultra-high-resolution (up to 4K) images from text and image prompts in seconds, with advanced editing features, support for multi-image inputs, and strong consistency, making it ideal for professional artwork, advertising, and commercial design workflows.

$0.0247/per time$0.029/per time

doubao-seedream-4-0-250828/image-edit is a cutting-edge multimodal AI model developed by Doubao Seedream. It specializes in automated image editing, creative enhancement, and visual content transformation. Featuring advanced neural architectures, the model integrates fast processing and high-fidelity output, making it ideal for design, marketing, and web applications. Compared to standard Seedream models, this variant offers optimized workflows for image inputs, more control over output styles, and extended compatibility with creative pipelines. Its differentiated features and scalability empower developers and creative professionals seeking reliable, adaptable AI-powered image solutions.

Input:$9/1M tokens$15/1M tokens
Output:$72/1M tokens$120/1M tokens

GPT-5 Pro is an advanced variant of GPT-5 designed for the most challenging and complex tasks. It features extended reasoning capabilities, allowing it to think longer and produce more comprehensive and accurate answers than the standard GPT-5. GPT-5 Pro achieves state-of-the-art performance on difficult benchmarks, reduces major errors by 22%, and is aimed at professional and enterprise users requiring maximum AI performance and precision.

Input:$9/1M tokens$15/1M tokens
Output:$72/1M tokens$120/1M tokens

GPT-5 Pro supports image-to-text capabilities, allowing it to analyze and interpret visual content comprehensively. It can generate detailed, descriptive text from images, recognizing objects, scenes, and textual information within images. This feature enables applications like visual content analysis, enhanced image understanding, and multimodal interaction, making GPT-5 Pro highly effective for complex visual and textual tasks.

Input:$0.2432/1M tokens$0.2703/1M tokens
Output:$0.973/1M tokens$1.0811/1M tokens

DeepSeek-V3 is an open-source AI language model with 671 billion parameters and 37 billion activated per token. It uses a Mixture-of-Experts architecture and Multi-head Latent Attention for efficient, cost-effective inference and training. Supporting a 128,000-token context window, it excels in natural language understanding, reasoning, coding, and multilingual tasks, offering fast, accurate, and scalable performance for diverse applications.

$0.0315/per time$0.035/per time

Qwen-Image is a 20 billion parameter multimodal foundation model by Alibaba's Tongyi Qianwen team, specializing in high-quality image generation and precise image editing. It excels at complex text rendering, including multi-line layouts and fine details, supports multilingual input, and offers advanced editing features like style transfer, object replacement, and background generation. Qwen-Image is widely used for creative visual AI applications and available through APIs and open-source platforms.

Input:$0.495/1M tokens$0.55/1M tokens
Output:$1.9703/1M tokens$2.1892/1M tokens

DeepSeek-R1 is an advanced open-source AI model by DeepSeek designed for high-speed logical reasoning, problem-solving, and mathematical tasks. It uses a Mixture of Experts architecture combined with reinforcement learning and supervised fine-tuning to achieve powerful chain-of-thought reasoning, self-verification, and high accuracy. It excels in software development, complex reasoning, data analysis, and educational support across multiple languages.

Input:$1/1M tokens$2.5/1M tokens
Output:$4/1M tokens$10/1M tokens

gpt-4o-2024-08-06/text-to-text is a modern OpenAI language model in the GPT-4o family, designed for high-speed, accurate text generation and processing. It supports advanced content creation, code assistance, and information retrieval for developers. Compared to previous GPT models, it offers improved context handling, faster response times, and robust reliability, making it ideal for technical, business, and educational tasks.

Input:$1/1M tokens$2.5/1M tokens
Output:$4/1M tokens$10/1M tokens

gpt-4o-2024-08-06/image-to-text is OpenAI’s state-of-the-art multimodal model designed for fast and accurate image-to-text (OCR and captioning) tasks. Based on the GPT-4o architecture, it offers lightning-fast processing, robust recognition capabilities, and contextual understanding. Ideal for developers needing scalable solutions for document automation, accessibility, and data extraction. Compared to prior GPT models, it introduces native image handling and enhanced performance for mixed-modality workflows, making it a leading choice for modern multimodal applications.

Input:$1/1M tokens$2.5/1M tokens
Output:$4/1M tokens$10/1M tokens

gpt-4o-2024-08-06/web-search is OpenAI’s latest GPT-4o variant optimized for multi-modal tasks including web search, text generation, coding, and image understanding. Its core upgrade lies in enhanced speed and context handling, integrating more accurate web results and image-to-text capabilities. Compared to prior GPT-4 models, it delivers quicker and richer outputs for developers and professionals across industries seeking powerful, scalable, and flexible AI solutions.

Input:$1/1M tokens$2.5/1M tokens
Output:$4/1M tokens$10/1M tokens

gpt-4o-2024-08-06/file-analysis is a cutting-edge AI model from OpenAI, based on the GPT-4o family, tailored for in-depth, multimodal file analysis. It processes text, code, and image files efficiently. Developers rely on its fast, context-aware output and advanced reasoning to streamline workflows like content extraction, vulnerability identification, and document parsing. Compared to the base GPT-4o, it features enhanced file handling and robust analytical abilities. Ideal for technical teams needing powerful, scalable solutions for diverse input types across industries.

Input:$0.03/1M tokens$0.05/1M tokens
Output:$0.24/1M tokens$0.4/1M tokens

gpt-5-nano/text-to-text is an efficient, compact AI language model built for fast and accurate text processing. As part of the GPT-5 family, it is designed for developers and teams seeking high-throughput natural language tasks, like coding, content generation, and summarization. Compared to larger models, gpt-5-nano/text-to-text delivers optimized resource usage, predictable output, and faster response times, making it ideal for real-world applications where scalability and cost-efficiency are critical.

Input:$0.03/1M tokens$0.05/1M tokens
Output:$0.24/1M tokens$0.4/1M tokens

gpt-5-nano/web-search is a high-performance AI language model in the GPT-5 family, designed to combine fast, accurate text generation with real-time web search capabilities. Tailored for developers and technical professionals, it excels in coding tasks, data retrieval, and contextual responses using up-to-date web information. Compared to its base GPT-5 models, gpt-5-nano/web-search offers enhanced efficiency, smaller deployment footprint, and superior web integration, making it ideal for dynamic workflows that require seamless access to current data sources.

Input:$0.03/1M tokens$0.05/1M tokens
Output:$0.24/1M tokens$0.4/1M tokens

gpt-5-nano/file-analysis is a lightweight AI model specialized in fast file parsing, intelligent data extraction, and efficient document analysis. Building on the GPT-5-nano foundation, this version introduces optimized algorithms for rapid, scalable, and accurate file-oriented tasks. It is highly suitable for developers needing batch document processing, structured data extraction, and workflow automation. Its differentiating features include superior speed, minimal resource requirements, and seamless integration for enterprise and developer use cases, setting it apart from other large language models.

Input:$0.03/1M tokens$0.05/1M tokens
Output:$0.24/1M tokens$0.4/1M tokens

gpt-5-nano/image-to-text is a fast, compact multimodal AI model from the GPT-5 family, specialized in converting visual data to accurate text descriptions. Designed for developers needing speed and reliability, it blends efficient processing with high output quality. Compared to base GPT-5 models, it offers focused image understanding, faster inference, and optimized resource use. Ideal for document digitization, accessibility, and media workflows, its architecture enables stable API integration and scalable image-to-text conversion across industries.

Input:$0.15/1M tokens$0.25/1M tokens
Output:$1.2/1M tokens$2/1M tokens

gpt-5-mini/text-to-text is a streamlined AI model from the GPT-5 family, designed for quick text generation and code-oriented workflows. Its compact architecture offers faster response times and lower resource requirements than standard GPT-5 models. Ideal for developers, educators, and businesses needing scalable, lightweight solutions for everyday text tasks, it delivers reliable results with reduced infrastructure costs. gpt-5-mini/text-to-text bridges the gap between advanced AI and practical deployment at scale.

Input:$0.15/1M tokens$0.25/1M tokens
Output:$1.2/1M tokens$2/1M tokens

gpt-5-mini/file-analysis is a focused AI model designed for rapid and accurate file analysis tasks. Derived from the GPT-5-mini architecture, it offers optimized performance for text extraction, code review, and structured data parsing. Compared to the full GPT-5 line, gpt-5-mini/file-analysis provides lighter, faster processing and is ideal for situations demanding quick insights from documents, logs, or code files. Its unique differentiation lies in efficient context handling and specialized algorithms for file-based workflows. Suitable for IT, legal, finance, and research applications, it empowers developers and analysts with reliable file-driven AI capabilities.

Input:$0.15/1M tokens$0.25/1M tokens
Output:$1.2/1M tokens$2/1M tokens

gpt-5-mini/web-search is an efficient AI language model designed for high-speed web search, text generation, code help, and data analysis. Part of the GPT-5 family, it stands out for streamlined performance and real-time web integration. Unlike larger models such as GPT-5 or Gemini, gpt-5-mini/web-search specializes in fast queries and lightweight deployments. Its core strengths include quick information retrieval, accurate answers, and contextual web reasoning, making it a reliable solution for developers, researchers, and teams needing instant results. It is highly optimized for modern workflows where speed and relevance matter.

Input:$0.15/1M tokens$0.25/1M tokens
Output:$1.2/1M tokens$2/1M tokens

gpt-5-mini/image-to-text is a specialized AI model from the GPT-5-mini family, designed for rapid image-to-text conversion. Built on GPT's robust architecture, it focuses on delivering concise and accurate text outputs from images, supporting multimodal tasks. Compared to the base GPT-5-mini, this variant offers optimized image processing workflows and a streamlined API for faster performance. Industry professionals value its speed, reliability, and precise extraction—especially in document automation, data entry, and accessibility solutions.

Input:$0.75/1M tokens$1.25/1M tokens
Output:$6/1M tokens$10/1M tokens

gpt-5/text-to-text is OpenAI’s latest-generation language model, optimized for multilingual text transformation, code assistance, and advanced analysis. Faster, smarter, and more context-aware than prior GPT models, it excels in generating accurate, reliable, and creative textual outputs. With improved reasoning and customization features, gpt-5/text-to-text is ideal for developers, enterprises, and researchers seeking scalable, AI-driven solutions. Unlike GPT-4, it offers more precise context handling and enhanced workflow integration for professional use.

Input:$0.75/1M tokens$1.25/1M tokens
Output:$6/1M tokens$10/1M tokens

gpt-5/file-analysis is an advanced AI model tailored for in-depth file content understanding, code analysis, and structured data extraction. As a specialized variant of the GPT-5 model family, it stands out with optimized processing of large documents, accurate code interpretation, and robust data parsing capabilities. Unlike base GPT-5, gpt-5/file-analysis emphasizes targeted file workflows, making it ideal for developers, data analysts, and businesses that demand high-precision document or file-driven automation. It delivers scalable, reliable, and context-aware results across a spectrum of technical and business environments.

Input:$0.75/1M tokens$1.25/1M tokens
Output:$6/1M tokens$10/1M tokens

gpt-5/web-search is an advanced AI model from the fifth-generation GPT family, optimized for real-time web information retrieval and multimodal tasks. It blends state-of-the-art language understanding with the ability to process textual and online data, offering rapid, accurate results for complex queries. Unlike GPT-4 and Claude, it stands out with native web search integration, enhanced speed, and superior context handling. Developers and enterprises use gpt-5/web-search for next-level code generation, business analysis, and dynamic content creation, benefiting from its reliability, scalability, and multi-modal input processing.

Input:$0.75/1M tokens$1.25/1M tokens
Output:$6/1M tokens$10/1M tokens

gpt-5/image-to-text is a next-generation AI model built by OpenAI, focused on converting images into accurate, detailed textual descriptions. As an extension of the GPT-5 family, it merges multi-modal understanding with advanced vision capabilities. It excels in accessibility, content moderation, data labeling, and automated reporting. Unlike standard GPT-5, gpt-5/image-to-text specializes in visual context extraction and structured text generation from image inputs, offering faster inference, expanded compatibility, and robust accuracy for developers seeking seamless integration of multimodal intelligence.

$0.2842/per time$0.406/per time

Higgsfield Turbo is a speed-optimized version of the Higgsfield AI video generation platform. It offers approximately 1.5 times faster rendering speeds and around 30% cost savings compared to standard models. Turbo includes seven new motion styles for enhanced creative flexibility and priority queue access, making it ideal for rapid video creation, quick iterations, and exploring multiple styles efficiently. It maintains high-quality cinematic video outputs with professional camera movements and effects.​

$0.0875/per time$0.125/per time

Higgsfield-lite is an advanced AI video generation model by Higgsfield AI, designed to quickly transform static images and text prompts into short, cinematic video clips with lifelike motion and professional-grade camera effects. It enables creators to produce visually engaging videos with sophisticated lighting, smooth transitions, and dynamic animations, all through an intuitive platform that requires no advanced technical skills. Higgsfield-lite emphasizes fast video creation, realistic character animation, and flexible format support optimized for social media and marketing content.

$0.3941/per time$0.563/per time

Higgsfield-Standard is an AI video generation model producing 3–5 second cinematic clips with lifelike movement and professional camera effects. It features over 50 motion presets, style filters, and prompt enhancement via large language models. Designed for creators and marketers, it balances speed and quality, enabling easy video creation from text or images without advanced skills or editing software.

Input:$0.06/1M tokens$0.15/1M tokens
Output:$0.24/1M tokens$0.6/1M tokens

GPT-4o-mini is OpenAI's cost-efficient small model, outperforming GPT-3.5 Turbo on benchmarks with 128K context window, text/image inputs, and 16K output tokens. It excels in reasoning, coding, multilingual tasks, function calling at $0.15/M input, $0.60/M output—ideal for chatbots, real-time apps, and high-volume use

Input:$0.06/1M tokens$0.15/1M tokens
Output:$0.24/1M tokens$0.6/1M tokens

GPT-4o-mini supports image-to-text capabilities as part of its multimodal features. It can process image inputs to provide detailed textual descriptions, perform OCR, extract information, and interpret visual content in various applications like document analysis and data extraction. It offers 128K token context with strong accuracy and cost-efficiency for vision-language tasks.​​

Input:$10.5/1M tokens$15/1M tokens
Output:$52.5/1M tokens$75/1M tokens

claude-opus-4-1-20250805 is an advanced AI language model from Anthropic designed for precise text generation, coding, and data analysis. Building on the Claude 4 architecture, it delivers improved speed, accuracy, and understanding for complex developer workflows. This model stands out through strong reasoning, safe outputs, and adaptive capabilities—making it ideal for business, research, and technical teams requiring context-rich, reliable AI performance. Compared to previous Claude models, the opus-4-1 generation offers enhanced multi-step logic and broader integration support.

Input:$10.5/1M tokens$15/1M tokens
Output:$52.5/1M tokens$75/1M tokens

claude-opus-4-1-20250805/file-analysis is a cutting-edge model in the Claude Opus AI family, specialized for deep file analysis, document parsing, and advanced text processing. Building on Claude Opus 4’s multimodal and scalable architecture, this variant boosts accuracy and speed for complex file-driven workflows. It is engineered for developers and data professionals needing robust solutions for bulk document extraction, code review, and context-aware content analysis. Differentiated by its optimized file handling, this model excels in enterprise, legal, research, and engineering settings, delivering reliable, secure, and detailed outputs even on challenging datasets.

Input:$10.5/1M tokens$15/1M tokens
Output:$52.5/1M tokens$75/1M tokens

claude-opus-4-1-20250805/web-search is a state-of-the-art AI model from Anthropic’s Claude series, engineered for advanced natural language tasks with integrated real-time web search. It blends large-scale reasoning, coding, and enterprise security with rapid access to the latest online data, setting it apart from earlier Claude or GPT generations. The model is designed for developers and professionals seeking highly reliable, up-to-date AI analysis, automated research, and context-enriched content generation.

Input:$0.0953/1M tokens$0.1122/1M tokens
Output:$0.9637/1M tokens$1.1338/1M tokens

Doubao-seed-1-6-thinking-250715 is a ByteDance ARK multimodal LLM variant from the Seed 1.6 series, optimized for deep thinking in reasoning, coding, and math. It supports 256K context (max 224K input), 32K output, text/image/video inputs, and JSON outputs via /v1/chat/completions API.

Input:$0.0953/1M tokens$0.1122/1M tokens
Output:$0.9637/1M tokens$1.1338/1M tokens

Doubao-seed-1-6-thinking-250715 image-to-text supports multimodal inputs (text, images, video) to generate text outputs like descriptions, OCR, visual reasoning, and chart analysis via /v1/chat/completions API. With 256K context and step-by-step thinking mode, it excels in complex visual tasks such as document processing and exam problem-solving.

Input:$0.6019/1M tokens$0.7081/1M tokens
Output:$1.0843/1M tokens$1.2757/1M tokens

Doubao-1.5-pro-256k is ByteDance’s high-end Doubao 1.5 Pro large language model variant with a 256K-token context window for very long inputs. It uses a sparse Mixture-of-Experts architecture for GPT‑4-class reasoning at lower cost and supports complex tasks like multi-document analysis, coding, and research-scale long-form question answering.

Input:$0.0919/1M tokens$0.1081/1M tokens
Output:$0.9189/1M tokens$1.0811/1M tokens

Doubao-seed-1-6-thinking-250615 is an advanced ByteDance multimodal model variant optimized for deep reasoning and complex problem-solving. It supports 256K-token context, handling text, images, and video inputs with up to 16K tokens output. Key features include a hybrid sparse attention mechanism, enhanced embedding spaces, and extensive multimodal training, enabling superior understanding, logical deduction, and real-time efficiency.

Input:$0.0919/1M tokens$0.1081/1M tokens
Output:$0.9189/1M tokens$1.0811/1M tokens

Doubao-seed-1-6-thinking-250615 image-to-text leverages its native vision-language model (VLM) integration for accurate visual understanding, including detailed descriptions, OCR on high-res images, chart/diagram reasoning, and multimodal chain-of-thought deduction. It processes images with 256K text context for complex queries.

Input:$0.0172/1M tokens$0.0203/1M tokens
Output:$0.1723/1M tokens$0.2027/1M tokens

Doubao-seed-1.6-flash is a high-speed multimodal deep-thinking model supporting low-latency inference (around 10ms) with strong text and image understanding. It handles image-to-text and text-to-text tasks efficiently, with a 256K-token context window and up to 16K output tokens. It's designed for real-time interaction and complex visual/text reasoning.

Input:$0.0172/1M tokens$0.0203/1M tokens
Output:$0.1723/1M tokens$0.2027/1M tokens

Doubao-seed-1.6-flash image-to-text processes images alongside text prompts to generate detailed descriptions, visual reasoning, OCR, chart analysis, and object recognition at ultra-low latency (10ms TPOT). Its visual capabilities match pro-series competitors while supporting 256K context for complex multimodal queries.

Input:$0.0919/1M tokens$0.1081/1M tokens
Output:$0.2297/1M tokens$0.2703/1M tokens

Doubao-seed-1.6 is ByteDance's multimodal deep-thinking LLM family with 256K context, supporting text/images/video inputs and up to 16K outputs. Variants include seed-1.6 (all-round), -thinking (coding/math/logic boost), and -flash (low-latency). Excels in reasoning, tool-calling, and agentic tasks at reduced cost.

Input:$0.0919/1M tokens$0.1081/1M tokens
Output:$0.2297/1M tokens$0.2703/1M tokens

Doubao-seed-1.6 text-to-text capability means the model can understand and generate high-quality text responses from text inputs. It supports long contexts (up to 256K tokens), advanced deep reasoning, complex problem-solving, and multi-turn conversations. It excels in language tasks like question answering, summarization, code generation, and insights across diverse topics.

Input:$0.24/1M tokens$0.6/1M tokens
Output:$4.8/1M tokens$12/1M tokens

GPT-4o-mini-tts is OpenAI's text-to-speech model built on GPT-4o mini, generating natural, expressive speech from text with customizable voices, emotions, accents, and multilingual support (50+ languages). It supports real-time streaming, up to 2,000 tokens, and prompt-based styling for audiobooks, voice agents, and interactive apps via API.​

Input:$0.5/1M tokens$1.25/1M tokens
Output:$4/1M tokens$10/1M tokens

Gemini 2.5 Pro excels in complex text generation and understanding, with a massive context window of up to 1 million tokens. It supports nuanced conversation, multi-step reasoning, and API tool integration for dynamic data access. The model is optimized for expressive, coherent interactions across 24+ languages, making it ideal for advanced question answering, writing, summarization, and coding assistance.

Input:$0.5/1M tokens$1.25/1M tokens
Output:$4/1M tokens$10/1M tokens

Gemini 2.5 Pro enables high-quality image generation from text prompts with detailed control over style, composition, and content. It maintains character consistency and supports multi-image blending and precise edits. The model’s real-world knowledge integration ensures context-aware visuals. Available through Gemini API and Google AI Studio, it suits creative tasks and commercial applications needing fast, accurate image rendering.

Input:$0.5/1M tokens$1.25/1M tokens
Output:$4/1M tokens$10/1M tokens

Gemini 2.5 Pro offers powerful file analysis capabilities using an extensive token context window. It can interpret and summarize large documents, extract insights from images, code, and video, and understand multimodal inputs. Its reasoning extends across diverse data types, enabling complex workflows involving research, data mining, and content synthesis. This multimodal understanding enhances productivity in enterprise and research environments.

Input:$2.3995/1M tokens$5.9986/1M tokens
Output:$4/1M tokens$10/1M tokens

GPT-4o-transcribe is OpenAI's advanced speech-to-text model leveraging GPT-4o for superior audio transcription, outperforming Whisper v3 with lower word error rates across 50+ languages. Features 16K token context, 2K output limit, real-time WebSocket streaming, noise cancellation, speaker separation, and semantic understanding for meetings, voice agents, and live captioning via API.

Input:$2.3995/1M tokens$5.9986/1M tokens
Output:$4/1M tokens$10/1M tokens

gpt-4o-transcribe/audio-to-text is a high-performance audio transcription model by OpenAI, designed to convert speech to text with remarkable accuracy in real time. Built on the GPT-4o architecture, it extends core text understanding with advanced audio handling. The model supports multiple languages, fast response, and robust diarization, making it ideal for industries such as media, education, legal, and healthcare. Compared to standard GPT family models, gpt-4o-transcribe/audio-to-text delivers specialized audio recognition, optimized workflows, and scalable deployment for developers seeking seamless multimodal integration and reliable transcription solutions.

Input:$1.7992/1M tokens$2.9986/1M tokens
Output:$9/1M tokens$15/1M tokens

Grok 4 is xAI’s most advanced AI language model with 1.7 trillion parameters, offering highly improved reasoning, a massive 130,000-token context window, and multimodal capabilities including text and images. It excels in complex tasks such as scientific research, coding, and real-time data analysis, integrating live data from platforms like X to provide dynamic, accurate responses.

Input:$1.7992/1M tokens$2.9986/1M tokens
Output:$9/1M tokens$15/1M tokens

grok-4/image-to-text is a fourth-generation multimodal AI model from the Grok family, specialized in fast and reliable image-to-text conversion. It supports automated content extraction, object recognition, and enhanced accessibility. Unlike previous Grok models, grok-4/image-to-text delivers improved processing speed and better contextual understanding for visual inputs. Its distinct multimodal capabilities and focus on image interpretation set it apart from text-only models like GPT-4 or Claude, making it a robust choice for developers seeking scalable solutions across media analysis, digital archiving, and workflow automation.

Input:$0.8/1M tokens$2/1M tokens
Output:$3.2/1M tokens$8/1M tokens

gpt-4.1-2025-04-14/text-to-text is an advanced natural language AI model from OpenAI’s latest GPT-4.1 generation, specializing in complex text generation, intelligent code assistance, and nuanced data processing. Designed for enterprise reliability and developer productivity, it delivers more precise outputs, faster inference, and improved context understanding compared to earlier versions. Tailored for text-to-text tasks, it outperforms many general models in structured content creation, professional communication, and scalable document workflows.

Input:$0.8/1M tokens$2/1M tokens
Output:$3.2/1M tokens$8/1M tokens

gpt-4.1-2025-04-14/image-to-text is a state-of-the-art multimodal AI model by OpenAI, designed for fast and accurate image-to-text conversion. Building on the GPT-4 foundation, it features optimized image understanding and detailed textual output, making it ideal for technical, educational, and enterprise workflows. Its efficiency, multi-format support, and robust performance set it apart from traditional language-only models, offering developers superior flexibility and advanced vision-language capabilities.

Input:$0.8/1M tokens$2/1M tokens
Output:$3.2/1M tokens$8/1M tokens

gpt-4.1-2025-04-14/web-search is a next-generation large language model from OpenAI, built for advanced tasks such as dynamic text generation, coding assistance, and in-depth research. Leveraging the GPT-4.1 architecture, it seamlessly integrates up-to-date web search, enabling precise answers with real-time references. This model stands out due to its improved speed, enhanced accuracy, and robust comprehension of complex queries, making it ideal for developers, enterprises, and technical teams seeking accurate, scalable AI-powered insights.

Input:$0.8/1M tokens$2/1M tokens
Output:$3.2/1M tokens$8/1M tokens

gpt-4.1-2025-04-14/file-analysis is a specialized variant of the GPT-4.1 series, engineered for advanced file analysis, document understanding, and intelligent data extraction. It excels in parsing large-format files, extracting insights, and supporting tasks such as code review, report generation, and automated compliance checks. Compared to the base GPT-4.1 model, its core enhancements include optimized context handling for complex documents, more accurate data parsing, and seamless integration within developer workflows. Ideal for technical teams, compliance officers, and knowledge managers, it delivers fast, reliable AI support for file-centric analysis and automation.

Input:$0.0459/1M tokens$0.0541/1M tokens
Output:$0.1149/1M tokens$0.1351/1M tokens

Doubao-1-5-pro-32k-250115 is a specific version of ByteDance’s Doubao 1.5 Pro large language model with a 32K-token context window, tuned for strong reasoning and enterprise use. It uses a sparse Mixture-of-Experts architecture for high performance and efficiency, and the “250115” suffix denotes a particular dated build/release of this 32K variant for stable deployment tracking.

Input:$0.1723/1M tokens$0.2027/1M tokens
Output:$0.5169/1M tokens$0.6081/1M tokens

Doubao-1-5-vision-pro-32k-250115 is a multimodal Doubao 1.5 Vision Pro model variant from ByteDance that supports both text and image input with a 32K-token context window. It is optimized for visual reasoning, document understanding, and detailed image analysis.

Input:$0.1723/1M tokens$0.2027/1M tokens
Output:$0.5169/1M tokens$0.6081/1M tokens

Doubao-1.5-Vision-Pro-32K-250115 is a multimodal model supporting image-to-text, visual reasoning, and OCR. It analyzes images, generates precise descriptions, interprets charts, and answers visual questions. With a 32K context window and advanced vision–language fusion, it delivers reliable professional-grade understanding for captioning, document reading, and complex visual analysis.

Input:$0.12/1M tokens$0.3/1M tokens
Output:$1/1M tokens$2.5/1M tokens

Gemini 2.5 Flash is Google’s lightweight, ultra-fast AI model optimized for real-time, high-volume tasks with up to 1 million tokens context. It prioritizes speed and efficiency while maintaining strong reasoning capabilities and tool integration, making it ideal for quick writing, summarizing, and data extraction.

Input:$0.12/1M tokens$0.3/1M tokens
Output:$1/1M tokens$2.5/1M tokens

Gemini 2.5 Flash Image-to-Text processes images to generate detailed, analytical descriptions, enabling advanced vision-language workflows with fast, precise responses. It supports tasks like multi-image fusion, targeted edits, and reading hand-drawn diagrams, leveraging world knowledge for real-world understanding.

Input:$0.12/1M tokens$0.3/1M tokens
Output:$1/1M tokens$2.5/1M tokens

Gemini 2.5 Flash File Analysis specializes in parsing and summarizing complex documents and datasets, accelerating legal, financial, and vulnerability reviews by providing clear, actionable insights with high accuracy and efficiency.

$1.28/per time$3.2/per time

Veo 3 Pro is a subscription tier of Google's Veo 3 AI video generation model that offers up to 100 video generations per month at 720p resolution and 24 FPS with 8-second clip duration. It includes native synchronized audio generation, advanced prompt adherence for cinematic control, and realistic physics-based motion and lighting. Pro users may need third-party tools for watermark removal and video hosting, and it’s optimized for professional-quality short videos.

$1.28/per time$3.2/per time

Veo 3 Pro image-to-video is a premium feature of Google’s Veo 3 AI video generation model that transforms still images into dynamic 8-second videos with synchronized native audio and cinematic motion. It offers advanced creative controls to guide motion, narrative, and sound from the input image and optional text prompts. This capability supports professional-quality video creation with realistic animations, voiceovers, and sound effects via the Gemini API, aimed at creators seeking high fidelity and artistic control.

$0.48/per time$1.2/per time

Veo 3 Fast is a streamlined, speed-optimized version of Google's Veo 3 AI video generation model. It produces high-fidelity, 8-second video clips at 1080p with synchronized native audio in under one minute, significantly faster than the standard Veo 3. Veo 3 Fast supports both text-to-video and image-to-video workflows and is designed for rapid content iteration, enterprise use, and scalable video production. It features embedded SynthID watermarking and legal indemnity for enterprise users.

$0.48/per time$1.2/per time

Veo 3 Fast image-to-video is a rapid, cost-effective AI feature from Google's Veo 3 model that creates high-quality videos from a single still image with synchronized audio. It supports guiding the motion, narrative, and sound via text prompts alongside the image. Veo 3 Fast delivers smooth, cinematic motion sequences in under a minute, ideal for quick iterations and scalable video production through the Gemini API.

$0.48/per time$1.2/per time

veo3-fast/reference-to-video is a next-generation AI model designed for fast, efficient, and high-quality video generation from references. Part of the advanced veo3 family, the 'fast' variant balances speed and visual fidelity, enabling rapid prototyping and real-time creative workflows. Compared to core veo3 or slower variants, veo3-fast/reference-to-video specializes in quick outputs while maintaining robust reference-based content alignment. Ideal for developers, filmmakers, and creative professionals needing adaptable, scalable video solutions with minimal latency.

$0.032/per time$0.04/per time

Flux Kontext Pro is an advanced AI image editing tool designed for precise, context-aware editing using natural language instructions. It supports both local and large-scale scene changes while preserving character consistency and visual quality. Users can modify text, change backgrounds, adjust styles, and perform multi-turn iterative edits. It offers fast, high-quality results with compatibility for various image formats and workflows.

$0.032/per time$0.04/per time

flux-kontext-pro/text-to-image is a next-generation AI model for text-to-image synthesis. Developed by the Flux research team, it specializes in converting textual prompts into detailed visual outputs with high fidelity and speed. It supports scalable workflows and API integration for tech-oriented use cases. The model stands out for its precise rendering, interpretability controls, and flexible deployment options, differing from base models by improved context retention and output quality. Ideal for creative, engineering, and research application scenarios.

$0.064/per time$0.08/per time

Flux Kontext Max (FLUX.1 Kontext [max]) is an advanced AI model by Black Forest Labs for high-resolution, precise image generation and editing. It delivers superior prompt adherence, detailed rendering, and advanced typography control. It supports complex scene transformations, maintains character consistency, and enables high-quality automated creative workflows in enterprise and design applications.

$0.064/per time$0.08/per time

flux-kontext-max/text-to-image is a state-of-the-art model for generating high-quality images from textual input. Built by the Flux AI team, it focuses on speed, multimodal integration, and advanced control. Compared to its foundational variants, flux-kontext-max delivers faster rendering and improved fidelity, making it ideal for creative design, prototyping, and visual content development. It suits industries needing reliable text-to-image capabilities, offering flexible API support and scalable deployment.

Input:$1.8/1M tokens$3/1M tokens
Output:$9/1M tokens$15/1M tokens

Grok-3-reasoner-r is an enhanced reasoning variant of xAI’s Grok 3 model that emphasizes robust, multi-step problem solving with an extended reasoning budget. It dynamically allocates compute to deeply analyze and refine answers, providing highly accurate step-by-step solutions for complex tasks in mathematics, science, and programming. This version offers improved reliability and transparency through detailed reasoning traces and error correction.

Input:$0.18/1M tokens$0.3/1M tokens
Output:$0.3/1M tokens$0.5/1M tokens

Grok-3-mini is a lightweight, cost-effective reasoning model developed by xAI. It supports text-only input and offers a large context window of up to 131,072 tokens. Grok-3-mini excels at logic-based tasks that don't require deep domain knowledge and provides accessible reasoning traces for transparency. It supports function calling, structured outputs, and adjustable "thinking effort" for simple to complex queries, making it ideal for high-volume, cost-sensitive applications requiring scalable reasoning.

Input:$2.0991/1M tokens$2.9986/1M tokens
Output:$10.5/1M tokens$15/1M tokens

claude-sonnet-4-20250514 is the latest generation AI model from Anthropic's Claude family, offering balanced performance between speed and advanced reasoning. It supports both text and multi-modal inputs, provides reliable outputs for coding, data analysis, and business automation, and stands out with improved context windows and creative capabilities over previous Claude models. Designed for developers and enterprises, claude-sonnet-4-20250514 excels in complex tasks, scalable integration, and enhanced content safety. This model delivers a unique combination of fast responses and high accuracy, making it ideal for real-world, professional scenarios.

Input:$2.0991/1M tokens$2.9986/1M tokens
Output:$10.5/1M tokens$15/1M tokens

claude-sonnet-4-20250514/file-analysis is a specialized version of Anthropic’s Claude Sonnet 4 family focused on advanced file understanding, content extraction, and natural language response. This model excels at parsing complex documents, source code, and structured data efficiently, delivering context-rich, high-quality outputs. It stands out from general-purpose models by offering rapid file-specific insights and deeper contextual accuracy. Designed for professionals in data-driven fields, it merges Claude’s core strengths with tailored file analysis capabilities, enabling streamlined workflows for developers, researchers, and analysts seeking precise, scalable AI solutions.

Input:$2.0991/1M tokens$2.9986/1M tokens
Output:$10.5/1M tokens$15/1M tokens

claude-sonnet-4-20250514/web-search is a next-generation AI language model from Anthropic's Claude family, designed for advanced text understanding, coding, content generation, and enhanced real-time information retrieval through web search. It delivers high-speed, context-aware responses with a balanced focus on creativity, ethical alignment, and factual accuracy. Compared to previous Sonnet or Claude models, this version features updated training, broader knowledge integration, and more robust support for web-augmented queries, making it a top choice for professionals requiring dependable AI for research, coding, writing, and complex problem solving.

Input:$2.0991/1M tokens$2.9986/1M tokens
Output:$10.5/1M tokens$15/1M tokens

claude-sonnet-4-20250514-thinking is a state-of-the-art AI language model from Anthropic's Claude Sonnet series, designed for deep reasoning, creative writing, and advanced code understanding. It features fast, scalable performance, improved context retention, and strong multimodal support. Compared to previous Claude Sonnet and base Claude iterations, this version delivers enhanced logic and accuracy for complex tasks, making it a smart choice for developers, analysts, and enterprise teams tackling intricate workflows.

Input:$2.0991/1M tokens$2.9986/1M tokens
Output:$10.5/1M tokens$15/1M tokens

claude-sonnet-4-20250514-thinking/file-analysis is a powerful AI model from the Claude Sonnet 4 series, built to analyze and understand complex files efficiently. Tailored for developers and technical professionals, it offers rapid document parsing, precise code analysis, and advanced reasoning over structured data. Compared to other large language models like GPT, it excels in file-specific processing and delivers contextual, reliable outputs for demanding workflows. Its optimized architecture ensures balanced speed, accuracy, and creativity, making it a top choice for tasks like automated reporting, code generation, and technical research.

Input:$2.0991/1M tokens$2.9986/1M tokens
Output:$10.5/1M tokens$15/1M tokens

claude-sonnet-4-20250514-thinking is an advanced AI language model from Anthropic’s Claude family, designed for versatile tasks such as text generation, coding, and data analysis. Compared to base Claude models, it offers improved reasoning, speed, and context management. Its robust architecture delivers stable and creative outputs, making it ideal for developers, enterprises, and content professionals who prioritize reliable and scalable AI solutions.

Input:$1.8/1M tokens$2/1M tokens
Output:$7.2/1M tokens$8/1M tokens

o3/text-to-text is a next-generation AI language model specialized in converting prompts to high-quality text outputs. Developed for speed, versatility, and precision, it supports core tasks like content generation, programming help, and structured data transformation. Compared with other foundational models, o3/text-to-text emphasizes efficiency in workflow automation, stronger task-specific adaptation, and reliable output stability. It's ideal for developers and teams who prioritize seamless integration, scalable performance, and reliable linguistic intelligence within digital applications.

Input:$1.8/1M tokens$2/1M tokens
Output:$7.2/1M tokens$8/1M tokens

o3/image-to-text is a next-generation AI vision model specialized in converting image content to structured text. Engineered for rapid and accurate Optical Character Recognition (OCR), it enables seamless automation, accessibility, and real-time information extraction across industries. Unlike traditional OCR solutions or generic multimodal models, o3/image-to-text emphasizes speed, reliability, and adaptability, making it ideal for developers seeking robust image-to-text capabilities. It uses advanced neural architectures that excel in diverse scenarios, including document processing, automated workflows, and AI-powered accessibility tools.

Input:$1.8/1M tokens$2/1M tokens
Output:$7.2/1M tokens$8/1M tokens

o3/file-analysis is a state-of-the-art AI model designed for efficient and accurate file analysis across diverse formats. Built on Open3 foundations, it brings advanced data extraction and interpretation capabilities suited for developers and enterprises. Unlike general text models, o3/file-analysis is optimized to handle structured files, metadata, and complex documents, providing faster insights and higher accuracy for workflow integration, audit, and compliance tasks.

Input:$1.8/1M tokens$2/1M tokens
Output:$7.2/1M tokens$8/1M tokens

o3/web-search is an advanced AI model tailored for enhanced web search scenarios and intelligent content creation. It combines AI-driven natural language understanding with real-time web data retrieval, delivering fact-based, up-to-date responses. Compared to standard models, o3/web-search incorporates web integration natively for accuracy and relevancy, making it ideal for research, customer support, technical writing, and SEO-focused applications. Its robust modal capabilities and responsive design optimize data extraction, answer generation, and workflow automation for developers, businesses, and content professionals.

Input:$0.99/1M tokens$1.1/1M tokens
Output:$3.96/1M tokens$4.4/1M tokens

o4-mini/text-to-text is a compact AI language model tailored for rapid and efficient text-based tasks. With a lightweight architecture, it delivers fast inference and reliable outputs, making it suitable for real-time applications such as automated writing, coding assistance, and conversational bots. Compared to its base o4 models, o4-mini/text-to-text focuses on speed and resource savings while maintaining high output quality for most standard use cases. It's particularly valuable for developers and businesses seeking scalable, low-latency AI solutions without extensive hardware requirements.

Input:$0.99/1M tokens$1.1/1M tokens
Output:$3.96/1M tokens$4.4/1M tokens

o4-mini/image-to-text is a fast, compact AI vision model engineered for converting images into descriptive text. Belonging to the o4-mini family, this model focuses on image captioning and visual content description with improved speed and lightweight architecture. It delivers reliable performance for image analysis tasks in real time, distinguishing itself from larger multimodal models through efficiency and lower resource consumption. Its text output is precise and context-aware, making o4-mini/image-to-text ideal for applications in accessibility, content moderation, and automated media annotation. Compared to its base model, o4-mini/image-to-text is optimized for rapid deployment and use on resource-constrained environments.

Input:$0.99/1M tokens$1.1/1M tokens
Output:$3.96/1M tokens$4.4/1M tokens

o4-mini/file-analysis is a focused AI model designed for automated file analysis, data extraction, and document understanding across industries. As part of the o4-mini model family, it is optimized for speed, lightweight deployment, and specialized processing of files such as PDFs, spreadsheets, and text documents. It stands apart from base o4-mini models by offering enhanced structure recognition, smarter data parsing, and better support for enterprise workflows. Developers use it to streamline document review, compliance checks, and file-driven automation, benefiting from its precision and efficient operation, especially in technical and business scenarios.

Input:$0.99/1M tokens$1.1/1M tokens
Output:$3.96/1M tokens$4.4/1M tokens

o4-mini/web-search is a lightweight AI language model specifically optimized for web search, data extraction, and information retrieval tasks. Designed for speed and efficiency, it is well-suited for real-time indexing, summarization, and knowledge graph building. Compared to its o4-mini family base model, o4-mini/web-search introduces enhanced relevance ranking, faster query resolution, and domain-specific accuracy. Its compact architecture ensures rapid deployment for developers and seamless integration into search-driven workflows.

$0.0081/per time$0.0135/per time

Grok-3-reasoner is a specialized reasoning variant of xAI’s Grok 3 model designed for deep, multi-step problem solving. It uses test-time compute to dynamically allocate resources, allowing extended reflection, error correction, and exploration of alternative solutions. This mode excels in complex reasoning tasks like advanced mathematics, scientific research, and coding, providing transparent step-by-step thought processes and significantly improved accuracy over generalist models.

$0.048/per time$0.06/per time

ideogram-replace-background-v3/text-to-image is an advanced generative AI model specialized in transforming text prompts into high-quality images with seamless background manipulation. Building on the Ideogram family, it offers enhanced background replacement, fast processing, and precise scene adaptation. Designed for media, design, and digital marketing, it stands out for its flexibility in complex workflows and integration with enterprise imaging pipelines. Compared to standard text-to-image models, it delivers superior control over scene elements and background context.

$0.048/per time$0.06/per time

ideogram-remix-v3/text-to-image is an advanced text-to-image AI model designed for high-quality visual content generation. Leveraging diffusion-based architectures, it transforms textual prompts into coherent and detailed images. This model excels in versatility, supporting various creative workflows such as design prototyping, ad visuals, and educational illustration. Compared to its base model, ideogram-remix-v3/text-to-image introduces improvements in rendering speed, prompt adherence, and style consistency. It is ideal for developers, artists, marketers, and educators who require scalable and reliable generative imagery.

$0.048/per time$0.06/per time

Ideogram Edit v3 is an advanced AI image generation and editing model from Ideogram, focused on producing highly realistic and textually accurate images. It includes powerful editing tools like Magic Fill for adding or changing image areas, and Extend for expanding image boundaries. The model features enhanced text rendering within images, supports style reference images, and allows fine control over image composition, texture, and lighting for professional-quality visuals. It is widely used for marketing content, social media graphics, and creative design workflows.

$0.048/per time$0.06/per time

Ideogram-Reframe-V3 is an advanced image-to-image AI model designed to extend and adapt existing images to different resolutions and aspect ratios while preserving key visual elements. It enables creative expansions and modifications for various formats like JPEG, PNG, and WebP. The Reframe feature is ideal for responsive design, digital media, and creative automation, allowing developers to efficiently repurpose visuals across platforms with prompt-driven control and style customization.

$0.0865/per time$0.1081/per time

suno-music/text-to-music is a cutting-edge AI model designed to transform text prompts directly into original musical compositions. Developed by Suno, this model leverages generative audio technology to deliver fast, diverse, and coherent music creation for multimedia, entertainment, and education. Unlike language-based models, it specializes in audio synthesis, producing custom tracks from descriptive text inputs. Its unique text-to-music capability enables rapid prototyping, sound design, and creative exploration, differentiating it from standard AI models focused on language or images.

$0.0022/per time$0.0027/per time

suno-lyrics/text-to-text is a specialized AI text generation model tailored for songwriting, lyric brainstorms, and music content creation. Unlike general-purpose language models, it focuses on producing rhythmic, engaging, and thematically cohesive lyrics. It seamlessly supports creative writers and musicians by offering fast, context-aware, and genre-adaptable outputs. Built upon advanced language architectures, suno-lyrics/text-to-text features improved prompt understanding for musical contexts, making it a unique tool for modern content creators, songwriting professionals, and entertainment teams.

$0.048/per time$0.06/per time

Ideogram-Generate-V3 is an advanced AI text-to-image generation model known for high visual fidelity, photorealism, and excellent text rendering within images. Released in 2024, it supports multiple artistic styles and custom aspect ratios, enabling creation of logos, marketing visuals, and creative designs with readable text and detailed compositions. It delivers fast, high-quality images suitable for professional and creative workflows.

$0.0608/per time$0.1014/per time

Midjourney is an AI-based image generation service that transforms natural language prompts into detailed, artistic images using advanced machine learning models. Its API allows developers to integrate this capability into applications, offering features like image generation, upscaling, inpainting, and blending.

$0.0608/per time$0.1014/per time

Midjourney Image-to-Image API enables users to submit an existing image as input to generate variations, enhancements, or stylistic changes. This feature facilitates creative editing such as style transfer, background alteration, or generating image continuations, all leveraging powerful AI models to tailor outputs to user needs.

Input:$1/1M tokens$2.5/1M tokens
Output:$4/1M tokens$10/1M tokens

gpt-4o/text-to-text is OpenAI’s latest-generation language model designed for high-performance text generation and understanding. It combines optimized speed, improved logic, and multi-turn conversational skills. Ideal for real-time writing, code generation, and data analysis, gpt-4o/text-to-text stands apart from previous models like GPT-4 because of its scalable throughput and context-aware accuracy. Developers rely on it for reliable automation and productivity across business, tech, and education sectors.

Input:$1/1M tokens$2.5/1M tokens
Output:$4/1M tokens$10/1M tokens

gpt-4o/image-to-text is OpenAI’s advanced multimodal AI model designed for fast and accurate image-to-text conversion. It excels at extracting information from images, enabling high-quality OCR and contextual visual analysis. Compared to the core GPT-4o model, gpt-4o/image-to-text optimizes workflows for visual content understanding, making it ideal for technical, business, and accessibility applications. Its scalable inference, robust architecture, and multimodal capabilities support rapid integration, helping developers automate document extraction and enhance user experiences with reliable image analysis.

Input:$1/1M tokens$2.5/1M tokens
Output:$4/1M tokens$10/1M tokens

gpt-4o/web-search is a next-generation multimodal AI model from OpenAI designed for fast, accurate web-based queries, code generation, and knowledge retrieval. It improves on the GPT foundation with enhanced real-time web search integration, efficient multi-modal processing for text and images, and superior task adaptability. gpt-4o/web-search is optimized for workflows requiring up-to-date data, context-rich outputs, and high-speed interaction, making it ideal for developers, analysts, and researchers who demand reliable AI-driven solutions with scalable performance.

Input:$1/1M tokens$2.5/1M tokens
Output:$4/1M tokens$10/1M tokens

gpt-4o/file-analysis is a cutting-edge multimodal AI model based on the GPT-4o family, designed to analyze, interpret, and generate insights from diverse file types including text, code, and images. Building upon the speed and accuracy of GPT-4o, this model uniquely integrates file understanding, enabling developers to extract structured information and automate document-heavy workflows. Compared to standard GPT-4o, it further streamlines file-centric tasks, making it indispensable for software engineering, research, and business automation.

Input:$6/1M tokens$10/1M tokens
Output:$24/1M tokens$40/1M tokens

GPT Image-1 image-edit is a feature of the same OpenAI model that allows precise editing of images using text prompts and optional masks. Users can modify specific areas by adding or removing elements, adjusting styles or correcting details, leveraging GPT-image-1’s understanding of visual and textual cues for seamless image modifications.

Input:$6/1M tokens$10/1M tokens
Output:$24/1M tokens$40/1M tokens

gpt-image-1/text-to-image is a multimodal AI model designed for fast and accurate text-to-image synthesis. Developed by OpenAI as part of the GPT image model family, it brings advanced generative capabilities to image creation. This model stands out by combining the reliable architecture of GPT with adaptation for image generation, supporting industry use in digital media, creative tasks, and automation. Its optimized speed and multimodal input make it a preferred choice for developers and teams seeking robust text-to-image solutions.

Input:$0.8/1M tokens$2/1M tokens
Output:$3.2/1M tokens$8/1M tokens

gpt-4.1/text-to-text is an advanced AI model from the GPT-4 series, optimized for high-quality natural language processing tasks like text generation, summarization, translation, and code support. It offers robust language understanding, increased response speed, and fine-tuned control compared to base GPT-4. Ideal for developers and businesses, gpt-4.1/text-to-text stands out with its reliable handling of complex, long-form content, efficient scaling for large requests, and deep contextual awareness, making it a preferred choice for technical, creative, and analytical text tasks.

Input:$0.8/1M tokens$2/1M tokens
Output:$3.2/1M tokens$8/1M tokens

gpt-4.1/file-analysis is a specialized AI model from the GPT-4.1 family, designed for advanced file interpretation, code review, and data extraction. It excels at automated file processing, supporting varied codebases, document types, and complex workflows. Unlike general GPT-4 engines, gpt-4.1/file-analysis integrates unique file parsing capabilities and high-speed performance suited for technical, developer-focused environments. Its adaptable model architecture ensures reliability and efficiency in file-centric automation, making it a go-to choice for software engineers, data analysts, and IT professionals needing robust, accurate analytics in diverse file formats.

Input:$0.8/1M tokens$2/1M tokens
Output:$3.2/1M tokens$8/1M tokens

gpt-4.1/web-search is a highly advanced language model developed by OpenAI, built on the latest GPT-4.1 architecture and enhanced with real-time web browsing capabilities. It delivers precise, up-to-date responses by integrating web search data directly into outputs. Compared to standard GPT models, gpt-4.1/web-search stands out for its ability to access live information, making it ideal for research, development, content creation, and code assistance. It supports multi-modal tasks and is optimized for speed, context depth, and scalability, offering professionals and developers reliable, responsive, and accurate solutions for dynamic digital environments.

Input:$0.8/1M tokens$2/1M tokens
Output:$3.2/1M tokens$8/1M tokens

gpt-4.1/image-to-text is an advanced vision-language AI model built by OpenAI for converting images into accurate text descriptions, captions, and structured representations. As a core member of the GPT-4.1 family, it integrates enhanced image understanding with natural language processing, enabling developers to extract, classify and analyze visual data efficiently. Unlike standard GPT-4.1, this variant is optimized for quick, reliable image-to-text workflows, supporting diverse formats with high accuracy. It's widely applied in OCR, accessibility tools, document digitization, and automated QA with superior speed and context-aware outputs.

Input:$0.16/1M tokens$0.4/1M tokens
Output:$0.64/1M tokens$1.6/1M tokens

gpt-4.1-mini/text-to-text is a lightweight, high-speed AI model purpose-built for rapid and efficient text processing. As a member of the GPT-4.1 family, it inherits core natural language understanding from the base model but optimizes for minimal latency and resource usage. Suitable for real-time chatbots, summarization, and drafting tasks, it serves developers needing prompt, reliable, and cost-effective solutions. Its main differentiator is its size-to-performance ratio, delivering quality outputs in environments where speed and efficiency are critical, outpacing larger models in throughput while remaining accurate and context-aware.

Input:$0.16/1M tokens$0.4/1M tokens
Output:$0.64/1M tokens$1.6/1M tokens

gpt-4.1-mini/image-to-text is a compact multimodal AI model focusing on converting images to accurate text. As part of the GPT-4.1-mini family, it offers efficient visual data extraction and advanced OCR capability while maintaining fast inference speeds. Unlike general-purpose models, gpt-4.1-mini/image-to-text is optimized for real-time document processing, receipts recognition, and visual content parsing, making it highly relevant for developers building solutions in finance, logistics, and automation. Its precision, efficiency, and cost-effective deployment set it apart for teams needing scalable image-to-text workflows.

Input:$0.16/1M tokens$0.4/1M tokens
Output:$0.64/1M tokens$1.6/1M tokens

gpt-4.1-mini/web-search is a compact, efficient AI language model based on the GPT-4 family and finely tuned for web search applications and real-time text generation. It delivers fast responses and reliable accuracy, making it ideal for online query handling, coding assistance, and scalable content creation. Compared to full-scale models, gpt-4.1-mini/web-search emphasizes speed and resource efficiency, providing developers and businesses with a lightweight solution for tasks requiring timely, context-aware information. Its streamlined structure ensures seamless integration and precise output while maintaining robust language capabilities.

Input:$0.16/1M tokens$0.4/1M tokens
Output:$0.64/1M tokens$1.6/1M tokens

gpt-4.1-mini/file-analysis is a compact AI language model specialized in efficient file analysis, code review, and structured text extraction. Part of the GPT-4.1-mini family, it focuses on delivering fast response times, low resource usage, and high accuracy, making it ideal for developers and teams needing lightweight, reliable AI-powered file intelligence. Its core strengths include advanced code understanding, robust document processing, and seamless integration into automation pipelines—providing an efficient alternative to larger, general-purpose models.

Input:$0.04/1M tokens$0.1/1M tokens
Output:$0.16/1M tokens$0.4/1M tokens

gpt-4.1-nano/text-to-text is an efficient AI text generation model built for speed and resource efficiency. Designed on the GPT-4.1 family, it bridges core NLP capabilities and fast deployment. Its differentiator is rapid inference with reduced compute needs, making it an ideal solution for edge devices, quick-response systems, or lightweight applications. Compared to larger GPT variants, it offers faster results with lower overhead, suitable for developers needing reliable summarization, generation, or everyday language processing under strict resource constraints.

Input:$0.04/1M tokens$0.1/1M tokens
Output:$0.16/1M tokens$0.4/1M tokens

gpt-4.1-nano/image-to-text is a compact multimodal AI model by OpenAI based on the GPT-4.1-nano architecture. Designed for fast and accurate image-to-text conversion, it excels in optical character recognition, document parsing, and extracting textual content from images. Compared to full-scale GPT-4, this version offers rapid processing and lower resource usage, making it optimal for applications needing real-time results or high deployment scalability. Its speed and focused modality make it ideal for developers and businesses automating image analysis pipelines, digital archiving, accessibility, or mobile scenarios.

Input:$0.04/1M tokens$0.1/1M tokens
Output:$0.16/1M tokens$0.4/1M tokens

gpt-4.1-nano/web-search is a high-performance AI language model designed by OpenAI, part of the GPT-4.1 family. Engineered for rapid-response tasks, it excels in low-latency scenarios and integrates real-time web search capabilities. Developers leverage it for text generation, coding support, and factual queries with up-to-date information. Compared to standard GPT-4.1 models, the nano variant is optimized for efficiency, speed, and deployment flexibility, making it suitable for scalable applications, live chatbots, and knowledge-driven workflows in technology and business sectors.

Input:$0.04/1M tokens$0.1/1M tokens
Output:$0.16/1M tokens$0.4/1M tokens

gpt-4.1-nano/file-analysis is a compact, next-generation AI model designed for fast, precise document and code file analysis. As part of the GPT-4.1 family, it leverages efficient architecture and lightweight deployment, making it highly suitable for workflow automation, file auditing, and technical review scenarios. Unlike larger GPT models, nano/file-analysis emphasizes speed and resource efficiency, supporting developers and businesses needing reliable file-centric AI capabilities for seamless integration. Its specialized processing mode covers text, code, and structured file formats, ensuring consistent results with minimal overhead.

Input:$1.8/1M tokens$3/1M tokens
Output:$9/1M tokens$15/1M tokens

Grok-3 is the third-generation AI language model developed by xAI, designed to compete with leading models like GPT-4 and Gemini 2. It features enhanced reasoning, advanced real-time data integration, and a massive 128,000-token context window for deep understanding. Grok-3 offers specialized modes like "Big Brain" for complex problem-solving and "DeepSearch" for real-time information synthesis, excelling in coding, research, and multitask AI applications.

$0.0203/per time$0.0338/per time

GPT-4o-image-vip is a premium variant of OpenAI's GPT-4o multimodal model, specialized for advanced image-to-text and image generation tasks. It offers enhanced image understanding, detailed visual description, and precise text extraction from images. GPT-4o-image-vip supports high fidelity, multi-image processing, and iterative image editing, making it ideal for complex visual workflows in creative design, technical analysis, and interactive applications. It integrates seamlessly with text inputs for rich multimodal conversations and is optimized for both latency and output quality.

$0.0203/per time$0.0338/per time

GPT-4o-image-vip image-to-image refers to the advanced image generation and editing capabilities of the GPT-4o-image-vip model by OpenAI. This model enables uploading an existing image and providing precise instructions to modify, enhance, or creatively transform it while maintaining coherence in style and details. It supports high-resolution outputs, multi-turn conversational refinement, and accurate rendering of text within images. Ideal for creative workflows, design prototyping, and interactive media, GPT-4o-image-vip excels in generating production-ready visuals with realism and flexibility through natural language commands.

Input:$0.04/1M tokens$0.1/1M tokens
Output:$0.16/1M tokens$0.4/1M tokens

Gemini 2.0 Flash is an advanced AI model by Google designed for fast, accurate text processing supporting complex reasoning, extended context (up to 1 million tokens), and native tool integrations. It excels in multilingual, real-time text generation and advanced coding, research, and conversational applications.

Input:$0.04/1M tokens$0.1/1M tokens
Output:$0.16/1M tokens$0.4/1M tokens

Gemini 2.0 Flash Image-to-Text processes images natively to extract and generate descriptive, analytical text, enabling multimodal input for tasks like image analysis, captioning, and combined vision-language workflows. Both are part of Gemini 2.0's multimodal, high-speed AI platform with ongoing API and tool enhancements.

Input:$0.04/1M tokens$0.1/1M tokens
Output:$0.16/1M tokens$0.4/1M tokens

gemini-2.0-flash/file-analysis is a highly optimized, multimodal AI model built for fast and accurate file content analysis. Part of the Gemini 2.0 family, it leverages advanced architecture to deliver rapid processing speeds, efficient text and document evaluation, and robust performance. Unlike core Gemini models, it specializes in file input workflows, making it ideal for developers and businesses needing reliable, scalable, and secure file-based AI solutions. Its precision and flexibility drive innovation in sectors like legal, education, and enterprise document management.

$0.48/per time$1.2/per time

Veo 3 is Google DeepMind's advanced AI video generation model that creates high-definition, realistic videos with synchronized native audio from simple text or image prompts. It combines three specialized systems for visuals, audio, and timing to produce cohesive audiovisual content including dialogue, ambient sounds, and music. Veo 3 supports complex scenes with realistic motion, lighting, and physics, making it a versatile tool for cinematic-quality video creation.

$0.48/per time$1.2/per time

Veo 3 image-to-video is an AI capability that transforms a single still image into a dynamic, high-quality video clip with consistent motion and native audio. It allows users to guide the generated video’s motion, narrative, and sound by providing an initial image plus optional text prompts. Veo 3 and its faster variant, Veo 3 Fast, power this feature with realistic animation, seamless transitions, and synchronized sound effects, making it ideal for creative video production workflows.

$0.48/per time$1.2/per time

veo3/reference-to-video is an advanced multi-modal AI model developed for high-fidelity, reference-driven video generation. Leveraging input cues such as reference images or videos, it synthesizes new content while preserving style, structure, or motion. Compared to standard foundational models, veo3/reference-to-video emphasizes precise adherence to references and supports faster, more controllable media creation workflows. Its strengths lie in creative fields like marketing, film, animation, and interactive media, offering unique AI-powered solutions where consistency and visual coherence are essential.