Generative models.
Developer? Get 10% off on TTS, Image, & Music via API.
Eleven v3
The most expressive model. Supports 70+ languages. Requires more prompt engineering than previous models.
Multilingual v2
Our most life-like, emotionally rich model in 29 languages. Best for voice overs, audiobooks, post-production, or any other content creation needs.
Flash v2.5
Our ultra low latency model in 32 languages. Ideal for conversational use cases.
Gemini 2.5 Flash
Fast and efficient Gemini TTS. Great for real-time applications with natural multilingual output.
Gemini 2.5 Pro
Highest quality Gemini voice model. Supports multi-speaker dialogue and advanced style instructions.
Neural2
Google's highest quality neural voices. Natural-sounding speech with excellent prosody and clarity across 11 languages.
Wavenet
DeepMind WaveNet-based voices. Extends Neural2 with additional language support for Chinese, Indonesian, Russian, and Turkish.
Nano Banana 2
World knowledge, precise text, consistent characters, fast.
Nano Banana Pro
Studio quality control, legible text, incredible consistency.
GPT Image 2
Sharp text rendering, multilingual prompts, 4K detail.
GPT Image 1.5
Faithful prompt following with crisp details, fast turnaround.
Seedream 4.5
Up to 4K, multi-image edits, dense text accuracy.
Seedream 5 Lite
Up to 3K, multi-image edits. Strong on product shots.
FLUX.2 Pro
Realistic lighting, accurate spaces, character continuity.
FLUX 1 Kontext Pro
Generate and edit with scene cohesion and style control.
FLUX 2 Flex
Flexible high-detail generate + edit, up to 2K.
Grok Image
Cinematic styling, witty edits via image refs.
Qwen Z-Image
Ultra-cheap, fast photorealistic text-to-image.
Qwen Image
Versatile generate + edit, strong prompt following.
Topaz Image Upscale
Up to 8× resolution boost. Detail-preserving, no prompt.
Google Veo 3.1
Cinematic quality, 720p–4K, native audio.
Google Veo 3.1 Fast
Optimized Veo 3.1, 720p–4K, audio-backed.
Google Veo 3.1 Lite
Most affordable Veo tier, 720p–4K, audio-backed.
Grok Imagine
Quick, expressive video from text or image. 6–30s.
Kling 2.5 Turbo Pro
Cinematic motion, fluid action. 5s or 10s.
Kling 2.6
Newer Kling with optional native sound. 5s or 10s.
Kling 3.0
Latest Kling, up to 4K with optional sound. 3–15s.
Kling 2.6 Motion Control
Animate a character image with a motion video. Up to 30s.
Kling 3.0 Motion Control
Premium motion transfer, character image + motion video. Up to 30s.
Kling 2.1 Standard
Image-to-video, smooth motion. 5s or 10s.
Kling 2.1 Pro
Image-to-video with optional end frame. 5s or 10s.
Kling 2.1 Master
Premium Kling 2.1, text or image to video. 5s or 10s.
Kling AI Avatar Standard
Lip-synced avatar, 720p. Up to 5 min of audio.
Kling AI Avatar Pro
Premium lip-synced avatar, 1080p 48fps. Up to 5 min.
InfiniteTalk
Talking video from an image + audio. 480p/720p.
Topaz Video Upscale
Detail-preserving video upscale up to 4×.
suno-v5_5
Suno's most advanced model. Highest audio fidelity, complex lyric structures, and incredible genre versatility.
suno-v5
Next-generation music model. Rich vocal harmonies, extended track length, and improved instrument separation.
suno-v4_5plus
Enhanced v4.5 model with style boosts and advanced vocal tuning. Optimized for clean studio mixes.
suno-v4_5all
The balanced workhorse model. Excellent structure, versatile genre adherence, and fast generation.
suno-v4_5
Suno's popular standard model. Great for typical pop, EDM, and acoustic structures with high fidelity.
suno-v4
Legacy model. Fast generation, lower cost, and ideal for quick drafts and melodic ideas.