ElevenLabs Text to Speech — Complete Guide for Creators
Everything you need to know about ElevenLabs on Sonna: Eleven v3, Multilingual v2, Flash v2.5 — which model to pick, credit costs, and real-world use cases.
ElevenLabs is one of the world's leading text-to-speech (TTS) providers, and Sonna integrates three of their best models directly into the platform — without requiring your own ElevenLabs account. Simply select a voice, type your text, and generate studio-quality audio in seconds.
This guide explains the three ElevenLabs models available on Sonna, the differences between them, and when to use each one.
ElevenLabs Models on Sonna
Sonna provides three ElevenLabs models:
| Model | Model ID | Languages | Character Limit | Cost per Character |
|---|---|---|---|---|
| Eleven v3 | eleven-v3 | 70+ | 5,000 | 2.10 credits |
| Multilingual v2 | eleven-multilingual-v2 | 29 | 10,000 | 2.10 credits |
| Flash v2.5 | eleven-flash-v2-5 | 32 | 40,000 | 1.05 credits |
All ElevenLabs models require a Pro/Max plan or PAYG credits. Free users can only use Google Cloud TTS (Neural2 and WaveNet).
If you use the Sonna Developer API, all ElevenLabs models automatically receive a 10% discount.
Eleven v3 — The Most Expressive Model
Eleven v3 is the latest and most advanced ElevenLabs model. Released to general availability in March 2026, it brings unprecedented expressiveness to TTS.
Key Advantages of Eleven v3
Audio Tags — a unique Eleven v3 feature that allows you to control emotions and styles directly within the text:
[excited] Welcome to Sonna! [whispers] Your audio generation is ready.
[sighs] This is not easy... [confidently] But we can do it.
[laughs] Absolutely incredible! [slowly] Let us start from the beginning.
Available tags include: [excited], [whispers], [sighs], [laughs], [slowly], [angry], [sad], [surprised], and many more.
Complex Text Accuracy — Eleven v3 features a 68% reduction in errors for text containing numbers, URLs, formulas, and code. It is ideal for technical and educational content.
70+ Languages — supports English, Indonesian, Spanish, Mandarin, Arabic, Japanese, Korean, and dozens of others.
When to Use Eleven v3
Use Eleven v3 when:
- You need detailed emotional control within the narration
- The content contains many numbers, URLs, or technical terms
- You are creating dramatic content like podcasts, audiobooks, or ads
- Expressive quality is more important than cost
Limitations of Eleven v3
- 5,000 character limit per request — the smallest among the three models
- Priced the same as Multilingual v2 despite its higher capabilities
- Requires more prompt engineering for optimal results
Multilingual v2 — The Top Choice for Long Narration
Multilingual v2 is ElevenLabs' "workhorse" model — mature, stable, and excellent for professional content production. This model is trusted by millions of creators worldwide.
Key Advantages of Multilingual v2
- Rich Emotional Expression — voices sound natural and lifelike, not robotic
- 10,000 Character Limit — double that of Eleven v3, perfect for longer narrations
- 29 Languages including English, Spanish, French, German, Indonesian, Portuguese, Japanese, and Korean
- High Stability — delivers consistent results for bulk content production
- Best for voice-overs, audiobooks, e-learning, and post-production
When to Use Multilingual v2
Use Multilingual v2 when:
- You need long narration exceeding 5,000 characters
- Your content consists of audiobooks, online courses, or long-form videos
- You want high quality without learning audio tags
- You are producing in bulk where consistency is more important than variation
Example Usage
{
"text": "Welcome to the Python programming course for beginners. In this first module, we will learn the basics of the Python language, starting from variables and data types to control structures.",
"voice": "YOUR_VOICE_ID",
"ttsModel": "eleven-multilingual-v2",
"stability": 0.6,
"similarity_boost": 0.75
}
Flash v2.5 — The Fastest and Most Affordable
Flash v2.5 is designed for speed and efficiency. With a latency of around 75ms, it is the premier model for real-time applications and voice chatbots.
Key Advantages of Flash v2.5
- 50% Lower Cost than Eleven v3 and Multilingual v2 — only 1.05 credits per character
- 40,000 Character Limit — the largest among all ElevenLabs models on Sonna
- ~75ms Latency — optimized for conversational and real-time applications
- 32 Languages including all major global languages plus additional ones
When to Use Flash v2.5
Use Flash v2.5 when:
- Cost is your priority — high-volume content, bulk generation
- You require extremely long text (over 10,000 characters) in a single request
- Your application requires rapid responses (chatbots, voice assistants, instant alerts)
- A quality level of "excellent" is sufficient without needing "outstanding"
Cost Comparison
For 100,000 characters (about 15,000 words or a short book):
| Model | Total Credits |
|---|---|
| Eleven v3 | 210,000 credits |
| Multilingual v2 | 210,000 credits |
| Flash v2.5 | 105,000 credits |
Flash v2.5 saves you 50% for high-volume content.
Model Selection Guide
Questions to Help You Choose:
1. How long is your text?
- < 5,000 characters → Any model will work
- 5,000 – 10,000 characters → Multilingual v2 or Flash v2.5
-
10,000 characters → Flash v2.5 only
2. Do you need detailed emotional control?
- Yes → Eleven v3 (use audio tags)
- No → Multilingual v2 or Flash v2.5
3. Is cost a primary consideration?
- Yes → Flash v2.5
- No, quality is more important → Eleven v3 or Multilingual v2
4. Is this for real-time or batch processing?
- Real-time/conversational → Flash v2.5
- Batch/production → Eleven v3 or Multilingual v2
How to Use in Sonna Creative
- Go to Sonna Creative and select Text to Speech
- In the right panel, select your desired voice
- Scroll down the voice settings panel to find Model — choose Eleven v3, Multilingual v2, or Flash v2.5
- Enter your text in the main input area
- Click Generate — your audio will be ready in 1–3 seconds
For Eleven v3, embed audio tags directly inside your text to guide the voice expressions.
How to Use via API
All ElevenLabs models are available via the Sonna Developer API. The endpoint remains the same for all models — only the ttsModel parameter varies:
curl -X POST https://sonnalabs.app/api/v1/tts/synthesize \
-H "Authorization: Bearer sona_sk_YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"text": "Text to convert into speech.",
"voice": "VOICE_ID",
"ttsModel": "eleven-v3"
}'
Replace ttsModel with:
"eleven-v3"for Eleven v3"eleven-multilingual-v2"for Multilingual v2"eleven-flash-v2-5"for Flash v2.5
You can generate API keys in the Sonna Console → API Keys. Access requires a Pro/Max plan or PAYG credits.
Summary
| Eleven v3 | Multilingual v2 | Flash v2.5 | |
|---|---|---|---|
| Expression Quality | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐ |
| Character Limit | 5,000 | 10,000 | 40,000 |
| Languages | 70+ | 29 | 32 |
| Cost per Character | 2.10 cr | 2.10 cr | 1.05 cr |
| Latency | Fast | Fast | Ultra-fast |
| Audio Tags | ✅ | ❌ | ❌ |
| Best For | Expressive content | Long narrations | High-volume workloads |
Not sure where to start? Multilingual v2 is the safest choice for most use cases. Upgrade to Eleven v3 when you need highly detailed expressions, or switch to Flash v2.5 when cost efficiency and latency are your main priorities.
See all generative models available on Sonna on the Models page. For developer API integration guides, visit the Sonna Console.
More from News
Google Gemini 2.5 TTS — Natural Multilingual Voice on Sonna
Gemini 2.5 Flash and Pro bring natural AI speech in 30+ languages with style instructions. Here's how to get the most out of both models.
How to Generate Original Music with Suno on Sonna
From simple prompts to full custom-mode compositions — a practical guide to Suno v5.5, v5, v4.5, and when to use each version.
Nano Banana, GPT Image, FLUX, Grok — Which to Use?
A side-by-side comparison of every image generation model on Sonna. Prompting tips, credit costs, resolution options, and best use cases per model.