ElevenLabs Text to Speech — Complete Guide for Creators

ElevenLabs is one of the world's leading text-to-speech (TTS) providers, and Sonna integrates three of their best models directly into the platform — without requiring your own ElevenLabs account. Simply select a voice, type your text, and generate studio-quality audio in seconds.

This guide explains the three ElevenLabs models available on Sonna, the differences between them, and when to use each one.

ElevenLabs Models on Sonna

Sonna provides three ElevenLabs models:

Model	Model ID	Languages	Character Limit	Cost per Character
Eleven v3	`eleven-v3`	70+	5,000	2.10 credits
Multilingual v2	`eleven-multilingual-v2`	29	10,000	2.10 credits
Flash v2.5	`eleven-flash-v2-5`	32	40,000	1.05 credits

All ElevenLabs models require a Pro/Max plan or PAYG credits. Free users can only use Google Cloud TTS (Neural2 and WaveNet).

If you use the Sonna Developer API, all ElevenLabs models automatically receive a 10% discount.

Eleven v3 — The Most Expressive Model

Eleven v3 is the latest and most advanced ElevenLabs model. Released to general availability in March 2026, it brings unprecedented expressiveness to TTS.

Key Advantages of Eleven v3

Audio Tags — a unique Eleven v3 feature that allows you to control emotions and styles directly within the text:

[excited] Welcome to Sonna! [whispers] Your audio generation is ready.
[sighs] This is not easy... [confidently] But we can do it.
[laughs] Absolutely incredible! [slowly] Let us start from the beginning.

Available tags include: [excited], [whispers], [sighs], [laughs], [slowly], [angry], [sad], [surprised], and many more.

Complex Text Accuracy — Eleven v3 features a 68% reduction in errors for text containing numbers, URLs, formulas, and code. It is ideal for technical and educational content.

70+ Languages — supports English, Indonesian, Spanish, Mandarin, Arabic, Japanese, Korean, and dozens of others.

When to Use Eleven v3

Use Eleven v3 when:

You need detailed emotional control within the narration
The content contains many numbers, URLs, or technical terms
You are creating dramatic content like podcasts, audiobooks, or ads
Expressive quality is more important than cost

Limitations of Eleven v3

5,000 character limit per request — the smallest among the three models
Priced the same as Multilingual v2 despite its higher capabilities
Requires more prompt engineering for optimal results

Multilingual v2 — The Top Choice for Long Narration

Multilingual v2 is ElevenLabs' "workhorse" model — mature, stable, and excellent for professional content production. This model is trusted by millions of creators worldwide.

Key Advantages of Multilingual v2

Rich Emotional Expression — voices sound natural and lifelike, not robotic
10,000 Character Limit — double that of Eleven v3, perfect for longer narrations
29 Languages including English, Spanish, French, German, Indonesian, Portuguese, Japanese, and Korean
High Stability — delivers consistent results for bulk content production
Best for voice-overs, audiobooks, e-learning, and post-production

When to Use Multilingual v2

Use Multilingual v2 when:

You need long narration exceeding 5,000 characters
Your content consists of audiobooks, online courses, or long-form videos
You want high quality without learning audio tags
You are producing in bulk where consistency is more important than variation

Example Usage

{
  "text": "Welcome to the Python programming course for beginners. In this first module, we will learn the basics of the Python language, starting from variables and data types to control structures.",
  "voice": "YOUR_VOICE_ID",
  "ttsModel": "eleven-multilingual-v2",
  "stability": 0.6,
  "similarity_boost": 0.75
}

Flash v2.5 — The Fastest and Most Affordable

Flash v2.5 is designed for speed and efficiency. With a latency of around 75ms, it is the premier model for real-time applications and voice chatbots.

Key Advantages of Flash v2.5

50% Lower Cost than Eleven v3 and Multilingual v2 — only 1.05 credits per character
40,000 Character Limit — the largest among all ElevenLabs models on Sonna
~75ms Latency — optimized for conversational and real-time applications
32 Languages including all major global languages plus additional ones

When to Use Flash v2.5

Use Flash v2.5 when:

Cost is your priority — high-volume content, bulk generation
You require extremely long text (over 10,000 characters) in a single request
Your application requires rapid responses (chatbots, voice assistants, instant alerts)
A quality level of "excellent" is sufficient without needing "outstanding"

Cost Comparison

For 100,000 characters (about 15,000 words or a short book):

Model	Total Credits
Eleven v3	210,000 credits
Multilingual v2	210,000 credits
Flash v2.5	105,000 credits

Flash v2.5 saves you 50% for high-volume content.

Model Selection Guide

Questions to Help You Choose:

1. How long is your text?

< 5,000 characters → Any model will work
5,000 – 10,000 characters → Multilingual v2 or Flash v2.5
10,000 characters → Flash v2.5 only

2. Do you need detailed emotional control?

Yes → Eleven v3 (use audio tags)
No → Multilingual v2 or Flash v2.5

3. Is cost a primary consideration?

Yes → Flash v2.5
No, quality is more important → Eleven v3 or Multilingual v2

4. Is this for real-time or batch processing?

Real-time/conversational → Flash v2.5
Batch/production → Eleven v3 or Multilingual v2

How to Use in Sonna Creative

Go to Sonna Creative and select Text to Speech
In the right panel, select your desired voice
Scroll down the voice settings panel to find Model — choose Eleven v3, Multilingual v2, or Flash v2.5
Enter your text in the main input area
Click Generate — your audio will be ready in 1–3 seconds

For Eleven v3, embed audio tags directly inside your text to guide the voice expressions.

How to Use via API

All ElevenLabs models are available via the Sonna Developer API. The endpoint remains the same for all models — only the ttsModel parameter varies:

curl -X POST https://sonnalabs.app/api/v1/tts/synthesize \
  -H "Authorization: Bearer sona_sk_YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Text to convert into speech.",
    "voice": "VOICE_ID",
    "ttsModel": "eleven-v3"
  }'

Replace ttsModel with:

"eleven-v3" for Eleven v3
"eleven-multilingual-v2" for Multilingual v2
"eleven-flash-v2-5" for Flash v2.5

You can generate API keys in the Sonna Console → API Keys. Access requires a Pro/Max plan or PAYG credits.

Summary

	Eleven v3	Multilingual v2	Flash v2.5
Expression Quality	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐
Character Limit	5,000	10,000	40,000
Languages	70+	29	32
Cost per Character	2.10 cr	2.10 cr	1.05 cr
Latency	Fast	Fast	Ultra-fast
Audio Tags	✅	❌	❌
Best For	Expressive content	Long narrations	High-volume workloads

Not sure where to start? Multilingual v2 is the safest choice for most use cases. Upgrade to Eleven v3 when you need highly detailed expressions, or switch to Flash v2.5 when cost efficiency and latency are your main priorities.

See all generative models available on Sonna on the Models page. For developer API integration guides, visit the Sonna Console.

This guide explains the three ElevenLabs models available on Sonna, the differences between them, and when to use each one.

ElevenLabs Models on Sonna

Sonna provides three ElevenLabs models:

Model	Model ID	Languages	Character Limit	Cost per Character
Eleven v3	`eleven-v3`	70+	5,000	2.10 credits
Multilingual v2	`eleven-multilingual-v2`	29	10,000	2.10 credits
Flash v2.5	`eleven-flash-v2-5`	32	40,000	1.05 credits

All ElevenLabs models require a Pro/Max plan or PAYG credits. Free users can only use Google Cloud TTS (Neural2 and WaveNet).

If you use the Sonna Developer API, all ElevenLabs models automatically receive a 10% discount.

Eleven v3 — The Most Expressive Model

Eleven v3 is the latest and most advanced ElevenLabs model. Released to general availability in March 2026, it brings unprecedented expressiveness to TTS.

Key Advantages of Eleven v3

Audio Tags — a unique Eleven v3 feature that allows you to control emotions and styles directly within the text:

[excited] Welcome to Sonna! [whispers] Your audio generation is ready.
[sighs] This is not easy... [confidently] But we can do it.
[laughs] Absolutely incredible! [slowly] Let us start from the beginning.

Available tags include: [excited], [whispers], [sighs], [laughs], [slowly], [angry], [sad], [surprised], and many more.

Complex Text Accuracy — Eleven v3 features a 68% reduction in errors for text containing numbers, URLs, formulas, and code. It is ideal for technical and educational content.

70+ Languages — supports English, Indonesian, Spanish, Mandarin, Arabic, Japanese, Korean, and dozens of others.

When to Use Eleven v3

Use Eleven v3 when:

You need detailed emotional control within the narration
The content contains many numbers, URLs, or technical terms
You are creating dramatic content like podcasts, audiobooks, or ads
Expressive quality is more important than cost

Limitations of Eleven v3

5,000 character limit per request — the smallest among the three models
Priced the same as Multilingual v2 despite its higher capabilities
Requires more prompt engineering for optimal results

Multilingual v2 — The Top Choice for Long Narration

Multilingual v2 is ElevenLabs' "workhorse" model — mature, stable, and excellent for professional content production. This model is trusted by millions of creators worldwide.

Key Advantages of Multilingual v2

Rich Emotional Expression — voices sound natural and lifelike, not robotic
10,000 Character Limit — double that of Eleven v3, perfect for longer narrations
29 Languages including English, Spanish, French, German, Indonesian, Portuguese, Japanese, and Korean
High Stability — delivers consistent results for bulk content production
Best for voice-overs, audiobooks, e-learning, and post-production

When to Use Multilingual v2

Use Multilingual v2 when:

You need long narration exceeding 5,000 characters
Your content consists of audiobooks, online courses, or long-form videos
You want high quality without learning audio tags
You are producing in bulk where consistency is more important than variation

Example Usage

{
  "text": "Welcome to the Python programming course for beginners. In this first module, we will learn the basics of the Python language, starting from variables and data types to control structures.",
  "voice": "YOUR_VOICE_ID",
  "ttsModel": "eleven-multilingual-v2",
  "stability": 0.6,
  "similarity_boost": 0.75
}

Flash v2.5 — The Fastest and Most Affordable

Flash v2.5 is designed for speed and efficiency. With a latency of around 75ms, it is the premier model for real-time applications and voice chatbots.

Key Advantages of Flash v2.5

50% Lower Cost than Eleven v3 and Multilingual v2 — only 1.05 credits per character
40,000 Character Limit — the largest among all ElevenLabs models on Sonna
~75ms Latency — optimized for conversational and real-time applications
32 Languages including all major global languages plus additional ones

When to Use Flash v2.5

Use Flash v2.5 when:

Cost is your priority — high-volume content, bulk generation
You require extremely long text (over 10,000 characters) in a single request
Your application requires rapid responses (chatbots, voice assistants, instant alerts)
A quality level of "excellent" is sufficient without needing "outstanding"

Cost Comparison

For 100,000 characters (about 15,000 words or a short book):

Model	Total Credits
Eleven v3	210,000 credits
Multilingual v2	210,000 credits
Flash v2.5	105,000 credits

Flash v2.5 saves you 50% for high-volume content.

Model Selection Guide

Questions to Help You Choose:

1. How long is your text?

< 5,000 characters → Any model will work
5,000 – 10,000 characters → Multilingual v2 or Flash v2.5
10,000 characters → Flash v2.5 only

2. Do you need detailed emotional control?

Yes → Eleven v3 (use audio tags)
No → Multilingual v2 or Flash v2.5

3. Is cost a primary consideration?

Yes → Flash v2.5
No, quality is more important → Eleven v3 or Multilingual v2

4. Is this for real-time or batch processing?

Real-time/conversational → Flash v2.5
Batch/production → Eleven v3 or Multilingual v2

How to Use in Sonna Creative

Go to Sonna Creative and select Text to Speech
In the right panel, select your desired voice
Scroll down the voice settings panel to find Model — choose Eleven v3, Multilingual v2, or Flash v2.5
Enter your text in the main input area
Click Generate — your audio will be ready in 1–3 seconds

For Eleven v3, embed audio tags directly inside your text to guide the voice expressions.

How to Use via API

All ElevenLabs models are available via the Sonna Developer API. The endpoint remains the same for all models — only the ttsModel parameter varies:

curl -X POST https://sonnalabs.app/api/v1/tts/synthesize \
  -H "Authorization: Bearer sona_sk_YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Text to convert into speech.",
    "voice": "VOICE_ID",
    "ttsModel": "eleven-v3"
  }'

Replace ttsModel with:

"eleven-v3" for Eleven v3
"eleven-multilingual-v2" for Multilingual v2
"eleven-flash-v2-5" for Flash v2.5

You can generate API keys in the Sonna Console → API Keys. Access requires a Pro/Max plan or PAYG credits.

Summary

	Eleven v3	Multilingual v2	Flash v2.5
Expression Quality	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐
Character Limit	5,000	10,000	40,000
Languages	70+	29	32
Cost per Character	2.10 cr	2.10 cr	1.05 cr
Latency	Fast	Fast	Ultra-fast
Audio Tags	✅	❌	❌
Best For	Expressive content	Long narrations	High-volume workloads

See all generative models available on Sonna on the Models page. For developer API integration guides, visit the Sonna Console.

ElevenLabs Text to Speech — Complete Guide for Creators

ElevenLabs Models on Sonna

Eleven v3 — The Most Expressive Model

Key Advantages of Eleven v3

When to Use Eleven v3

Limitations of Eleven v3

Multilingual v2 — The Top Choice for Long Narration

Key Advantages of Multilingual v2

When to Use Multilingual v2

Example Usage

Flash v2.5 — The Fastest and Most Affordable

Key Advantages of Flash v2.5

When to Use Flash v2.5

Cost Comparison

Model Selection Guide

Questions to Help You Choose:

How to Use in Sonna Creative

How to Use via API

Summary

More from News

Google Gemini 2.5 TTS — Natural Multilingual Voice on Sonna

How to Generate Original Music with Suno on Sonna

Nano Banana, GPT Image, FLUX, Grok — Which to Use?

ElevenLabs Text to Speech — Complete Guide for Creators

ElevenLabs Models on Sonna

Eleven v3 — The Most Expressive Model

Key Advantages of Eleven v3

When to Use Eleven v3

Limitations of Eleven v3

Multilingual v2 — The Top Choice for Long Narration

Key Advantages of Multilingual v2

When to Use Multilingual v2

Example Usage

Flash v2.5 — The Fastest and Most Affordable

Key Advantages of Flash v2.5

When to Use Flash v2.5

Cost Comparison

Model Selection Guide

Questions to Help You Choose:

How to Use in Sonna Creative

How to Use via API

Summary

More from News

Google Gemini 2.5 TTS — Natural Multilingual Voice on Sonna

How to Generate Original Music with Suno on Sonna

Nano Banana, GPT Image, FLUX, Grok — Which to Use?