Skip to main content

Audio generation

Generate speech audio from text using Azerion Intelligence models with multi-language support.

Endpoint:

POST https://api.azerion.ai/v1/audio/speech

Description

This endpoint generates realistic speech audio from text input using the Kokoro open-source TTS model. It supports multiple languages and allows you to convert text into natural-sounding speech using various voice options and customizable playback settings.

Authentication

This endpoint requires authentication using an API key.

Request

{
"model": "kokoro",
"input": "Hello and Welcome to Azerion Intelligence. How are you today?",
"voice": "af_bella",
"response_format": "mp3",
"speed": 1.0
}

Request Parameters

  • model (string, required): ID of the model to use for speech generation. Use kokoro for the Kokoro TTS model. You can use the List Models endpoint to see available models.
  • input (string, required): The text to generate audio for. Maximum length is 4096 characters.
  • voice (string, required): The voice to use when generating the audio. See the Voice Options section below for available voices organized by language.
  • response_format (string, optional): The format to audio in. Supported formats are mp3, opus, aac, flac, wav, and pcm. Defaults to mp3.
  • speed (number, optional): The speed of the generated audio. Select a value from 0.25 to 4.0. Defaults to 1.0.

Voice Options

Kokoro TTS supports multiple languages with various voice options. Voices are named using a prefix system: language code + gender + name.

Supported Languages

English (American)

  • Female voices: af_alloy, af_aoede, af_bella, af_heart, af_jadzia, af_jessica, af_kore, af_nicole, af_nova, af_river, af_sarah, af_sky, af_v0, af_v0bella, af_v0irulan, af_v0nicole, af_v0sarah, af_v0sky
  • Male voices: am_adam, am_echo, am_eric, am_fenrir, am_liam, am_michael, am_onyx, am_puck, am_santa, am_v0adam, am_v0gurney, am_v0michael

English (British)

  • Female voices: bf_alice, bf_emma, bf_lily, bf_v0emma, bf_v0isabella
  • Male voices: bm_daniel, bm_fable, bm_george, bm_lewis, bm_v0george, bm_v0lewis

French

  • Female voices: ff_siwis

German/European

  • Female voices: ef_dora
  • Male voices: em_alex, em_santa

Hindi

  • Female voices: hf_alpha, hf_beta
  • Male voices: hm_omega, hm_psi

Italian

  • Female voices: if_sara
  • Male voices: im_nicola

Japanese

  • Female voices: jf_alpha, jf_gongitsune, jf_nezumi, jf_tebukuro
  • Male voices: jm_kumo

Portuguese

  • Female voices: pf_dora
  • Male voices: pm_alex, pm_santa

Chinese

  • Female voices: zf_xiaobei, zf_xiaoni, zf_xiaoxiao, zf_xiaoyi
  • Male voices: zm_yunjian, zm_yunxi, zm_yunxia, zm_yunyang
Choosing the Right Voice
  • For English content, af_bella and am_echo are popular choices
  • For professional presentations, consider bf_emma or bm_daniel for British English
  • For multilingual projects, select voices that match your target audience's language
  • v0 versions are alternative voice variants with different characteristics

Response

A successful request returns a 200 OK status code with the generated audio as binary data in the response body. The audio format matches the requested response_format parameter.

Response Headers

Content-Type: audio/mpeg
Content-Length: [size in bytes]

Usage Statistics

Usage information is included in the response headers:

X-Usage-Input-Tokens: 15
X-Usage-Output-Tokens: 0
X-Usage-Total-Tokens: 15

Example Request (cURL)

curl https://api.azerion.ai/v1/audio/speech \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
-d '{
"model": "kokoro",
"input": "Hello and Welcome to Azerion Intelligence. How are you today?",
"voice": "af_bella",
"response_format": "mp3",
"speed": 1.0
}' \
--output speech.mp3
Replace Placeholder

Replace YOUR_ACCESS_TOKEN with your actual API key or access token. Refer to the Authentication guide for details on obtaining and using your credentials.

Example Request (Python)

import requests
import os

api_key = os.environ.get("AZERION_API_KEY") # Or AZERION_ACCESS_TOKEN
url = "https://api.azerion.ai/v1/audio/speech" # Replace with your actual base URL if different

headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {api_key}"
}

data = {
"model": "kokoro",
"input": "Hello and Welcome to Azerion Intelligence. How are you today?",
"voice": "af_bella",
"response_format": "mp3",
"speed": 1.0
}

response = requests.post(url, headers=headers, json=data)

# Save the audio file
with open("speech.mp3", "wb") as f:
f.write(response.content)

print(f"Audio generated successfully!")
print(f"Content-Type: {response.headers.get('Content-Type')}")
print(f"Content-Length: {response.headers.get('Content-Length')} bytes")

Example Request (Node.js)

const fetch = require('node-fetch');
const fs = require('fs');

const apiKey = process.env.AZERION_API_KEY; // Or AZERION_ACCESS_TOKEN
const url = 'https://api.azerion.ai/v1/audio/speech'; // Replace with your actual base URL if different

const headers = {
'Content-Type': 'application/json',
'Authorization': `Bearer ${apiKey}`
};

const data = {
model: 'kokoro',
input: 'Hello and Welcome to Azerion Intelligence. How are you today?',
voice: 'af_bella',
response_format: 'mp3',
speed: 1.0
};

fetch(url, {
method: 'POST',
headers: headers,
body: JSON.stringify(data)
})
.then(response => {
if (!response.ok) {
throw new Error(`HTTP error! status: ${response.status}`);
}
return response.buffer();
})
.then(audioBuffer => {
fs.writeFileSync('speech.mp3', audioBuffer);
console.log('Audio generated successfully!');
console.log('File saved as speech.mp3');
})
.catch(error => console.error('Error:', error));