Audio generation
Generate speech audio from text using Azerion Intelligence models with multi-language support.
Endpoint:
POST https://api.azerion.ai/v1/audio/speech
Description
This endpoint generates realistic speech audio from text input using the Kokoro open-source TTS model. It supports multiple languages and allows you to convert text into natural-sounding speech using various voice options and customizable playback settings.
Authentication
This endpoint requires authentication using an API key.
Request
{
"model": "kokoro",
"input": "Hello and Welcome to Azerion Intelligence. How are you today?",
"voice": "af_bella",
"response_format": "mp3",
"speed": 1.0
}
Request Parameters
- model (string, required): ID of the model to use for speech generation. Use
kokorofor the Kokoro TTS model. You can use the List Models endpoint to see available models. - input (string, required): The text to generate audio for. Maximum length is 4096 characters.
- voice (string, required): The voice to use when generating the audio. See the Voice Options section below for available voices organized by language.
- response_format (string, optional): The format to audio in. Supported formats are
mp3,opus,aac,flac,wav, andpcm. Defaults tomp3. - speed (number, optional): The speed of the generated audio. Select a value from
0.25to4.0. Defaults to1.0.
Voice Options
Kokoro TTS supports multiple languages with various voice options. Voices are named using a prefix system: language code + gender + name.
Supported Languages
English (American)
- Female voices:
af_alloy,af_aoede,af_bella,af_heart,af_jadzia,af_jessica,af_kore,af_nicole,af_nova,af_river,af_sarah,af_sky,af_v0,af_v0bella,af_v0irulan,af_v0nicole,af_v0sarah,af_v0sky - Male voices:
am_adam,am_echo,am_eric,am_fenrir,am_liam,am_michael,am_onyx,am_puck,am_santa,am_v0adam,am_v0gurney,am_v0michael
English (British)
- Female voices:
bf_alice,bf_emma,bf_lily,bf_v0emma,bf_v0isabella - Male voices:
bm_daniel,bm_fable,bm_george,bm_lewis,bm_v0george,bm_v0lewis
French
- Female voices:
ff_siwis
German/European
- Female voices:
ef_dora - Male voices:
em_alex,em_santa
Hindi
- Female voices:
hf_alpha,hf_beta - Male voices:
hm_omega,hm_psi
Italian
- Female voices:
if_sara - Male voices:
im_nicola
Japanese
- Female voices:
jf_alpha,jf_gongitsune,jf_nezumi,jf_tebukuro - Male voices:
jm_kumo
Portuguese
- Female voices:
pf_dora - Male voices:
pm_alex,pm_santa
Chinese
- Female voices:
zf_xiaobei,zf_xiaoni,zf_xiaoxiao,zf_xiaoyi - Male voices:
zm_yunjian,zm_yunxi,zm_yunxia,zm_yunyang
- For English content,
af_bellaandam_echoare popular choices - For professional presentations, consider
bf_emmaorbm_danielfor British English - For multilingual projects, select voices that match your target audience's language
- v0 versions are alternative voice variants with different characteristics
Response
A successful request returns a 200 OK status code with the generated audio as binary data in the response body. The audio format matches the requested response_format parameter.
Response Headers
Content-Type: audio/mpeg
Content-Length: [size in bytes]
Usage Statistics
Usage information is included in the response headers:
X-Usage-Input-Tokens: 15
X-Usage-Output-Tokens: 0
X-Usage-Total-Tokens: 15
Example Request (cURL)
curl https://api.azerion.ai/v1/audio/speech \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
-d '{
"model": "kokoro",
"input": "Hello and Welcome to Azerion Intelligence. How are you today?",
"voice": "af_bella",
"response_format": "mp3",
"speed": 1.0
}' \
--output speech.mp3
Replace YOUR_ACCESS_TOKEN with your actual API key or access token. Refer to the Authentication guide for details on obtaining and using your credentials.
Example Request (Python)
import requests
import os
api_key = os.environ.get("AZERION_API_KEY") # Or AZERION_ACCESS_TOKEN
url = "https://api.azerion.ai/v1/audio/speech" # Replace with your actual base URL if different
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {api_key}"
}
data = {
"model": "kokoro",
"input": "Hello and Welcome to Azerion Intelligence. How are you today?",
"voice": "af_bella",
"response_format": "mp3",
"speed": 1.0
}
response = requests.post(url, headers=headers, json=data)
# Save the audio file
with open("speech.mp3", "wb") as f:
f.write(response.content)
print(f"Audio generated successfully!")
print(f"Content-Type: {response.headers.get('Content-Type')}")
print(f"Content-Length: {response.headers.get('Content-Length')} bytes")
Example Request (Node.js)
const fetch = require('node-fetch');
const fs = require('fs');
const apiKey = process.env.AZERION_API_KEY; // Or AZERION_ACCESS_TOKEN
const url = 'https://api.azerion.ai/v1/audio/speech'; // Replace with your actual base URL if different
const headers = {
'Content-Type': 'application/json',
'Authorization': `Bearer ${apiKey}`
};
const data = {
model: 'kokoro',
input: 'Hello and Welcome to Azerion Intelligence. How are you today?',
voice: 'af_bella',
response_format: 'mp3',
speed: 1.0
};
fetch(url, {
method: 'POST',
headers: headers,
body: JSON.stringify(data)
})
.then(response => {
if (!response.ok) {
throw new Error(`HTTP error! status: ${response.status}`);
}
return response.buffer();
})
.then(audioBuffer => {
fs.writeFileSync('speech.mp3', audioBuffer);
console.log('Audio generated successfully!');
console.log('File saved as speech.mp3');
})
.catch(error => console.error('Error:', error));