Audio generation
Generate speech audio from text using Azerion Intelligence models with multi-language support.
Endpoint:
POST https://api.azerion.ai/v1/audio/speech
Description
This endpoint generates realistic speech audio from text input using the Kokoro open-source TTS model. It supports multiple languages and allows you to convert text into natural-sounding speech using various voice options and customizable playback settings.
Authentication
This endpoint requires authentication using an API key.
Request
{
"model": "kokoro",
"input": "Hello and Welcome to Azerion Intelligence. How are you today?",
"voice": "af_bella",
"response_format": "mp3",
"speed": 1.0
}
Request Parameters
- model (string, required): ID of the model to use for speech generation. Use
kokoro
for the Kokoro TTS model. You can use the List Models endpoint to see available models. - input (string, required): The text to generate audio for. Maximum length is 4096 characters.
- voice (string, required): The voice to use when generating the audio. See the Voice Options section below for available voices organized by language.
- response_format (string, optional): The format to audio in. Supported formats are
mp3
,opus
,aac
,flac
,wav
, andpcm
. Defaults tomp3
. - speed (number, optional): The speed of the generated audio. Select a value from
0.25
to4.0
. Defaults to1.0
.
Voice Options
Kokoro TTS supports multiple languages with various voice options. Voices are named using a prefix system: language code + gender + name.
Supported Languages
English (American)
- Female voices:
af_alloy
,af_aoede
,af_bella
,af_heart
,af_jadzia
,af_jessica
,af_kore
,af_nicole
,af_nova
,af_river
,af_sarah
,af_sky
,af_v0
,af_v0bella
,af_v0irulan
,af_v0nicole
,af_v0sarah
,af_v0sky
- Male voices:
am_adam
,am_echo
,am_eric
,am_fenrir
,am_liam
,am_michael
,am_onyx
,am_puck
,am_santa
,am_v0adam
,am_v0gurney
,am_v0michael
English (British)
- Female voices:
bf_alice
,bf_emma
,bf_lily
,bf_v0emma
,bf_v0isabella
- Male voices:
bm_daniel
,bm_fable
,bm_george
,bm_lewis
,bm_v0george
,bm_v0lewis
French
- Female voices:
ff_siwis
German/European
- Female voices:
ef_dora
- Male voices:
em_alex
,em_santa
Hindi
- Female voices:
hf_alpha
,hf_beta
- Male voices:
hm_omega
,hm_psi
Italian
- Female voices:
if_sara
- Male voices:
im_nicola
Japanese
- Female voices:
jf_alpha
,jf_gongitsune
,jf_nezumi
,jf_tebukuro
- Male voices:
jm_kumo
Portuguese
- Female voices:
pf_dora
- Male voices:
pm_alex
,pm_santa
Chinese
- Female voices:
zf_xiaobei
,zf_xiaoni
,zf_xiaoxiao
,zf_xiaoyi
- Male voices:
zm_yunjian
,zm_yunxi
,zm_yunxia
,zm_yunyang
- For English content,
af_bella
andam_echo
are popular choices - For professional presentations, consider
bf_emma
orbm_daniel
for British English - For multilingual projects, select voices that match your target audience's language
- v0 versions are alternative voice variants with different characteristics
Response
A successful request returns a 200 OK
status code with the generated audio as binary data in the response body. The audio format matches the requested response_format
parameter.
Response Headers
Content-Type: audio/mpeg
Content-Length: [size in bytes]
Usage Statistics
Usage information is included in the response headers:
X-Usage-Input-Tokens: 15
X-Usage-Output-Tokens: 0
X-Usage-Total-Tokens: 15
Example Request (cURL)
curl https://api.azerion.ai/v1/audio/speech \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
-d '{
"model": "kokoro",
"input": "Hello and Welcome to Azerion Intelligence. How are you today?",
"voice": "af_bella",
"response_format": "mp3",
"speed": 1.0
}' \
--output speech.mp3
Replace YOUR_ACCESS_TOKEN
with your actual API key or access token. Refer to the Authentication guide for details on obtaining and using your credentials.
Example Request (Python)
import requests
import os
api_key = os.environ.get("AZERION_API_KEY") # Or AZERION_ACCESS_TOKEN
url = "https://api.azerion.ai/v1/audio/speech" # Replace with your actual base URL if different
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {api_key}"
}
data = {
"model": "kokoro",
"input": "Hello and Welcome to Azerion Intelligence. How are you today?",
"voice": "af_bella",
"response_format": "mp3",
"speed": 1.0
}
response = requests.post(url, headers=headers, json=data)
# Save the audio file
with open("speech.mp3", "wb") as f:
f.write(response.content)
print(f"Audio generated successfully!")
print(f"Content-Type: {response.headers.get('Content-Type')}")
print(f"Content-Length: {response.headers.get('Content-Length')} bytes")
Example Request (Node.js)
const fetch = require('node-fetch');
const fs = require('fs');
const apiKey = process.env.AZERION_API_KEY; // Or AZERION_ACCESS_TOKEN
const url = 'https://api.azerion.ai/v1/audio/speech'; // Replace with your actual base URL if different
const headers = {
'Content-Type': 'application/json',
'Authorization': `Bearer ${apiKey}`
};
const data = {
model: 'kokoro',
input: 'Hello and Welcome to Azerion Intelligence. How are you today?',
voice: 'af_bella',
response_format: 'mp3',
speed: 1.0
};
fetch(url, {
method: 'POST',
headers: headers,
body: JSON.stringify(data)
})
.then(response => {
if (!response.ok) {
throw new Error(`HTTP error! status: ${response.status}`);
}
return response.buffer();
})
.then(audioBuffer => {
fs.writeFileSync('speech.mp3', audioBuffer);
console.log('Audio generated successfully!');
console.log('File saved as speech.mp3');
})
.catch(error => console.error('Error:', error));