Audio generation

Generate speech audio from text using Azerion Intelligence models with multi-language support.

Endpoint:

POST https://api.azerion.ai/v1/audio/speech

Description

This endpoint generates realistic speech audio from text input using the Kokoro open-source TTS model. It supports multiple languages and allows you to convert text into natural-sounding speech using various voice options and customizable playback settings.

Authentication

This endpoint requires authentication using an API key.

Request

{
  "model": "kokoro",
  "input": "Hello and Welcome to Azerion Intelligence. How are you today?",
  "voice": "af_bella",
  "response_format": "mp3",
  "speed": 1.0
}

Request Parameters

model (string, required): ID of the model to use for speech generation. Use kokoro for the Kokoro TTS model. You can use the List Models endpoint to see available models.
input (string, required): The text to generate audio for. Maximum length is 4096 characters.
voice (string, required): The voice to use when generating the audio. See the Voice Options section below for available voices organized by language.
response_format (string, optional): The format to audio in. Supported formats are mp3, opus, aac, flac, wav, and pcm. Defaults to mp3.
speed (number, optional): The speed of the generated audio. Select a value from 0.25 to 4.0. Defaults to 1.0.

Voice Options

Kokoro TTS supports multiple languages with various voice options. Voices are named using a prefix system: language code + gender + name.

Supported Languages

English (American)

Female voices: af_alloy, af_aoede, af_bella, af_heart, af_jadzia, af_jessica, af_kore, af_nicole, af_nova, af_river, af_sarah, af_sky, af_v0, af_v0bella, af_v0irulan, af_v0nicole, af_v0sarah, af_v0sky
Male voices: am_adam, am_echo, am_eric, am_fenrir, am_liam, am_michael, am_onyx, am_puck, am_santa, am_v0adam, am_v0gurney, am_v0michael

English (British)

Female voices: bf_alice, bf_emma, bf_lily, bf_v0emma, bf_v0isabella
Male voices: bm_daniel, bm_fable, bm_george, bm_lewis, bm_v0george, bm_v0lewis

French

Female voices: ff_siwis

German/European

Female voices: ef_dora
Male voices: em_alex, em_santa

Hindi

Female voices: hf_alpha, hf_beta
Male voices: hm_omega, hm_psi

Italian

Female voices: if_sara
Male voices: im_nicola

Japanese

Female voices: jf_alpha, jf_gongitsune, jf_nezumi, jf_tebukuro
Male voices: jm_kumo

Portuguese

Female voices: pf_dora
Male voices: pm_alex, pm_santa

Chinese

Female voices: zf_xiaobei, zf_xiaoni, zf_xiaoxiao, zf_xiaoyi
Male voices: zm_yunjian, zm_yunxi, zm_yunxia, zm_yunyang

Choosing the Right Voice

For English content, af_bella and am_echo are popular choices
For professional presentations, consider bf_emma or bm_daniel for British English
For multilingual projects, select voices that match your target audience's language
v0 versions are alternative voice variants with different characteristics

Response

A successful request returns a 200 OK status code with the generated audio as binary data in the response body. The audio format matches the requested response_format parameter.

Response Headers

Content-Type: audio/mpeg
Content-Length: [size in bytes]

Usage Statistics

Usage information is included in the response headers:

X-Usage-Input-Tokens: 15
X-Usage-Output-Tokens: 0
X-Usage-Total-Tokens: 15

Example Request (cURL)

curl https://api.azerion.ai/v1/audio/speech \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
  -d '{
    "model": "kokoro",
    "input": "Hello and Welcome to Azerion Intelligence. How are you today?",
    "voice": "af_bella",
    "response_format": "mp3",
    "speed": 1.0
  }' \
  --output speech.mp3

Replace Placeholder

Replace YOUR_ACCESS_TOKEN with your actual API key or access token. Refer to the Authentication guide for details on obtaining and using your credentials.

Example Request (Python)

import requests
import os

api_key = os.environ.get("AZERION_API_KEY") # Or AZERION_ACCESS_TOKEN
url = "https://api.azerion.ai/v1/audio/speech" # Replace with your actual base URL if different

headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {api_key}"
}

data = {
    "model": "kokoro",
    "input": "Hello and Welcome to Azerion Intelligence. How are you today?",
    "voice": "af_bella",
    "response_format": "mp3",
    "speed": 1.0
}

response = requests.post(url, headers=headers, json=data)

# Save the audio file
with open("speech.mp3", "wb") as f:
    f.write(response.content)

print(f"Audio generated successfully!")
print(f"Content-Type: {response.headers.get('Content-Type')}")
print(f"Content-Length: {response.headers.get('Content-Length')} bytes")

Example Request (Node.js)

const fetch = require('node-fetch');
const fs = require('fs');

const apiKey = process.env.AZERION_API_KEY; // Or AZERION_ACCESS_TOKEN
const url = 'https://api.azerion.ai/v1/audio/speech'; // Replace with your actual base URL if different

const headers = {
  'Content-Type': 'application/json',
  'Authorization': `Bearer ${apiKey}`
};

const data = {
  model: 'kokoro',
  input: 'Hello and Welcome to Azerion Intelligence. How are you today?',
  voice: 'af_bella',
  response_format: 'mp3',
  speed: 1.0
};

fetch(url, {
  method: 'POST',
  headers: headers,
  body: JSON.stringify(data)
})
.then(response => {
  if (!response.ok) {
    throw new Error(`HTTP error! status: ${response.status}`);
  }
  return response.buffer();
})
.then(audioBuffer => {
  fs.writeFileSync('speech.mp3', audioBuffer);
  console.log('Audio generated successfully!');
  console.log('File saved as speech.mp3');
})
.catch(error => console.error('Error:', error));

Endpoint:​

Description​

Authentication​

Request​

Request Parameters​

Voice Options​

Supported Languages​

Response​

Response Headers​

Usage Statistics​

Example Request (cURL)​

Example Request (Python)​

Example Request (Node.js)​