Skip to main content

Chat completions

Create chat completions from a sequence of messages using Azerion Intelligence models.

Endpoint:

POST https://api.azerion.ai/v1/chat/completions

Description

This endpoint generates responses to conversational prompts. It allows you to define a conversation as a series of messages and have the model respond to it.

Authentication

This endpoint requires authentication using an API key.

Request

{
"model": "meta.llama3-1-405b-instruct-v1:0",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the world series in 2020?"}
],
"temperature": 0.01,
"max_tokens": 4096,
"top_p": 0.001,
"stream": true
}

Request Parameters

  • model (string, required): ID of the model to use. You can use the List Models endpoint to see available models.
  • messages (array, required): A list of messages describing the conversation so far.
    • Each message is an object with role and content fields:
      • role can be "system", "user", or "assistant"
      • content is the content of the message
  • temperature (number, optional, default: 1): Controls randomness. Lower values like 0.2 make responses more focused and deterministic, while higher values like 0.8 make output more random. In the example, it is set to 0.01.
  • max_tokens (integer, optional): The maximum number of tokens to generate in the response. In the example, it is set to 4096.
  • stream (boolean, optional, default: false): If set to true, partial message deltas will be sent as they are generated. In the example, it is set to true.
  • top_p (number, optional, default: 1): Controls diversity via nucleus sampling. A value of 0.1 means only the tokens comprising the top 10% probability mass are considered. In the example, it is set to 0.001.
  • stop (string or array of strings, optional): Up to 4 sequences where the API will stop generating further tokens.
  • presence_penalty (number, optional, default: 0): Positive values penalize new tokens based on whether they appear in the text so far, increasing the likelihood of the model discussing new topics.
  • frequency_penalty (number, optional, default: 0): Positive values penalize tokens based on their frequency in the text so far, decreasing the likelihood of repetition.
  • user (string, optional): A unique identifier representing your end-user, which can help monitor and detect abuse.
Using stream: true

Setting the stream parameter to true enables streaming responses, where partial message deltas are sent as they are generated. This is useful for applications that need to display responses in real-time.

Response

A successful request returns a 200 OK status code with a JSON response body.

Standard Response (Non-Streaming)

{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1699901000,
"model": "meta.llama3-1-405b-instruct-v1:0",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of France is Paris. It's known as the 'City of Light' and is famous for landmarks such as the Eiffel Tower, the Louvre Museum, and Notre-Dame Cathedral."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 23,
"completion_tokens": 32,
"total_tokens": 55
}
}

Response Fields

  • id (string): A unique identifier for the completion.
  • object (string): Always "chat.completion".
  • created (integer): Unix timestamp of when the completion was created.
  • model (string): The model used for the completion.
  • choices (array): An array of completion choices. Usually one choice unless multiple are specifically requested.
    • index (integer): The index of the choice.
    • message (object): The message generated by the model.
      • role (string): The role of the message (always "assistant" for responses).
      • content (string): The content of the message.
    • finish_reason (string): The reason generation stopped, which can be:
      • "stop": API returned complete message
      • "length": Maximum tokens reached
      • "content_filter": Content was omitted due to a content filter
  • usage (object): An object containing token usage information.
    • prompt_tokens (integer): Number of tokens in the prompt.
    • completion_tokens (integer): Number of tokens in the generated completion.
    • total_tokens (integer): Total number of tokens used (prompt + completion).

Streaming Response

When stream is set to true, the API returns a stream of server-sent events. Each event is a JSON object containing a partial completion.

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1699901000,"model":"meta.llama3-1-405b-instruct-v1:0","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1699901000,"model":"meta.llama3-1-405b-instruct-v1:0","choices":[{"index":0,"delta":{"content":"The"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1699901000,"model":"meta.llama3-1-405b-instruct-v1:0","choices":[{"index":0,"delta":{"content":" capital"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1699901000,"model":"meta.llama3-1-405b-instruct-v1:0","choices":[{"index":0,"delta":{"content":" of"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1699901000,"model":"meta.llama3-1-405b-instruct-v1:0","choices":[{"index":0,"delta":{"content":" France"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1699901000,"model":"meta.llama3-1-405b-instruct-v1:0","choices":[{"index":0,"delta":{"content":" is"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1699901000,"model":"meta.llama3-1-405b-instruct-v1:0","choices":[{"index":0,"delta":{"content":" Paris."},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1699901000,"model":"meta.llama3-1-405b-instruct-v1:0","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Example Request (cURL)

curl https://api.azerion.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
-d '{
"model": "meta.llama3-1-405b-instruct-v1:0", # Replace with an available model slug
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the world series in 2020?"}
],
"temperature": 0.01,
"max_tokens": 4096,
"top_p": 0.001,
"stream": true
}'
Replace Placeholder

Replace YOUR_ACCESS_TOKEN with your actual API key or access token. Refer to the Authentication guide for details on obtaining and using your credentials.

Example Request (Python)

import requests
import os

api_key = os.environ.get("AZERION_API_KEY") # Or AZERION_ACCESS_TOKEN
url = "https://api.azerion.ai/v1/chat/completions" # Replace with your actual base URL if different

headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {api_key}"
}

data = {
"model": "meta.llama3-1-405b-instruct-v1:0",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the world series in 2020?"}
],
"temperature": 0.01,
"max_tokens": 4096,
"top_p": 0.001,
"stream": true
}

response = requests.post(url, headers=headers, json=data)
print(response.json())

Example Request (Node.js)

const fetch = require('node-fetch');

const apiKey = process.env.AZERION_API_KEY; // Or AZERION_ACCESS_TOKEN
const url = 'https://api.azerion.ai/v1/chat/completions'; // Replace with your actual base URL if different

const headers = {
'Content-Type': 'application/json',
'Authorization': `Bearer ${apiKey}`
};

const data = {
model: 'meta.llama3-1-405b-instruct-v1:0',
messages: [
{role: 'system', content: 'You are a helpful assistant.'},
{role: 'user', content: 'Who won the world series in 2020?'}
],
temperature: 0.01,
max_tokens: 4096,
top_p: 0.001,
stream: true
};

fetch(url, {
method: 'POST',
headers: headers,
body: JSON.stringify(data)
})
.then(response => response.json())
.then(data => console.log(data))
.catch(error => console.error('Error:', error));