Tool & Function Calling

Function calling enables AI models to interact with external systems and APIs by generating structured requests for specific tools or functions. When you provide function definitions to a model, it can intelligently determine when to use these functions based on user input and generate the appropriate parameters needed to execute them.

The process works as a collaborative workflow: the model identifies the need for external data or actions, formats the function call with the correct parameters, and you execute the actual function in your application. The results are then fed back to the model, which incorporates this information into its final response.

Azerion Intelligence standardizes the tool calling interface across models and providers. In an API call, you can describe functions and have the model intelligently choose to output a JSON object containing arguments to call one or many functions.

The Chat Completions API does not call the function; instead, the model generates JSON that you can use to call the function in your code.

Supported models

Not all model families currently support function calling. The following models support function calling with both streaming and non-streaming responses:

Anthropic Claude Models:

anthropic.claude-opus-4-20250514-v1:0
anthropic.claude-sonnet-4-20250514-v1:0
claude-3-7-sonnet-20250219-v1:0
claude-3-haiku-20240307-v1:0

Other Models (Non-streaming only):

qwen2_5-14b-instruct
meta.llama3-1-70b-instruct-v1:0
meta.llama3-1-405b-instruct-v1:0

Streaming Support

Models marked as "Non-streaming only" currently support function calling in non-streaming mode. Streaming support for these models is being developed.

Common use cases

Function calling is useful for creating assistants that can answer questions by calling external APIs, converting natural language into API calls, and extracting structured data from text.

Basic example

from openai import OpenAI
client = OpenAI(
    api_key="your-api-key",
    base_url="https://api.azerion.ai/v1"
)

def get_current_weather(location, unit="fahrenheit"):
    """Get the current weather in a given location"""
    if "tokyo" in location.lower():
        return json.dumps({"location": "Tokyo", "temperature": "10", "unit": unit})
    elif "san francisco" in location.lower():
        return json.dumps({"location": "San Francisco", "temperature": "72", "unit": unit})
    elif "paris" in location.lower():
        return json.dumps({"location": "Paris", "temperature": "22", "unit": unit})
    else:
        return json.dumps({"location": location, "temperature": "unknown"})

def run_conversation():
    # Step 1: send the conversation and available functions to the model
    messages = [{"role": "user", "content": "What's the weather like in San Francisco, Tokyo, and Paris?"}]
    tools = [
        {
            "type": "function",
            "function": {
                "name": "get_current_weather",
                "description": "Get the current weather in a given location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and state, e.g. San Francisco, CA",
                        },
                        "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
                    },
                    "required": ["location"],
                },
            },
        }
    ]
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=messages,
        tools=tools,
        tool_choice="auto",  # auto is default, but we'll be explicit
    )
    response_message = response.choices[0].message
    tool_calls = response_message.tool_calls
    # Step 2: check if the model wanted to call a function
    if tool_calls:
        # Step 3: call the function
        # Note: the JSON response may not always be valid; be sure to handle errors
        available_functions = {
            "get_current_weather": get_current_weather,
        }  # only one function in this example, but you can have multiple
        messages.append(response_message)  # extend conversation with assistant's reply
        # Step 4: send the info for each function call and function response to the model
        for tool_call in tool_calls:
            function_name = tool_call.function.name
            function_to_call = available_functions[function_name]
            function_args = json.loads(tool_call.function.arguments)
            function_response = function_to_call(
                location=function_args.get("location"),
                unit=function_args.get("unit"),
            )
            messages.append(
                {
                    "tool_call_id": tool_call.id,
                    "role": "tool",
                    "name": function_name,
                    "content": function_response,
                }
            )  # extend conversation with function response
        second_response = client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=messages,
        )  # get a new response from the model where it can see the function response
        return second_response
print(run_conversation())

Function calling behavior

The default behavior (tool_choice: "auto") is for the model to decide on its own whether to call a function and, if so, which function to call. We can also set tool_choice: "none" to force the model to not call a function, or force the model to call a specific function by setting tool_choice: {"type": "function", "function": {"name": "my_function"}}.

Parallel function calling

The model can call multiple functions in a single response. For example, the model might call functions to get the weather in 3 different locations at the same time, which would result in a message with 3 function calls in the tool_calls array, each with a unique id.

Function calling with structured outputs

Function calling is also supported with structured outputs. When you supply strict: true for a function, the model will always follow the exact schema you provide.

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get weather information for a location",
            "strict": True,
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string"},
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
                },
                "required": ["location"],
                "additionalProperties": False
            }
        }
    }
]

Tokens

Under the hood, functions are injected into the system message in a syntax the model has been trained on. This means functions count against the model's context limit and are billed as input tokens. If running into context limits, we suggest limiting the number of functions or the length of documentation you provide for function parameters.

It is also possible to use fine-tuning to reduce the number of tokens used if you have many functions defined.

Example with curl

curl https://api.azerion.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $AZERION_API_KEY" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [
      {
        "role": "user",
        "content": "What is the weather like in Boston?"
      }
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_current_weather",
          "description": "Get the current weather in a given location",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {
                "type": "string",
                "description": "The city and state, e.g. San Francisco, CA"
              },
              "unit": {
                "type": "string",
                "enum": ["celsius", "fahrenheit"]
              }
            },
            "required": ["location"]
          }
        }
      }
    ],
    "tool_choice": "auto"
  }'

Response format

When the model decides to call a function, the response will include a tool_calls array in the message:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1699896916,
  "model": "gpt-3.5-turbo",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "call_abc123",
            "type": "function",
            "function": {
              "name": "get_current_weather",
              "arguments": "{\n\"location\": \"Boston, MA\"\n}"
            }
          }
        ]
      },
      "logprobs": null,
      "finish_reason": "tool_calls"
    }
  ],
  "usage": {
    "prompt_tokens": 82,
    "completion_tokens": 17,
    "total_tokens": 99
  }
}

Supported models​

Common use cases​

Basic example​

Function calling behavior​

Parallel function calling​

Function calling with structured outputs​

Tokens​

Example with curl​

Response format​