Veo 3

Generate high-quality videos with native audio from text or image prompts using Google DeepMind's Veo 3 model.

Endpoint

POST https://api.azerion.ai/v1/videos/generation

Description

Veo 3 is Google DeepMind's video generation model featuring native audio generation with synchronized dialogue, sound effects, and ambient sounds. It produces videos up to 1080p resolution with 4–8 second duration, realistic physics simulation, and improved prompt fidelity.

Veo 3 supports text-to-video, image-to-video, and video extension.

Authentication

This endpoint requires authentication using an API key.

Request

{
  "model": "google-veo-3",
  "prompt": "A street musician plays acoustic guitar on a cobblestone sidewalk at sunset, the warm strumming echoes off nearby buildings as passersby drop coins into an open case",
  "n": 1,
  "image": {
    "mime_type": "image/jpeg",
    "url": "https://example.com/images/input-image.jpg",
    "base_64_encoded": "BASE64_ENCODED_IMAGE_DATA"
  },
  "video": {
    "mime_type": "video/mp4",
    "url": "https://example.com/videos/input-video.mp4",
    "base_64_encoded": "BASE64_ENCODED_VIDEO_DATA"
  },
  "generate_audio": true,
  "duration": 8,
  "resolution": "1080p",
  "aspect_ratio": "16:9",
  "seed": 42,
  "negative_prompt": "blurry, low quality, distorted",
  "person_generation": "allow_adult",
  "resize_mode": "crop",
  "compression_quality": "optimized",
  "output_format": "url"
}

Request Parameters

Core Parameters

model (string, required): Model ID. Use google-veo-3.
prompt (string, required): Text description of the desired video. Maximum 4000 characters. Required for text-to-video; optional but recommended for image-to-video.
n (integer, optional): Number of videos to generate. Range: 1–4. Default: 1.

Image-to-Video

image (object, optional): Input image for image-to-video generation. Recommended: 720p+ resolution with 16:9 or 9:16 aspect ratio. Provide either url or base_64_encoded.
- mime_type (string): MIME type of the image. Accepted: "image/jpeg", "image/png".
- url (string): URL of the input image.
- base_64_encoded (string): Base64-encoded image data.

Video Extension

video (object, optional): A previously Veo-generated video to extend. Adds approximately 7 seconds to the original video. Provide either url or base_64_encoded.
- mime_type (string): MIME type of the video. Accepted: "video/mp4".
- url (string): URL of the input video.
- base_64_encoded (string): Base64-encoded video data.

Generation Settings

generate_audio (boolean, optional): Enable native audio generation including dialogue, sound effects, and ambient sounds. Default: true.
duration (integer, optional): Video duration in seconds. Accepted: 4, 6, 8. Default: 8.
resolution (string, optional): Output resolution. Accepted: "720p", "1080p". Default: "720p". Veo 3+ models only.
aspect_ratio (string, optional): Video aspect ratio. Accepted: "16:9", "9:16". Default: "16:9".
seed (integer, optional): Seed for reproducible results. Range: 0–4294967295.
negative_prompt (string, optional): Describe elements to exclude from the generated video.
person_generation (string, optional): Control person generation. Accepted: "allow_adult" (default), "allow_all", "dont_allow".
resize_mode (string, optional): How the input image is resized/cropped. Image-to-video only.
compression_quality (string, optional): Controls output video compression.
output_format (string, optional): Response format. Accepted: "base64", "url". Default: "base64". When set to "url", the response returns a URL instead of base64-encoded data.

note

enhance_prompt is not supported on Veo 3. It is only available for Veo 2 models.

Writing Audio-Aware Prompts

When generate_audio is enabled, include audio cues in your prompt for best results. Describe dialogue, sound effects, and ambient sounds alongside the visual scene. For example: "A barista steams milk with a loud hiss, then pours latte art while soft jazz plays in the background and customers chat quietly."

Response

A successful request returns a 200 OK status code with a JSON response body.

When `output_format` is `"base64"` (default)

{
  "videos": [
    {
      "base64_encoded": "AAAAIGZ0eXBpc29tAAACAGlzb21pc..."
    }
  ]
}

When `output_format` is `"url"`

{
  "videos": [
    {
      "url": "https://storage.example.com/generated-video.mp4"
    }
  ]
}

Response Fields

videos (array): An array of generated video objects.
- base64_encoded (string): The base64-encoded video data (MP4, 24 FPS). Present when output_format is "base64".
- url (string): URL to the generated video. Present when output_format is "url".

Working with Base64 Video Data

The response returns videos as base64-encoded strings by default. For details on decoding, saving, and displaying videos, see the Video Generation page.

Example Requests

Text-to-Video (cURL)

curl -X POST https://api.azerion.ai/v1/videos/generation \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
  -d '{
    "model": "google-veo-3",
    "prompt": "A street musician plays acoustic guitar on a cobblestone sidewalk at sunset, the warm strumming echoes off nearby buildings as passersby drop coins into an open case",
    "generate_audio": true,
    "duration": 8,
    "resolution": "1080p",
    "aspect_ratio": "16:9",
    "n": 1
  }'

Image-to-Video (cURL)

curl -X POST https://api.azerion.ai/v1/videos/generation \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
  -d '{
    "model": "google-veo-3",
    "prompt": "The camera slowly zooms in as the scene comes to life with gentle motion",
    "image": {
      "mime_type": "image/jpeg",
      "url": "https://example.com/images/input-image.jpg"
    },
    "generate_audio": true,
    "duration": 8,
    "resolution": "720p"
  }'

Replace Placeholder

Replace YOUR_ACCESS_TOKEN with your actual API key or access token. Refer to the Authentication guide for details on obtaining and using your credentials.

Text-to-Video (Python)

import requests
import base64
import os

api_key = os.environ.get("AZERION_API_KEY")
url = "https://api.azerion.ai/v1/videos/generation"

headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {api_key}"
}

data = {
    "model": "google-veo-3",
    "prompt": "A street musician plays acoustic guitar on a cobblestone sidewalk at sunset, the warm strumming echoes off nearby buildings as passersby drop coins into an open case",
    "generate_audio": True,
    "duration": 8,
    "resolution": "1080p",
    "aspect_ratio": "16:9",
    "n": 1
}

response = requests.post(url, headers=headers, json=data)
result = response.json()

print(f"Status code: {response.status_code}")

# Save the video file
video_data = base64.b64decode(result["videos"][0]["base64_encoded"])
with open("generated_video.mp4", "wb") as f:
    f.write(video_data)
print("Video saved as generated_video.mp4")

Text-to-Video (Node.js)

const fetch = require('node-fetch');
const fs = require('fs');

const apiKey = process.env.AZERION_API_KEY;
const url = 'https://api.azerion.ai/v1/videos/generation';

const headers = {
  'Content-Type': 'application/json',
  'Authorization': `Bearer ${apiKey}`
};

const data = {
  model: 'google-veo-3',
  prompt: 'A street musician plays acoustic guitar on a cobblestone sidewalk at sunset, the warm strumming echoes off nearby buildings as passersby drop coins into an open case',
  generate_audio: true,
  duration: 8,
  resolution: '1080p',
  aspect_ratio: '16:9',
  n: 1
};

fetch(url, {
  method: 'POST',
  headers: headers,
  body: JSON.stringify(data)
})
.then(response => response.json())
.then(result => {
  // Save the video file
  const videoData = Buffer.from(result.videos[0].base64_encoded, 'base64');
  fs.writeFileSync('generated_video.mp4', videoData);
  console.log('Video saved as generated_video.mp4');
})
.catch(error => console.error('Error:', error));

Output Specifications

Spec	Value
Output Format	MP4
Frame Rate	24 FPS
Native Audio	Dialogue, SFX, ambient sounds
Image-to-Video	Supported
Video Extension	Supported
Reference Images	Not supported (see Veo 3.1)
First/Last Frame Control	Not supported (see Veo 3.1)

Veo 3.1 — Adds reference images and first/last frame control
Google Veo Documentation
List Models

Endpoint​

Description​

Authentication​

Request​

Request Parameters​

Core Parameters​

Image-to-Video​

Video Extension​

Generation Settings​

Response​

When output_format is "base64" (default)​

When output_format is "url"​

Response Fields​

Working with Base64 Video Data​

Example Requests​

Text-to-Video (cURL)​

Image-to-Video (cURL)​

Text-to-Video (Python)​

Text-to-Video (Node.js)​

Output Specifications​

Related Resources​