Skip to main content

Veo 3

Generate high-quality videos with native audio from text or image prompts using Google DeepMind's Veo 3 model.

Endpoint

POST https://api.azerion.ai/v1/videos/generation

Description

Veo 3 is Google DeepMind's video generation model featuring native audio generation with synchronized dialogue, sound effects, and ambient sounds. It produces videos up to 1080p resolution with 4–8 second duration, realistic physics simulation, and improved prompt fidelity.

Veo 3 supports text-to-video, image-to-video, and video extension.

Authentication

This endpoint requires authentication using an API key.

Request

{
"model": "google-veo-3",
"prompt": "A street musician plays acoustic guitar on a cobblestone sidewalk at sunset, the warm strumming echoes off nearby buildings as passersby drop coins into an open case",
"n": 1,
"image": {
"mime_type": "image/jpeg",
"url": "https://example.com/images/input-image.jpg",
"base_64_encoded": "BASE64_ENCODED_IMAGE_DATA"
},
"video": {
"mime_type": "video/mp4",
"url": "https://example.com/videos/input-video.mp4",
"base_64_encoded": "BASE64_ENCODED_VIDEO_DATA"
},
"generate_audio": true,
"duration": 8,
"resolution": "1080p",
"aspect_ratio": "16:9",
"seed": 42,
"negative_prompt": "blurry, low quality, distorted",
"person_generation": "allow_adult",
"resize_mode": "crop",
"compression_quality": "optimized",
"output_format": "url"
}

Request Parameters

Core Parameters

  • model (string, required): Model ID. Use google-veo-3.
  • prompt (string, required): Text description of the desired video. Maximum 4000 characters. Required for text-to-video; optional but recommended for image-to-video.
  • n (integer, optional): Number of videos to generate. Range: 1–4. Default: 1.

Image-to-Video

  • image (object, optional): Input image for image-to-video generation. Recommended: 720p+ resolution with 16:9 or 9:16 aspect ratio. Provide either url or base_64_encoded.
    • mime_type (string): MIME type of the image. Accepted: "image/jpeg", "image/png".
    • url (string): URL of the input image.
    • base_64_encoded (string): Base64-encoded image data.

Video Extension

  • video (object, optional): A previously Veo-generated video to extend. Adds approximately 7 seconds to the original video. Provide either url or base_64_encoded.
    • mime_type (string): MIME type of the video. Accepted: "video/mp4".
    • url (string): URL of the input video.
    • base_64_encoded (string): Base64-encoded video data.

Generation Settings

  • generate_audio (boolean, optional): Enable native audio generation including dialogue, sound effects, and ambient sounds. Default: true.
  • duration (integer, optional): Video duration in seconds. Accepted: 4, 6, 8. Default: 8.
  • resolution (string, optional): Output resolution. Accepted: "720p", "1080p". Default: "720p". Veo 3+ models only.
  • aspect_ratio (string, optional): Video aspect ratio. Accepted: "16:9", "9:16". Default: "16:9".
  • seed (integer, optional): Seed for reproducible results. Range: 0–4294967295.
  • negative_prompt (string, optional): Describe elements to exclude from the generated video.
  • person_generation (string, optional): Control person generation. Accepted: "allow_adult" (default), "allow_all", "dont_allow".
  • resize_mode (string, optional): How the input image is resized/cropped. Image-to-video only.
  • compression_quality (string, optional): Controls output video compression.
  • output_format (string, optional): Response format. Accepted: "base64", "url". Default: "base64". When set to "url", the response returns a URL instead of base64-encoded data.
note

enhance_prompt is not supported on Veo 3. It is only available for Veo 2 models.

Writing Audio-Aware Prompts

When generate_audio is enabled, include audio cues in your prompt for best results. Describe dialogue, sound effects, and ambient sounds alongside the visual scene. For example: "A barista steams milk with a loud hiss, then pours latte art while soft jazz plays in the background and customers chat quietly."

Response

A successful request returns a 200 OK status code with a JSON response body.

When output_format is "base64" (default)

{
"videos": [
{
"base64_encoded": "AAAAIGZ0eXBpc29tAAACAGlzb21pc..."
}
]
}

When output_format is "url"

{
"videos": [
{
"url": "https://storage.example.com/generated-video.mp4"
}
]
}

Response Fields

  • videos (array): An array of generated video objects.
    • base64_encoded (string): The base64-encoded video data (MP4, 24 FPS). Present when output_format is "base64".
    • url (string): URL to the generated video. Present when output_format is "url".

Working with Base64 Video Data

The response returns videos as base64-encoded strings by default. For details on decoding, saving, and displaying videos, see the Video Generation page.

Example Requests

Text-to-Video (cURL)

curl -X POST https://api.azerion.ai/v1/videos/generation \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
-d '{
"model": "google-veo-3",
"prompt": "A street musician plays acoustic guitar on a cobblestone sidewalk at sunset, the warm strumming echoes off nearby buildings as passersby drop coins into an open case",
"generate_audio": true,
"duration": 8,
"resolution": "1080p",
"aspect_ratio": "16:9",
"n": 1
}'

Image-to-Video (cURL)

curl -X POST https://api.azerion.ai/v1/videos/generation \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
-d '{
"model": "google-veo-3",
"prompt": "The camera slowly zooms in as the scene comes to life with gentle motion",
"image": {
"mime_type": "image/jpeg",
"url": "https://example.com/images/input-image.jpg"
},
"generate_audio": true,
"duration": 8,
"resolution": "720p"
}'
Replace Placeholder

Replace YOUR_ACCESS_TOKEN with your actual API key or access token. Refer to the Authentication guide for details on obtaining and using your credentials.

Text-to-Video (Python)

import requests
import base64
import os

api_key = os.environ.get("AZERION_API_KEY")
url = "https://api.azerion.ai/v1/videos/generation"

headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {api_key}"
}

data = {
"model": "google-veo-3",
"prompt": "A street musician plays acoustic guitar on a cobblestone sidewalk at sunset, the warm strumming echoes off nearby buildings as passersby drop coins into an open case",
"generate_audio": True,
"duration": 8,
"resolution": "1080p",
"aspect_ratio": "16:9",
"n": 1
}

response = requests.post(url, headers=headers, json=data)
result = response.json()

print(f"Status code: {response.status_code}")

# Save the video file
video_data = base64.b64decode(result["videos"][0]["base64_encoded"])
with open("generated_video.mp4", "wb") as f:
f.write(video_data)
print("Video saved as generated_video.mp4")

Text-to-Video (Node.js)

const fetch = require('node-fetch');
const fs = require('fs');

const apiKey = process.env.AZERION_API_KEY;
const url = 'https://api.azerion.ai/v1/videos/generation';

const headers = {
'Content-Type': 'application/json',
'Authorization': `Bearer ${apiKey}`
};

const data = {
model: 'google-veo-3',
prompt: 'A street musician plays acoustic guitar on a cobblestone sidewalk at sunset, the warm strumming echoes off nearby buildings as passersby drop coins into an open case',
generate_audio: true,
duration: 8,
resolution: '1080p',
aspect_ratio: '16:9',
n: 1
};

fetch(url, {
method: 'POST',
headers: headers,
body: JSON.stringify(data)
})
.then(response => response.json())
.then(result => {
// Save the video file
const videoData = Buffer.from(result.videos[0].base64_encoded, 'base64');
fs.writeFileSync('generated_video.mp4', videoData);
console.log('Video saved as generated_video.mp4');
})
.catch(error => console.error('Error:', error));

Output Specifications

SpecValue
Output FormatMP4
Frame Rate24 FPS
Native AudioDialogue, SFX, ambient sounds
Image-to-VideoSupported
Video ExtensionSupported
Reference ImagesNot supported (see Veo 3.1)
First/Last Frame ControlNot supported (see Veo 3.1)