Veo 3
Generate high-quality videos with native audio from text or image prompts using Google DeepMind's Veo 3 model.
Endpoint
POST https://api.azerion.ai/v1/videos/generation
Description
Veo 3 is Google DeepMind's video generation model featuring native audio generation with synchronized dialogue, sound effects, and ambient sounds. It produces videos up to 1080p resolution with 4–8 second duration, realistic physics simulation, and improved prompt fidelity.
Veo 3 supports text-to-video, image-to-video, and video extension.
Authentication
This endpoint requires authentication using an API key.
Request
{
"model": "google-veo-3",
"prompt": "A street musician plays acoustic guitar on a cobblestone sidewalk at sunset, the warm strumming echoes off nearby buildings as passersby drop coins into an open case",
"n": 1,
"image": {
"mime_type": "image/jpeg",
"url": "https://example.com/images/input-image.jpg",
"base_64_encoded": "BASE64_ENCODED_IMAGE_DATA"
},
"video": {
"mime_type": "video/mp4",
"url": "https://example.com/videos/input-video.mp4",
"base_64_encoded": "BASE64_ENCODED_VIDEO_DATA"
},
"generate_audio": true,
"duration": 8,
"resolution": "1080p",
"aspect_ratio": "16:9",
"seed": 42,
"negative_prompt": "blurry, low quality, distorted",
"person_generation": "allow_adult",
"resize_mode": "crop",
"compression_quality": "optimized",
"output_format": "url"
}
Request Parameters
Core Parameters
- model (string, required): Model ID. Use
google-veo-3. - prompt (string, required): Text description of the desired video. Maximum 4000 characters. Required for text-to-video; optional but recommended for image-to-video.
- n (integer, optional): Number of videos to generate. Range:
1–4. Default:1.
Image-to-Video
- image (object, optional): Input image for image-to-video generation. Recommended: 720p+ resolution with 16:9 or 9:16 aspect ratio. Provide either
urlorbase_64_encoded.- mime_type (string): MIME type of the image. Accepted:
"image/jpeg","image/png". - url (string): URL of the input image.
- base_64_encoded (string): Base64-encoded image data.
- mime_type (string): MIME type of the image. Accepted:
Video Extension
- video (object, optional): A previously Veo-generated video to extend. Adds approximately 7 seconds to the original video. Provide either
urlorbase_64_encoded.- mime_type (string): MIME type of the video. Accepted:
"video/mp4". - url (string): URL of the input video.
- base_64_encoded (string): Base64-encoded video data.
- mime_type (string): MIME type of the video. Accepted:
Generation Settings
- generate_audio (boolean, optional): Enable native audio generation including dialogue, sound effects, and ambient sounds. Default:
true. - duration (integer, optional): Video duration in seconds. Accepted:
4,6,8. Default:8. - resolution (string, optional): Output resolution. Accepted:
"720p","1080p". Default:"720p". Veo 3+ models only. - aspect_ratio (string, optional): Video aspect ratio. Accepted:
"16:9","9:16". Default:"16:9". - seed (integer, optional): Seed for reproducible results. Range:
0–4294967295. - negative_prompt (string, optional): Describe elements to exclude from the generated video.
- person_generation (string, optional): Control person generation. Accepted:
"allow_adult"(default),"allow_all","dont_allow". - resize_mode (string, optional): How the input image is resized/cropped. Image-to-video only.
- compression_quality (string, optional): Controls output video compression.
- output_format (string, optional): Response format. Accepted:
"base64","url". Default:"base64". When set to"url", the response returns a URL instead of base64-encoded data.
enhance_prompt is not supported on Veo 3. It is only available for Veo 2 models.
When generate_audio is enabled, include audio cues in your prompt for best results. Describe dialogue, sound effects, and ambient sounds alongside the visual scene. For example: "A barista steams milk with a loud hiss, then pours latte art while soft jazz plays in the background and customers chat quietly."
Response
A successful request returns a 200 OK status code with a JSON response body.
When output_format is "base64" (default)
{
"videos": [
{
"base64_encoded": "AAAAIGZ0eXBpc29tAAACAGlzb21pc..."
}
]
}
When output_format is "url"
{
"videos": [
{
"url": "https://storage.example.com/generated-video.mp4"
}
]
}
Response Fields
- videos (array): An array of generated video objects.
- base64_encoded (string): The base64-encoded video data (MP4, 24 FPS). Present when
output_formatis"base64". - url (string): URL to the generated video. Present when
output_formatis"url".
- base64_encoded (string): The base64-encoded video data (MP4, 24 FPS). Present when
Working with Base64 Video Data
The response returns videos as base64-encoded strings by default. For details on decoding, saving, and displaying videos, see the Video Generation page.
Example Requests
Text-to-Video (cURL)
curl -X POST https://api.azerion.ai/v1/videos/generation \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
-d '{
"model": "google-veo-3",
"prompt": "A street musician plays acoustic guitar on a cobblestone sidewalk at sunset, the warm strumming echoes off nearby buildings as passersby drop coins into an open case",
"generate_audio": true,
"duration": 8,
"resolution": "1080p",
"aspect_ratio": "16:9",
"n": 1
}'
Image-to-Video (cURL)
curl -X POST https://api.azerion.ai/v1/videos/generation \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
-d '{
"model": "google-veo-3",
"prompt": "The camera slowly zooms in as the scene comes to life with gentle motion",
"image": {
"mime_type": "image/jpeg",
"url": "https://example.com/images/input-image.jpg"
},
"generate_audio": true,
"duration": 8,
"resolution": "720p"
}'
Replace YOUR_ACCESS_TOKEN with your actual API key or access token. Refer to the Authentication guide for details on obtaining and using your credentials.
Text-to-Video (Python)
import requests
import base64
import os
api_key = os.environ.get("AZERION_API_KEY")
url = "https://api.azerion.ai/v1/videos/generation"
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {api_key}"
}
data = {
"model": "google-veo-3",
"prompt": "A street musician plays acoustic guitar on a cobblestone sidewalk at sunset, the warm strumming echoes off nearby buildings as passersby drop coins into an open case",
"generate_audio": True,
"duration": 8,
"resolution": "1080p",
"aspect_ratio": "16:9",
"n": 1
}
response = requests.post(url, headers=headers, json=data)
result = response.json()
print(f"Status code: {response.status_code}")
# Save the video file
video_data = base64.b64decode(result["videos"][0]["base64_encoded"])
with open("generated_video.mp4", "wb") as f:
f.write(video_data)
print("Video saved as generated_video.mp4")
Text-to-Video (Node.js)
const fetch = require('node-fetch');
const fs = require('fs');
const apiKey = process.env.AZERION_API_KEY;
const url = 'https://api.azerion.ai/v1/videos/generation';
const headers = {
'Content-Type': 'application/json',
'Authorization': `Bearer ${apiKey}`
};
const data = {
model: 'google-veo-3',
prompt: 'A street musician plays acoustic guitar on a cobblestone sidewalk at sunset, the warm strumming echoes off nearby buildings as passersby drop coins into an open case',
generate_audio: true,
duration: 8,
resolution: '1080p',
aspect_ratio: '16:9',
n: 1
};
fetch(url, {
method: 'POST',
headers: headers,
body: JSON.stringify(data)
})
.then(response => response.json())
.then(result => {
// Save the video file
const videoData = Buffer.from(result.videos[0].base64_encoded, 'base64');
fs.writeFileSync('generated_video.mp4', videoData);
console.log('Video saved as generated_video.mp4');
})
.catch(error => console.error('Error:', error));
Output Specifications
| Spec | Value |
|---|---|
| Output Format | MP4 |
| Frame Rate | 24 FPS |
| Native Audio | Dialogue, SFX, ambient sounds |
| Image-to-Video | Supported |
| Video Extension | Supported |
| Reference Images | Not supported (see Veo 3.1) |
| First/Last Frame Control | Not supported (see Veo 3.1) |
Related Resources
- Veo 3.1 — Adds reference images and first/last frame control
- Google Veo Documentation
- List Models