Skip to main content

Veo 3.1

Generate high-quality videos with advanced control using Google DeepMind's Veo 3.1 model.

Endpoint

POST https://api.azerion.ai/v1/videos/generation

Description

Veo 3.1 builds on Veo 3 with richer and better-synchronized audio, first/last frame control for precise transitions, and subject reference images for consistent character and object appearance across generations. It supports text-to-video, image-to-video, and video extension.

Authentication

This endpoint requires authentication using an API key.

Request

{
"model": "google-veo-3-1",
"prompt": "A detective in a trench coat walks down a rain-soaked neon-lit alley, footsteps echoing against the wet pavement",
"n": 1,
"image": {
"mime_type": "image/jpeg",
"url": "https://example.com/images/first-frame.jpg",
"base_64_encoded": "BASE64_FIRST_FRAME_DATA"
},
"last_frame": {
"mime_type": "image/jpeg",
"url": "https://example.com/images/last-frame.jpg",
"base_64_encoded": "BASE64_LAST_FRAME_DATA"
},
"reference_images": [
{
"image": {
"mime_type": "image/jpeg",
"url": "https://example.com/images/reference.jpg",
"base_64_encoded": "BASE64_REFERENCE_IMAGE_DATA"
},
"reference_type": "subject"
}
],
"video": {
"mime_type": "video/mp4",
"url": "https://example.com/videos/input-video.mp4",
"base_64_encoded": "BASE64_ENCODED_VIDEO_DATA"
},
"generate_audio": true,
"duration": 8,
"resolution": "1080p",
"aspect_ratio": "16:9",
"seed": 42,
"negative_prompt": "blurry, low quality, distorted",
"person_generation": "allow_adult",
"resize_mode": "crop",
"compression_quality": "optimized",
"output_format": "url"
}

Request Parameters

Core Parameters

  • model (string, required): Model ID. Use google-veo-3-1.
  • prompt (string, required): Text description of the desired video. Maximum 4000 characters. Required for text-to-video; optional but recommended for image-to-video.
  • n (integer, optional): Number of videos to generate. Range: 1–4. Default: 1.

Image-to-Video

  • image (object, optional): Input image for image-to-video generation. Also acts as the first frame when used with last_frame. Recommended: 720p+ resolution with 16:9 or 9:16 aspect ratio. Provide either url or base_64_encoded.
    • mime_type (string): MIME type of the image. Accepted: "image/jpeg", "image/png".
    • url (string): URL of the input image.
    • base_64_encoded (string): Base64-encoded image data.

Last Frame Control

  • last_frame (object, optional): Specifies the final frame of the video. Use with image (first frame) to create controlled transitions between two states. Provide either url or base_64_encoded.
    • mime_type (string): MIME type of the image. Accepted: "image/jpeg", "image/png".
    • url (string): URL of the last frame image.
    • base_64_encoded (string): Base64-encoded image data.

Reference Images

  • reference_images (array, optional): Up to 3 reference images to guide character or object appearance. Each object contains:
    • image (object): The reference image. Provide either url or base_64_encoded.
      • mime_type (string): MIME type. Accepted: "image/jpeg", "image/png".
      • url (string): URL of the reference image.
      • base_64_encoded (string): Base64-encoded image data.
    • reference_type (string): "subject" — guides character/object appearance.
warning

Style reference images ("style" type) are not supported on Veo 3.1. Only "subject" reference type is available.

Video Extension

  • video (object, optional): A previously Veo-generated video to extend. Adds approximately 7 seconds to the original video. Provide either url or base_64_encoded.
    • mime_type (string): MIME type of the video. Accepted: "video/mp4".
    • url (string): URL of the input video.
    • base_64_encoded (string): Base64-encoded video data.

Generation Settings

  • generate_audio (boolean, optional): Enable native audio generation with richer and better-synchronized audio than Veo 3. Default: true.
  • duration (integer, optional): Video duration in seconds. Accepted: 4, 6, 8. Default: 8.
  • resolution (string, optional): Output resolution. Accepted: "720p", "1080p". Default: "720p". Veo 3+ models only.
  • aspect_ratio (string, optional): Video aspect ratio. Accepted: "16:9", "9:16". Default: "16:9".
  • seed (integer, optional): Seed for reproducible results. Range: 0–4294967295.
  • negative_prompt (string, optional): Describe elements to exclude from the generated video.
  • person_generation (string, optional): Control person generation. Accepted: "allow_adult" (default), "allow_all", "dont_allow".
  • resize_mode (string, optional): How the input image is resized/cropped. Image-to-video only.
  • compression_quality (string, optional): Controls output video compression.
  • output_format (string, optional): Response format. Accepted: "base64", "url". Default: "base64". When set to "url", the response returns a URL instead of base64-encoded data.
note

enhance_prompt is not supported on Veo 3.1. It is only available for Veo 2 models.

Writing Audio-Aware Prompts

Veo 3.1 produces richer and better-synchronized audio than Veo 3. Include audio cues in your prompt for best results — describe dialogue, sound effects, and ambient sounds alongside the visual scene.

Response

A successful request returns a 200 OK status code with a JSON response body.

When output_format is "base64" (default)

{
"videos": [
{
"base64_encoded": "AAAAIGZ0eXBpc29tAAACAGlzb21pc..."
}
]
}

When output_format is "url"

{
"videos": [
{
"url": "https://storage.example.com/generated-video.mp4"
}
]
}

Response Fields

  • videos (array): An array of generated video objects.
    • base64_encoded (string): The base64-encoded video data (MP4, 24 FPS). Present when output_format is "base64".
    • url (string): URL to the generated video. Present when output_format is "url".

Working with Base64 Video Data

The response returns videos as base64-encoded strings by default. For details on decoding, saving, and displaying videos, see the Video Generation page.

Example Requests

Text-to-Video (cURL)

curl -X POST https://api.azerion.ai/v1/videos/generation \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
-d '{
"model": "google-veo-3-1",
"prompt": "A detective in a trench coat walks down a rain-soaked neon-lit alley, footsteps echoing against the wet pavement",
"generate_audio": true,
"duration": 8,
"resolution": "1080p",
"aspect_ratio": "16:9",
"n": 1
}'

First/Last Frame Transition (cURL)

curl -X POST https://api.azerion.ai/v1/videos/generation \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
-d '{
"model": "google-veo-3-1",
"prompt": "A flower blooms from bud to full blossom in a time-lapse style",
"image": {
"mime_type": "image/jpeg",
"url": "https://example.com/images/bud.jpg"
},
"last_frame": {
"mime_type": "image/jpeg",
"url": "https://example.com/images/blossom.jpg"
},
"duration": 6,
"resolution": "1080p"
}'

Subject Reference Images (cURL)

curl -X POST https://api.azerion.ai/v1/videos/generation \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
-d '{
"model": "google-veo-3-1",
"prompt": "The character walks through a bustling marketplace, stopping to examine colorful fabrics",
"reference_images": [
{
"image": {
"mime_type": "image/jpeg",
"url": "https://example.com/images/character-ref.jpg"
},
"reference_type": "subject"
}
],
"generate_audio": true,
"duration": 8,
"resolution": "1080p"
}'
Replace Placeholder

Replace YOUR_ACCESS_TOKEN with your actual API key or access token. Refer to the Authentication guide for details on obtaining and using your credentials.

Text-to-Video (Python)

import requests
import base64
import os

api_key = os.environ.get("AZERION_API_KEY")
url = "https://api.azerion.ai/v1/videos/generation"

headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {api_key}"
}

data = {
"model": "google-veo-3-1",
"prompt": "A detective in a trench coat walks down a rain-soaked neon-lit alley, footsteps echoing against the wet pavement",
"generate_audio": True,
"duration": 8,
"resolution": "1080p",
"aspect_ratio": "16:9",
"n": 1
}

response = requests.post(url, headers=headers, json=data)
result = response.json()

print(f"Status code: {response.status_code}")

# Save the video file
video_data = base64.b64decode(result["videos"][0]["base64_encoded"])
with open("generated_video.mp4", "wb") as f:
f.write(video_data)
print("Video saved as generated_video.mp4")

Text-to-Video (Node.js)

const fetch = require('node-fetch');
const fs = require('fs');

const apiKey = process.env.AZERION_API_KEY;
const url = 'https://api.azerion.ai/v1/videos/generation';

const headers = {
'Content-Type': 'application/json',
'Authorization': `Bearer ${apiKey}`
};

const data = {
model: 'google-veo-3-1',
prompt: 'A detective in a trench coat walks down a rain-soaked neon-lit alley, footsteps echoing against the wet pavement',
generate_audio: true,
duration: 8,
resolution: '1080p',
aspect_ratio: '16:9',
n: 1
};

fetch(url, {
method: 'POST',
headers: headers,
body: JSON.stringify(data)
})
.then(response => response.json())
.then(result => {
// Save the video file
const videoData = Buffer.from(result.videos[0].base64_encoded, 'base64');
fs.writeFileSync('generated_video.mp4', videoData);
console.log('Video saved as generated_video.mp4');
})
.catch(error => console.error('Error:', error));

Output Specifications

SpecValue
Output FormatMP4
Frame Rate24 FPS
Native AudioRicher, better-synchronized than Veo 3
Image-to-VideoSupported
Video ExtensionSupported
Reference ImagesUp to 3 ("subject" type only)
First/Last Frame ControlSupported

Veo 3 vs Veo 3.1

FeatureVeo 3Veo 3.1
Native AudioDialogue, SFX, ambientRicher, better-synchronized
Reference ImagesNot supportedUp to 3 ("subject" type)
First/Last FrameNot supportedSupported
Video ExtensionSupportedSupported
Max Resolution1080p1080p
Duration4, 6, 8 seconds4, 6, 8 seconds