Veo 3.1

Generate high-quality videos with advanced control using Google DeepMind's Veo 3.1 model.

Endpoint

POST https://api.azerion.ai/v1/videos/generation

Description

Veo 3.1 builds on Veo 3 with richer and better-synchronized audio, first/last frame control for precise transitions, and subject reference images for consistent character and object appearance across generations. It supports text-to-video, image-to-video, and video extension.

Authentication

This endpoint requires authentication using an API key.

Request

{
  "model": "google-veo-3-1",
  "prompt": "A detective in a trench coat walks down a rain-soaked neon-lit alley, footsteps echoing against the wet pavement",
  "n": 1,
  "image": {
    "mime_type": "image/jpeg",
    "url": "https://example.com/images/first-frame.jpg",
    "base_64_encoded": "BASE64_FIRST_FRAME_DATA"
  },
  "last_frame": {
    "mime_type": "image/jpeg",
    "url": "https://example.com/images/last-frame.jpg",
    "base_64_encoded": "BASE64_LAST_FRAME_DATA"
  },
  "reference_images": [
    {
      "image": {
        "mime_type": "image/jpeg",
        "url": "https://example.com/images/reference.jpg",
        "base_64_encoded": "BASE64_REFERENCE_IMAGE_DATA"
      },
      "reference_type": "subject"
    }
  ],
  "video": {
    "mime_type": "video/mp4",
    "url": "https://example.com/videos/input-video.mp4",
    "base_64_encoded": "BASE64_ENCODED_VIDEO_DATA"
  },
  "generate_audio": true,
  "duration": 8,
  "resolution": "1080p",
  "aspect_ratio": "16:9",
  "seed": 42,
  "negative_prompt": "blurry, low quality, distorted",
  "person_generation": "allow_adult",
  "resize_mode": "crop",
  "compression_quality": "optimized",
  "output_format": "url"
}

Request Parameters

Core Parameters

model (string, required): Model ID. Use google-veo-3-1.
prompt (string, required): Text description of the desired video. Maximum 4000 characters. Required for text-to-video; optional but recommended for image-to-video.
n (integer, optional): Number of videos to generate. Range: 1–4. Default: 1.

Image-to-Video

image (object, optional): Input image for image-to-video generation. Also acts as the first frame when used with last_frame. Recommended: 720p+ resolution with 16:9 or 9:16 aspect ratio. Provide either url or base_64_encoded.
- mime_type (string): MIME type of the image. Accepted: "image/jpeg", "image/png".
- url (string): URL of the input image.
- base_64_encoded (string): Base64-encoded image data.

Last Frame Control

last_frame (object, optional): Specifies the final frame of the video. Use with image (first frame) to create controlled transitions between two states. Provide either url or base_64_encoded.
- mime_type (string): MIME type of the image. Accepted: "image/jpeg", "image/png".
- url (string): URL of the last frame image.
- base_64_encoded (string): Base64-encoded image data.

Reference Images

reference_images (array, optional): Up to 3 reference images to guide character or object appearance. Each object contains:
- image (object): The reference image. Provide either url or base_64_encoded.
  - mime_type (string): MIME type. Accepted: "image/jpeg", "image/png".
  - url (string): URL of the reference image.
  - base_64_encoded (string): Base64-encoded image data.
- reference_type (string): "subject" — guides character/object appearance.

warning

Style reference images ("style" type) are not supported on Veo 3.1. Only "subject" reference type is available.

Video Extension

video (object, optional): A previously Veo-generated video to extend. Adds approximately 7 seconds to the original video. Provide either url or base_64_encoded.
- mime_type (string): MIME type of the video. Accepted: "video/mp4".
- url (string): URL of the input video.
- base_64_encoded (string): Base64-encoded video data.

Generation Settings

generate_audio (boolean, optional): Enable native audio generation with richer and better-synchronized audio than Veo 3. Default: true.
duration (integer, optional): Video duration in seconds. Accepted: 4, 6, 8. Default: 8.
resolution (string, optional): Output resolution. Accepted: "720p", "1080p". Default: "720p". Veo 3+ models only.
aspect_ratio (string, optional): Video aspect ratio. Accepted: "16:9", "9:16". Default: "16:9".
seed (integer, optional): Seed for reproducible results. Range: 0–4294967295.
negative_prompt (string, optional): Describe elements to exclude from the generated video.
person_generation (string, optional): Control person generation. Accepted: "allow_adult" (default), "allow_all", "dont_allow".
resize_mode (string, optional): How the input image is resized/cropped. Image-to-video only.
compression_quality (string, optional): Controls output video compression.
output_format (string, optional): Response format. Accepted: "base64", "url". Default: "base64". When set to "url", the response returns a URL instead of base64-encoded data.

note

enhance_prompt is not supported on Veo 3.1. It is only available for Veo 2 models.

Writing Audio-Aware Prompts

Veo 3.1 produces richer and better-synchronized audio than Veo 3. Include audio cues in your prompt for best results — describe dialogue, sound effects, and ambient sounds alongside the visual scene.

Response

A successful request returns a 200 OK status code with a JSON response body.

When `output_format` is `"base64"` (default)

{
  "videos": [
    {
      "base64_encoded": "AAAAIGZ0eXBpc29tAAACAGlzb21pc..."
    }
  ]
}

When `output_format` is `"url"`

{
  "videos": [
    {
      "url": "https://storage.example.com/generated-video.mp4"
    }
  ]
}

Response Fields

videos (array): An array of generated video objects.
- base64_encoded (string): The base64-encoded video data (MP4, 24 FPS). Present when output_format is "base64".
- url (string): URL to the generated video. Present when output_format is "url".

Working with Base64 Video Data

The response returns videos as base64-encoded strings by default. For details on decoding, saving, and displaying videos, see the Video Generation page.

Example Requests

Text-to-Video (cURL)

curl -X POST https://api.azerion.ai/v1/videos/generation \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
  -d '{
    "model": "google-veo-3-1",
    "prompt": "A detective in a trench coat walks down a rain-soaked neon-lit alley, footsteps echoing against the wet pavement",
    "generate_audio": true,
    "duration": 8,
    "resolution": "1080p",
    "aspect_ratio": "16:9",
    "n": 1
  }'

First/Last Frame Transition (cURL)

curl -X POST https://api.azerion.ai/v1/videos/generation \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
  -d '{
    "model": "google-veo-3-1",
    "prompt": "A flower blooms from bud to full blossom in a time-lapse style",
    "image": {
      "mime_type": "image/jpeg",
      "url": "https://example.com/images/bud.jpg"
    },
    "last_frame": {
      "mime_type": "image/jpeg",
      "url": "https://example.com/images/blossom.jpg"
    },
    "duration": 6,
    "resolution": "1080p"
  }'

Subject Reference Images (cURL)

curl -X POST https://api.azerion.ai/v1/videos/generation \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
  -d '{
    "model": "google-veo-3-1",
    "prompt": "The character walks through a bustling marketplace, stopping to examine colorful fabrics",
    "reference_images": [
      {
        "image": {
          "mime_type": "image/jpeg",
          "url": "https://example.com/images/character-ref.jpg"
        },
        "reference_type": "subject"
      }
    ],
    "generate_audio": true,
    "duration": 8,
    "resolution": "1080p"
  }'

Replace Placeholder

Replace YOUR_ACCESS_TOKEN with your actual API key or access token. Refer to the Authentication guide for details on obtaining and using your credentials.

Text-to-Video (Python)

import requests
import base64
import os

api_key = os.environ.get("AZERION_API_KEY")
url = "https://api.azerion.ai/v1/videos/generation"

headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {api_key}"
}

data = {
    "model": "google-veo-3-1",
    "prompt": "A detective in a trench coat walks down a rain-soaked neon-lit alley, footsteps echoing against the wet pavement",
    "generate_audio": True,
    "duration": 8,
    "resolution": "1080p",
    "aspect_ratio": "16:9",
    "n": 1
}

response = requests.post(url, headers=headers, json=data)
result = response.json()

print(f"Status code: {response.status_code}")

# Save the video file
video_data = base64.b64decode(result["videos"][0]["base64_encoded"])
with open("generated_video.mp4", "wb") as f:
    f.write(video_data)
print("Video saved as generated_video.mp4")

Text-to-Video (Node.js)

const fetch = require('node-fetch');
const fs = require('fs');

const apiKey = process.env.AZERION_API_KEY;
const url = 'https://api.azerion.ai/v1/videos/generation';

const headers = {
  'Content-Type': 'application/json',
  'Authorization': `Bearer ${apiKey}`
};

const data = {
  model: 'google-veo-3-1',
  prompt: 'A detective in a trench coat walks down a rain-soaked neon-lit alley, footsteps echoing against the wet pavement',
  generate_audio: true,
  duration: 8,
  resolution: '1080p',
  aspect_ratio: '16:9',
  n: 1
};

fetch(url, {
  method: 'POST',
  headers: headers,
  body: JSON.stringify(data)
})
.then(response => response.json())
.then(result => {
  // Save the video file
  const videoData = Buffer.from(result.videos[0].base64_encoded, 'base64');
  fs.writeFileSync('generated_video.mp4', videoData);
  console.log('Video saved as generated_video.mp4');
})
.catch(error => console.error('Error:', error));

Output Specifications

Spec	Value
Output Format	MP4
Frame Rate	24 FPS
Native Audio	Richer, better-synchronized than Veo 3
Image-to-Video	Supported
Video Extension	Supported
Reference Images	Up to 3 (`"subject"` type only)
First/Last Frame Control	Supported

Veo 3 vs Veo 3.1

Feature	Veo 3	Veo 3.1
Native Audio	Dialogue, SFX, ambient	Richer, better-synchronized
Reference Images	Not supported	Up to 3 (`"subject"` type)
First/Last Frame	Not supported	Supported
Video Extension	Supported	Supported
Max Resolution	1080p	1080p
Duration	4, 6, 8 seconds	4, 6, 8 seconds

Veo 3 — Base model without reference images or frame control
Google Veo Documentation
List Models

Endpoint​

Description​

Authentication​

Request​

Request Parameters​

Core Parameters​

Image-to-Video​

Last Frame Control​

Reference Images​

Video Extension​

Generation Settings​

Response​

When output_format is "base64" (default)​

When output_format is "url"​

Response Fields​

Working with Base64 Video Data​

Example Requests​

Text-to-Video (cURL)​

First/Last Frame Transition (cURL)​

Subject Reference Images (cURL)​

Text-to-Video (Python)​

Text-to-Video (Node.js)​

Output Specifications​

Veo 3 vs Veo 3.1​

Related Resources​