Skip to content

Vidsheet Anatomy

A vidsheet (Auto Content Engine) has two sides that produce one thing: a final rendered video per row.

┌─────────────────────────────────────────────────────────────────┐
│ Auto Content Engine (one per campaign or workflow) │
│ │
│ ┌─────────── Sheet side ───────────┐ ┌── Video side ────┐ │
│ │ │ │ │ │
│ │ Row 1 → cells → final_video────┼──▶ Layers stack │ │
│ │ Row 2 → cells → final_video────┼──▶ Layers stack │ │
│ │ Row 3 → cells → final_video────┼──▶ Layers stack │ │
│ │ │ │ │ │
│ └──────────────────────────────────┘ └──────────────────┘ │
└─────────────────────────────────────────────────────────────────┘

Each row is one video. If you want 20 videos in a batch, create 20 rows.

  • POST /v1/autocontentengine/:id/rows?agent_id= — create
  • POST /v1/autocontentengine/:id/rows/:row_id/duplicate — clone a row
  • PUT /v1/autocontentengine/:id/rows/mass_update — bulk update

Each column defines what kind of content lives at that intersection. Common columns in a template:

ColumnTypeRoleExample
TEXT 1textingredientTopic / subject
SCRIPTtextingredientFull VO script
VOICEaudioingredientGenerated TTS
VIDEOvideovideoAvatar / b-roll clip
FINAL VIDEOvideofinal_videoComposite output
STATSstatsstatsCredit cost + metadata

Types: text / image / video / audio. Roles: ingredient (user-editable), video, final_video, stats (system-managed).

A cell is one piece of content for one row / column pair. Cells hold:

  • A value (text, or a content resource URL)
  • A creation card (the generation_type + data that generates its value)
  • Zero or more layers (if the column is video role)
PATCH /v1/autocontentengine/:id/cells/:cell_id?agent_id=
body: { "spreadsheet_cell": { "value": "the script text" } }

Every cell has a creation card declaring how to generate its content. There are 9 types — see the full reference.

Example — a text cell that uses Gemini 2.5 Pro:

{
"generation_type": "text",
"data": {
"prompt": "Write a 12-second TikTok hook for {{topic}}",
"model": "gemini_2_5_pro",
"variables": { "topic": "San Antonio tacos" }
}
}

Example — a video cell that animates an image:

{
"generation_type": "video_from_image",
"data": {
"image_resource_id": 4821,
"prompt": "zoom in slowly, handheld feel",
"model": "kling_2_6",
"aspect_ratio": "9:16",
"duration": 5
}
}

A video-role cell composites layers into the final output. Each layer has:

  • type — image / video / audio / text / captions
  • position — z-index (lower = further back)
  • Timing — start_at, duration, fade-ins
  • Source — either its own generation, or a reference to another cell

Typical talking-avatar layer stack (bottom to top):

z=0 background (video or image)
z=1 avatar (video_from_image)
z=2 audio (speech_from_text)
z=3 captions (captions generated from audio)
z=4 text overlay (CTA, watermark)

Create a layer:

POST /v1/autocontentengine/:id/cells/:cell_id/layers?agent_id=
body: { "video_layer": { "name": "captions", "type": "captions", "position": 3 } }

Reorder by updating positions:

PUT /v1/autocontentengine/:id/cells/:cell_id/layers/update_positions?agent_id=
body: { "layers": [{ "id": 1, "position": 0 }, { "id": 2, "position": 1 }] }

Global variables live on the engine and substitute into any prompt containing {{variable_name}}. Use them for brand name, CTA, tone, color, aspect ratio preferences, anything reused across cells.

GET /v1/autocontentengine/:id/global_variables?agent_id=
POST /v1/autocontentengine/:id/import_global_variables (XLSX upload)

See Variables for the full flow.

The sheet side is human-friendly — it looks like a spreadsheet, so non-devs can drive it. The video side is renderer-friendly — timelines with layers is exactly what Remotion composites expect.

The API exposes both sides orthogonally. You can drive the sheet without ever touching layers (let the template’s layer stack just work). Or you can ignore the sheet and drive layers directly. Most workflows do a bit of both.