Captions
The Captions card transcribes audio or video into timestamped captions using Gemini. The output includes word-level timing data that can be used for subtitle overlays in the final video.
Generate captions
Section titled “Generate captions”POST /v1/autocontentengine/{id}/cells/{cell_id}/generate?agent_id={agent_id}Path parameters
Section titled “Path parameters”| Parameter | Type | Description |
|---|---|---|
id | integer | The sheet ID. |
cell_id | integer | The cell ID. |
Request body
Section titled “Request body”| Field | Type | Required | Description |
|---|---|---|---|
generation_type | string | Yes | "captions". |
data.model | string | Yes | "gemini". |
data.source_resource_id | string | Yes | ID of a previously uploaded audio or video content resource. |
Example
Section titled “Example”curl -X POST "https://api.gen.pro/v1/autocontentengine/101/cells/3000/generate?agent_id=42" \ -H "X-API-Key: your-api-key" \ -H "Content-Type: application/json" \ -d '{ "generation_type": "captions", "data": { "model": "gemini", "source_resource_id": "res_abc123" } }'You can also generate captions on a layer:
POST /v1/autocontentengine/{id}/cells/{cell_id}/layers/{layer_id}/generate?agent_id={agent_id}Response
Section titled “Response”Poll Check generation status until complete. The finished generation includes the captions object with timestamps in the result field.
{ "id": 9003, "status": "completed", "result": { "captions": [ {"word": "Welcome", "start": 0.0, "end": 0.45}, {"word": "back", "start": 0.45, "end": 0.72}, {"word": "to", "start": 0.72, "end": 0.85}, {"word": "another", "start": 0.85, "end": 1.20}, {"word": "episode", "start": 1.20, "end": 1.65} ] }}Errors
Section titled “Errors”| Status | Error code | Description |
|---|---|---|
422 | validation_error | No audio or video resource provided. |
404 | not_found | Sheet or cell not found. |
Cell configuration
Section titled “Cell configuration”| Field | Type | Description |
|---|---|---|
autoGenerate | boolean | Auto-trigger generation when source audio/video changes. |
Caption source and styling are passed in the generation data parameter.