Skip to content

Captions

The Captions card transcribes audio or video into timestamped captions using Gemini. The output includes word-level timing data that can be used for subtitle overlays in the final video.

POST /v1/autocontentengine/{id}/cells/{cell_id}/generate?agent_id={agent_id}
ParameterTypeDescription
idintegerThe sheet ID.
cell_idintegerThe cell ID.
FieldTypeRequiredDescription
generation_typestringYes"captions".
data.modelstringYes"gemini".
data.source_resource_idstringYesID of a previously uploaded audio or video content resource.
Terminal window
curl -X POST "https://api.gen.pro/v1/autocontentengine/101/cells/3000/generate?agent_id=42" \
-H "X-API-Key: your-api-key" \
-H "Content-Type: application/json" \
-d '{
"generation_type": "captions",
"data": {
"model": "gemini",
"source_resource_id": "res_abc123"
}
}'

You can also generate captions on a layer:

POST /v1/autocontentengine/{id}/cells/{cell_id}/layers/{layer_id}/generate?agent_id={agent_id}

Poll Check generation status until complete. The finished generation includes the captions object with timestamps in the result field.

{
"id": 9003,
"status": "completed",
"result": {
"captions": [
{"word": "Welcome", "start": 0.0, "end": 0.45},
{"word": "back", "start": 0.45, "end": 0.72},
{"word": "to", "start": 0.72, "end": 0.85},
{"word": "another", "start": 0.85, "end": 1.20},
{"word": "episode", "start": 1.20, "end": 1.65}
]
}
}
StatusError codeDescription
422validation_errorNo audio or video resource provided.
404not_foundSheet or cell not found.

FieldTypeDescription
autoGeneratebooleanAuto-trigger generation when source audio/video changes.

Caption source and styling are passed in the generation data parameter.