Skip to content

Voice

The Voice API lives alongside Agent Core under /v1/agents/{agent_id}/voice/*. It unifies voice library listing, prompt-based voice design, audio sample cloning, TTS preview, and voice deletion.

GET /v1/agents/{agent_id}/voice/library

Query params:

  • source (optional) — filter by origin: public, user_designed, user_trained, or user_elevenlabs

Response 200:

{
"voices": [
{ "voice_id": "21m00...", "name": "Rachel", "source": "public", "preview_url": null },
{ "voice_id": "huv-1", "name": "Custom", "source": "user_designed", "preview_url": null },
{ "voice_id": "huv-2", "name": "Santiago", "source": "user_trained", "preview_url": null },
{ "voice_id": "el-abc", "name": "My Voice", "source": "user_elevenlabs", "preview_url": null }
],
"total": 4
}

The user_elevenlabs source only appears when the agent has a connected ElevenLabs API key (see below).

Users connect their own ElevenLabs API key to unlock their personal voice library and use their own usage quota. The key is validated against ElevenLabs /v1/user before being saved on the agent record.

GET /v1/agents/{agent_id}/voice/integrations/elevenlabs

Response 200:

{ "connected": true, "masked_key": "***xyz1" }
POST /v1/agents/{agent_id}/voice/integrations/elevenlabs

Body: {"api_key": "sk_..."}

Response 200: {"connected": true, "user": { "subscription": { "tier": "creator" } }} on success. 400 with invalid_key if ElevenLabs rejects the key.

POST /v1/agents/{agent_id}/voice/integrations/elevenlabs/test

Body: {"api_key": "sk_..."}

Response 200: {"valid": true | false}

DELETE /v1/agents/{agent_id}/voice/integrations/elevenlabs

Returns 204 No Content.

Build a voice from a text description. Steps 1–3 are cheap (LLM calls only); step 4 persists.

POST /v1/agents/{agent_id}/voice/design/generate-script

Body: {"language": "en"} (optional)

Response 200: {"voice_sample": "Hi there, I'm your voice, and I'm going to tell you about..."}

POST /v1/agents/{agent_id}/voice/design/generate-description

Body: {"gender": "male", "voice_description": "warm and confident", "language": "en", "script": "..."}

  • gender is required
  • voice_description, language, script are optional hints

Response 200: {"voice_description": "A warm, confident male voice with a measured pace..."}

POST /v1/agents/{agent_id}/voice/design/generate-samples

Body: {"text": "...", "description": "..."}text required (typically the script from step 1)

Response 200: {"samples": [{"generation_id": "g1", "audio": "base64..."}, {"generation_id": "g2", ...}, {"generation_id": "g3", ...}]}

Pick one of the three generation_id values and pass it to step 4. Treat it as opaque.

POST /v1/agents/{agent_id}/voice/design

Body:

{
"generation_id": "g2",
"name": "Santiago",
"gender": "male",
"language": "en",
"description": "warm, confident"
}

Response 201: The created voice resource. Now visible in the library under source: user_designed.

Clone a voice from an existing audio sample. Returns the created voice immediately — no polling.

POST /v1/agents/{agent_id}/voice/clone

Body: Provide exactly one of audio_url or audio_base64.

{
"name": "Santiago Clone",
"audio_url": "https://cdn.example.com/sample.mp3",
"gender": "male",
"language": "en",
"description": "A warm, natural speaking voice"
}

Or with inline bytes for small clips:

{
"name": "Santiago Clone",
"audio_base64": "SUQzBAAAAAAA...",
"gender": "male"
}

audio_url is preferred for anything larger than a few seconds — the server downloads the file directly, which is faster and avoids base64 overhead.

Response 201: The created voice resource. Visible in the library under source: user_trained.

DELETE /v1/agents/{agent_id}/voice/{voice_id}

Returns 204 No Content. Only works on voices the agent owns (user_designed, user_trained, user_elevenlabs). Public voices cannot be deleted.

Audition a voice saying specific text before assigning it via PATCH /core.

POST /v1/agents/{agent_id}/voice/{voice_id}/preview

Body: {"text": "Hello world"}

Response 202 Accepted: {"user_job_id": 138860}

This enqueues a UserJob on the GEN backend. The job charges credits and writes output audio as a ContentResource when complete.

GET /v1/agents/{agent_id}/voice/preview/{job_id}

Response 200: The full user_job record. Check the status field:

{
"id": 138860,
"status": "completed",
"output_resources": [
{ "id": 99, "type": "audio", "file_url": "https://cdn.gen.pro/.../preview.mp3" }
]
}

Possible statuses: pending, processing, completed, failed.

CodeMeaning
400Missing required field (e.g. text on preview, name on clone, gender on generate-description)
400invalid_key on ElevenLabs connect — the key was rejected by ElevenLabs
400Both audio_url and audio_base64 provided on clone (supply exactly one)
401Missing or invalid X-API-Key
404Voice not found (on delete or preview)

See the Agent Core reference for identity, overview, personality, and other non-voice sections.