Audio nodes generate sound from a prompt — speech, music, or sound effects — and can also transcribe speech or modify existing voice clips. All audio models are served by ElevenLabs.
Speech (text-to-speech)
- ElevenLabs v3 — highest-quality, most expressive voice model.
- ElevenLabs Multilingual v2 — broad language coverage.
- ElevenLabs Turbo v2.5 — fastest, lowest cost.
Sound effects
- ElevenLabs Sound Effects — short non-musical clips (1–22 seconds) from a text prompt. Use for foley, ambience, and quick sound cues.
Music
- ElevenLabs Music — instrumental tracks (10–300 seconds) from a text prompt.
Transcription (speech-to-text)
- ElevenLabs Scribe v2 — transcribes spoken audio into text.
Voice change (audio-to-audio)
- ElevenLabs Voice Changer — re-voices an existing audio clip with a different speaker while preserving timing and inflection.
Use model_list for the live catalog and exact variant IDs. Last modified on May 19, 2026