Audio models - Melius

Audio nodes generate sound from a prompt — speech, music, or sound effects — and can also transcribe speech or modify existing voice clips.

Speech (text-to-speech)

ElevenLabs v3 — highest-quality, most expressive voice model.
ElevenLabs Multilingual v2 — broad language coverage.
ElevenLabs Turbo v2.5 — fastest, lowest cost.
Seed Audio 1.0 — speech generation from ByteDance.

Sound effects

ElevenLabs Sound Effects — short non-musical clips (1–22 seconds) from a text prompt. Use for foley, ambience, and quick sound cues.

Music

ElevenLabs Music — instrumental tracks (10–300 seconds) from a text prompt.

Transcription (speech-to-text)

ElevenLabs Scribe v2 — transcribes spoken audio into text.

Voice change (audio-to-audio)

ElevenLabs Voice Changer — re-voices an existing audio clip with a different speaker while preserving timing and inflection.
ElevenLabs Voice Isolator — separates speech from background noise in an existing audio clip.

Use model_list for the live catalog and exact variant IDs.

Last modified on July 8, 2026

Video models Members & roles

⌘I