Our model faithfully reproduces original voices in our voice clone, emulating everything from tone, speed, inflections, accent, breathing patterns, mouth clicks, and noises in the surrounding environment. Below, we describe our two versions of voice cloning (instant and professional) and provide some tips for optimizing input quality, which, in turn, will shape output quality.

Instant voice cloning allows you to create a voice clone from very short audio samples (≥15 seconds) and get a result immediately, providing a great option for people looking to get their voice clone running ASAP. Unlike competitors’, our instant voice cloning tool does not match voice inputs to pre-recorded voices. Instead, it generates the clone by extracting and mimicking the style, intonation, etc. of the voice. This means people speaking all languages and speaking in all sorts of accents are supported.

Professional voice cloning is a better option for people hoping to experience best-in-class voice clones, requiring ≥5 minutes of speech input and delivering a high quality output in just 30 minutes. We require orders of magnitude less audio data and produce professional voice clones in significantly less time than competitors while maintaining superior quality. Here are some tips specific to optimizing professional voice cloning quality:

  • Microphone position: Position yourself 6-12 inches away from the microphone
  • Equipment: Using premium recording equipment, such as an XLR microphone (e.g., Shure) connected to a high-quality audio interface (e.g., Focusrite). Pairing this setup with a pop-filter can help eliminate unwanted sounds and ensure high-quality audio. However, using your computer’s microphone is great and will still work very well!
  • Acoustic environment: An acoustically-treated room or exterior padding of some sort (e.g., blankets) can help lessen echoes and background noises
  • Iteration: If you are unhappy with your clone’s voice, feel free to go back and record another 5+ minute recording to help calibrate the type of voice you’re trying to create — e.g., if your voice sounds too high-pitched, speak with a deeper intonation in the next recording!
  • Duration: The longer the recording, the better!

Tips helpful for both Instant and Professional Voice Cloning:

  • Clear audio: Having clear audio with minimal background noise
  • Consistency: If using multiple recordings, ensure consistent audio quality, tone, and loudness across recordings (ideally between -23 dB to -18 dB RMS, with an absolute maximum of -3 dB!)
  • Speech mannerisms: Speak at roughly the same speed that you’d like the voice clone to speak, with a wide array of emotions (especially those you want the voice clone to adopt), and in the same accent you want the AI to speak in with