In general, you can put any text into LMNT's speech models and get something usable out. But if you spend a little bit of time improving your text prompt to fit your use case, your generated speech comes out next level.
This living reference covers the elements you should think about and control for when crafting your text prompts.
Find some examples of speech online that give the feeling you're looking for, and use them as a reference as you walk through this guide.
Text prompting basics
Punctuation
Use punctuation where you want to explicitly direct pacing. Imagine you're writing a script you want the model to perform.
Paragraph breaks
Use paragraph breaks to indicate larger, paragraph-level pauses. The model will pause appropriately, usually a little bit longer than it would between sentences.
Prompting for speaking style
Spontaneous vs read speech
These days when people say generated speech feel robotic, they're not usually talking about the acoustic quality or even the vocal quality.
They're usually saying that the generated speech feels out of context. The biggest wrong-context feeling comes from mixing up spontaneous speech vs read speech.
Read speech
Read speech is what you hear in audiobooks, scripted ads, and news broadcasts — someone reading words off a page. They can look ahead and know where they're going, so the pacing is even, the intonation is predictable, and there are no ums and uhs. It sounds polished and performed.
Spontaneous speech
Spontaneous speech is what you hear in podcasts, interviews, and ordinary conversation. Pacing is uneven, intonation more dynamic, and the signs of thinking-on-the-fly show up: ums, restarts, breaths, and hesitations.
Prompting for spontaneous speech
Use contractions and casual language
Use words like don't instead of do not, or I'll instead of I will.
Use filler words
When people need time to think of the next thing to say, they give themselves more time by adding filler words.
Add filler words like um, uh, well, you know, I mean, etc to your prompts.
Signal pauses to think and hesitations
Use ..., , and other punctuation to interrupt the flow.
Use natural transitions
When something requires a mental context switch, add filler sentences in addition to filler words.
For example, So, um... the thing about... or Well, actually, that's a great question.
Keep text short & light
A conversation is a back and forth. People generally don't monologue at each other.
Written form vs spoken form escape hatch
Language is written differently than it's spoken. For example, $1 is actually spoken as one dollar.
In general, the model does a pretty good job at translating written form to spoken form. But if you're running into trouble, try converting your text into a more explicit spoken form.
Phone numbers
1-800-555-1234 is spoken more like one eight hundred; five five five; one two three four
Email addresses
alice@example.com is spoken more like alice at example dot com