Prompt engineering

Text prompting

Comprehensive guide to text prompt engineering for LMNT's latest models

In general, you can put any text into LMNT's speech models and get something usable out. But if you spend a little bit of time improving your text prompt to fit your use case, your generated speech comes out next level.

This living reference covers the elements you should think about and control for when crafting your text prompts.

Find some examples of speech online that give the feeling you're looking for, and use them as a reference as you walk through this guide.


Text prompting basics

Punctuation

Use punctuation where you want to explicitly direct pacing. Imagine you're writing a script you want the model to perform.

Paragraph breaks

Use paragraph breaks to indicate larger, paragraph-level pauses. The model will pause appropriately, usually a little bit longer than it would between sentences.


Prompting for speaking style

Spontaneous vs read speech

These days when people say generated speech feel robotic, they're not usually talking about the acoustic quality or even the vocal quality.

They're usually saying that the generated speech feels out of context. The biggest wrong-context feeling comes from mixing up spontaneous speech vs read speech.

Read speech

Read speech is what you hear in audiobooks, scripted ads, and news broadcasts — someone reading words off a page. They can look ahead and know where they're going, so the pacing is even, the intonation is predictable, and there are no ums and uhs. It sounds polished and performed.

Spontaneous speech

Spontaneous speech is what you hear in podcasts, interviews, and ordinary conversation. Pacing is uneven, intonation more dynamic, and the signs of thinking-on-the-fly show up: ums, restarts, breaths, and hesitations.

Prompting for spontaneous speech

Use contractions and casual language

Use words like don't instead of do not, or I'll instead of I will.

Use filler words

When people need time to think of the next thing to say, they give themselves more time by adding filler words.

Add filler words like um, uh, well, you know, I mean, etc to your prompts.

Signal pauses to think and hesitations

Use ..., , and other punctuation to interrupt the flow.

Use natural transitions

When something requires a mental context switch, add filler sentences in addition to filler words.

For example, So, um... the thing about... or Well, actually, that's a great question.

Keep text short & light

A conversation is a back and forth. People generally don't monologue at each other.


Written form vs spoken form escape hatch

Language is written differently than it's spoken. For example, $1 is actually spoken as one dollar.

In general, the model does a pretty good job at translating written form to spoken form. But if you're running into trouble, try converting your text into a more explicit spoken form.

Phone numbers

1-800-555-1234 is spoken more like one eight hundred; five five five; one two three four

Email addresses

alice@example.com is spoken more like alice at example dot com


Next steps