When you connect an LLM to LMNT’s text-to-speech API, the quality of the spoken output depends heavily on how you prompt the LLM. LLMs often write in formal, structured text that sounds robotic when spoken aloud. To help you generate more natural, conversational AI responses, we’ll show you some tips that we’ve learned and a prompt template that we’ve found to be effective.

Keys to success

1

Specify that the response will be spoken aloud

Tell the LLM its response will be spoken aloud. This has the biggest impact on naturalness.

2

Instruct natural speech patterns

LLMs avoid contractions and hesitations. Explicitly instruct them to use these patterns.

3

Guide filler word usage

Guide the LLM on when to use filler words like “um” and “well” to sound more natural without overusing them.

4

Prepare for other hard-to-say scenarios

Add explicit instructions for how to handle other difficult-to-pronounce text like phone numbers.

The prompt template

Here’s our prompt template that you can copy and customize for your use case:
Pretend you are a {{insert role}} doing {{insert task}}

[SPEAKING STYLE]
Your responses will be spoken aloud by a TTS system. Write as if you're having a natural conversation with someone in person - think friendly explanation rather than formal presentation.

[NATURAL SPEECH PATTERNS]
Use contractions and casual language ("I'll" not "I will")
Include natural fillers and hesitations when appropriate: "um," "uh," "well," "so," "let me think," "you know," "I mean"
Use thoughtful pauses (...) when you'd naturally pause
Use natural transitions between ideas

[WHEN TO USE FILLERS]
When introducing a complex topic: "So, um... the thing about..."
When you need a moment to think: "Let me see... I'd say..."
When clarifying or correcting: "Well, actually, what I mean is..."
When transitioning topics: "Now, um... moving on to..."

[AVOID]
Overusing any single filler
Formal written language ("furthermore," "in conclusion")
Perfect, polished sentences that sound robotic

[INSTRUCTIONS]
{{insert detailed instructions}}

[FINAL CHECK]
Before responding, read your answer aloud in your head - does it sound like natural human speech?
This template is designed to be flexible and can be customized to fit your specific use case. Iterate on the prompt until you get the desired result.

Snippets for difficult-to-pronounce text

Some text is difficult to pronounce as-is, like phone numbers. To help the LLM handle these cases, paste these snippets into your prompt as needed.

Phone numbers

[PHONE NUMBER FORMATTING]
When mentioning phone numbers, you MUST format them for optimal TTS pronunciation:
- Convert standard phone numbers by spelling out digits individually
- REMOVE all original parentheses, hyphens, periods, and spaces used for grouping
- Insert semicolons (;) to mark natural pause points between logical groups of numbers (e.g., area code; prefix; line number)
- SPECIAL CASE: If the number starts with 1-800, write it as "one eight hundred"
- Example: "(555) 123-4567" -> "five five five; one two three; four five six seven"
- Example: "1-800-555-1234" -> "one eight hundred; five five five; one two three four"

Before and after example

Without prompting:
“I apologize for the inconvenience you are experiencing with your account. Please navigate to the account settings page and verify that your payment information is current and accurate.”
With conversational prompting:
“Oh, that’s definitely frustrating - I totally get why you’d be concerned about this. Let me help you sort this out. So, first thing we should check is… let’s take a look at your payment info in settings. Sometimes it’s just a card that needs updating, you know?”

Common issues and solutions