Text-to-speech (TTS) technology does just what it sounds like: It allows machines to speak written words aloud. This capability has a broad range of applications, from web accessibility to voicebots to hands-free content consumption. If you’re listening to this article right now, we don’t need to tell you the benefits of TTS.
But not all TTS voices sound particularly natural—and human-sounding text to speech leads to better experiences in most cases, such as:
- Listening to long texts. If you listen to your morning newspaper—online or with a print-document reader—you probably prefer a lifelike TTS voice. Long texts tend to become grating when spoken by stilted, robotic voices, at least if you’re not used to consuming lots of content using TTS.
- Interacting with a company contact center. When people call a customer service line or IT helpdesk, they usually want to reach a live agent. Intelligent voice assistants (IVAs) are the next-best thing—and they’re available 24/7, so they bring a lot of value. But when IVAs sound clearly robotic, callers distrust their abilities. Human-like TTS voices create better caller experiences and more trustworthy brand interactions.
- Using a web-reading service. Internet users with dyslexia, vision impairments, or low literacy may use TTS software we call web readers to consume online content. Many users who simply prefer listening over reading do the same. Human-sounding TTS voices provide better pronunciation and more natural prosody. That aids comprehension, and can even be a prerequisite to understanding.
Regardless of how you plan to use TTS, you may struggle to find human-sounding text-to-speech voices that meet your expectations. Of course, the TTS service you choose will depend on what you plan to do with TTS. There are lots of commercial uses for it, including:
- Call-center IVAs
- Voicebots and virtual assistants
- Smart-home user interfaces
- Smart speaker skills
- Video game voice overs
- Vehicle infotainment systems
- Accessibility features for websites, devices, and more
All these examples benefit from human-sounding TTS—and they all require a professional TTS provider that offers top-quality voices, ongoing pronunciation support, and the languages and dialects to match your market.
We use AI technology and our 20+ years of experience to create incredibly lifelike TTS voices—including custom voices that represent your brand alone. If you’re curious, you can listen to ReadSpeaker’s TTS with our interactive demo.
|Explore ReadSpeaker voices here.|
But if you’re just curious about the state of TTS technology today, start your exploration with some free TTS sites. Below are a few places to experiment with it. (Note that these options offer varying degrees of quality.)
7 Free Sites for Exploring Human-Sounding Text to Speech
While NaturalReader locks its most human-sounding text to speech voices behind a paywall, the free version offers reasonably lifelike TTS in 16 languages, including English. The free plan is marketed as an accessibility overlay, and includes a dyslexia font option for the text-entry window. NaturalReader offers in-browser TTS, mp3 downloads, and a chrome extension that reads webpages, emails, PDFs, Google Docs, and even Kindle ebooks. Commercial licenses are available, with access to higher-quality voices starting at $49 per month for a single user.
2. Free TTS
With a name like Free TTS, you might not expect this service to offer the most human-like voices in the industry—and you’d be right. Voice quality in the Free TTS demo is decent, thanks to the use of Google’s TTS engine. But given the odd prosody—unexpected pauses, uneven pitch control—few would mistake these TTS voices for a live speaker. That said, Free TTS does live up to its name, offering up to 6,000 characters of text-to-speech translation per week. Beyond that, you’ll pay $6 for 24-hour access to 1 million characters, or $19 for month-long access to 2 million. Use it in the browser or download mp3s.
Voicemaker is unique among free TTS services in a few ways. It makes roughly human-sounding TTS voices available to all users (though many of these voices are marked “premium,” only available with a subscription). It provides numerous performance controls, including the ability to add pauses, adjust speed, change volume, and format pronunciation for dates, times, and more with a click of the mouse. You can even change the sample rate—which is to say the audio quality—between 8,000 Hz and 24,000 Hz, and even higher with a premium (paid) plan. But the free version of Voicemaker is intended only for “testing,” and to get more than 250 characters per conversion, you’ll need to upgrade. Basic plans are $5 per month; premium plans are $10 per month; and business plans are $20 per month.
4. Vanilla Voice
Vanilla Voice offers at least 12 English TTS voices that do sound pretty human, though, again, they struggle with natural prosody, at least on the demo page. And make no mistake: Vanilla Voice’s free offering is essentially a demo. It allows you to translate limited text to speech within the browser. It lets you download an mp3, though size limits are unclear (and there are certainly size limits). The trouble is, as we publish, Vanilla Voice is in “private beta.” The demo is the service. Still, you might find it useful for small bits of generated TTS downloads—and you can always sign up to get notifications when the service goes public.
Go see what Uberduck can do, by all means—just don’t use it for commercial purposes. Uberduck leverages the TalkNet TTS engine, inviting its user base to build datasets that emulate real speakers’ voices. And users delivered, offering synthesized speech based on everyone from Eminem to Sesame Street’s Cookie Monster. (You can see why commercial use is a lawsuit waiting to happen.) Still, Uberduck is a fascinating example of neural voice cloning in the hands of a decentralized creative community. That’s one perspective. Another is that Uberduck is a case study in poor TTS ethics, featuring the cloned voices of beloved and departed figures like Tupac Shakur and Biggie Smalls without, as far as we can tell, permissions of any kind. Strictly for research, this one.
6. The Festival Speech Synthesis System
Festival is unique on our list. It’s not a demo (though a 70-character demo is available). It’s not a browser-based TTS interface. It’s certainly not a voice-cloning tool. Instead, the Festival Speech Synthesis System is an open-source software framework, created and managed by the University of Edinburgh’s Centre for Speech Technology Research. Whatever you build with it, it’s free and legal to use. Its creators distribute the software under an X11-type license, which gives users permission to use the software for both personal and commercial projects. This isn’t your choice if you’re just looking to transform a block of text into audible speech; but it’s a powerful tool in the hands of system developers and curious amateurs alike.
Few would describe iSpeech’s publicly available TTS voices as “human-like,” but it’s worth exploring for its 27 languages and dialects and simple, three-speed interface. You can even download mp3 clips thanks to an embedded Audiobook.ai widget. But, alas, the free TTS on iSpeech is another demo, limited to just 150 characters, based on our testing. iSpeech describes its offering as a “TTS-as-a-service” product; it charges per word, with web and mobile usage billed at $50 for 2,000 words, $200 for 10,000 words, and $1,000 for 100,000 words.
Choosing a TTS Provider for Lifelike Synthetic Speech
This is hardly an exhaustive list, and many more free TTS options are available. They’re interesting for research, but to bring TTS to your website, service, or device, you’ll need a proven technology partner.
Look for a TTS provider that can provide human-like TTS voices, as well as:
- Ongoing support. Every TTS system must pronounce new words, whether that’s the latest industry jargon or the name of a new public figure. Choose a TTS provider that continues to update its speech engine to ensure proper pronunciation, no matter how your system’s vocabulary may change.
- Control over prosody. “Prosody” includes all the non-word elements of speech, such as pauses and intonation. To sound human, the TTS engine must adjust prosody to match the utterance. Look for a TTS system with prosody controls that keep utterances natural.
- A wide variety of human-like TTS voices. If you’re creating an ebook for children, you may want a child’s voice. If you’re creating a banking voicebot, you may want a more professional tone. When it comes to TTS services, the more lifelike voices on offer, the better.
- Languages that match your audience. While multilingual TTS voices are possible, they’re not yet widely available—which means you need a TTS partner with voices that speak your audience’s language.
- Custom TTS voices. Brands need to stand out, and that’s as true in voice channels as anywhere else. Few TTS providers offer top-quality, custom TTS voices for brands and creators. ReadSpeaker AI does.
In fact, ReadSpeaker AI checks all these boxes. Long story short:
If you need human-sounding TTS from a partner you can trust, start the conversation with ReadSpeaker AI today.