I am participating in a Beta test using AI voices to generate audiobooks. Interesting. And not altogether a bad experience, but a time-consuming one. Most audiobooks run about 20-26 hours, so there you go. But if it works, it makes the creation of audiobooks available to authors without the loss of an arm and/or a leg.
So, things I’m learning.
Choosing a Voice
Listen to all the voice options: male, female, American or British, then pick the narrator your reader would expect. Right now, the options are limited. I chose a twenty-something female for one book and a late-twenties-sounding male for another.
A real plus is that you can change narrators mid-stream without losing your edits.
Don’t watch the screen — Listen
The AI I’m trying out has a marker on the screen that moves from word to word with the narration. If you watch the screen while listening, the paced rhythm of the marker takes all meaning out of the sentence – dah dah dah dah. So shut your eyes and listen for the modulation in each phrase and where inflection changes the intended meaning of a sentence.
As you listen to the very human voice, it is easy to forget it is machine-produced. Yet, it can’t interpret the narrative like a voice actor or a reader would. So, you will need to intercede. Currently, choices are limited: speed a word up, slow it down, or add long or short pauses. Speeding up and slowing down words can affect modulation in unexpected ways, requiring several passes to get the inflection just right. It would be nice to be able to modulate the timbre of the voice for those occasions when a question needs to end other than on an upbeat, but it is not available.
Adding long and short pauses is critical to pacing and understanding. For instance, rapid banter, easily understood on the printed page, needs pauses between speakers to assist the listener. Without a pause, the exchanged dialog becomes a jumble, losing its spice as the listener struggles to figure out who said what to whom.
Listen to each word — Carefully.
The voice replicates standard English; is there such a thing? So, homophones are an issue. For instance, bow (beau) consistently being pronounced bow (as in bowed before the king) no matter the context. And the verb does pronounced as though it were more than one female deer.
The pronunciation pop-up doesn’t translate diacritical markings, which means you have to find a set of letters that creates the correct sound. For instance, duz gets you does and avoids a herd of deer roaming your book. The good news is you can apply pronunciation changes to all instances of the word in the text. The bad news, well, read on.
Users are warned to listen to the complete book before accepting the audiobook conversion; heed it. Else, this could happen. Crappie fish may be croppy to you and me, but not to the AI voice who happily asserted that crappy fish swam in a pond. And in what makes no sense, the voice insists that bass is pronounced base, as in bass violin, and will not say bass, as in fish. And the letters bass, which should produce the correct sound by all rules of the English language, don’t. Nor do b ass, baz, or bahz or, well, anything. English is a minefield of weirdness. But as it turns out, the AI voice is very good with French, thus beau.
Then there are em dashes? Well, imagine my surprise when the voice opined: yes, dash, she changed. Using the pronunciation feature, I tried substituting a fast uh, but that—uh — isn’t always appropriate. So, what do you do? Sometimes, I add a word to make a stutter. Sometimes, I fill in the blank with the missing word(s). It is a conundrum. If the dash is set off with spaces, the voice says dash and if it isn’t, it runs the wordstogether.
What I’ve learned — Mostly
Listen. Listen twice. Learn your options for editing, fast, slow, and pauses and how they affect pace and modulation. Watch out for homophones, some are truly unexpected – as the female deer attest. Watch also for possessives, as the voice tends to hesitate for apostrophes, Eliza s, and needs to be overridden. Watch foreign names and words, unless in French. Be chary with em-dashes, though this issue should be addressed by the programmers. For instance, the voice doesn’t have a problem with ellipses. I know. Weird, huh?
And finally, if you find errors in your manuscript while creating the audiobook, don’t be afraid to correct them. The AI I am tinkering with automatically updates the audio text along with the manuscript text. Not bad, that.
See all my books at dzchurch.com where you can also sign up for my newsletter.





You must be logged in to post a comment.