…The global audiobook market could be poised to reach heights few in the industry today would dare admit to imagining.


Pretty much since smartphones became mainstream, audio content in the form of podcasts and audiobooks have been gathering momentum as a significant format sector in the global publishing industry.

Even with the à la carte and monthly credit subscription models audio has taken off big time with consumers, while in the markets where publishers are amenable to unlimited subscription audiobooks have quickly become a format to rival – and in the case of Sweden even to exceed – the popularity of print.

But the brake on audio – and especially on longform audiobooks – has always been the production costs of studios, sound engineers and narrators that can add thousands of dollars to the cost of a book as a sound product, deterring many publishers and making some titles financially unviable.

Lurking in the background as the audio industry discovered and embraced digital, was AI – artificial intelligence – with the futuristic promise and premise that one day an entire book could be narrated by a robot and no-one would know any better.

Well, we’re not there yet, but anyone who follows developments in this arena will know quality is accelerating, driven by the proven global demand for digital audio based on text-to-speech (TTS).

As an author I love the idea that one day I might, at the click of a mouse, convert my novels to saleable-quality audiobooks, and as an industry commentator writing TNPS I fantasise about the day I might hit the mouse and my TNPS posts be converted into podcasts.

In the real world it seemed like the latter might happen soonest, as TTS (text to speech) seems to be developing fastest in the non-fiction arena, where delivery relies less on emotion and more purveying information.

But the reality is when I try the latest sample AI offerings I hit one major obstacle – TNPS posts are so full of “foreign” names (as in not in the AI English names database) that the text converted to sound is quite unacceptable. Another couple of years and it might be a different story.

But for fiction, where conveying emotion and tone has been the problem, progress has been palpable, this week resulting in news that one AI-audio operator, UK-based DeepZen, has partnered with US distributor Ingram to offer its AI-audio services to a no doubt cautiously optimistic publishing industry.

Per the DeepZen press release,

The service uses innovative technology that replicates the human voice to create a listening experience that is virtually indistinguishable from the real thing. Developed specifically for audiobooks and long form content, it incorporates artificial intelligence, natural language processing, and next generation algorithms.

DeepZen’s AI voices are licensed from voice actors and narrators, capturing all of the elements of the human voice, such as pacing and intonation, and a wide range of emotions that produce more realistic speech patterns. They are benchmarked against human narration, and are a world away from the robotic, monotone, voice assistants with which we are all familiar.

But that still begs the question, are they a world away enough to be acceptable to paying consumers?

The 49 second sample DeepZen offers via the press release really isn’t enough to make that call, but check it out here and see – or rather hear – for yourself.

For publishers the process is supposedly as simple as loading a text, choosing narrator voice, agreeing to the estimate and paying the fee, with a turn around time of three weeks.

The latter seems to confirm the press release detail that this is not just an automated process but will have some level of manual quality control input, which may help assuage publisher reservations.

DeepZen CEO Taylan Kamis said:

We are delighted to be teaming with Ingram to provide publishers with a new, cost-effective and convenient way to create high-quality audiobooks.

Our technology is breaking down barriers, making audiobook production more cost-effective for everyone, and much more accessible for small and mid-size publishers.

The proof will be in the eating, or rather, the listening.

But what publishers big and small need to be asking at this stage is:

a) What will this technology be like in 3, 5 or 10 years time, and
b) Will consumers be comfortable with non-celebrity, imperfect but acceptable audio content if the price is right?

If the answer to b) is yes – and that’s a pretty safe bet – then the tech quality will advance exponentially, and the global audiobook market could be poised to reach heights few in the industry today would dare admit to imagining.