Top TTS Trends for 2021
If you’re reading this, it will come as no surprise to you that TTS is changing the way people have traditionally consumed digital text. According to a July 2020 report by Global Industry Analysts, even amid the current worldwide pandemic, the global market for voice technology is expected to grow by over 600% between now and 2027 — from US $2.84bn in 2019 to US $19.39bn — as people increasingly choose to listen to their content rather than read it.
To achieve that projection, the speech quality of TTS needs to feel more natural in terms of pronunciation, timing and feeling the emotions in the voice for it to really catch on to the masses. This is especially important for brands that are now investing in voice to increase audience engagement.
In terms of different languages and speaking styles — from newsreader to conversational speech and colloquial dialects — the improvement in quality of TTS has allowed the latest systems to mimic human context. This development aims to improve the overall experience of all TTS consumers and the big trend for 2021 is for AI to help provide even more localised speech based on region and location. This is exciting progress in TTS systems for those working in the industry.
Reading With Your Ears
Another big audio trend predicted for 2021 is audiobooks, which have become a real boost to a publishing industry that has experienced a steady decline in physical book sales over the past decade. TTS is fuelling this audiobook expansion.
Readers who don’t have the time to sit down and read a book can now listen to their favourite author via an audiobook while they’re at the gym or taking a walk, for example. Or you can keep up to date with current events by using your smartphone while out shopping to listen to the latest news headlines.
Talking books, smart books or digital audiobooks also give visually impaired listeners easy access to digital text that allow for a personalised experience of choosing the speed of how the voice is given and bookmarking text. Indeed, digital audiobooks are the fastest growing sector in the publishing industry, with major players like Audible, Penguin Random House, Simon & Schuster and Hachette Audio investing in opening more studios and narration.
A study by The UK’s Publisher’s Association found that 54% of the country’s audiobook consumers use them because they allow them to listen to books when reading isn’t possible. This is because more than half of those consumers said they don’t have the time to sit and read physical books. Others added that listening to an audiobook provides them with a more intimate and immersive experience.
Audiobooks have also given children, who may have been missing school due to coronavirus restrictions across the globe, a key resource to help with their literacy. Listening to audiobooks can help children who are struggling with vocabulary by helping them to express themselves while learning because it can make it easier for them to understand a book’s content. This is a real help to boys especially, who have notoriously higher rates of disabilities like dyslexia and ADHD, because audiobook listening provides them with extra time away from a desk and helps them to relax from the anxiety of trying to read and write.
And in so far as the production of books that have been TTS-enabled, they are much easier and faster to make than audiobooks that have been narrated by a voiceover artist, which means more books that students need will be readily available to them. The impressive improvements in speech synthesis has given TTS applications the ability to control the voice, tone, pitch, language structure, grammar, vocal composition and how fast or slow it is communicated to such an impact that converting publisher’s files can be done with greater ease and speed in this new era of voice solutions for the industry.
Not surprisingly, the Xennial and older Gen Y millennial 25-34 age group is the largest market of audiobook users, mainly due to the fact that they’re the first highly tech-driven generation. Consumers in this age range prefer to listen to their favourite novels and biographies rather than buy traditional physical books and e-books in genres like suspense and thrillers, which are some of the most listened to audiobooks. According to research by Harris Interactive, the average 18-34 year-old male listens to at least four audiobooks every 12 months, do so for entertainment and brain stimulation and when working, commuting and running outdoors. They also consume audiobooks far more than women in the same age group, and haven’t previously been a strong book buying group.
Speak(er) To Me
And how are these users listening to their audiobooks? Through another big TTS trend — smart speakers, like Amazon’s Echo, Google’s Home and Apple’s HomePod. Although the car remains the number one place where audiobooks are listened to, smart speakers are used to play them for longer durations.
According to the 2020 ‘The Smart Speaker Report’ by Business Insider, the smart speaker market could grow faster than other gadgets, even smartphones, because companies in a range of industries, including media, e-commerce and banking, are looking to interact with their consumers more connectively. Smart speakers have also fundamentally changed the way people get their needs met and this trend will likely continue in this way into the future.
Something else to look out for is how businesses and brands are developing smart speaker apps that you can activate with your voice. Also known as Skills, they are downloaded like apps and help brands reach wider audiences. Once downloaded, you can use them to command Alexa to do your bidding. Audiobook lovers, for instance, can use Alexa’s Audible Skill to not only listen to their current read by a real narrator, but also control it with their voice. In a little over three years to January 2020, Skills that are available for the Amazon Alexa artificial intelligence virtual assistant used in its Echo smart speakers had risen from just 130 to more than 100,000.
And then there’s Amazon Polly TTS, which is used to help Alexa convert text to recognisable synthesised speech in order to respond to commands and requests that you can hear via your smart speaker. This is because Amazon Polly uses Neural Text-to-Speech (NTTS) technology. Neural Text-To-Speech technology is an advanced step towards better speech quality which for instance through machine learning technologies allows you to identify a persona to produce natural-sounding speech that can even include accents and can be used to create a Skill. Neural voices are even provided by Microsoft Azure and will be available also for our customers at Voice Intuitive.
Since smart speakers are still rather bulky in their size and the devices haven’t thus been really designed for use outside of home, TTS and speech recognition alternatives are being used to greatest extent on smartphones instead. So far, to most of us use of audio technologies is still mostly known of and used when consuming contents, such as audio books. The more we get used to easy access solutions that audio books bring along with, the more likely we’re about to look for such options otherwise, too. In fact, use of technologies and consumption of digital contents is going through a revolution. There’s an ever-increasing trend towards a voice-based use of contents. Who knows, we’re soon customizing the voices we listen to contents with.