Outlook 2023: Nigel Cannings, Intelligent Voice

Nigel Cannings is CTO and founder of Intelligent Voice.

Nigel Cannings

What has been the best/most innovative technology in the speech tech space in the past year?

OpenAI continues to astonish the market with a range of advances to existing technologies. While GPT3 and GPT “3.5” have hit the headlines (the latter, also known as ChatGPT being capable of generating article-length content), it is the release of Whisper that has caught the attention of the speech community.  Whisper is probably the most capable speech recognition algorithm available today, judging by the breadth of language cover and accuracy. While not state-of-the-art in some languages, it provides a really interesting benchmark of where current transformer technologies can go. It is certainly not ready for a “production” type deployment, but it is of enormous interest to researchers.

Have any changes in regulations (such as the online safety bill) impacted how companies can utilise speech tech or AI?

The Information Commissioner’s Office has come in a bit from left field in late October with its analysis of “biometric” technology, into which it has lumped a wide variety of technology from gaze tracking to wearables to sentiment analysis. The analysis was scathing, stating “Developments in the biometrics and emotion AI market are immature. They may not work yet, or indeed ever.”, which damns a huge swathe of technologies, including the widely used “my voice is my password” technologies used by banks, and even, by extension, the entire US Immigration system, which relies on facial biometrics to streamline entry via US airports to dramatically reduce queues. The ICO promises further guidance in Spring 2023, but notwithstanding its stated position as an “advocate for genuine innovation and business growth”, the language so far is hostile towards an industry that has demonstrated genuine advances.

What do you see as the biggest potential issue that speech tech can look to solve next year? 

Consumer speech tech is in turmoil at the moment, with Google and Amazon laying off large swathes of employees, and an analysis of the Alexa business showing that it will lose $10 billion this year. However, one bright spot may be the ability to reduce “Zoom fatigue”.  Innovations in the last 12 months have allowed us for the first time to accurately summarise whole Zoom and Teams meetings, making it easier for people to engage properly in long meetings without the need to keep taking notes.  Or even not engage at all, and then just read the summary!

What are your predictions for AI innovations in the next year?

There will be a lot more of the same, that is for certain. There is something of an “arms race” going on in the Large Language Model space, the latest iteration of this being the release of ChatGPT (or “GPT-3.5”). However, this is all more of the same.  True innovation means we have to look at different types of networks, and I see that the next stage of evolution is in biologically inspired networks, like “spiking” neural networks.  These are a form of self-organising networks that solve problems in an organic fashion. This could be as simple as noise reduction in an audio signal, to solving the infamous travelling salesman problem (already solved in nature by the slime mould Physarum polycephalum). By using fewer but more complex neurons in an artificial network, we can break away from needing networks trained on vast amounts of data, and maybe pave the way towards a more general form of artificial intelligence.  If we combine this with current technologies like transformer networks, we can take the best of the current range of “knowledge” networks, but give them a little more actual intelligence. The C Elegans roundworm manages to find food and manoeuvre with only 10 neurons, unlike the 175 billion in GPT-3!