Researchers at Cornell University have developed a silent speech recognition interface that uses acoustic sensing and artificial intelligence to continuously recognize up to 31 non-vocalized commands based on lip and mouth movements.
The low-power wearable interface, called EchoSpeech, requires just a few minutes of user training data before it can recognize commands and can be run on a smartphone.
Dr. Ruidong Zhang of Information Sciences “EchoSpeech. is the lead author of Continuous Silent Speech Recognition on Minimally-obtrusive Eyewear Powered by Acoustic Sensing, which will be presented at the Association for Computing Machinery’s conference on Human Factors in Computing Systems. CHI) this month in Hamburg, Germany.
“For people who cannot produce voice, this silent speech technology can be a great addition to the voice synthesizer. It can give patients their voice back,” Zhang said of the potential future uses of the technology.
In its current form, EchoSpeech can be used to communicate with others via smartphone in places where speech is uncomfortable or inappropriate, such as a noisy restaurant or a quiet library. The silent speech interface can also be paired with a pen and used with design software such as CAD, eliminating the need for a keyboard and mouse.
Equipped with a pair of microphones and speakers smaller than a pencil eraser, the EchoSpeech glasses become a wearable AI-powered sonar system that sends and receives sound waves through the face and senses mouth movements. A deep learning algorithm then analyzes these response profiles in real time with an accuracy of around 95%.
“We’re carrying sonar on the body,” said Cheng Zhang, assistant professor of information sciences and director of Cornell’s Smart Computer Interfaces for Future Interactions (SciFi) lab.
“We’re very excited about this system,” he said, “because it’s really pushing the boundaries of performance and privacy. It is small, low-power, and privacy-sensitive, all of which are important features for implementing new, wearable technologies. the real world.”
Most silent speech recognition technologies are limited to some predetermined commands and require the user to face or wear a camera, which is neither practical nor feasible, Zhang Zhang said. There are also serious privacy concerns with wearable cameras, both for the user and those with whom the user interacts, he said.
Acoustic sensing technology like EchoSpeech eliminates the need for wearable cameras. And because audio data is much smaller than image or video data, it requires less bandwidth to process and can be transmitted in real time to a smartphone via Bluetooth, said information science professor Francois Gimbrettier.
“And because data is processed locally on your smartphone instead of being uploaded to the cloud,” he said, “privacy-sensitive information never leaves your control.”