Machine Translates Thoughts Into Speech In Real Time

Home Medicine & Health Neuroscience December 21, 2009Machine Translates Thoughts into Speech in Real TimeDec 21, 2009 By Lisa Zyga feature Enlarge Model of the brain-machine interface for real-time synthetic speech production. The stroke-induced lesion (red X) disables speech output, but speech motor planning in the cerebral cortex remains intact. Signals collected from an electrode in the speech motor cortex are amplified and sent wirelessly across the scalp as FM radio signals. The Neuralynx System amplifies, converts, and sorts the signals. The neural decoder then translates the signals into speech commands for the speech synthesizer. Credit: Guenther, et al.( — By implanting an electrode into the brain of a person with locked-insyndrome, scientists have demonstrated how to wirelessly transmit neural signals toa speech synthesizer. The “thought-to-speech” process takes about 50 milliseconds -the same amount of time for a non-paralyzed, neurologically intact person to speaktheir thoughts. The study marks the first successful demonstration of a permanentlyinstalled, wireless implant for real-time control of an external device. The study is led by Frank Guenther of the Department of Cognitive and NeuralSystems and the Sargent College of Health and Rehabilitation Sciences at BostonUniversity, as well as the Division of Health Science and Technology at HarvardUniversity-Massachusetts Institute of Technology. The research team includescollaborators from Neural Signals, Inc., in Duluth, Georgia; StatsANC LLC in BuenosAires, Argentina; the Georgia Tech Research Institute in Marietta, Georgia; theGwinnett Medical Center in Lawrenceville, Georgia; and Emory University Hospital inAtlanta, Georgia. The team published their results in a recent issue of PLoS ONE.“The results of our study show that a brain-machine interface (BMI) user can controlsound output directly, rather than having to use a (relatively slow) typing process,”Guenther told their study, the researchers tested the technology on a 26-year-old male who had abrain stem stroke at age 16. The brain stem stroke caused a lesion between thevolunteer’s motor neurons that carry out actions and the rest of the brain; while hisconsciousness and cognitive abilities are intact, he is paralyzed except for slowvertical movement of the eyes. The rare condition is called locked-in syndrome.Five years ago, when the volunteer was 21 years old, the scientists implanted anelectrode near the boundary between the speech-related premotor and primarymotor cortex (specifically, the left ventral premotor cortex). Neurites began growinginto the electrode and, in three or four months, the neurites produced signalingpatterns on the electrode wires that have been maintained indefinitely.Three years after implantation, the researchers began testing the brain-machineinterface for real-time synthetic speech production. The system is “telemetric” – itrequires no wires or connectors passing through the skin, eliminating the risk ofinfection. Instead, the electrode amplifies and converts neural signals into frequencymodulated (FM) radio signals. These signals are wirelessly transmitted across thescalp to two coils, which are attached to the volunteer’s head using a water-solublepaste. The coils act as receiving antenna for the RF signals. The implanted electrode ispowered by an induction power supply via a power coil, which is also attached to thehead. Phenotypic ScreeningCNS Drug Discovery Primary Neurons on MEA-NeurochipsThe signals are then routed to an electrophysiological recording system that digitizesand sorts them. The sorted spikes, which contain the relevant data, are sent to aneural decoder that runs on a desktop computer. The neural decoder’s outputbecomes the input to a speech synthesizer, also running on the computer. Finally, thespeech synthesizer generates synthetic speech (in the current study, only three vowelsounds were tested). The entire process takes an average of 50 milliseconds.As the scientists explained, there are no previous electrophysiological studies ofneuronal firing in speech motor areas. In order to develop an accurate neural codingscheme, they had to rely on an established neurocomputational model of speechmotor control. According to this model, neurons in the left ventral premotor cortexrepresent intended speech sounds in terms of “formant frequency trajectories.”In an intact brain, these frequency trajectories are sent to the primary motor cortexwhere they are transformed into motor commands to the speech articulators.However, in the current study, the researchers had to interpret these frequencytrajectories in order to translate them into speech. To do this, the scientists developeda two-dimensional formant frequency space, in which different vowel sounds can beplotted based on two formant frequencies (whose values are represented on the x andy axes).“The study supported our hypothesis (based on the DIVA model, our neural networkmodel of speech) that the premotor cortex represents intended speech as an ‘auditorytrajectory,’ that is, as a set of key frequencies (formant frequencies) that vary withtime in the acoustic signal we hear as speech,” Guenther said. “In other words, wecould predict the intended sound directly from neural activity in the premotor cortex,rather than try to predict the positions of all the speech articulators individually andthen try to reconstruct the intended sound (a much more difficult problem given thesmall number of neurons from which we recorded). This result provides our firstinsight into how neurons in the brain represent speech, something that has not beeninvestigated before since there is no animal model for speech.”To confirm that the neurons in the implanted area were able to carry speechinformation in the form of formant frequency trajectories, the researchers asked thevolunteer to attempt to speak in synchrony with a vowel sequence that was presentedauditorily. In later experiments, the volunteer received real-time auditory feedbackfrom the speech synthesizer. During 25 sessions over a five-month period, thevolunteer significantly improved the thought-to-speech accuracy. His average hit rateincreased from 45% to 70% across sessions, reaching a high of 89% in the last session.Although the current study focused only on producing a small set of vowels, theresearchers think that consonant sounds could be achieved with improvements to thesystem. While this study used a single three-wire electrode, the use of additionalelectrodes at multiple recording sites, as well as improved decoding techniques, couldlead to rapid, accurate control of a speech synthesizer that could generate a widerange of sounds.“Our immediate plans involve the implementation of a new synthesizer that can

Leave a Reply

Your email address will not be published. Required fields are marked *