Real time machine translation may soon be upon us, opening up a slew of possibilities for systems that completely break down language barriers around the world. The current state of speech recognition technologies suggest that though there is still much work to be done, such systems could start appearing over the next few years.

Microsoft Research recently demoed a functional speech recognition system that showed that not only can spoken English undergo machine translation and be spoken back in another language, the spoken translation can be done in the speaker’s original tone and cadence. Watch the real-time machine translation demo from English to Chinese below.

Computer scientists have been working on speech recognition systems for over sixty years. The first approach was to use simple pattern matching of speech waveforms and comparing them with waveforms associated with specific words. The next breakthrough came in the late 1970s when Carnegie Mellon University researchers used the Hidden Markov modeling technique to create robust statistical speech models. With faster computers due to Moore’s Law, the ability to process much more data, and better methods, speech systems have steadily improved.

Microsoft Research’s recent breakthrough, in conjunction with the University of Toronto, uses the Deep Neural Networks technique, which patterns speech recognition roughly on human brain behavior. This method enables even more discriminating and reliable speech recognition systems. Rick Rashid, Microsoft’s Chief Research Officer, wrote about his thoughts on machine translation and recapped his presentation in a guest post on

Find out more about Microsoft Research Machine Translation projects on their website, or watch the full 9 minute presentation on the state of Microsoft’s speech recognition projects, including a discussion of the above research.

(Cover image: Find Biometrics)