The Babbel Blog

language learning in the digital age

Tech Background: Babbel Speech Recognition

Posted on June 28, 2010 by

Interview with Technical Director Thomas Holl


Speech recognition is the exciting new feature at  Babbel. It’s not only fun – it’s also amazingly efficient for learning a new language. But how does it work? I got the low down from our Technical Director Thomas.

Crisi: What does the new speech recognition tool do?

Thomas: Basically, we use pronunciation samples recorded by our native speaking course editors and compare your pronunciation to theirs. As always with Babbel, you get instant feedback. The closer your pronunciation is to this example, the more points you get on a scale from 0 to 100. If you get more than 50 points, you’re good enough to be generally understood.

Crisi: But if you just compare two sounds, is that really speech recognition?

Thomas: Sure, we recognize what you say. We’re now sitting in front of the screen and we are talking but you see that the score is 0 all the time. Now, try saying arrivederci.

Crisi: Arrivederci

Thomas: Nice, 78 points.  Better than Aldo Raine in “Inglorious Basterds” (see details here). Remember the hilarious scene where Brad Pitt is trying to speak Italian? We ran his pronunciation through our analysis and as you might expect he scored pretty low. But I’m digressing, sorry. Back to our little test. Your pronunciation is about 78% exact compared to our reference sample. That’s pretty good.

Crisi: Still, it’s only about comparing sounds, not about understanding what I say.

Thomas: Well, there are different sub-types of speech recognition. One is speech-to-text or voice control. That’s what you’d use to enter text or commands if you can’t use a keyboard. Recognizing words and evaluating their pronunciation is another sub-type, and that’s the technology that makes sense for language learning. We can use it for pronunciation training and for building new interactive exercises.

Crisi: So, what’s the technical challenge in this sub-type of speech recognition?

Thomas: Well, it’s not as easy as it sounds – no pun intended. It’s actually not enough to just compare two sounds. It’s a little like telling how similar two people look in two different photos. The audio samples are usually pretty different: a woman has a higher voice than a man and the tempo of speech also differs a lot. And then you have a number of artifacts…

Crisi: Artifacts?

Thomas: Noises and characteristics that are caused by the environment or the technical setup: rumbling, hissing, other sounds mixing into the voice. Most people don’t have a high-end microphone connected to their computer and in our case we just use the built-in mic on my laptop. The audio quality of what the system is hearing is pretty poor.

Crisi: So to make the speech recognition work properly, our users need to have a good mic and be in a quiet room?

Thomas: No, that’s the point: we can also work with cheap microphones and filter out noise in the immediate environment. That’s part of the challenge.

Crisi: Sounds like a lot of filtering and levelling…

Thomas: Yes, that also, but there’s more: We have to distil the “core” of the voice sample and then match that to the original. To do that, the system needs to figure out when you start and stop speaking. You don’t have to press any key to start and stop recording; we do the matching in real-time.

Crisi: So everything we say into the system here is somehow analyzed?

Thomas: Right. Just look at the level: every sound input is analyzed and matched to the sound we’re looking for. In this case, arrivederci.

Crisi: 55 points

Thomas: Ok, yours is better than mine. But you see that the word was recognized among all the other things we said.

Crisi: Is this unique technology? Are there other software product that do this?

Thomas:There are a number of software products that do have speech recognition. Some of them also are of decent quality.

Crisi: So what’s so special about the Babbel speech recognition?

Thomas: Well, it’s online and works in your browser.

Crisi: Does this mean that everything we say here is sent to the Babbel servers and analyzed there?

Thomas: No, the whole audio processing is done instantly, directly in the browser. We don’t have to send the audio to the server and that’s why we can give instant feedback.

Crisi: Do I have to install a plugin or something?

Thomas: You don’t. It’s all done in Flash. 97% of all browsers have the Flash plugin pre-installed. As we use the latest version, you might have to do an update, but that’s very quick. Other than that, you just need a microphone like the one that’s built into my laptop.

Crisi: Babbel has been online since January 2008. Why did it take so long to add this feature?

Thomas: We needed the new Flash Player 10.1 because before that it wasn’t possible to do audio processing locally. It would have been necessary to either send all the audio to the server for analyses or to use a custom browser plugin.

Crisi: What’s wrong with a custom browser plugin?

Thomas: First of all, you have to install new software on your computer. And then you have compatibility issues. There are some rare solutions that offer real-time speech recognition in a browser plugin, but most of them won’t work on your Mac and none of them are compatible with all browsers. Flash is already there, the plugin works fine and it’s available for all platforms.

Crisi: How about the iPhone? You can’t use Flash technology on that platform, can you?

Thomas: No, but the Babbel iPhone apps work natively on the iPhone anyway.

Crisi: Natively?

Thomas:The Babbel apps are built specifically for the iPhone and don’t need a browser or plugin to work. That’s called a “native” application. We can build our algorithm directly into the app.

Crisi: That’s not related to Native Instruments, the software company you used to work for?

Thomas: (laughs): No, not directly. But for being an audio software company, Native Instruments definitely is a great name because the software works natively on the computer.

Crisi: I guess we don’t have to understand that completely. But speaking of audio software: has your audio expertise (along with that of the other Babbel founders) been crucial for this new feature or is it something entirely different than building DJ tools?

Both. Of course working on beat detection and time stretching for music and building a speech recognition tool are two different things. On the other hand, we couldn’t have done this in-house without our background.

Crisi: So who actually implemented the new feature?

Thomas: Most of it was done by Toine Diepstraten, one of the Babbel founders. He and I started working together on audio software in our first company, d-lusion, more than 10 years ago. Toine is one of the best developers and audio specialists I’ve ever met. It’s fantastic to have him on board for this project. He did have to do quite some research but without his expertise, this would never have been possible. But this way we have state-of-the art technology that can compare with any other implementation.

Crisi: You sound very convinced

Thomas: From a technical point of view, this is a great piece of software. We actually got some recognition from Adobe, the makers of the Flash Player. They were pretty impressed by our solution.

Crisi: Will this be a focus for Babbel from now on, or do you plan to work on other types of features?

Thomas: It is a very important feature because now we can do everything online that traditional e-learning software can do locally. And we don’t need installation or updates and we have a very lively online community that goes together with the self-directed learning…

Crisi: But?

Thomas: It’s important but it’s not the end. We’ll keep working and adding new features.

Crisi: Can you say what’s next for Babbel?

Thomas: Sorry, but for that we’ll have to turn off the mic.

Crisi: No problem.


Hello Google and yahoo works fine in my situation but your site is starting slowly which went on around one minute to actually
load up, I’m not sure if it is my personal issue or maybe
web site problems. However thanks for posting terrific articles.

I believe it has been extremely helpful user who click
here. This one is without a doubt terrific everything that you
have implemented and wish to check out a lot more articles from your website.
I now have your site saved to my bookmarks to see blogs you publish.

I am speaking and saying words, but it’s not recognizing me. I have headphones. It should work, they are not the cheapest one.

I have followed the instructions but my mike is not picked up by BABBEL. The mike works on SKYPE etc. Where is the problem?

My Flash Player is the latest edition.

How can I get voice recognition working on my Mac laptop. I have enabled the internal microphone via system preferences.

Hi Philip, Please check our FAQ and the guide how to setup speech recognition here:
If this doesn’t help you please contact our support team at thank you!

Sorry but I am terribly disappoited by your voice recognition. I used to be good in Spanish and figyred I’d give it a try to recall all that I have firgitten and I kept getting 0. Then I checked against french,,, which is my native language. guess what… zeroes unless i pull out a dumb pronunciation to sound like the recording.

Bonjour François,
merci pour votre message. Nous travaillons au perfectionnement de notre outil de reconnaissance vocale. Il demande à être utilisé dans un endroit calme et doit être ajusté avant son utilisation.


Equipe Babbel

I’m getting no feed back from speech recognition. My microphone works well and I’ve set it up for Babbel, but still no joy. I’m probably doing something stupid – can you help?

I have fun with, cause I found just what I was taking a look for.
You have ended my four day long hunt! God Bless you man.
Have a nice day. Bye

Thank you and have fun with learning languages!

yeah ! definitivamente necesito un iphone !

that’s another reason to have an iPhone )

Leave a Reply