The Babbel Blog

Online Language Learning

Tech Background: Babbel Speech Recognition

Posted on June 28, 2010 by

Interview with Technical Director Thomas Holl

 

Speech recognition is the exciting new feature at  Babbel. It’s not only fun – it’s also amazingly efficient for learning a new language. But how does it work? I got the low down from our Technical Director Thomas.

Crisi: What does the new speech recognition tool do?

Thomas: Basically, we use pronunciation samples recorded by our native speaking course editors and compare your pronunciation to theirs. As always with Babbel, you get instant feedback. The closer your pronunciation is to this example, the more points you get on a scale from 0 to 100. If you get more than 50 points, you’re good enough to be generally understood.

Crisi: But if you just compare two sounds, is that really speech recognition?

Thomas: Sure, we recognize what you say. We’re now sitting in front of the screen and we are talking but you see that the score is 0 all the time. Now, try saying arrivederci.

Crisi: Arrivederci

Thomas: Nice, 78 points.  Better than Aldo Raine in “Inglorious Basterds” (see details here). Remember the hilarious scene where Brad Pitt is trying to speak Italian? We ran his pronunciation through our analysis and as you might expect he scored pretty low. But I’m digressing, sorry. Back to our little test. Your pronunciation is about 78% exact compared to our reference sample. That’s pretty good.

Crisi: Still, it’s only about comparing sounds, not about understanding what I say.

Thomas: Well, there are different sub-types of speech recognition. One is speech-to-text or voice control. That’s what you’d use to enter text or commands if you can’t use a keyboard. Recognizing words and evaluating their pronunciation is another sub-type, and that’s the technology that makes sense for language learning. We can use it for pronunciation training and for building new interactive exercises.

Crisi: So, what’s the technical challenge in this sub-type of speech recognition?

Thomas: Well, it’s not as easy as it sounds – no pun intended. It’s actually not enough to just compare two sounds. It’s a little like telling how similar two people look in two different photos. The audio samples are usually pretty different: a woman has a higher voice than a man and the tempo of speech also differs a lot. And then you have a number of artifacts…

Crisi: Artifacts?

Thomas: Noises and characteristics that are caused by the environment or the technical setup: rumbling, hissing, other sounds mixing into the voice. Most people don’t have a high-end microphone connected to their computer and in our case we just use the built-in mic on my laptop. The audio quality of what the system is hearing is pretty poor.

Crisi: So to make the speech recognition work properly, our users need to have a good mic and be in a quiet room?

Thomas: No, that’s the point: we can also work with cheap microphones and filter out noise in the immediate environment. That’s part of the challenge.

Crisi: Sounds like a lot of filtering and levelling…

Thomas: Yes, that also, but there’s more: We have to distil the “core” of the voice sample and then match that to the original. To do that, the system needs to figure out when you start and stop speaking. You don’t have to press any key to start and stop recording; we do the matching in real-time.

Crisi: So everything we say into the system here is somehow analyzed?

Thomas: Right. Just look at the level: every sound input is analyzed and matched to the sound we’re looking for. In this case, arrivederci.

Crisi: 55 points

Thomas: Ok, yours is better than mine. But you see that the word was recognized among all the other things we said.

Crisi: Is this unique technology? Are there other software product that do this?

Thomas: There are a number of software products that do have speech recognition. Some of them also are of decent quality.

Crisi: So what’s so special about the Babbel speech recognition?

Thomas: Well, it’s online and works in your browser.

Crisi: Does this mean that everything we say here is sent to the Babbel servers and analyzed there?

Thomas: No, the whole audio processing is done instantly, directly in the browser. We don’t have to send the audio to the server and that’s why we can give instant feedback.

Crisi: Do I have to install a plugin or something?

Thomas: You don’t. It’s all done in Flash. 97% of all browsers have the Flash plugin pre-installed. As we use the latest version, you might have to do an update, but that’s very quick. Other than that, you just need a microphone like the one that’s built into my laptop.

Crisi: Babbel has been online since January 2008. Why did it take so long to add this feature?

Thomas: We needed the new Flash Player 10.1 because before that it wasn’t possible to do audio processing locally. It would have been necessary to either send all the audio to the server for analyses or to use a custom browser plugin.

Crisi: What’s wrong with a custom browser plugin?

Thomas: First of all, you have to install new software on your computer. And then you have compatibility issues. There are some rare solutions that offer real-time speech recognition in a browser plugin, but most of them won’t work on your Mac and none of them are compatible with all browsers. Flash is already there, the plugin works fine and it’s available for all platforms.

Crisi: How about the iPhone? You can’t use Flash technology on that platform, can you?

Thomas: No, but the Babbel iPhone apps work natively on the iPhone anyway.

Crisi: Natively?

Thomas: The Babbel apps are built specifically for the iPhone and don’t need a browser or plugin to work. That’s called a “native” application. We can build our algorithm directly into the app.

Crisi: That’s not related to Native Instruments, the software company you used to work for?

Thomas: (laughs): No, not directly. But for being an audio software company, Native Instruments definitely is a great name because the software works natively on the computer.

Crisi: I guess we don’t have to understand that completely. But speaking of audio software: has your audio expertise (along with that of the other Babbel founders) been crucial for this new feature or is it something entirely different than building DJ tools?


Thomas:
Both. Of course working on beat detection and time stretching for music and building a speech recognition tool are two different things. On the other hand, we couldn’t have done this in-house without our background.

Crisi: So who actually implemented the new feature?

Thomas: Most of it was done by Toine Diepstraten, one of the Babbel founders. He and I started working together on audio software in our first company, d-lusion, more than 10 years ago. Toine is one of the best developers and audio specialists I’ve ever met. It’s fantastic to have him on board for this project. He did have to do quite some research but without his expertise, this would never have been possible. But this way we have state-of-the art technology that can compare with any other implementation.

Crisi: You sound very convinced

Thomas: From a technical point of view, this is a great piece of software. We actually got some recognition from Adobe, the makers of the Flash Player. They were pretty impressed by our solution.

Crisi: Will this be a focus for Babbel from now on, or do you plan to work on other types of features?

Thomas: It is a very important feature because now we can do everything online that traditional e-learning software can do locally. And we don’t need installation or updates and we have a very lively online community that goes together with the self-directed learning…

Crisi: But?

Thomas: It’s important but it’s not the end. We’ll keep working and adding new features.

Crisi: Can you say what’s next for Babbel?

Thomas: Sorry, but for that we’ll have to turn off the mic.

Crisi: No problem.


Tweet about this on TwitterShare on Google+Pin on PinterestEmail this to someone

Brad Pitt (as Lt. Aldo Raine) Scores 57 in Italian

Posted on June 24, 2010 by

After we released the new speech recognition feature yesterday and had all the good feeedback, we got into what you might call a funny mood. We ended up testing some celebrities with our new feature.

It was loads of fun running Brad Pitt’s pronunciation through the tool to see what score he would have gotten for his Americanized Italian in Tarantino’s movie “Inglorious Basterds”. His buongiorno and arrivederci are understandable, but you have to admit the pronunciation is far from perfect!  According to our new tool that evaluates pronunciation quality, Lt. Aldo Raine scores a 53 for his buongiorno and a 57 for the famous arrivederci. A little practice could have probably helped…

You can try your own results in any of seven languages on Babbel.com. To practice Italian greetings, just select the beginner’s course. First step is free after a simple registration.

Tweet about this on TwitterShare on Google+Pin on PinterestEmail this to someone

Practice your Pronunciation: New Speech Recognition Tool

Posted on June 23, 2010 by

Babbel Speech Recognition ScreenshotIt’s called learning to “speak” a language, but the sad truth is, few self-directed learners get up the courage to actually open their mouths. Lack of speaking practice can lead to shying away from having interactions with real people, which certainly doesn’t help with the final goal of communication.

The idea of our new integrated speech recognition tool, however, is to get learners talking. By prompting the learner to say the words out loud and test their own pronunciation, Babbel now has covered all the bases for truly effective  foreign language learning: reading, writing, listening comprehension and speaking.

Some traditional e-learning software includes this sort of tool, but none of them do it online in quite the same way. We have built a tool that performs speech recognition in real-time without requiring a custom browser plugin. The technology is complex, but using it is easy. For some backgrounds, check out the interview with Technical Director Thomas Holl.

To try out the speech recognition tool, just log into Babbel.com, set up the audio in 2 quick steps, and start any vocabulary package or beginner’s course. If you don’t have a user account yet, you can register for free and take one trial lesson. Speak up!

Tweet about this on TwitterShare on Google+Pin on PinterestEmail this to someone