AI-102 Microsoft Azure AI – Translate language
- Azure Translator Services
Now the next section of the exam has to do with the translator service. Microsoft Azure Cognitive Services supports text to text translations as well as speechtospeech and speechtotext that is going from one language to another. If we look at the documentation for this, we can see that modern translation services use a technique called neural machine translation. So this is not just a simple dictionary lookup of one word into another, causing into the typical poor machine translation. Over time this has gotten better and better as machines have started to not only understand the intention of speech, but then be able to translate that so that the intention remains. So we’re training this using machine learning. There are over 90 text translation languages and dialects available.
So if we go down here and look at the translations, we can see that there’s a list here of all the languages, not just the popular ones. Once you get into 90 languages, you’re getting into some less spoken languages around the world. In fact, if you look closely among the languages, you’ll notice at least two languages that stand out being Klingon, two forms of Klingon. The rest of them I believe are languages spoken by humans on the planet Earth. So that is the ability to do translations we’ve already seen. It has the ability to detect written languages.
And so we can see a list here of all of the written languages that it can detect. Give it some text that will tell you it’s one of these languages. One of the interesting features is the transliteration method. Transliteration method translates a language but keeps it in the language only changing the script. So for instance, you have some non Latin scripts such as the Arabic character set. And so you can translate back and forth. You see the arrows here indicate you can go from the Arabic script to a Latin script still in Arabic or back again. And it has that for a number of languages, which I’m sure is a great support to many people. There is a dictionary lookup feature. So if you want to know the definitions of words, again monks, many different languages that’s available.
Finally you can basically customize a language. Now the purpose of that is if you have words, phrases, things within your language that are custom for your industry, then you can basically define those words here. So let’s say we’ve got in terms of Microsoft Azure, so perhaps the word Microsoft Azure is the same in every language. You don’t translate the word Azure to the color blue in other languages. And so you can set up a customization script that says even if you’re translating the word Azure from English and you’re going to go that into Bulgarian, you remain that as the word Azure. You don’t try to translate that into the Bulgarian word for a light sky blue.
So that could be very helpful if you’ve got industry specific translations in medical field in technology field, et cetera. Now, if we want to see this in action, Microsoft does have a Microsoft Translator GitHub repository. There are some samples in there. And so if we scroll into this one, we can see the same Rest API and point that we do with some other samples. In this case, it’s got the from language and the to language. This example has actually two languages. So it’s taking English text, translating it to German and to Italian. You get both results in your function call. So basically this is the Rest API for that. And then when you call it, it’s passing in this Hello World you’re going to get the response back that contains both the German and Italian versions of Hello World. Very simple and easy to call the translation API. Now, believe it or not, the pricing is quite reasonable. There is a free plan that allows you to have 2 million characters a month for free.
And so if you just want to play around with translation for demo purposes, testing and training, development purposes, you can just do all of this translation. 2 million is quite a lot of characters that you can do for free. If you are moving to a pay as you go plan, then you’re looking at around $10 per million for any of these function calls. Again, it’s going to be based on text translation, language detection, bilingual dictionary and that transliteration feature. Those characters you get charged per character.
- Speech-to-Speech Audio Translation
So we saw the text to text translation is interesting, but pretty straightforward. The code wasn’t that long. We’re moving to a more complicated world of speech to speech translation. This is under my own repository, AI 102 files. The translates of directory translates speechospeech pi. So this is a bit of a longer script. It’s 85 lines. You can see that it is using the Cognitive Services API, the speech API in particular. Now we’re setting up a couple of different things. We’re going to use the speech API to input some speech and then there’s a translation as well.
So these are sort of two different things. So we get the speech configuration, which is from the key in the region, setting up a speech synthesizer. And that’s for the talking part. We are saying the source language is going to be English US. And the destination languages are going to be two languages, one of being French and the other being Indonesian.
So we’re translating audio from English to French and Indonesian. Now, this big SDK translation recognizer is what is going to we’re setting ourselves up here. We’re creating a recognizer. We’ve got our translation configuration being the keys here. All right, so there’s now a point where we’re going to ask the speaker to say something because the default input is the microphone, since we haven’t specified another input. So the recognize once command same as it was before. It’s going to trigger the audio input. And again, we’re passing this into our translation recognizer object. So that’s going to return back a result and we’re going to basically check that the result exists and we’re going to then go through there’s two translations.
So we go through one at a time. So to go from English to French was the first one. And we’re going to be using the Julie voice for this French. And so we’re invoking the speech API speech synthesizer, using the previously configuration, using Julie to say the result. And then similarly for Indonesian, we are choosing Andaya as the person’s voice. And we’re going to then basically speech synthesize that result. So you can see this already.
We’ve done a number of APIs, including the translation and then the speaking of it, the speech to the audio output effectively. So let’s look at how this runs within Python. So we switch over to PyCharm. This is our script again. We set it up with the right keys and such. Then we can execute this within PyCharm. We’re expecting it to ask for input. The quick brown fox jumps over the lazy dog and now it returns a result. So we’ve seen it taken the English audio input and translated it to French and Indonesian as well as using the audio player to play those, which is pretty cool.
- Speech-to-Text Translation
So we’ve seen how to do speechtospeech translation. This is speechtotext. Now, we’ve seen speechtotext when it comes to transcription prior to this in this course. So we’re going to be very similar, except this is going to change the language along the way. We’re using the SDK Cognitive Services Speech API. All this same setup. We’re basically setting up the translation configuration. We’re setting up the from language being English and the two language being German.
And so by setting up this translation configuration, now we’re basically telling it this is what we want to use. We want to use audio and set that up to text. Now this recognize once command is going to input from the default, which happens to be your microphone. This comment, actually, I haven’t mentioned it before, but we’ve seen this recognized once, a few times so far in this course. The reason why it’s called once is because it stops after a single utterance is recognized.
So as soon as you say a sentence or a partial sentence, it will stop maximum of 15 seconds. Okay? It listens for silence. So if you want to do long running speech, then you’re going to use this continuous recognition. All right? So we call recognize once. It gets the audio input from the microphone, and then it performs the translation. And what you’re expecting for success is that it’s going to recognize the text that we input it, and it’s going to translate it into, in this case, german.
There are other error conditions that are tracked. Pretty simple code. We’re down to 40 lines compared to the 80 lines that were in the previous example. So in PyCharm, we take the script and we run it. The quick brown fox jumps over the lazy dog. It takes the audio input, translates it into German. Just don’t ask me to speak it in German.