Google's Translatotron can translate speech in the speaker's voice

Sponsored Links


NurPhoto via Getty Images

Speaking another language may be getting easier. Google is showing off Translatotron, a first-of-its-kind translation model that can directly convert speech from one language into another while maintaining a speaker’s voice and cadence. The tool forgoes the usual step of translating speech to text and back to speech, which can often lead to errors along the way. Instead, the end-to-end technique directly translates a speaker’s voice into another language. The company is hoping the development will open up future developments using the direct translation model.

According to Google, Translatotron uses a sequence-to-sequence network model that takes a voice input, processes it as a spectrogram — a visual representation of frequencies — and generates a new spectrogram in a target language. The result is a much faster translation with less likelihood of something getting lost along the way. The tool also works with an optional speaker encoder component, which works to maintain a speaker’s voice. The translated speech is still synthesized and sounds a bit robotic, but can effectively maintain some elements of a speaker’s voice. You can listen to samples of Translatotron’s attempts to maintain a speaker’s voice as it completes translations on Google Research’s GitHub page. Some are certainly better than others, but it’s a start.

Model architecture of Translatotron

Google has been fine-tuning its translations in recent months. Last year, the company introduced accents in Google Translate that can speak a variety of languages in region-based pronunciations and added more langauges to its real-time translation feature. Earlier this year, Google Assistant got an “interpreter mode” for smart displays and speakers that can between 26 languages.

Let’s block ads! (Why?)

Link to original source

Google’s Translatotron converts one spoken language to another, no text involved

Every day we creep a little closer to Douglas Adams’ famous and prescient Babel fish. A new research project from Google takes spoken sentences in one language and outputs spoken words in another — but unlike most translation techniques, it uses no intermediate text, working solely with the audio. This makes it quick, but more importantly lets it more easily reflect the cadence and tone of the speaker’s voice.

Translatotron, as the project is called, is the culmination of several years of related work, though it’s still very much an experiment. Google’s researchers, and others, have been looking into the possibility of direct speech-to-speech translation for years, but only recently have those efforts borne fruit worth harvesting.

Translating speech is usually done by breaking down the problem into smaller sequential ones: turning the source speech into text (speech-to-text, or STT), turning text in one language into text in another (machine translation), and then turning the resulting text back into speech (text-to-speech, or TTS). This works quite well, really, but it isn’t perfect; each step has types of errors it is prone to, and these can compound one another.

Furthermore, it’s not really how multilingual people translate in their own heads, as testimony about their own thought processes suggests. How exactly it works is impossible to say with certainty, but few would say that they break down the text and visualize it changing to a new language, then read the new text. Human cognition is frequently a guide for how to advance machine learning algorithms.

Spectrograms of source and translated speech. The translation, let us admit, is not the best. But it sounds better!

To that end, researchers began looking into converting spectrograms, detailed frequency breakdowns of audio, of speech in one language directly to spectrograms in another. This is a very different process from the three-step one, and has its own weaknesses, but it also has advantages.

One is that, while complex, it is essentially a single-step process rather than multi-step, which means, assuming you have enough processing power, Translatotron could work quicker. But more importantly for many, the process makes it easy to retain the character of the source voice, so the translation doesn’t come out robotically, but with the tone and cadence of the original sentence.

Naturally this has a huge impact on expression, and someone who relies on translation or voice synthesis regularly will appreciate that not only what they say comes through, but how they say it. It’s hard to overstate how important this is for regular users of synthetic speech.

The accuracy of the translation, the researchers admit, is not as good as the traditional systems, which have had more time to hone their accuracy. But many of the resulting translations are (at least partially) quite good, and being able to include expression is too great an advantage to pass up. In the end, the team modestly describes their work as a starting point demonstrating the feasibility of the approach, though it’s easy to see that it is also a major step forward in an important domain.

The paper describing the new technique was published on Arxiv, and you can browse samples of speech, from source to traditional translation to Translatotron, at this page. Just be aware that these are not all selected for the quality of their translation, but serve more as examples of how the system retains expression while getting the gist of the meaning.

Let’s block ads! (Why?)

Link to original source

eBay’s improved AI translation boosts Spanish-language sales

Sponsored Links


jejim via Getty Images

Since eBay added artificial intelligence translations for product listings in 2014, sales from the US to Spanish-speaking Latin American nations increased by almost 11 percent, according to a study. Researchers from MIT and Washington University in St. Louis scraped data from the ecommerce platform, and found that the translations helped buyers and sellers overcome language barriers.

Before 2014, eBay had some automatic translation options, but it seems the use of AI has been a boon for sellers as it’s said to have improved translation by around 10 percent. The AI only affected search queries and listing titles (and not product descriptions), but it seems that more accurate translation helped people find the items they were looking for, leading to increased sales.

The economists behind the study also looked at AI translations for French, Italian and Russian. They found similar results, though focused on Latin America due to the way in which eBay rolled out the translation tools.

The researchers considered mitigating factors, such as the length of title listings and marketing spend. They also noted that when eBay added AI translations for Latin America, it added other localization options, including prices in buyers’ own currencies and local deals and promotions.

It makes sense that when people have access to more accurate translations, it’s easier for them to conduct business with folks in other countries. The researchers noted that since they completed their research, Google has released a translation tool that’s even more powerful and has “significantly improved translation quality” as it looks at the context of entire sentences instead of simply converting individual words into another language. Google’s translation services are naturally used more broadly than those on eBay, and as such the economists suggest that, based on their findings, its effect “on cross-border trade could be large.”

Let’s block ads! (Why?)

Link to original source

Samsung imagines a wraparound smartphone display

Sponsored Links


WIPO/Samsung

If that whole folding smartphone thing doesn’t work out, Samsung has lot of other ideas cooking. It recently received patent approval for a continuous display that covers the front, while folding around the top and part of the rear of the phone, as spotted by Let’s Go Digital. That would make for some interesting applications, like letting subjects see how they look before you take a photo or showing live language translations on the rear display.

You could activate which part of the continuous display (front or back) to use by hovering your hand or a stylus S Pen over it. Rather than being stuck with a basic camera, you’d use the superior rear camera for selfies thanks to the rear display. The language translation part is a particularly interesting idea, as it would let each party speak while the other sees the translation — all without the need to flip the screen around.

Samsung continuous screen patent

Because it folds around the device, there could also be a display on the top that shows notifications, messages and so on. That way, it could function like a glorified pager, letting you see messages without even removing the phone from your pocket. If you wanted to reply, you could simply pull out the phone drag the message from the top to the front display.

Samsung is actually a bit late to this party, as the recently released Vivo NEX Dual Display phone already has a rear screen, although it uses two separate displays, not a continuous one. Vivo has even advanced the posing feature pretty far, introducing a Pose Director that can give your subject pose suggestions from an image library.

However, the translation app is an interesting idea, and a phone like this might look pretty cool thanks to the seamless display that wraps around the top. There’s a chance we might see something like this, but don’t bet any money on it — patents often turn out to be duds.

Let’s block ads! (Why?)

Link to original source

Google Lens may add translation and restaurant 'filters'


Cherlynn Low/Engadget

As clever as Google Lens can be, it’s still quite limited in what it can do before it points you to another app. You might not have to lean on those other apps quite so often n the near future. In the wake of an initial discovery earlier in April, the 9to5Google team has spotted evidence that Lens could soon include a host of “filters” aimed at fulfilling specific augmented reality tasks. A “translate” filter, for instance, might auto-detect one language and offer to convert it to another instead of simply copying text and asking to launch Google Translate.

There are also references to a “dining” filter that would search nearby restaurants, including popular dishes. A “shopping” filter appears to focus on more generic goods. Combined with the translation feature, it appears as if Google wants to offer a range of specialized searches instead of a one-size-fits-all function.

It’s not certain when the upgraded Lens might arrive, assuming it does at all. With I/O starting on May 7th, though, it wouldn’t be surprising if Google revealed or even releaseed the feature as its developer conference got underway.

Let’s block ads! (Why?)

Link to original source