Every day we creep a little closer to Douglas Adams’ famous and prescient Babel fish. A new research project from Google takes spoken sentences in one language and outputs spoken words in another — but unlike most translation techniques, it uses no intermediate text, working solely with the audio. This makes it quick, but more importantly lets it more easily reflect the cadence and tone of the speaker’s voice.
Translatotron, as the project is called, is the culmination of several years of related work, though it’s still very much an experiment. Google’s researchers, and others, have been looking into the possibility of direct speech-to-speech translation for years, but only recently have those efforts borne fruit worth harvesting.
Translating speech is usually done by breaking down the problem into smaller sequential ones: turning the source speech into text (speech-to-text, or STT), turning text in one language into text in another (machine translation), and then turning the resulting text back into speech (text-to-speech, or TTS). This works quite well, really, but it isn’t perfect; each step has types of errors it is prone to, and these can compound one another.
Furthermore, it’s not really how multilingual people translate in their own heads, as testimony about their own thought processes suggests. How exactly it works is impossible to say with certainty, but few would say that they break down the text and visualize it changing to a new language, then read the new text. Human cognition is frequently a guide for how to advance machine learning algorithms.
Spectrograms of source and translated speech. The translation, let us admit, is not the best. But it sounds better!
To that end, researchers began looking into converting spectrograms, detailed frequency breakdowns of audio, of speech in one language directly to spectrograms in another. This is a very different process from the three-step one, and has its own weaknesses, but it also has advantages.
One is that, while complex, it is essentially a single-step process rather than multi-step, which means, assuming you have enough processing power, Translatotron could work quicker. But more importantly for many, the process makes it easy to retain the character of the source voice, so the translation doesn’t come out robotically, but with the tone and cadence of the original sentence.
Naturally this has a huge impact on expression, and someone who relies on translation or voice synthesis regularly will appreciate that not only what they say comes through, but how they say it. It’s hard to overstate how important this is for regular users of synthetic speech.
The accuracy of the translation, the researchers admit, is not as good as the traditional systems, which have had more time to hone their accuracy. But many of the resulting translations are (at least partially) quite good, and being able to include expression is too great an advantage to pass up. In the end, the team modestly describes their work as a starting point demonstrating the feasibility of the approach, though it’s easy to see that it is also a major step forward in an important domain.
The paper describing the new technique was published on Arxiv, and you can browse samples of speech, from source to traditional translation to Translatotron, at this page. Just be aware that these are not all selected for the quality of their translation, but serve more as examples of how the system retains expression while getting the gist of the meaning.
If it feels like technological change is happening faster than it used to, that’s because it is.
It took around 12,000 years to move from the agrarian to the industrial revolution but only a couple of hundred years to go from the industrial to the information revolution that’s now propelling us in a short number of decades into the artificial intelligence revolution. Each technological transformation enables the next as the time between these quantum leaps becomes shorter.
That’s why if you are looking backwards to get a sense of how quickly the world around you will change, you won’t realize how quickly our radically different future is approaching. But although this can sometimes feel frightening, there’s a lot we can do now to help make sure we ride this wave of radical change rather than get drowned by it.
Here’s my essential list:
Do what you can to preserve your youth Scientists are discovering new ways to slow the biological process of aging. It won’t be too long before doctors start prescribing pills, gene therapies, and other treatments to manage getting old as a partly curable disease. Because most of the terrible afflictions we now fear are correlated with age, medically treating aging will push off the date when we might have otherwise developed cancers, heart disease, dementia, and other killers. To maximally benefit from the new treatments for aging tomorrow, we all, no matter what our current age, need to do what we can to take care of our bodies today. That means exercising around 45 minutes a day, eating a healthy and mostly plant-based diet, trying to sleep at least seven hours a night, avoiding too much sun, not smoking, building and maintaining strong communities and support networks, and living a purposeful life. The healthier you are when the anti-age treatments arrive, the longer you’ll be able to maintain your vitality into your later years.
Quantify and monitor your health You can’t monitor what you can’t measure. If you want to maintain optimal health, you need a way to regularly assess if you are on the right track. Monitoring your health through regular broad-spectrum blood and stool tests, constant feedback about your heart rate and sleep patterns from devices like your Apple Watch or Fitbit, having your genome sequenced, getting a full body MRI, and having a regular colonoscopy may seem like overkill to most people. But waiting until you have a symptom to start assessing your health status is like waiting until your car is careening down a hill to check if the brakes are in order. Some smart people worry that this kind of monitoring of “healthy” people will waste money, overwhelm our already overburdened healthcare system, and cause people unnecessary anxiety. But even the healthiest among us are in the early stages of developing one disease or another. Society will inevitably shift from a model of responsive sick care of people already in trouble to the predictive healthcare trying to keep people out of it. Do you want to be a dinosaur-like victim of the old model or a proactive pioneer of the new one?
Freeze your essential biological materials Our bodies are a treasure trove of biological materials that could save us in the future, but every morning we still flush gold down the toilet. That gold, our stool, could potentially be frozen so we could repopulate our essential gut bacteria if our microbiome were to take a dangerous hit from antibiotics or illness. Skin cells could be transformed into potentially life-saving stem cells and stored for future use to help rejuvenate various types of aging cells. If our future treatments will be personalized using our own biological materials, but we’ll need to have stored these materials earlier in life to receive the full benefit of these advances. We put money in the bank to ensure our financial security, so why wouldn’t we put some of our biological materials in a bio-bank to have our youngest possible rescue cells waiting for us when we need them and help secure our physiological security?
If you plan on ever having children, freeze your eggs or your sperm More people will soon shift from conceiving children through sex to conceiving them through IVF and embryo selection. The preliminary driver of this will be parents’ increasing recognition that they can reduce the roughly 3% chance their future children will be born with dangerous genetic mutations by having their embryos screened in a lab prior to implantation in the mother. This may seem less exciting than making babies in the back seat of a car, but the health and longevity benefits of screening embryos will ultimately overpower conception by sex kind of like how vaccinating our children has (mostly) overpowered the far more natural option of not doing so. If you are likely to conceive via IVF and embryo selection, why not freeze your eggs, sperm, or embryos when you are at your biological peak and when the chance of passing on genetic abnormalities is lower than it may be later in life?
Manage your public identity The days of living incognito are over. No matter how aggressively some of us may try to avoid it, our lives leave massive digital footprints that are becoming an essential part of our very identities. The authoritarian government in China is planning to give “social credit“ scores evaluating the digitally monitored behavior of each citizen in a creepy and frightening way. But even in more liberal societies we will all be increasingly judged at work, at home, and in our commercial interactions based on our aggregated digital identities. These identities will be based on what we buy, what we post, what we seek, and how and with whom we interact online. Some societies and individuals are smartly trying to exert a level of control over the collection and use of this personal data, but even this won’t change the new reality that our digital identities will significantly influence what options are available to us in life and represent us after we die. Given this, and perhaps sadly, we all need to protect our privacy but also think of our public selves as brands, managing our digitally recorded activity from early on to present ourselves to the world the way we consciously want the world to know us.
Learn the language of code Our lives will be increasingly manipulated by algorithms few of us understand. Most people who were once good at finding their way now just use their GPS-guided smart phones to get where they need to go. As algorithms touching many different aspects of our lives get better, we will increasingly rely on them to make plans, purchasing decisions, and even significant life choices for us. Pretty much every job we might do and many other aspects of our lives will be guided by artificial intelligence and big data analytics. Fully understanding every detail of how each of these algorithms function may be impossible, but we’ll be even more at their mercy if we don’t each acquire at least a rudimentary understanding of what code is and how it works. If you can read one book about code, that’s a start. Learning the fundamental of coding will do even more to help you navigate the fast arriving algorithmic world.
Become multicultural Pretty much wherever you were in the 18th century, you needed to understand Europe to operate effectively because European power then defined so many parts of the world. The same was true for understanding United States in the 20th century understanding America was imperative for most people living outside of the United States because US actions influenced so many aspects of their lives. For many people living in 20th century America, understanding the rest of the world was merely interesting. As China rises and Global power decentralizes in the 21st-century, we’ll all need to learn more about China, India, and other new power, population, and culture centers than ever before. This won’t just help you become a more well-rounded person, it will give you a far greater chance of success in most anything you’ll be doing. Although machine translation will make communicating across languages pretty seamless, you’ll need a cultural fluidity and fluency to succeed in the 21st century world. The good news is that people motivated to learn about other groups and societies now have more resources than ever before to do so. If you want to be ready for our multicultural, multinational future, you’d better start doing all you can to learn about other cultures and societies now.
Become an obsessive learner Technological change has been a constant throughout human history, but the pace of change is today accelerating far more rapidly than ever before. As innovations across the spectrum of science and technology empower, inspire, and reinforce each other, multiple technological transformations are converging into a revolutionary whole far greater than the sum of its parts. This unprecedented rate of change will mean that much of your knowledge will start becoming obsolete as soon as you acquire it. To keep up in your career and life, you’ll need to dedicate yourself to a lifetime of never ending, aggressive, continuous, and creativity-driven learning. The only skill worth having in an exponential world will be knowing how to learn and a passion for doing it. Call me an old-fashioned futurist, but this learning process must include reading lots of books to help you understand where we have come from and how the disparate pieces of information fit together to create a larger story. This type of knowledge will be an essential foundation of the wisdom we’ll each and all need to navigate our fast-changing world.
Invest in physical community We humans are social species. A primary reason we rose to the top of the food chain and built civilization is that our brains are optimized for collaborating with those around us. When we bond with our partners and friends, we realize one of our essential cord needs as humans. That’s why people in solitary confinement tend to go a bit crazy. But although our progression from feeling our sense of connection, belonging, and community has expanded from the level of clan to village to city to country to, in some ways, the world, we are still not virtual beings. We may get a little dopamine hit whenever someone likes our tweet or Facebook post, but most of us still need a connected physical community around us in order to be happy and to realize our best potential. With all of the virtual options that will surround us – chatbots engaging us in witty repartee, virtual assistants managing our schedules, and even friends messaging from faraway lands among them – our virtual future must remain grounded in our physical world. To build your essential community of flesh and blood people, you must invest in deep and meaningful relationships with the people physically around you.
Don’t get stuck in today The olden days were, at least in most peoples’ minds, always better. We used to have better values, a better work ethic, better communities. We used to walk to school uphill in both directions! But while we do need to hold on to the best of the past, we also need to march boldly into the future. Because the coming world will feel like science fiction, will all need to be like science fiction writersimagining the world ahead and positioning ourselves to shape it for the better. The technologies of the future will be radically new but we’ll need to draw on the best of our ancient value systems to use them wisely. The exponential future is coming faster than most of us appreciate or are ready for. Like it or not, we are now all futurists.
The ability to quickly and automatically translate anything you see using a web service is a powerful one, yet few expect much from it other than a tolerable version of a foreign article, menu, or street sign. Shouldn’t this amazing tool be put to better use? It can be, and a company called Lilt is quietly doing so — but crucially, it isn’t even trying to leave the human element behind.
By combining the expertise of human translators with the speed and versatility of automated ones, you get the best of both worlds — and potentially a major business opportunity.
The problem with machine translation, when you really get down to it, is that it’s bad. Sure, it won’t mistake “tomato” for “potato,” but it can’t be trusted to do anything beyond accurately translate the literal meaning of a series of words. In many cases that’s all you need — for instance, on a menu — but for a huge amount of content it simply isn’t good enough.
This is much more than a convenience problem; for many language provides serious professional and personal barriers.
“Information on a huge number of topics is only available in English,” said Lilt co-founder and CEO Spence Green; he encountered this while doing graduate work in the Middle East, simultaneously learning Arabic and the limitations placed on those who didn’t speak English.
Much of this information is not amenable to machine translation, he explained. Imagine if you were expected to operate heavy machinery using instructions run through Google Translate, or perform work in a country where immigration law is not available in your language.
“Books, legal information, voting materials… when quality is required, you need a human in the loop,” he said.
Working on translation projects there and later at Google, where he interned in 2011, Green found himself concerned with how machine translation could improve access to information without degrading it — as most of the systems do.
His realization, which he pursued with co-founder John DeNero, was that machine learning systems worked well not simply as a tool for translation, but as tool for translators. Working in concert with a translation system makes them faster and better at their work, lightening the cognitive load.
The basic idea of Lilt’s tool is that the system provides translations for the next sentence or paragraph, as a reference for structure, tense, idiom, and so on that the translator can consult and, at least potentially, work faster and better. Lilt claims a 5x increase in words per hour translated, and says the results are as good or better than a strictly human translation.
“We published papers — we knew the technology worked. We’d worked with translators and had done some large-scale experiments,” Green said, but the question was how to proceed.
Talk to a big company and get them interested? “We went through this process of realizing that the big companies are really focused on the consumer applications — not anywhere there’s a quality threshold, which is really the entire translation industry,” Green said.
Stay in academic research, get a grant and open-source it? “The money kind of dried up,” Green explained: money was lavishly allocated after 9/11 with the idea of improving intelligence and communication, but a decade later the sense of urgency had departed, and with it much of the grant cash.
Start a company? “We knew the technology was inevitable,” he said. “The question was who would bring it to market.” So they decided it would be them.
Interestingly, a major change in language translation took place around the time they were really getting to work on it. Statistical neural network systems gave way to attention-based ones; these have a natural sort of affinity to efficiently and effectively parsing things like sentences, where each word exists not like a pixel in an image, but is dependent on the words nearby it in a structured way. They basically had to reinvent their core translation system, but it was ultimately for the better.
“These systems have much better fluency — they’re just a better model of language. Second, they learn much faster; you need fewer updates to adapt to a domain,” Green said. That is to say, as far as domains, that the system can quickly accommodate jargon and special rules found in, say, technical writing or real estate law.
Of course, you can’t just sprint into the midst of the translation business, which spans publishing, real-time stuff, technical documents, and a dozen other verticals, and say “here, use AI!”
“There’s enormous structural resistance in the industry to automating in any real way,” Green said. There was no way a major publishing house was going to change the way it worked.
“We tried several business models before we found one that works. There really hasn’t been a company that has decided ‘Okay, this human-in-the-loop method is the fundamental way to solve this problem, let’s just build a company around that.’ So we’re vertically integrated, we work with big enterprises and governments, and we just own the entire translation workflow for them.”
A faster method that doesn’t adversely affect translation quality is basically an efficiency multiplier — catnip for organizations that have a lot of content that needs accurate translation but needs to get the most for their money.
Think about it like this: if you’re a company that puts out products in 20 countries that speak as many languages, translation of packaging, advertising, documentation, and so on is a task that’s essentially never done. The faster and cheaper you can get it done, the better, and if you have a single company that can handle it all, that’s just a cherry on top.
“We work with Zendesk, Snap, Sprinklr… we just take over the whole localization workflow for them. That helps with international go to market.” said Green. If a company’s translation budget and process before using Lilt limited it to targeting 5 or 6 new markets in a given period, that could double or triple for the same price and staff, depending on efficiency gains.
Right now the working on acquiring customers, naturally. “In Q4 last year we built our first sales team,” Green admitted. But initial work with governments especially has been heartening, since they have “more idiosyncratic language needs” and a large volume of text. The 29 languages Lilt supports right now will be 43 by the end of the year. A proofreading feature is in the works to improve the efficiency of editors as well as translators.
They’re also working hard on connecting with academics and building the translation community around Lilt. Academics are both a crucial source of translators and language experts and a major market. A huge majority of scientific literature is only published in English because it would be onerous to translate this highly technical text for others.
Green’s pet peeve seems to be that brilliant researchers are being put to work on boring consumer stuff: “Tech companies are kind of sucking up all the talent and putting them on Assistant or Alexa or something.” It’s a common refrain in frontier tech like AI and robotics.
Finally, Green said, “it’s my great hope that we can close this circle and get into book translation as we go on. It’s less lucrative work but it’s the third part of the vision. If we’re able to, it’s a choice where we’ll feel like we’ve done something meaningful.”
Although it may start out as support documents for apps and random government contracts, the types of content and markets amenable to Lilt’s type of human-in-the-loop process seem likely to only increase. And a future where AI and people work in cooperation is certainly more reassuring than one where humans are replaced. With translation at least, the human touch is nowhere near ready to be excluded.
The Internet is of course amazing if you want to send messages across borders. But different languages can still put a wrinkle in your conversational flow, even with all the handy translation apps also on tap to help turn zut alors into shucks!
So Microsoft -owned SwiftKey is probably still onto something with a new feature launching today in its Android app that bakes two-way translation right into the keyboard — which should save a lot of tedious copy-pasting, at least if you’re frequently conversing across language barriers.
It’s not clear whether the translation feature will be coming to SwiftKey on iOS too (we’ve asked and will update with any additional details).
Microsoft Translator is the underlying technology powering the core linguistic automagic. So SwiftKey’s parent is intimately involved in this feature addition.
Microsoft’s tech does continue to exist in a standalone app form too, though. And that app is getting a cross-promotional push, via the SwiftKey addition, with the company touting an added benefit for users if they install Microsoft Translator — as the keyboard translation feature will then work offline.
(SwiftKey had some 300M active users at the time of its acquisition by Microsoft, three years ago, so the size of that promotional push for Translator is potentially pretty large.)
The translation option is being added to SwiftKey via a relatively recently launched Toolbar that lets users customize the keyboard — such as by adding stickers, location or calendar.
To access the Toolbar (and the various add-ons nested within it) users tap on the ‘+’ in the upper left corner.
With translation enabled, users of the next word predicting keyboard can then switch between input and output languages to turn incoming missives from one of more than 60 languages into another tongue at the tap of a button, as well as translate their outgoing replies back the other way without needing to know how to write in that other language.
Supported languages include Italian, Spanish, Germany, Russian and Turkish, to name a few.
And while the machine translation technology is doing away with the immediate need for human foreign language expertise, there’s at least a chance app users will learn a bit as they go along — i.e. as they watch their words get rendered in another tongue right before their eyes.
As tech magic goes, translation is hard to beat. Even though machine translation can often still be very rough round the edges. But here, for helping with everyday chatting on mobiule messaging apps, there’s no doubt it will be a great help.
Commenting on the new feature in a statement, Colleen Hall, senior product manager at SwiftKey, said: “The integration of Microsoft Translator into SwiftKey is a great, natural fit, enhancing the raft of language-focused features we know our users love to use.”