Data: Machine and Professional Human Translations Identical in 5 - 20% cases
Click on the chart to enlarge.
Our latest data shows to what extent generic machine translation (MT) engines can help professional translators. Already 5 - 20% of the suggestions from MT are good enough to simply use them as final translation without any changes. Up to 40% of the suggestions are okay after edits, and for 80% of segments MT can provide data to autocomplete.
Memsource is a translation tool for professional translators and translation teams. Users can add a machine translation engine to provide quick suggestions as they go through the text segment by segment.
This year we added analytics suite Kibana (learn how to set it up here) that allows users to visualize their data.
With the new analytics suite we were able to measure exactly how much MT helps humans translate. In the chart above:
- Match 100: segments where the professional human translation is identical to the MT suggestion
- Match 85 - 95: MT suggestions are close enough to use after edits
- Match 50 - 75: MT is useful for autocompletion of individual words, but not whole segments
- Match 0 represent segments with 0-49 correlation to human translation.
For this chart we could only track projects where users first enabled machine translation, and then performed a post-editing analysis. The resulting sample is about 38 million words in size. It is statistically representative, but of course it is quite small in comparison to the total amount of words translated in Memsource, which is currently 600-800 million per month.
MT Works Better with English, French, Spanish and Portuguese
French, Portuguese, Spanish and English machine translation engines have the highest rates of potential MT leverage. English to French stands out with more than 20% of the translations a complete match to MT suggestions, and almost 90% of segments having at least some coherence with the MT.
In comparison, Russian, Polish and Korean have a much lower leverage rates, below 40% or even 20% fuzzy matches and 5% complete matches.
The difference is probably due to the morphological typology of the languages. French, Portuguese, Spanish and English are analytic languages which rely on word order and auxiliary words such as “are” or “will” to convey meaning. Russian, Polish and Korean are synthetic, which means they use inflections much more. Machine translation is still struggling with that.
Should You Use MT?
Not everyone is a fan of machine translation. Translators and language services companies believe it sometimes compromises quality, or interferes with the mental focus and self-discipline of the linguist which are necessary to deliver a perfect translation.
However, online conversation today calls for a fast-paced translation. Customers need faster turnaround times more than than they need high quality. “Done is better than perfect”, - says the famous slogan of Facebook. At the same time, MT accuracy constantly improves. It’s not yet human-level, but it is getting there. Hence more and more arguments for enabling MT. This year, Memsource users added machine translation support to 31.8% of the projects.
How Post-Editing Machine Translation Works in Memsource
A window of Memsource Web Editor with suggestions from translation memory, terminology base and machine translation on the right. Click to enlarge.
While translating in Memsource linguists can choose between suggestions from a machine translation engine, a translation memory, or a terminology base: whichever fits best. It is generally advisable to reuse a past translation from the translation memory. Memory makes language uniform throughout the document, which is great for contracts, interfaces, support and technical documentation, manuals. However, when translation memory suggestions are not accurate or not available, MT can help.
A translator inserts a machine translation suggestion fast with a hotkey combination, for example Ctrl + 3 in the example above. The translator can then either confirm the raw MT suggestion, or first edit it.
This process is called “interactive MT post-editing”, and it increases productivity significantly. Translators don’t have to type sentences from scratch, and save time and effort. Interactive post-editing differs from “classic post-editing”, because linguists are not obliged to use MT suggestions. If a suggestion is too far off, they can simply ignore it. Translators commonly utilise Google Translate or Microsoft Translator, which are generic engines. But there is also an option to work with customised engines fine-tuned to the terminology of a specific customer.
Final Word: On the Big Data series
This article opens up “Translation Big Data” post series. We plan to publish more useful statistics about the translation industry based on anonymized data from Memsource Cloud.
Some of the world’s largest translation companies and in-house translation departments use Memsource every day to deliver translations. The volume of documents processed in the system has increased from 100 million words per month throughout 2015 to 0.8 billion words in May 2016. This already represents a noticeable slice of the market, and the volumes and the use base grow constantly.
We hope that the data in these posts will offer our users and readers valuable insight into the trends in the industry. If you have a suggestion for a chart or an indicator, please contact Konstantin Dranch.
About the author
Konstantin Dranch is the former head of marketing in Memsource.
His background is in market research. Konstantin surveys translation services markets in Russia and Ukraine since 2011, and has recently started similar surveys in the United Kingdom and France.