David's Desk: Machine Translation Post-editing Revisited
It’s been almost six years since we launched Post-editing Analysis, a somewhat revolutionary feature back in 2011 when it was first introduced. Post-editing Analysis measures the post-editing effort based on an edit distance (the machine translation output vs. the final post-edited translation) and introduces the concept of “machine translation matches” that work very much like “translation memory matches”.
In Memsource, we extended the widely used translation memory matches to machine translation and combined the results in the Post-editing Analysis. This similarity to the well-established translation memory matches was one of the main reasons why it was so well-received by our users and why some translation software developers have adopted this approach as well.
Here is a video from 2012 that shows the main functionality of the Post-editing Analysis (by the way, I produced the video myself - at that time we had a staff of 5 and obviously didn’t have a marketing department!):
The post-editing analysis proved to be extremely useful, not only to establish machine translation post-editing efforts, but also to capture translation memory matches in projects on which multiple translators collaborated.
The standard practice introduced, along with desktop translation tools to analyze translation memory matches before translation, does not work in a scenario where translators contribute to translation memory in real time. As a result, it’s not known if Translator A will contribute a translation to a translation memory and Translator B will then re-use that translation or vice versa. By adding the post-editing analysis feature, it helped project managers establish who translated what and determine the impact on billing.
Freelance Translators as Early Adopters of Machine Translation
Another interesting by-product of the Post-editing Analysis was the ability to get machine translation a bit more under control. At some point, translation providers and translation buyers realized that some of the more innovative and tech-friendly translators (kudos to them) had decided to start using machine translation as a way to increase their productivity.
Given the prevalence of desktop translation tools at that time, there was almost no way for a translation agency or translation buyer to actually find out if MT is used by some of their translators or not. The adoption of machine translation into the professional translation process was a lengthy process with a number of technical, process, financial, and other pitfalls; however, to adopt MT as a freelance translator was remarkably easy.
This created a bit of a paradox: there were a number of discussions - mainly theoretical - at numerous industry conferences asking if/how to adopt machine translation. At the same time, many innovative freelance translators jumped on the MT bandwagon and just started using it as an additional support tool that greatly increased their translation productivity.
Here is a screenshot from a 2012 webinar I did on machine translation with (mainly) translation agencies and translation buyers confirming they mostly did not know whether their translators used MT or not when translating for them:
Machine translation has moved from being an academic topic at conferences to being a major element of translation production. There are an ever growing number of machine translation providers to choose from. While Microsoft and Google may win on scale, smaller providers focus on customization and professional services. At Memsource, we have been adding new MT providers on a quarterly basis and have seen a surge of custom-trained machine translation usage over the last couple of years. We have seen some major translation providers adopt the post-editing analysis as the default input data for their billing.
Additionally, drawing on customer feedback, we’ve implemented a few improvements to the Post-editing Analysis such as an option to include translation memory post-editing into the overall results (when translation memory results are not accurate, they need to be edited and the translator should be paid for that). We have also made some technical and security changes to how the post-editing data is stored, so that it is tamper-proof and can be collected when working offline.
I feel like we are still at the beginning in many ways and there is a lot to come, including from Memsource. Stay tuned…
David Čaněk is the CEO and Head of Product at Memsource.