Post-editing Machine Translation Best Practices
Machine translation (MT) is continuously improving by becoming faster, cheaper, and more accurate. Yet it still has some way to go before it achieves parity with human translators. Post-editing machine translation (PEMT) helps combine the best of both worlds: the speed and ability of MT engines to quickly handle large volumes of text, with the skill and sensitivity of trained linguists. Read on to find helpful tips for implementing post-editing in your translation workflows.
Looking for a more general introduction to machine translation? Check out our Beginner’s Guide to Machine Translation.
Set yourself up for success
Perhaps the most important step in post-editing happens right at the start: making sure that you have a high-quality MT output. The better the original engine output, the less work will be required during the post-editing step, leading to faster turnaround times at a lower cost. Here are two points to consider:
1. Start with the Source Text
As with all translation projects, the source text should be carefully created or pre-edited to ensure that both the post-editors and the MT engine are working with the best. Early errors can compound and create problems down the line. That’s why it’s important to make sure that the original text has as few spelling and grammatical errors as possible. The terminology and formatting should be consistent. The source text should, at the very least, be prepared as if it was going to be handled by a human translator.
For the best results you should consider preparing your text specifically for machine translation. Although MT is rapidly improving, there are certain steps you can take to increase output quality.
Generally, MT works best with input that is clear and concise. An ideal sentence should be under 20 words and have simple grammar. Complex sentences or headline-style sentence fragments do not work well.
MT also tends to struggle with nuance and likes to be as literal as possible. Avoid sarcasm (machines are really good at picking it up), avoid double-negatives (they won’t do no good), and where possible have the dates in a non-numeric format. 01/05/2020 can lead to some ambiguity: is the 1st of May or the 5th of January?
For other useful guidelines, consider having a look at IBM’s Machine translation tips.
2. Choose the Optimal Translation Engine
There is a large number of MT engines to choose from and new engines are being developed all the time. Not all engines are created equal, some simply perform better than others. Some are more suitable for specific language pairs or subject matter (domains). Choosing the most effective engine for your project can save a lot of time and effort.
Consider a range of generic MT engines and evaluate them using samples or past experience. Although creating custom analyses can be a time-consuming option, it can lead to cost-effective solutions in the long run. Another option is to consider a customized translation engine, trained using your own data. This will generally produce high-quality results for the content that you are used to working with.
Memsource Translate offers an interesting solution to the problem of MT choice. This feature was developed by the AI team at Memsource to dynamically select the best MT engine for your content. It considers the text’s domain, source and target language, and looks for an optimal MT engine based on past performance. It will always automatically select the best engine for your content. Find out more about Memsource Translate here.
Once the MT engine produces its translation, post-editing can begin. How much post-editing is required varies from project to project, so it is important to always define your expectations early. The three main considerations are time, quality, and cost. Build your post-editing strategy around this by choosing the right approach.
Light Post-editing (LPE)
With LPE, raw MT is only modified where absolutely necessary to ensure that the output is legible and accurately conveys the meaning of the source document. The post-editor should be especially mindful of errors that might hinder the document’s purpose or outright subvert it. Without review, raw MT can create embarrassing results, as one tech giant discovered recently. The editor should aim to make as few edits as possible. This approach can lead to fast and cost-effective results.
Full Post-editing (FPE)
With FPE, raw MT is thoroughly reviewed and modified to ensure that there are no errors whatsoever. Where LPE focuses on the bare essentials of accuracy and legibility, FPE considers a number of factors, including but not limited to:
- stylistic and tonal consistency within the document (and with other appropriate documents)
- the absence of all grammatical errors
- appropriate cultural adjustments for the target language (such as idiomatic expressions)
A document that has gone through FPE should convince its reader that it was originally written in the target language. This approach is slower and more expensive than LPE, but will achieve a high-quality output.
Requirements should always be tailored to the specific translation project. It can be useful to think of LPE and FPE as being on a spectrum rather than a binary choice. Pick and choose your post-editing priorities based on considerations of time, cost, and desired quality. It can be effective to selectively prioritize certain segments that have a higher business value than others.
Another option to consider is bypassing post-editing completely. For certain projects this is possible, for example with internal documents where MT output is expected to be good and the consequences of bad translation are negligible.
To help with post-editing, both editors and managers should be aware of the various tools that can help. Virtually all CAT platforms now offer support for post-editing. Here are a few tools to keep in mind:
- Set up terminology management systems to help post-editors ensure consistency. This includes translation memories, term bases, and any useful reference documents. These should be kept up-to-date for future projects.
- Use Machine Translation Quality Estimation (MTQE). In the Memsource Editor this AI-powered feature can provide quality scores for all MT output, giving editors an indication of how much work is required for each specific segment. This can help linguists and project managers identify segments that they should prioritize for editing.
- Use a QA tool, either integrated or standalone. These will help dynamically identify any issues in the original output that were overlooked or new errors that were introduced during the post-editing step.
Post-editing Training and Qualifications
It is important to address perhaps the largest misconception with PEMT: the post-editor is not a translator. Though there is some overlap in the nature of the work they perform, the exact skills required differ in a few important ways. In 2017, ISO 18587 defined some of the key aspects of PEMT, including the specific skills and competencies of the post-editor.
Best results are always obtained with qualified and experienced post-editors. An ideal post-editor will have practical experience working in the specified language pairs, with the content, and with the relevant tools. It can be helpful to train post-editors for specific tasks before they start on a project. Formal training courses are now available, such as TAUS’ Post-Editing Course.
To help improve the results of PEMT, it is important to continuously evaluate the process and results using data and feedback. Consider post-editing to be an iterative process that can be improved with time and experience.
A range of tools are available to help with post-editing analysis. For example, the Memsource Editor can help calculate post-editing effort, which shows how much work was required by linguists to finalize the translation. This information can be as granular as required. Knowing, for example, that certain segments require a disproportionate amount of post-editing can help future projects: perhaps the source text can be adjusted, or post-editors can be provided with useful reference documents.
Besides gathering data, it is invaluable to proactively seek out feedback from all the key stakeholders. This can include the content creators, post-editors, clients, and project managers. Ask them about their experience with the project and identify what worked and what can be improved.
Post-editing is a new process that is still defining and improving itself. If you are interested in pursuing post-editing and machine translation further, consider reading our e-book which provides more practical information and case studies that can help you and your business.