Memsource Translate: Interview with Dalibor Frívaldský, Memsource Chief Technology Officer
Memsource released the newest version of Memsource Translate, a machine translation management solution that is set to change the way you approach MT. To learn more about this game-changing feature, we asked Memsource’s CTO, Dalibor Frívaldský, some questions to learn exactly what Memsource Translate is, how it works, and why you should try it for yourself.
Q: Why did Memsource decide to create Memsource Translate and what problem does it solve?
A: Machine translation has been integrated with Memsource for quite some time. At the moment we provide integrations with over 30 MT engines and we’re adding a new one almost every month. The MT space is therefore quite fragmented and it’s become very challenging for organizations to evaluate all the options and make an optimal choice. In addition, performance of these MT engines can differ not only across language pairs, but also for different domains in a language pair. The different MT systems also evolve over time, so any evaluation can become obsolete within a year.
We wanted to make using MT technology as simple as possible, without the need to go through the complex process of choosing a single MT provider. Memsource Translate tries to solve this problem by picking the optimal MT engine for each individual document you are translating, based on the language combination and domain of the document
Q: When should you use Memsource Translate?
A: Always! But more seriously, it doesn’t matter if you are just starting with MT or you already have it integrated into your workflows. Memsource Translate allows you to step into the MT world without the need to evaluate all the options out there. You can take advantage of the know-how accumulated by the algorithm over time and work with the most optimal MT engines right away. If you’re already an MT user, you are likely using only one provider. As data shows, that is not an optimal solution if you are working with multiple language pairs or domains. Memsource Translate will understand your content and pick a better MT engine where appropriate.
Q: What’s new in the latest version of Memsource Translate?
A: The new version brings four important innovations.
- The MT engine is selected for each individual document, not just for the entire language pair of a project.
- The algorithm has improved - the system learns about the performance of different MT engines for each language pair and domain of the text continuously, in real time.
- Domain identification - initially for English source text, but soon for top 10 source languages in Memsource, we can automatically detect and categorize the content into 11 domains, such as Legal, Industrial, Software documentation and so on. This allows the system to work on lower granularity.
- Support for customizable MT engines has been added as well. With a custom MT engine, we will evaluate it only on your content and compare it to the other MT engines. If it does indeed perform better than the other solutions, it will become the most recommended MT engine for you.
Q: How does the optimal machine translation selection work?
A: Each document that gets post-edited provides feedback to the system - how well did the MT engine perform. The algorithm behind Memsource Translate learns from this data and takes it into account when next recommending an MT engine for a document. The data is collected in real-time, so the recommendation for your next job can already take advantage of feedback from the translation you just finished. The algorithm needs around 50-100 documents to produce a reliable estimate. Memsource Translate has already processed over 22,000 documents and that number is growing continuously. The more feedback data we have, the more confident the algorithm is that the recommendation is truly the optimal one.
At the same time, we know that MT engine quality evolves over time. When we get informed by an MT provider about a new release of their systems that increase the quality, we can nudge the algorithm to explore the MT engine again. But even if that doesn’t happen, the algorithm takes into account feedback data only from the last 6 months, so any change in quality will eventually be noticed and taken into account for future recommendations.
We can illustrate the behavior of the learning algorithm with the chart above. In one specific language combination and domain, Amazon Translate is providing the optimal performance compared to the other two MT engines (just a note, Memsource Translate now integrates with more MT systems, the data is taken from a pilot run). With the increased amount of feedback gathered over time (period of roughly two weeks in this case), the algorithm learned about the performance of the optimal MT system and became more confident in this knowledge, recommending the MT system predominantly for new documents created for this combination of languages and domain.
Q: How easy is it to use Memsource Translate?
A: Very easy. Out of the box, you will get access to three MT engines without needing to set up anything apart from having a Memsource account. However, the more MT engines you use, the better. For generic MT engines that don’t come enabled out of the box, you will only be required to provide your API keys or credentials (depending on how the MT engine authorizes the requests). These will then automatically become part of the set from which Memsource Translate recommends the optimal engine for each document.
Q: How does a user know whether the optimal best machine translation engine is being used?
A: Users can get a general overview of engine performance through our Machine Translation reports, which cover engine performance on a quarterly basis.
While not part of the initial release, we are looking into other ways to let the users know which MT engine was picked for each individual document, how much better is it for this particular language combination and domain than the other MT engines the user has enabled, and also potentially let the user know how much better the performance can be by enabling additional MT engines on the platform
Q: Is the data shared with MT providers?
A: Users don’t have to be concerned about Memsource sharing their data. Memsource Translate does not send any post-edits to the MT providers, all the recommendation and evaluation logic happens within the Memsource environment. If you’re also concerned about sending any source text to an MT provider to get a machine translation, you can disable such provider in Memsource Translate and it will not be recommended for any new jobs you upload.
Memsource Translate is available for all Memsource users. Learn how easy it is to start using it here.