Factoring Humans into the Machine Translation Equation
(Originally published May 2012)
Why is it necessary to post-edit?
There are several reasons that the “raw” output from machine translation needs review and post-editing by a qualified linguist before publication. Machine translation (MT) systems use automated formulas to render the translation. Our use of language is so widely varied that these formulas cannot capture it all, so machine translation systems make mistakes which the systems themselves cannot currently detect. Finally, as with human translation, it is important for a qualified translator/editor to look at the final translation with “the eyes of the client” to ensure that it has been rendered accurately, with the style, terminology and other preferences of the client.
A typical workflow including MT?
When introducing machine translation, if the client has an archive of previously translated documents, for example stored in a translation memory (TM) database, then MT can be used together with the TMs to produce draft translations. This means that all of the client’s approved translations and terminology are available and leveraged for productivity and quality first. For each sentence, if there is an approved translation already available for that sentence, it will be used as-is for the draft translation. If there is a very similar sentence that can be leveraged, that sentence will be considered for revision. Otherwise, the sentence will be machine translated to produce a draft. The resulting translation will then be provided to a linguist who can post-edit it as a whole text. From there, it is reviewed and polished to the quality level required by an editor.
The near-term payoff of MT post-editing?
For the right types of texts, even in the course of a single project, MT and post-editing pay off with shorter turnaround and very cost-effective translation. Machine translation produces the best results with more factual, objective texts and relatively short sentences. User manuals are a classic example. RFPs and technical proposals are often good candidates as well.
The longer term payoff?
When machine translation is used over time – for many projects, or for a very large project – there are many more benefits.
The approved post-edited translations are added to the translation memory, and increase the number of approved translated sentences that can be reused on future projects.
The post-editing process provides the opportunity to choose appropriate translations for terminology in the document context. The approved terminology is added back into the machine translation system as ongoing customization. After the project is completed and approved, just as with conventional translation, the post-edited machine translation can be added to the translation memory database if it is of sufficiently high quality. In this way, the terminology decisions and other client preferences are available for reference and reuse via the translation memory.
It sometimes happens that organizations decide to adopt machine translation for types of materials that they have not translated before. For this reason, machine translation projects may result in starting a new translation memory archive. The MT and post-editing process can enable rapid accumulation of translation memory for reuse, which will further accelerate the translation process as more text is available.
Whatever the process or circumstances, it is absolutely critical that quality be proactively managed from beginning to end. It is easy to end up with low MT output quality if the MT system is not trained with good, clean data. Likewise, MT-produced translations, whether raw or post-edited, saved to translation memories without the right level of quality can produce very low quality, hard to maintain translation memories full of unusable or barely usable translation units. Factoring trained linguists into the MT equation and allowing your LSP to effectively manage this process will allow you to reap the benefits of using MT without the pitfalls that can be costly and damaging to your organization’s brand and reputation.
In the “Machine Translation Guide and Tips,” article on machine translation, we highlighted the importance of customizing machine translation in order to get high quality output that can be cost-effectively post-edited. There are several types of machine translation, two of which (statistical MT and hybrid MT) use statistical algorithms to learn automatically from your previously translated texts. The algorithms learn your terminology and style as it is captured in your approved translations. With this approach, post-edited translations may be used directly for customization of an MT system, incrementally adding your improvements to the system.
Whether you use statistical MT, rule-based MT or a hybrid, customization of the system is the key to quality drafts and cost-effective post-editing. And in all cases, customization is about terminology – the words and phrases used in and by your company and industry. Inappropriate use of terminology or a failure to use precise industry-specific terms undermines credibility within that industry.
Even if your company has not invested in a terminology management or terminology definition effort, you still have a rich lexicon of terminology in your documentation. Getting that terminology into the machine translation system, either by a dictionary-based customization effort or by statistical learning, is the primary goal of MT customization.
If you embark on using machine translation and already have multilingual glossaries or terminology lists, you have much of the customization problem solved! It is a relatively straightforward process to import the terminology into the MT system and test it. If you have English-only glossaries, this is also a significant advantage, as there is a defined list of words and phrases to use as the basis for a multilingual glossary.
Even if you simply have the texts that have to be translated, the post-editing process provides a context for making decisions about terminology – which words and phrases constitute terms, and what their official translations are in the target language. In Syntes’ MT translation process, each of these decisions is saved and incorporated into the machine translation system, so that terms are translated consistently as each client needs them translated .
So, by factoring humans into the machine translation equation, it is a win-win combination. You benefit from the efficiency of MT while leveraging the skills of qualified linguists to determine what materials are viable candidates for automated translation, to develop quality multilingual glossaries, to post-edit the MT output, and offer the best long-term solution of developing an MT system specifically customized to meet your communication needs in other languages. Contact us today for information or questions.