Measuring the Quality of Machine Translation

More and more, organizations are relying on machine translation in order to meet the evolving preferences of global customers. 2020 data from CSA Research demonstrates that 76% of consumers prefer to buy products from websites offering information in their native language, while 40% of consumers will never buy from websites in other languages. Furthermore, 74% of customers are more likely to do business with an organization if support is provided in their native language (Zendesk).

Because hiring a team of translators or multilingual support agents to cover each language that customers may speak is generally cost-prohibitive and difficult to scale, it’s no surprise that organizations with global customer bases are turning to machine translation to meet this growing demand for multilingual content. But though machine translation is becoming more widespread, concerns about the quality and accuracy of its output have existed since it was first introduced–in many cases, for good reason.

We’ve all seen our fair share of translation “fails” circulating around the web. These translation errors can range from amusing–think of menu incidents such as “jerk chicken” being translated into “chicken rude and unreasonable”–to downright catastrophic, as HSBC Bank discovered when their 2009 tagline of “Assume Nothing” was mistranslated into “Do Nothing” in several countries. Such a simple error cost the bank $10 million in rebranding costs that could have been avoided had a human translator been consulted.

Understandably, incidents like these give organizations pause when employing machine translation, and have resulted in a need to reliably measure the quality of machine translation output. But without having a team of human translators to cover fluency across every possible language spoken by a customer, how can businesses verify the quality and accuracy of machine translation?

Measuring Machine Translation Quality

There are a number of quality estimation frameworks used to evaluate the quality of machine translation output. These include BLEU (Bilingual Evaluation Understudy), WER (word error rate), METEOR (Metric for Evaluation of Translation with Explicit ORdering), and many more. Each of these models differ in their exact means of scoring a translation output, but each are used to evaluate how well machine translation engines perform on various aspects of language understanding.

One common theme across many of these quality estimation methods is their accuracy at the corpus level (that is, on a larger body of text) vs. at the sentence level (shorter content, more conversational). These methods generally tend to perform at a much higher level when analyzing larger amounts of text, and may become less accurate when analyzing a shorter piece of content.

To give an example of how small linguistic differences may not be evaluated correctly: If a machine translated sentence were to read “we like to go at school,” anyone reading this sentence would be able to understand the general sentiment–that the speakers like to go to school. But if the translation were to read “we like to go to building,” a human interpreting this statement would have a much more difficult time. Any framework that hinges itself on calculating the number or percentage of erroneous words in a sentence would generate the same ‘score’ for those two translations, despite the fact that one’s meaning is clear and the other’s is not.

This makes it difficult to rely on automated quality estimation to make decisions about how to handle machine translations of conversational content. An automated quality estimate may say that an agent’s translated response is fine, but in reality, incorrect word choices may make the agent’s response look unprofessional. On the flip side, a model might say that a translated agent response is low quality, when in reality it’s perfectly fine. This may cause a company to incur additional cost to fix that translation, maybe by sending the translation to be corrected by a human, based on a false positive result.

So if these quality estimation frameworks are limited in their capability to analyze translation quality on real-time customer conversations, how can organizations use other methods to help improve translation quality on that type of content?

Improving Quality on Conversational Content

Customer service teams ultimately care about helping customers find solutions and making sure that their responses sound professional no matter what language they are being translated into. Machine translation helps organizations service customers in new geographies faster than ever, but may require some assistance to get the context right.

Language I/O’s Glossary Imposition technology, for example, helps make sure that the right context is applied to the right terms, especially when those terms may be tricky to translate. This helps companies to leverage the speed and cost advantages of machine translation, while increasing the support agent and customer’s understanding of one another.

Returning to the example from before, applying a glossary to an agent’s translated reply can help make sure that “school” isn’t translated to the equivalent of “building.” This small change can have a big impact on the customer’s perception of the support interaction, and help avoid adding confusion to the conversation.

Evaluating Machine Translation Quality at Your Organization

If your organization uses or plans to implement machine translation as a means of enabling your monolingual support team to converse with your global customer base, then your overall goal is to provide excellent customer support regardless of languages spoken and, ultimately, boost customer satisfaction. Relying on automated estimation frameworks to make decisions on how to handle conversational translations may not help that cause, and may even hurt further if incorrect estimates lead to higher translation costs.

Quality estimation methods are always evolving, and as a result, Language I/O is always evaluating how to best apply these technologies. Our focus always has and always will be on finding the most optimal ways to achieve better business outcomes on conversational content.Is your organization looking for a machine translation solution to support your global customer base? Contact Language I/O to learn more about our unique approach to breaking through language barriers.

Measuring the Quality of Your Machine Translation

Measuring Machine Translation Quality

Improving Quality on Conversational Content

Evaluating Machine Translation Quality at Your Organization