Overcoming the Challenges and Security Risks of Translating User Generated Content

Every day, customer support representatives face a host of challenges. Between working with unhappy customers, navigating several different tools and technologies to provide support, and managing multiple conversations at once, the last thing a customer support agent needs is another complication to add to the mix. Unfortunately, for businesses with a global customer base, accurately translating user generated content is just another challenge requiring a solution.

As its name suggests, user generated content (UGC) is any content created by the user of a product or solution. This can include reviews left on product listings or even submissions to brand-sponsored contests, such as the Starbucks White Cup Contest. While those are highly-coveted examples of UGC, not all instances of UGC are promotional or even public. Submitted support tickets or requests for help via chat are also considered UGC, and unlike the aforementioned public-facing formats, require real-time responses.

For brands who rely on machine translation to service multilingual customer bases, this presents a set of challenges.

Challenges of Translating User Generated Content

Machine translation (MT) technology is built on the foundation of well-written, structured content. For the world’s top MT engines to work most effectively, content submitted for translation should contain full sentences and be free of syntax errors or misspellings. 

Customer service representatives know that expecting this from user-submitted chats and tickets is, at best, wishful thinking.

Unlike professionally written content, UGC is often riddled with grammatical errors, slang, acronyms, and industry-specific jargon–all things that are, simply put, the bane of machine translation’s existence. While humans can fairly easily deduce what a misspelled word is supposed to say or parse together a grammatically incorrect sentence, the same cannot reliably be said for machines. 

As an example of this, let’s look at the gaming industry. Video and tabletop games alike tend to come with their own universes, characters, and sometimes even invented languages. Simply put, this means that a lot of the words used in games are completely made up. Understandably, traditional machine translation is almost certainly going to fail to accurately translate these invented words.

For games where online support is required, these fabricated words pose a major threat to proper translation, as seen in this example of a Chinese-speaking Magic the Gathering player requesting support from an English-speaking agent.

In this example, “毁世奥札奇牌” is the completely made up name of an in-game card; there is no literal translation for it. An engine like Google Translate (as demonstrated on the left) fails to properly translate this term to its equally made up English name of Eldrazi Devastator. As a result, the English-speaking support agent would have no context for what card isn’t working properly.

Furthermore, the original message includes slang: “瞎菜了” is an expression to communicate confusion, but literally translates to “blind dishes.” Without that context, an agent who doesn’t speak Chinese has no ability to parse its meaning. Similarly, Google Translate fails to properly translate the recently invented Chinese acronym “gkd,” which literally means “do it now.”

The ideal solution for this particular conundrum is either to have a Chinese-speaking agent (or team of agents) available to speak with Chinese-speaking customers, or to have an English- and Chinese-speaking human translator available for instances of tricky syntax. But hiring human translators or multilingual support teams to cover conversational text is rarely viable. 

Imagine having to staff a team that is available at every hour your customers may be contacting support, and that covers every possible language that may arise. The cost to fund that team alone makes this idea a non-starter for the vast majority of organizations.

Security Risks of Translating User Generated Content

UGC submitted via chat or support ticket doesn’t only lead to translation issues, but it also can pose security risks.

Conversations with customer support often include sensitive or personally identifiable information (PII), including a customer’s full name, date of birth, home address, login credentials, and credit card information. How companies handle this data is of critical importance, not just to preserve the privacy of their customers, but also to ensure compliance with legal requirements.

Running sensitive information through a free engine like Google Translate poses a threat to privacy due to Google’s terms, which states that Google has the right to “save […] content on [its] systems” and use submitted content to influence its algorithm.

Similarly, any solution that relies on training a neural machine translation (NMT) model is going to hold onto content that is submitted for translation–including personal data–to continually train and improve its translation quality. As a result, data is stored in log files and databases–in other words, it’s essentially sitting there just waiting for a breach.   

With so many concerns related to translation quality and security, the question becomes: how do organizations with global customer bases reliably service their multilingual customers in real time?

Effectively and Securely Translating User Generated Content

Proper, secure machine translation of UGC requires the following:

  • Access to the world’s best machine translation engines; having multiple engines is important, as some MT engines handle certain language pairs better than others
  • A comprehensive translation glossary that identifies jargon, slang, and acronyms and instructs the technology on how to handle these terms
  • Verified data flow and encryption practices that ensure personally identifying information is not stored in log files or in the database

At Language I/O, we have built our technology to align with those three requirements. Our solution differs from other MT tools in that we don’t build a unique NMT model for every customer, language, or industry; instead, we aggregate the leading NMT engines and layer a customizable real-time translation glossary on top of it. Because we don’t use a new NMT model for every customer, we don’t have to store and use data to improve the quality of our translations–meaning that personally identifiable information shared by customers is never stored in our database, log files, or elsewhere.

Want to learn more about how Language I/O’s unique solution can take your support team’s global communications to the next level? Contact us to speak with a specialist or see a demo.