Monday, August 12, 2013

Obfuscation in Chinese

I recently saw the following status posted on a social networking site:


If you run this through Google Translate, you get gibberish, and rightly so, because it is gibberish. The key is that each character stands for another that sounds the same (or similar).  The plaintext message is:

bìxū qízhì xiānmíngde fǎnduì dòngluàn
"We must aspire to clearly oppose disorder"

This kind of obfuscation seems trivial at first, because any speaker of Chinese can probably see right through it--especially since the poster used characters that both looked similar and sounded similar. But it has an advantage:  Humans can understand it, yet simple search algorithms cannot.

If you needed to send a text message to another human being across a communication network that is monitored by a (potentially) hostile authoritarian government, without the opportunity to exchange keys or encryption algorithms ahead of time, this approach would let you slip past monitoring software that searches for keywords.

But it has a huge disadvantage if it is over-used.  Once the authorities figure out what you are doing, all they have to do is flag messages that use obfuscated versions of sensitive words (like 返怼 for "oppose"), and they will be able to select out the subset of messages that both contain sensitive content and are trying to hide it.

No comments:

Post a Comment