This post in Krebs on Security describes an unusual and potentially very dangerous attack technique that can be used to sneak evil code past code reviews and into the supply chain. Briefly, it allows evildoers to write code that looks very different to a human and a compiler. It should probably come as no surprise that it involves Unicode, the same coding standard that lets you make blog posts that include inline emoji, or mix text in English and Arabic.

In particular, it's the latter ability that the vulnerability targets, specifically Unicode's "Bidi" algorithm for presenting a mix of left-to-right and right-to-left text. (Read the Bidi article for details and examples -- I'm not going to try plopping random text in languages I don't know into the middle of a blog post.)

Now go read the "Trojan Source Attacks" website, and the associated paper [PDF] and GitHub repo. Observe, in particular, the Warning about bidirectional Unicode text that GitHub now attaches to files like this one in C++. Observe also that GitHub does not flag files that, for example, mix homoglyphs like "H" (the usual ASCII version) and "Н" (the similar-looking Cyrillic letter that sounds like "N"; how similar it looks depends on what font your browser is using). If you're unlucky, you might have clicked on a URL containing one or more of these, that took you someplace unexpected and almost certainly malicious.

The Trojan Source attack works by making use of the control characters U+202B RIGHT-TO-LEFT EMBEDDING (RLE) and U+202A LEFT-TO-RIGHT EMBEDDING (LRE), which change the base direction explicitly.

And remember: ШYSINAШYG - What You See Is Not Always What You've Got!

Resources