Meltdown and the Spectre of Speculation

TL;DR: Patch your computer NOW! (Or as soon as you can, if you're running Windows or Ubuntu and reading this on Monday -- the official release date for this information was supposed to have been Tuesday January 9th.)

Unless you've been hiding under a rock all weekend, you probably know that Meltdown and Spectre have nothing to do with either nuclear powerplants or shady investments: they are, instead, recently-revealed, dangerous design flaws in almost all recent computers. Meltdown affects primarily Intel processors (i.e. most desktops, laptops, and servers), and will be mitigated (Don't you just love that word? It doesn't mean "fixed", it means "made less severe". That's accurate.) by the recent patches to Linux, Windows, and MacOS. Spectre is harder to exploit, but also harder to fix, and may well present serious problems going forward.

But what the heck are they? I'm going to try to explain that in terms a non-geek can understand. Geeks can find the rest of the details in the links, if they haven't already chased them down themselves. (And if you're in software or IT and you haven't, you haven't been paying attention.)

Briefly, these bugs are hardware design problems that allow programs to get at information belonging to other programs. In the case of Meltdown, the other program is the operating system; with Spectre, it's other application programs. The information at risk includes things like passwords, credit card and bank account numbers, and cryptographic keys. Scared yet?

Basically, it all comes down to something called "speculative execution", which means something like "getting stuff done ahead of time just in case it's needed." And carefully putting things back the way they were if it turned out you didn't. That's where it gets tricky.

Modern computers are superscalar, which means that they achieve a lot of their impressive speed by doing more than one operation at once, and playing fast-and-loose with the order they do them in when it doesn't matter. Sometimes they make tests (like, "is this number greater than zero?", or "is that a location the program doesn't have permission to read?"), and do something different depending on the result. That's called a "branch", because the program can take either of two paths.

But if the computer is merrily going along executing instructions before it needs their results, it doesn't know which path to take. So, in the case of Spectre, it speculates that it's going to be the same path as last time. If it guesses wrong (and Spectre makes sure that it will by going down the safe path first), the computer will get an instruction or two down the wrong path before it has to turn back and throw away any results it got. Spectre makes it do something with those results that leaves a trace.

In the case of Meltdown, the test that's going down the wrong path is to see whether the program is trying to read from memory that belongs to the operating system kernel -- that's the part of the OS that's always there, managing resources like memory and files, creating and scheduling processes, and keeping programs from getting into places where they aren't permitted. (There's a lot of information in the kernel's memory, including personal data and passwords; for this discussion you just need to know that leaking it would be BAD.) When this happens, the memory-management hardware interrupts the program before it receives its ill-gotten data; normally the result is that the program is killed. End of story. On Intel processors, though, there's a way the program can say something like "if this instruction causes an interrupt, just pretend it never happened." The illegally-loaded data is, of course, thrown away.

Meltdown works because the operating system's memory is -- or was -- part of the same "address space" as the application program. The application can try to read the kernel's memory; it just gets stopped if it tries. After Tuesday's patch, the two address spaces are going to be completely separate, so the program can't even try -- the kernel's address space simply isn't there. (There's a performance hit, because switching between the two address spaces takes time -- that's why they were together in the first place.)

At this point you know what Spectre and Meltdown do, but you may be wondering how they manage to look at data that simply isn't there any more, because the instruction that loaded it was canceled. (If you're not wondering that, you can stop here.) The key is in the phrase "any more". During the brief time when the data is there, the attacker can do something with it that can still be detected later. The simplest way is by warming the cache.

Suppose you go out to your car on an icy morning and the hood feels warm. Maybe one of the local hoodlums took it out for a joyride, or maybe one of the neighbor's cows was sitting on it. You can tell which it was by starting the engine and seeing whether it's already warmed up. (We're assuming that the cow doesn't know how to hotwire a car.) The attack program does almost the same thing.

The computer's CPU (Central Processing Unit) chip is really fast. It can execute an instruction in less than a nanosecond. Memory, on the other hand, is comparatively slow, in part because it's not part of the CPU chip -- electrical signals travel at pretty close to the speed of light, which is roughly a foot per nanosecond. There's also some additional hardware in the way (including the protection stuff that Meltdown is sneaking past), which slows things down even further. We can get into page tables another time.

The solution is for the CPU to load more memory than it needs and stash (or cache) it away in very fast memory that it can get to quickly, on the very sensible grounds that if it needs something from location X now, it's probably going to want the data at X+1 or somewhere else in the neighborhood pretty soon. The cache is divided into chunks called "lines" that are all loaded into the cache together. (Main memory is divided into "pages", but as I mentioned in the previous paragraph that's another story.)

When it starts a load operation, the first thing the CPU does is check to see whether the data it's loading is in the cache. If it is, that's great. Otherwise the computer has to go load it and the other bytes in the cache line from wherever it is in main memory, "warming up" the cache line in the process so that the next access will be fast. (If it turns out not to be anyplace the program has access to, we get the kind of "illegal access exception" that Meltdown takes advantage of.)

The point is, it takes a lot longer to load data if it's not in the cache. If one of the instructions that got thrown away loaded data that wasn't in the cache, that cache line will still be warm and it will take less time to load data from it. So one thing the attack program can do is to look at a bit in the data it's not supposed to see, and if it's a "1", load something that it knows isn't in the cache. That takes only two short instructions, so it can easily sneak in and get pre-executed.

Then, the attack program measures how long it takes to load data from that cache line again. (One of the mitigations for the spectre attack is to keep Javascript programs -- which might come from anywhere, and shouldn't be able to read your browser's stored passwords and cookies -- from getting at the high-resolution timers that would let them measure load time.)

Here under the cut are a basic set of references, should you wish to look further. Good stuff to read while your patches are loading.

Notes & links:

    Chip Flaws 'Meltdown' and 'Spectre' Loom Large Good intro.

    Researchers Discover Two Major Flaws in the World’s Computers - New York Times
    " on Tuesday, news of the Meltdown flaw began to leak through various news websites,
      including The Register, a science and technology site based in Britain. So the
      researchers released papers describing the flaws on Wednesday, much earlier than
      they had planned. "

    Kernel-memory-leaking Intel processor design flaw forces Linux, Windows redesign •
    The Register One of the aforementioned leaks, followed by this update:
    Meltdown, Spectre: The password theft bugs at the heart of Intel CPUs
    " Spectre allows, among other things, user-mode applications to extract information
      from other processes running on the same system. Alternatively, it can be used by
      code to extract information from its own process. Imagine malicious JavaScript in a
      webpage churning away using Spectre bugs to extract login cookies for other sites
      from the browser's memory. "

    Nice bit of speculation, but actually rather wrong about the mechanism:
    python sweetness -- The mysterious case of the Linux Page Table... 

    Cyberus Technology Blog - Meltdown good explanation of how the exploit works

    Detailed information on the official websites.
    Every named attack needs its own website these days.
    spectreattack.com:  Meltdown and Spectre \ these two sites have
    meltdownattack.com: Meltdown and Spectre / identical content
       meltdown.pdf
       spectre.pdf

    CERT: Vulnerability Note VU#584653 - CPU hardware vulnerable to side-channel attacks