Multitasking
Most humans multitask rather badly -- studies have shown that when one tries to do two tasks at the same time, both tasks suffer. That's why many states outlaw using a cell phone while driving. Some people are much better than others at switching between tasks, especially similar tasks, and so give the appearance of multitasking. There is still a cost to switching context, though. The effect is much less if one of the tasks requires very little attention, knitting during a conversation, or sipping coffee while programming. (Although I have noticed that if I get deeply involved in a programming project my coffee tends to get cold.) It may surprise you to learn that computers have the same problem.
Your computer isn't really responding to your keystrokes and mouse clicks, playing a video from YouTube in one window while running a word processor in another, copying a song to a thumb drive, fetching pages from ten different web sites, and downloading the next Windows update, all at the same time. It's just faking it by switching between tasks really fast. (That's only partially true. We'll get to that part later, so if you already know about multi-core processors and GPUs, please be patient. Or skip ahead. Like a computer, my output devices can only type one character at a time.)
Back when computers weighed thousands of pounds, cost millions of dollars, and were about a million times slower than they are now, people started to notice that their expensive machines were idle a lot of the time -- they were waiting for things to happen in the "real world", and when the computer was reading in the next punched card it wasn't getting much else done. As computers got faster -- and cheaper -- the effect grew more and more noticable, until some people realized that they could make use of that idle time to get something else done. The first operating systems that did this were called "foreground/background" systems -- they used the time when the computer was waiting for I/O to switch to a background task that did something that did a lot of computation and not much I/O.
Once when I was in college I took advantage of the fact that the school's IBM 1620 was just sitting there most of the night to write a primitive foreground/background OS that consisted of just two instructions and a sign. The instructions dumped the computer's memory onto punched cards and then halted. The sign told whoever wanted to use the computer to flip a switch, wait for the dump to be punched out, and load it back in when they were done with whatever they were doing. I got a solid week of computation done. (It would take much less than a second on your laptop or even your phone, but we had neither laptop computers nor cell phones in 1968.)
By the end of the 1950s computers were getting fast enough, and had enough memory, that people could see where things were headed, and several people wrote papers describing how one could time-share a large, fast computer among several people to give them each the illusion that they had a (perhaps somewhat less powerful) computer all to themselves. The users would type programs on a teletype machine or some other glorified typewriter, and since it takes a long time for someone to type in a program or make a change to it, the computer had plenty of time to do actual work. The first such systems were demonstrated in 1961.
I'm going to skip over a lot of the history, including minicomputers, which were cheap enough that small colleges could afford them (Carleton got a PDP-8 the year after I graduated). Instead, I'll say a little about how timesharing actually works.
A computer's operating system is there to manage resources, and in a timesharing OS the goal is to manage them fairly, and switch contexts quickly enough for users to think that they're using the whole machine by themselves. There are three main resources to manage: time (on the CPU), space (memory), and attention (all those users typing at their keyboards).
There are two ways to manage attention: polling all of the attached devices to see which ones have work to do, and letting the devices interrupt whatever was going on. If only a small number of devices need attention, it's a lot more efficient to let them interrupt the processor, so that's how almost everything works these days.
When an interrupt comes in, the computer has to save whatever it was working on, do whatever work is required, and then put things back the way they were and get back to what it was doing before. This takes time. So does writing about it, so I'll just mention it briefly before getting back to the interesting stuff.
See what I did there? This is a lot like what I'm doing writing this post, occasionally switching tasks to eat lunch, go shopping, sleep, read other blogs, or pet the cat that suddenly sat on my keyboard demanding attention.
Let's look at time next. The computer can take advantage of the fact that many programs perform I/O to use the time when it's waiting for an I/O operation to finish to look around and see whether there's another program waiting to run. Another good time to switch is when an interrupt comes in -- the program's state already has to be saved to handle the interrupt. There's a bit of a problem with programs that don't do I/O -- these days they're usually mining bitcoin. So there's a clock that generates an interrupt every so often. In the early days that used to be 60 times per second (50 in Britain); a sixtieth of a second was sometimes called a "jiffy". That way of managing time is often called "time-slicing".
The other way of managing time is multiprocessing: using more than one computer at the same time. (Told you I'd get to that eventually.) The amount of circuitry you can put on a chip keeps increasing, but the amount of circuitry required to make a CPU (a computer's Central Processing Unit) stays pretty much the same. The natural thing to do is to add another CPU. That's the point at which CPUs on a chip started being called "cores"; multi-core chips started hitting the consumer market around the turn of the millennium.
There is a complication that comes in when you have more than one CPU, and that's keeping them from getting in one another's way. Think about what happens when you and your family are making a big Thanksgiving feast in your kitchen. Even if it's a pretty big kitchen and everyone's working on a different part of the counter, you're still occasionally going to have times when more than one person needs to use the sink or the stove or the fridge. When this happens, you have to take turns or risk stepping on one another's toes.
You might think that the simplest way to do that is to run a completely separate program on each core. That works until you have more programs than processors, and it happens sooner than you might think because many programs need to do more than one thing at a time. Your web browser, for example, starts a new process every time you open a tab. (I am not going to discuss the difference between programs, processes, and threads in this post. I'm also not going to discuss locking, synchronization, and scheduling. Maybe later.)
The other thing you can do is to start adding specialized processors for offloading the more compute-intensive tasks. For a long time that meant graphics -- a modern graphics card has more compute power than the computer it's attached to, because the more power you throw at making pretty pictures, the better they look. Realistic-looking images used to take hours to compute. In 1995 the first computer-animated feature film, Toy Story, was produced on a fleet of 117 Sun Microsystems computers running around the clock. They got about three minutes of movie per week.
Even a mediocre graphics card can generate better-quality images at 75 frames per second. It's downright scary. In fairness, most of that performance comes from specialization. Rather than being general-purpose computers, graphics cards mostly just do the computations required for simulating objects moving around in three dimensions.
The other big problem, in more ways than one, is space. Programs use memory, both for code and for data. In the early days of timesharing, if a program was ready to run that didn't fit in the memory available, some other program got "swapped out" onto disk. All of it. Of course, memory wasn't all that big at the time -- a megabyte was considered a lot of memory in those days -- but it still took a lot of time.
Eventually, however, someone hit on the idea of splitting memory up into equal-sized chunks called "pages". A program doesn't use all of its memory at once, and most operations tend to be pretty localized. So a program runs until it needs a page that isn't in memory. The operating system then finds some other page to evict -- usually one that hasn't been used for a while. The OS writes out the old page (if it has to; if it hasn't been modified and it's still around in swap space, you win), and schedules the I/O operation needed to read the new page in. And because that take a while, it goes off and runs some other program while it's waiting.
There's a complication, of course: you need to keep track of where each page is in what its program thinks of as a very simple sequence of consecutive memory locations. That means you need a "page table" or "memory map" to keep track of the correspondence between the pages scattered around the computer's real memory, and the simple virtual memory that the program thinks it has.
There's another complication: it's perfectly possible (and sometimes useful) for a program to allocate more virtual memory than the computer has space for in real memory. And it's even easier to have a collection of programs that, between them, take up more space than you have.
As long as each program only uses a few separate regions of its memory at a time, you can get away with it. The memory that a program needs at any given time is called its "working set", and with most programs it's pretty small and doesn't jump around too much. But not every program is this well-behaved, and sometimes even when they are there can be too many of them. At that point you're in trouble. Even if there is plenty of swap space, there isn't enough real memory for every program to get their whole working set swapped in. At that point the OS is frantically swapping pages in and out, and things slow down to a crawl. It's called "thrashing". You may have noticed this when you have too many browser tabs open.
The only things you can do when that happens are to kill some large programs (Firefox is my first target these days), or re-boot. (When you restart, even if your browser restores its session to the tabs you had open when you stopped it, you're not in trouble again because it only starts a new process when you look at a tab.)
And at this point, I'm going to stop because I think I've rambled far enough. Please let me know what you think of it. And let me know which parts I ought to expand on in later posts. Also, tell me if I need to cut-tag it.