I've always been a little uncomfortable about build systems and languages that start the build by going out to a package repository and pulling down the most recent (minor or patch) version of every one of the package's dependencies. Followed by all of their dependencies. The best-known of these are probably Python's pip package manager, Javascript's npm (node package manager), and Ruby's gems. They're quite impressive to watch, as they fetch package after package from their repository and include it in the program or web page being built. What could possibly go wrong?

Plenty, as it turns out.

The best-known technique for taking advantage of a package manager is typosquatting -- picking a name for a malware package that's a plausible misspelling of a real one, and waiting for someone to make a typo. (It's an adaptation of the same technique from DNS - picking a domain name close to that of some popular site in hopes of siphoning off some of the legitimate site's traffic. These days it's common for companies to typosquat their own domains before somebody else does -- facbook.com redirects to FB, for example.)

A few days ago, Alex Birsan published "Dependency Confusion: How I Hacked Into Apple, Microsoft and Dozens of Other Companies", describing a new attack that relies on the way package managers like npm resolve dependencies, by looking for and fetching the most recent compatible version (i.e. with the same major version) of every package, and the fact that they can be made to look in more than one repository.

Fetching the most recent minor version of a package is usually perfectly safe; packages have owners, and only the owner can upload a new version to the repository. (There have been a few cases where somebody has gotten tired of maintaining a popular package, and transferred ownership to someone who turned out to be, shall we say, less than reliable.)

The problem comes if, like most large companies and many small ones, you have a private repository that some of your packages come from. The package manager looks in both places, public and private, for the most recent version. If an attacker somehow gets the name and version number of a private package that doesn't exist in the public repository, they can upload a bogus package with the same name and a later version.

It turns out that the names and versions of private packages can be leaked in a wide variety of ways. The simplest turns out to be looking in your target's web apps -- apparently it's not uncommon to find a copy of a `package.json` left in the app's JavaScript by the build process. Birsan goes into detail on this and other sources of information.

Microsoft has published 3 Ways to Mitigate Risk When Using Private Package Feeds, so that's a good place to look if you have this problem and want to fix it. (Hint: you really want to fix it.) Tl;dr: by far the simplest fix is to have one private repo that includes both your private packages, and all of the public packages your software depends on. Point your package manager at that. Updating the repo to get the most recent public versions is left as an exercise for the reader; if I was doing it I'd just make a set of dummy package that depend on them.

Happy hacking!

Resources