Scarf Blog - Are Package Registries Holding Open-Source Hostage?

A few days ago, I received an email from Docker about a change I already knew was coming:‍

Docker will begin enforcing rate limits on container pulls for Anonymous and Free users.

To many, this came as no surprise. For years, Docker Hub has offered free hosting of container images, which typically range in size from a few megabytes to many gigabytes. Docker workflows as a result use a lot of bandwidth, and that bandwidth costs money.

Should we OSS (open-source software) developers have to think about the business and financial models of the platforms we host our software on? In a perfect world, we wouldn't have to—but in the real world we very much do. The incentives between OSS maintainers and the registries they use are often misaligned.

Docker Hub, npm, and other comparable registries are incentivized to create lock-in, even if it makes the product experience worse for their users and customers. This is especially true of the for-profit companies behind the registries, but we see similar issues from many of the not-for-profit registries.

Maintainers, on the other hand, are incentivized to choose the best product for their needs at the lowest cost, which depends on being able to switch providers when a better service comes on the market.

This situation is fundamentally at odds with the today's package management ecosystem, where immutability is paramount in order to achieve stability. We avoid breaking things at all costs, on principle, since OSS packages are the nuts and bolts of the software ecosystem, the internet, and thus society itself.

Mechanics of registry lock-in

The registry where you host your packages and containers might be free today, but if that changes later, as is the case now with Docker, you and your users might be stuck paying whatever price the vendor chooses to set. Your users might even agree to access your package without the rate limit, but you will not be seeing any of that revenue. Access to your open-source software was effectively just sold for a profit, and you, the author of that software, were cut out of the transaction.

While you could in theory just host your software somewhere else, can you really do that without breaking things for your current users? If you maintain and distribute a popular Docker image, switching the package registry is likely difficult.

Currently, any image on Docker Hub is installable as:

‍If you decide later you actually want to move your container hosting somewhere else - let's say Google Container Registry for this example - the Docker client is reasonable and lets you pull down images by their URL:

‍‍The problem here is that once you've changed the URL to your images, all of your existing users will stop getting updates! Even worse, this can break builds or pipelines for your users whenever they hit the new rate limits, which are not under your control.

At the point where your container has a sizable user-base all going through one of the existing container registries, your lock-in is substantial. Moving platforms will be painful. The crux of the problem here is that you don't own the distribution channel. The registry is the first place the web traffic goes, and everything that happens after that is at the registry vendor’s discretion and to their advantage, not yours.

Effects of registry lock-in

Some might respond: "This still seems like more of a theoretical problem than a practical one."

There are several practical downstream effects of the misaligned incentive structures to open-source package hosting. One major effect of registry lock-in is that maintainers cannot access their usage data. The data that registries naturally collect from package downloads can be quite useful to maintainers in a myriad of ways, yet registries typically don't share anything beyond a download count.

Registries know where the downloads are coming from, the devices, the package versions, which other packages are installed alongside, and a whole lot more. Little to none that information is shared with maintainers. Thus, maintainers are effectively locked out from observing the usage traffic.

Why is this the case? It's not because developers don't ask for it (https://github.com/npm/npm/issues/279). It's because the registries have no incentive to do so. It would cost the registries money to build and maintain the features to provide this data. Some registries even claim that exposing this data publicly would incentivize maintainers to game the system. Meanwhile, the extreme levels of inertia in software distribution keep maintainers locked in.

The registries' demonstrated distrust of maintainers seems counterproductive in a space where there's opportunity to work together cooperatively. If registry incentives were aligned accordingly, a registry like npm, for instance, would be in a great position to empower maintainers to leverage their own distribution data to deliver the best software possible.

What makes npm’s particular scenario even worse is that they've made it so difficult to use a registry that is not npm. There's no way to pull a single package from an alternate registry without switching to that registry. Which makes it quite impossible to actually publish a widely used JavaScript package without putting it on npm.

Contrast this scenario to Docker: Docker Hub creates different tradeoffs that both help and harm OSS maintainers. They've loosened their grip on OSS maintainers by making it user-friendly to pull containers down from alternative registries besides Docker Hub. However, even if you switch away from Docker Hub, you're still jumping from one company to another. This is because, at the end of the day, the registries—not the maintainers—own the distribution URL. The power imbalance continues.

My argument is not intended to dismiss the efforts of the registries as a whole. Package registries serve an essential role in software distribution, and have collectively serviced billions upon billions of package downloads. They’ve made it easy for anyone in the world to interact with open-source, and as a result have helped push open-source forward. Astonishingly, they have, for most part, remained free to use! But as software continues to eat the world and the distribution of that software becomes more important, conflicting interests in this space become increasingly problematic.

Looking forward

How do we solve this? Ultimately, package registries need to align their incentives with those of maintainers. Registries should build products maintainers want to use rather than products they have to use. The entire OSS community can benefit.

Part of this means registries must be more intentional about giving maintainers back control over the distribution of their own software, even when it means the maintainers could take their packages elsewhere. As a community, we should be empowering maintainers to do their best work rather than constraining them to work within a specific platform or framework.

For the health of the open source ecosystem, it's critical to ensure that maintainers are not locked out from accessing the data about their own software distribution. Maintainers must be able to make data-informed decisions and treat distribution data as something that rightfully belongs to them, instead of just the registry providers.

Unfortunately, the current software distribution model works to cut maintainers out, making all downstream actions more difficult and strictly less informed. When we as a community decide to better align ourselves with open-source maintainers and build platforms to empower them, the result will be better software for the entire ecosystem.

Scarf is working on new tooling to address these problems, so stay tuned! Follow @scarf_oss on Twitter or subscribe below for periodic updates.