Scarf Case Study: Apache Superset

Apache Superset is an open-source modern data exploration and visualization platform that makes it easy for users of all skill sets to explore and visualize their data. Apache Superset is part of The Apache Software Foundation, one of the largest open source foundations.
We spoke with Maxime Beauchemin, founder & CEO of Preset, and the original creator of both Apache Superset and Apache Airflow. As a prominent open source developer and advocate himself, Maxime believes in open source as a better way to collaborate, deliver and iterate on software.
Challenges: Understanding Adoption and Usage
Prior to working with Scarf, tracking adoption and usage of Apache Superset, like most other OSS communities, was very limited. Superset found it challenging to understand who was using Superset, which versions of Superset were in use, and how they were installed. Getting foundation approval to collect meaningful metrics was challenging, as the little data they did have was unfortunately scattered across multiple platforms, like GitHub, npm, and Google Analytics. Efforts to go beyond this proved challenging, as both Superset’s and the ASF’s commitment to privacy prevented the use of most traditional approaches to analytics to be used.
“I feel like I had accepted that pre-Scarf, there was just a lot of stuff we [didn’t] know. In a lot of ways, what I knew about the community was very much the tip of the iceberg…”

Scarf enabled Superset to move forward on this problem
“That’s where Scarf comes in, where the guarantees that you provide around data management and privacy checked all the boxes for the ASF – and actors in the community.” In 2023, Scarf was officially approved for use within the ASF, ensuring that all Apache projects can leverage Scarf’s analytics while remaining compliant with ASF guidelines and respecting end-user privacy.

Getting started with Scarf
Following its governance model, the Superset team proposed Scarf to its community very carefully, first introducing a Superset Improvement Proposal (SIP). “The ASF has a very opinionated governance model, but they are trying to protect everyone. The SIP was thought through – all the rationale, how we’re doing it, …, alternatives we’ve considered, value provided for the community, who’s going to have access to this data. I think doing that upfront — if you do all that work up front, then people will read [and give] that thumbs up.”
Better visibility gained
Maxime highlights several challenges in tracking the adoption and usage of open source software—the difficulty in understanding usage patterns and the nuanced information beneath the surface. How many users were building Superset from source vs running the pre-built container? Which companies were relying most heavily on Superset? Which versions are most used?
Scarf was able to provide privacy conscious yet effective analytics for Superset, across their website, npm installations, Docker downloads, and their web application.
Scarf’s usage empowered them to understand the value that Apache Superset brings to their users. For Superset, a particularly valuable aspect of Scarf was the capability to incorporate a no-cookie pixel into the Superset UI, which ultimately set Scarf apart from other tracking tools. This feature enabled them to leverage data from active users, moving beyond the basic count of downloads.
Enhanced Visibility and Decision-Making for the Community
“I think every community should understand themselves. Who’s using us, and what features are they using and which versions they use… analytics is helpful to understand … if you want to make an educated decision, I think that’s clear.”
Usage metrics gathered through Scarf provides Apache Superset with valuable insights they can use to drive decision-making. From making more educated decisions about long-term support for older releases to better understanding which installation paths are most common in practice. Apache Superset is able to allocate resources more effectively and ensure their software meets the needs of their users and community.
Scarf data is not just used by the Superset maintainers themselves, but they are becoming part of the broader discussion within their community. At Superset’s monthly meetup, various subgroups will present high level metrics to give an overview of the most important Superset developments in the past month. Maxime remarked: “Now we get together as a Superset community and we look at some of the metrics. Scarf enables us to better monitor what’s going on in the [Superset] community.”

Another Form of Contributing
“I think people often ask ‘how do I contribute to open source?’, ‘I’ve got to get into the code’, or ‘ I’ve got to be an engineer.’ Actually, the very simplest thing that you can do is just say, ‘my organization gets real value from this piece of software.’ There are a bunch of ways to let the people know about it – and now Scarf is there. If your organization is getting a lot of value from a piece of open source software, make sure the devs know about it.”
Benefits for Commercial Open Source Businesses
“For a commercial open source company, there’s a win-win situation where the company invests into the OSS [for everyone] and the open source drives funnel and relevance for the commercial entity around the project. And so for that entity to know what’s happening in the open source community is useful, helpful and vital. To serve the symbiotic relationship with the open source project, you need to know what’s happening. For people betting big on open source – whether with commercial intent or just enjoyment and passion – you win from knowing what’s happening there. If you care about open source, you should care about the metrics of your project.”

You Can’t Build an Open Source Business Blind: Stirling PDF Case Study
StirlingPDF is one of the largest PDF platforms on GitHub, with an open-source core and an enterprise offering around it. Their platform includes:
Building a Predictable ICP: How Liquibase uses Scarf for GTM Operations
Liquibase is the open source standard for automating database change, with more than 100 million downloads and a community that has been growing for over a decade. Teams adopt Liquibase Community to keep database schema changes in lockstep with fast-moving application releases, then graduate to Liquibase Secure when they need governance, compliance, and control at scale.
From “Flying Blind” to Full Visibility: How Wherobots Uses Scarf to Guide GTM and DevRel
Wherobots is a Series A-stage startup building the Spatial Intelligence Cloud that makes it possible to build production-ready data products with data about the physical world up to 20X faster and at a fraction of the cost of existing approaches. Founded by the creators of Apache Sedona (used by more than 20,000 organizations), Wherobots brings the performance and governance of a modern lakehouse architecture to spatial data workloads through its optimized Sedna-compatible engine and SedonaDB, a spatial-first single-machine runtime. Teams move from complex, do-it-yourself pipelines to 5–20× faster processing without having to manage infrastructure.