Scarf Case Study: Apache Superset
Published
January 31, 2024
This article was originally posted on
Hackernoon
Apache Superset is an open-source modern data exploration and visualization platform that makes it easy for users of all skill sets to explore and visualize their data. Apache Superset is part of The Apache Software Foundation, one of the largest open source foundations.
We spoke with Maxime Beauchemin, founder & CEO of Preset, and the original creator of both Apache Superset and Apache Airflow. As a prominent open source developer and advocate himself, Maxime believes in open source as a better way to collaborate, deliver and iterate on software.
Challenges: Understanding Adoption and Usage
Prior to working with Scarf, tracking adoption and usage of Apache Superset, like most other OSS communities, was very limited. Superset found it challenging to understand who was using Superset, which versions of Superset were in use, and how they were installed. Getting foundation approval to collect meaningful metrics was challenging, as the little data they did have was unfortunately scattered across multiple platforms, like GitHub, npm, and Google Analytics. Efforts to go beyond this proved challenging, as both Superset’s and the ASF’s commitment to privacy prevented the use of most traditional approaches to analytics to be used.
“I feel like I had accepted that pre-Scarf, there was just a lot of stuff we [didn’t] know. In a lot of ways, what I knew about the community was very much the tip of the iceberg…”
Scarf enabled Superset to move forward on this problem
“That's where Scarf comes in, where the guarantees that you provide around data management and privacy checked all the boxes for the ASF – and actors in the community.” In 2023, Scarf was officially approved for use within the ASF, ensuring that all Apache projects can leverage Scarf’s analytics while remaining compliant with ASF guidelines and respecting end-user privacy.
Getting started with Scarf
Following its governance model, the Superset team proposed Scarf to its community very carefully, first introducing a Superset Improvement Proposal (SIP). “The ASF has a very opinionated governance model, but they are trying to protect everyone. The SIP was thought through – all the rationale, how we're doing it, …, alternatives we've considered, value provided for the community, who's going to have access to this data. I think doing that upfront -- if you do all that work up front, then people will read [and give] that thumbs up.”
Better visibility gained
Maxime highlights several challenges in tracking the adoption and usage of open source software—the difficulty in understanding usage patterns and the nuanced information beneath the surface. How many users were building Superset from source vs running the pre-built container? Which companies were relying most heavily on Superset? Which versions are most used?
Scarf was able to provide privacy conscious yet effective analytics for Superset, across their website, npm installations, Docker downloads, and their web application.
Scarf's usage empowered them to understand the value that Apache Superset brings to their users. For Superset, a particularly valuable aspect of Scarf was the capability to incorporate a no-cookie pixel into the Superset UI, which ultimately set Scarf apart from other tracking tools. This feature enabled them to leverage data from active users, moving beyond the basic count of downloads.
Enhanced Visibility and Decision-Making for the Community
“I think every community should understand themselves. Who's using us, and what features are they using and which versions they use… analytics is helpful to understand … if you want to make an educated decision, I think that's clear.”
Usage metrics gathered through Scarf provides Apache Superset with valuable insights they can use to drive decision-making. From making more educated decisions about long-term support for older releases to better understanding which installation paths are most common in practice. Apache Superset is able to allocate resources more effectively and ensure their software meets the needs of their users and community.
Scarf data is not just used by the Superset maintainers themselves, but they are becoming part of the broader discussion within their community. At Superset’s monthly meetup, various subgroups will present high level metrics to give an overview of the most important Superset developments in the past month. Maxime remarked: “Now we get together as a Superset community and we look at some of the metrics. Scarf enables us to better monitor what’s going on in the [Superset] community.”
Another Form of Contributing
“I think people often ask ‘how do I contribute to open source?’, ‘I've got to get into the code’, or ‘ I’ve got to be an engineer.’ Actually, the very simplest thing that you can do is just say, ‘my organization gets real value from this piece of software.’ There are a bunch of ways to let the people know about it – and now Scarf is there. If your organization is getting a lot of value from a piece of open source software, make sure the devs know about it.”
Benefits for Commercial Open Source Businesses
“For a commercial open source company, there’s a win-win situation where the company invests into the OSS [for everyone] and the open source drives funnel and relevance for the commercial entity around the project. And so for that entity to know what's happening in the open source community is useful, helpful and vital. To serve the symbiotic relationship with the open source project, you need to know what's happening. For people betting big on open source – whether with commercial intent or just enjoyment and passion – you win from knowing what’s happening there. If you care about open source, you should care about the metrics of your project.”
Latest blog posts
Tools and strategies modern teams need to help their companies grow.