Mastering Telemetry in Open Source: A Simple Guide to Building Lightweight Call Home Functionality
Published
August 3, 2023
This article was originally posted on
HackernoonImplementing a call-home functionality or telemetry within open-source software often raises privacy concerns within the community. Many parties, including enterprise security teams, customer advocates, and developers, express rightful apprehensions about the transmission, storage, and usage of data.
However, if you're a project maintainer or a team member at a commercial open-source company, understanding basic telemetry is often required. It helps you determine whether people are testing your software and continuing its use over time. Such insights not only confirm if the developed software meets users' needs but also help identify which versions are being adopted and which might be vulnerable to the latest bugs or other issues.
How can you strike a balance? The solution is to be as lightweight as possible. Let's explore how to build a minimal, privacy-focused call home functionality using a simple version check and Scarf.
Firstly, it's important to offer something to your users when attempting to gather basic telemetry data, however simple it might be. In my experience, a free version check has proven beneficial. This checks, either at startup or after a certain period, whether your users are running the latest software version and if they potentially have any vulnerabilities. Even this basic functionality should include an opt-out option, as some users may choose not to use this feature.
In the following example, we'll aim for simplicity and minimal invasiveness. I will use a JSON file on GitHub to store the latest version:
In the real world you could host this on your website, embed the response as part of a public api, or put it into your factor CDN. Where this is hosted is less important than having a URL we can redirect to. Next, where your users are installing or running your software you will have a local version of what is installed. For my example, I created a file called: current_version.json:
The process of comparing these two files is straightforward. I'll use Python for this example (you'll find a few different examples in the repository):
When I run this script it simply says if there is a match or a mismatch in versions:
You could use this information to log to a file, post a message to the admin console, or even send an email to the user via the application. The choice of user notification will heavily depend on the type of software being used.
Setting up Scarf:
To track telemetry for these installs, we'll use Scarf. Scarf is a service that enables open source projects, their maintainers, and the companies that support them to gather growth and adoption statistics securely and privately across multiple endpoints.
Assuming you have a Scarf account and are logged in, go to packages. Here, we'll create a new package, which essentially amounts to a URL redirect. In this case, our URL redirect will point to the version.json that is currently hosted on GitHub.

Click “New Package”:

After creating this, we can confirm that the URL is operational by opening the redirect in a web browser. We can also verify the setup in the Scarf dashboard and view the analytics.
View the setup:

Checking to see if our test was logged:

You can see my 1 view from the redirect, coming from Chrome. Now let's modify our Python script to use the new URL redirect:
Changing:
To:
I have created a separate version of call_home_example.py, named call_home_example_scarf.py, to include this change.
After making these changes, you can see two different "downloads" listed, one coming from our Python script, the other from Chrome.

Each time this script (or code snippet) is run, the event is logged. We can also see the geographical origin of the call, gather basic company information, and more.
Tracking Versions:
To further enhance the script, you could track the version your user is currently using by adding a new route to our file and including the version.

We can try the redirect in curl, wget, or a browser: theyonk.gateway.scarf.sh/callhome/version.json/0.97
Now lets adjust the Python code in a new file called : call_home_example_scarf_version.py to pass the version:
Now, in the dashboard I can see the version 0.97 I called from my browser as well as the 0.99 I passed from Python.

You can add other variables to track other data points as needed by just adding more to the URL you are calling.
Ideas, next steps, and considerations:
There are several ways to enhance this functionality and build up a more robust telemetry setup. But there are some things I would recommend you think through first.
- How often will you make this call back? If you make the call back on startup only, some server processes may stay up for months or years, how will that impact the flow of data? Conversely some applications only live for seconds. Will the volume of data be too much for you?
- How will you add an opt-out for your users? This is critical to instill trust.
- Enhance your calls to be non-blocking, a service outage on something simple like a version check can not impact users (or slow them down).
- What minimum set of data do you need to be successful? Which variables and routes will you add to support them?
- How will you introduce this to your customers and users? This is a touchy subject even if its lightweight.
All the code for the above examples is available here: https://github.com/TheYonk/scarf-examples/tree/master/call_home
Latest blog posts
Tools and strategies modern teams need to help their companies grow.