Unstructured: Understanding an Open Source Project’s Impact on Commercial Success
Customer: Unstructured

Customer: Unstructured
Industry: Data Science
Product: Platform for transforming unstructured data into a normalized schema
Challenge: Unstructured needed to validate product market fit for their open source offering and identify potential paying customers from that usage.
Solution: Scarf - analytics for open source
“I’m Dave Donahue, Head of Strategy for Unstructured. My role takes me across the entirety of the business from product, to sales, to marketing, to engineering, and on both the government and the commercial sides of the business. I work directly for the CEO.”

About Unstructured
A lot of data engineers and data scientists are spending their time wrangling unstructured data. Unstructured was born to build a single platform that would be agnostic to file types and be able to transform those files types into a normalized schema that is readable by large language models.
“In just one year, we had over 8 million downloads of our open source package, reinforcing that we had found a real problem space and had product market fit.”

Challenge
Unstructured had so much usage of their open source, but so little data. Prior to Scarf, they mostly had GitHub information for things like downloads and stars. It was difficult to separate the good signal from the noise without any specific information that would help them to better target this large and growing open source user base or data to influence their product roadmap.
“With Scarf, for the first time, we were able to see which organizations were generating value from our open source packages, allowing us to develop unparalleled product feedback and chart a course toward commercialization. The user journey is really brought to life by the Scarf product.”

Solution
They knew they needed to get better usage analytics on their open source artifacts. Then they came across Scarf. Scarf allows them to derive key usage metrics and identify new user organizations.
“There’s real value for our sales team where Scarf can help us identify a company’s download behavior, providing a proxy for adoption.”

The Setup
Until then, open source usage was only being used as a broad signal to demonstrate to, amongst other people, their board, that they were seeing heavy adoption by large organizations. Using Scarf’s automatic funnel stage analysis, they were able to determine the interest level of the organizations using their open source project and which ones to reach out to.
“Once we started seeing Fortune 500 logos in the funnel stages, we knew who was in experimentation, investigation, or ongoing usage. That allowed us to put those logos onto some of our board decks which just made things come to life for folks. We were floored that the largest enterprises in the world were using us and using us a lot.”

Result
Using the company information gathered by Scarf, Unstructured was able to put together marketing campaigns targeting their personas at the companies that were using their open source, but not their commercial product. These campaigns resulted in open rates of 2 - 2.5 X industry standard.
It isn’t just used by sales and marketing, the whole company is benefitting from the data Scarf provides. The product team has even used it to ask for hand-raisers for a new beta program and got thousands of people asking to participate, creating an invaluable feedback loop.
Key Outcomes
- Increased confidence in the product and its market fit
- Improved sales and marketing efficiencies
- Enhanced investor relations
- Established a foundation for future commercial success
- Outbound outreach to Open Source Qualified Leads saw 2x higher response rates as compared to their outreach campaigns without Scarf data
You Can’t Build an Open Source Business Blind: Stirling PDF Case Study
StirlingPDF is one of the largest PDF platforms on GitHub, with an open-source core and an enterprise offering around it. Their platform includes:
Building a Predictable ICP: How Liquibase uses Scarf for GTM Operations
Liquibase is the open source standard for automating database change, with more than 100 million downloads and a community that has been growing for over a decade. Teams adopt Liquibase Community to keep database schema changes in lockstep with fast-moving application releases, then graduate to Liquibase Secure when they need governance, compliance, and control at scale.
From “Flying Blind” to Full Visibility: How Wherobots Uses Scarf to Guide GTM and DevRel
Wherobots is a Series A-stage startup building the Spatial Intelligence Cloud that makes it possible to build production-ready data products with data about the physical world up to 20X faster and at a fraction of the cost of existing approaches. Founded by the creators of Apache Sedona (used by more than 20,000 organizations), Wherobots brings the performance and governance of a modern lakehouse architecture to spatial data workloads through its optimized Sedna-compatible engine and SedonaDB, a spatial-first single-machine runtime. Teams move from complex, do-it-yourself pipelines to 5–20× faster processing without having to manage infrastructure.