When I was at Yammer, we built many internal data tools. We were a 16-person analytics team making Bay Area salaries, so it was an expensive option for the company to invest in building its own tools. One of the reasons we built these tools is they didn’t exist in 2010. But if the tools that exist today were available then, it would have been a huge mistake to make the same investments we did.
In most cases, buying a data stack is the best solution — especially for start-ups.
Why it’s best to buy
Reserve your limited resources for focusing on your core competencies
As an early-stage company, you should focus on your core competency or differentiator and then buy or outsource everything else. When you have limited resources, you should put them into building your product.
A big part of our product at Mozart Data is ETL. Even with that being one of our core competencies, we’re still going to outsource it because plenty of other companies offer ETL. It’s a solved problem. Also, what makes us unique isn’t ETL. Our secret sauce is how we think about data pipelines.
Unless you’re a data company, your engineers shouldn’t be building any data infrastructure. Instead, they should be solving your specific data problems using tools that others have built, until you’ve got a specific need that can’t be met with an existing solution.
Get far really quickly
If you’re worried you’ll be in a different place in a few months or years where your data stack no longer meets your needs, that’s still not a good argument for not buying. Don’t invest in validating those assumptions over that period because it can potentially be a waste of time.
Buying a data stack buys you the ability to get pretty far with your data infrastructure in a very short amount of time (and with Mozart Data, it takes as little as an hour to set up a data stack). You should buy, and then if those assumptions are still true later on, you can then build. If those assumptions were wrong though, you’ll be really glad you didn’t invest your engineering resources to build.
You’ll get a better quality data stack
A data stack built by data stack experts is going to be better. You’re buying from a company where their core competency is building that tool, which means they have lots of experts in-house. Anything you buy could be built by your engineers, but it’s almost certain that another company has invested way more engineering hours, product hours, design hours, etc. into solving that problem so their solution will be better.
It’s the cheapest option
Buying a data stack and setting it up yourself is the cheapest option because vendors have built the technology once and are able to sell it to multiple people. It’s even cheaper than outsourcing your data stack to a consultant or agency.
The general consulting model is to charge a large markup against a full-time rate because consultants spend a lot of time selling themselves and getting context for projects. Then, that knowledge disappears after the work is done. If you outsource your data stack, you might spend two to 10 times what it would take to have someone internally do the work — or it might be even more than that.
Buying can lead to choosing the wrong tools though
When you don’t have expertise in the tooling and a lot of the value props sound the same, how are you supposed to know which one to buy? Having tool confusion can lead you to land on choosing the wrong tools for your data stack, and that has long-term consequences. Even though consultants are expensive, they can be helpful here by filling the gap in expertise and helping you set up your infrastructure.
Although outsourcing provides expertise, it can create problems
One of the general problems with the consultant model is you’re not necessarily incentive-aligned. Often, consultants have your best interests at heart, but at the end of the day, their incentive is to extend their contract and keep working for you. One of the ways to do that is to add a lot of value, but another is to embed themselves into doing more work. So the option of hiring an expert to implement your data stack is only advisable if the ability to to buy is difficult, but you want to be sure you find the right consultant.
While you get all of the expertise the consultant has built up over time, you don’t build up that expertise internally. The consultant model works well if you don’t need to develop internal expertise. This can go poorly with your data stack. If somebody implements a big part of your system as an outsider and then they leave, you’ll either need to go back to them or you’re not that much better off than if you had invested in someone internal learning along the way.
When to build your data stack
While buying makes sense most of the time, there are a few exceptions. There are benefits to building a data stack that you can’t get with other options.
You have a lot of edge cases
When you have something very specific and unique about your data needs, generic data solutions aren’t enough.
You might have specific connectors that are core to your business or you have very real time needs. Maybe you’re an algorithmic trading platform and you require cutting-edge capabilities, where you want to have a near real time update of customer data and customer interactions. Out-of-the-box solutions aren’t tailored for that because the best ones are solving core problems really well.
You’re betting you’ll achieve massive scale
Very famously, Amazon, Uber, Facebook, and Airbnb built a ton of internal data tooling. Jeff Bezos, Brian Chesky, and Mark Zuckerberg had incredibly huge ambitions for their companies, so they were betting into the scale they would achieve. The data solutions that provided 99% of what they needed weren’t good enough because the core value they deliver is the remaining 0.99%.
These companies have probably spent hundreds of millions of dollars on their data tools and none of them are crying about it. They’re all very excited about it because the extra 0.99% they’re getting from their data tools is worth it. They’re at such an incredible scale where the specifics of their tooling makes a huge difference and where tools on the market don’t work for their volume of data.
But it’s important to understand these companies wouldn’t have been successful if they had built data tools in the beginning of their company’s lifetime. Airbnb would have failed six months into its existence if they hadn’t been building their core product of apartment rentals and were building Airflow instead.
You need absolute control
When you build your own data stack, you’ll have complete control over it. This gives you flexibility. You can change it as needed and decide how things get prioritized.
If you’re ready to set up a data stack but aren’t sure where to start, check out The Start-Up’s Guide to the Modern Data Stack for everything you need to know.
Thanks to my co-founder, Dan Silberman, for co-writing this article with me.