There’s a lot of talk about data – how it’s the “new oil” and why every business needs to use it. But an organization investing in data should only be about one thing – gaining an edge. Whether it’s to know what advertising is working, to see what parts of the product offering are being used, to alert the team what customers might be at risk of churning, or to know where business stands relative to the plan. All of these have corresponding decisions and actions for the company, and those decisions are better made with the right information.
Most organizations use data in some form – typically a spreadsheet to manage the revenues, costs, or projections. This is a great starting point.
But the spreadsheet is ill prepared:
It can’t join a lot of data together
It has limits in terms of data size (1,048,576×16,384 for Excel or 40,000×18,278 for GSheets)
It can be painstakingly slow especially with calculations requiring large amounts of data
It often takes a number of manual steps to update a spreadsheet when data changes upstream – downloading, copy & pasting, applying formulas, and checking the output
Don’t get me wrong, we love spreadsheets. They’re still where most analysis actually takes place. But it’s the wrong place to assemble data into a meaningful format. That’s why all roads eventually lead to the data warehouse.
Any company with ambitions to scale their data advantage needs a warehouse. What most people don’t know is that anyone can do it, even without data warehouse experience. Companies no longer need to hire a data engineer, invest chunks of money upfront, and deploy tons of time and resources in order to use a data warehouse effectively. This is precisely why the spreadsheet is still the default tool being used for data tasks it wasn’t meant for.
By querying the warehouse, you’re introducing a type of repeatability into the process. It may take a few hours to get an initial result, but it won’t the second time. So why do you need a warehouse, as opposed to querying the sources directly or just downloading CSVs? A well-designed data warehouse has 3 critical advantages:
Complex joins and calculations are lightning fast: Column-oriented databases often serve as the data warehouse and organize data by field and keep all data associated with a field next to each other in memory. Columnar databases, like Snowflake, are growing in popularity because they provide performance benefits and are optimized for querying and analysis. This means users can perform complex queries and generate reports much faster than they would be able to with raw data.
Standardize data, increase accuracy: Because a data warehouse centralizes the raw data, you begin to work off common data sets, rather than imposed definitions in one-off spreadsheets. This reduces inconsistency of answers, which is one of the ways data loses trust in an organization. When there is high belief in the accuracy of the data, teams can focus on the applicability of the insight.
Make segmentation simple: Another benefit of centralized data is that it is easier to segment. You can join across datasets to identify anomalies or trends that might be missing from a single source. Unsurprisingly, more data sources tend to provide a richer picture of the business; but they also help avoid misleading conclusions – another source of distrust.
Is it worth building a data warehouse when you’re trying to solve a problem quickly? Historically, it was a significant investment in time and money to create a well-designed, well-maintained data warehouse. That has changed.
The modern data stack now makes the time and resource investment in setting up your warehouse trivial. Rather than months, this can be done in under an hour. And rather than a large up-front commitment, most pricing is usage-based. So the inevitable tradeoff of when to graduate past spreadsheets and rely on modern tools designed to find trustworthy, repeatable insights is now much earlier.
When is the right time to set up your data infrastructure? Now. And that’s going to give you an edge over anyone just relying on spreadsheets.