-
Extracts data from a system (like an analytics tool or marketing platform)
-
Transforms the extracted data into a usable form (standardizes, deduplicates, sorts)
-
Loads the data into a data warehouse
But when you implement one, and which one you choose, depends on your specific needs. Here are the factors you need to consider when deciding.
When You Need an ETL Tool
There are multiple schools of thought on when it makes the most sense to put an ETL tool in place. Ethan Aaron, Founder & CEO of Portable, doesn’t recommend putting an ETL tool in place too early. For most new companies, the majority of the data they need is in a single product database, which makes an ETL tool unnecessary.
Aaron notes, “It’s very much a question of maturity and how much valuable information lives in each of these systems. Most of the time, very early on, it’s a database, then you add some applications. Then, over time, as you add more and more and you have more requirements, that’s where [an ETL tool] comes into play.”
But there are benefits to putting an ETL tool in place as early as possible, too. Peter Fishman, Co-Founder of Mozart Data, says when you choose an ETL tool and other pieces of your data stack early on, “you tend to select tooling that will evolve better, that will make you more successful in the long run. If you have this mindset early on, it feeds the other parts of your decision making.”
How to Evaluate ETL Tools
If you don’t have expertise in the data space, evaluating different tools can seem complex. Your specific ETL needs will depend heavily on the type of business you have and the type of data you need to load into your warehouse, but there are a few things that everyone should keep in mind as you compare ETL tools.
Data warehouse compatibility
Unless you want to change your data warehouse, you must choose an ETL tool that’s compatible with the one you’re using. Since it’s so critical, data warehouse compatibility should be among the first things you look for when comparing ETL tools. If you decide to invest in setting up a data stack with an all-in-one vendor like Mozart Data, you know this kind of compatibility is built in, so that’s one less thing to worry about.
Available connectors
Assuming the tools you’re evaluating are all compatible with your data warehouse, you want to look at the data connectors they have next.
What you need now
You should choose an ETL tool that already supports all the tools you’re currently using to collect data. If you find an ETL solution that you like and it doesn’t support all your current applications, you’ll want to confirm that the vendor is able to spin up the connectors you need quickly.
What you’ll need in the future
Some ETL tools are more static than others. If you plan to add more applications that generate data as your company grows, you’ll want to be sure to select a tool that supports that. Some solutions might have a long list of existing connectors, but they don’t add more on a regular basis. This puts you in a bind when you want to use a tool they don’t currently support.
As Aaron puts it, “the other way to think about it is shipping connectors and how quickly those catalogs change. What we pride ourselves on is not necessarily the fact that we have X number of connectors but rather, we’ve built 200 of them in the last year so that we can actually build [a new connector] for the next client when they need it.”
Total cost of ownership
The price you pay for an ETL tool is just one piece to consider when you’re looking at cost. The Total Cost of Ownership (TCO) for an ETL tool also includes how much time someone on your team will spend managing the tool.
Lauren Balik, Principal and Owner of Upright Analytics, explains, “In a 40-hour week, if your employee is spending six hours a week [managing your ETL tool], you’ve now bought a product and you’ve bought something that is going to take up a person’s time. This is not what you should be focused on when you’re starting a company and building a product.”
How much time can you afford to invest in your ETL tool and how much are you able to entrust to your vendor?
Pricing Structure
ETL tools tend to fall into one of two common pricing structures: a flat rate per connector or a per record variable cost.
Smaller companies that aren’t sending much data through will usually benefit from a per record pricing structure where you’re paying for the number of rows of data you move each month. But costs add up quickly as soon as you start adding in more applications and generating more data. For larger companies that are moving huge amounts of data, choosing an ETL tool that charges per connector used is a more cost-efficient choice.
Speed
The speed you need from your ETL tool depends on the kind of business you have and how you’re leveraging your data.
As Balik notes, it’s about “how out of date can the data be? When you’re doing internal operations and BI, it’s pretty rare that you need real time streaming, for example, but you’ll pay more for that. Usually you can get away with ‘as of midnight last night’ or ‘as of an hour ago’ in the marketing world.”
Security and Privacy
When selecting an ETL tool, you want to think about the security of your data, your customer’s information, and any regulatory concerns, particularly around moving data between the EU and any other country. While most ETL vendors will meet the basic security practices, if you have a lot of complexity around regulations, that’s something to discuss before diving in with a new tool.
You’ll also want to look at how the vendor handles your data as it processes it. Is it accessible to their employees? Where and how is it encrypted? How is it stored and when is it deleted? The less access your ETL vendor has to your actual data, the lower risk to your customers’ privacy and your company.
The other piece in evaluating security is how the vendor handles your credentials, specifically API tokens and authentication credentials, which are extremely sensitive. It’s essential they not only have security measures in place for your credentials, but also they have some type of verification of those security measures, such as regular audits.
While every company should be in the business of protecting and securing their data, there are certain instances where these concerns go even further. For example, Aaron suggests, “If you work in financial services, if you work at one of the largest banks, or you work with healthcare and with HIPAA data, it is very difficult to use SaaS vendors in general. Open source is a phenomenal solution in that scenario.”
Should You Go Open Source?
While open source ETL can offer a cost-effective solution, there are some drawbacks to relying on it. One of the biggest negatives is maintenance. Balik says, “If you’re using something that’s open source, what you risk is something going down and then not being corrected until someone gets around to it.” When you work with an ETL vendor though, you know when a problem arises, you’re not the only organization being impacted and your vendor is incentivized to fix it quickly.
Or as Aaron put it, “If you want to be on call at 2 a.m. when things break in your pipeline, use open source.”
Adding an ETL tool to your data stack is one of the best ways to start leveraging your data effectively. For more expert opinions on when and how to evaluate ETL tools for your organization, watch the recording of our webinar How to Choose the Right ETL Tool.