You may have been hearing buzz recently around the cloud-based data-warehousing company Snowflake, and for good reason. They had the highest-valued software IPO ever back in September and closed the day at $254 per share, over twice what they started the day at. This put the company at a staggering ~$70 billion valuation. As of this post (12/1/20), their stock is worth $305 per share.
Snowflake leverages cloud-based hardware and software to allow customers to store and analyze their data all in the cloud. Businesses use Snowflake for data warehousing, data lakes, data engineering, data science, data application development, and for securely sharing and consuming shared data.
For this blog post, we have Mozart Data CTO and co-founder Dan Silberman discussing why Mozart chose to make Snowflake our data warehouse provider.
Dan Silberman: One thing we pride ourselves on here at Mozart Data is working with best-in-class partners for our managed data pipeline, namely Fivetran for our ETL tools and Snowflake for our managed data warehouse.
Before I get into why we chose Snowflake specifically, let’s quickly discuss why column-oriented databases are better suited for modern data stacks compared to row-oriented databases.
Row-oriented Database vs. Column-oriented Database
There are two ways to organize relational databases — row-oriented and column-oriented.
Row-oriented databases organize data by record and keep all data associated with a record next to each other in memory. This is the traditional way of organizing data — it allows data to be stored quickly, and is optimized for reading and writing rows efficiently. PostgreSQL and MySQL are examples of popular row oriented databases.
Column-oriented databases organize data by field and keep all data associated with a field next to each other in memory. Column-oriented databases are newer and are growing in popularity because they provide performance benefits and are optimized for computing on columns efficiently. Snowflake, BigQuery, Redshift, and Presto are examples of popular column-based databases.
In row-oriented databases, the data in each row is stored on separate disks, whereas in column-oriented databases, the data for each column is stored on separate disks. This column-oriented organization reduces the number of disks needed to be accessed while pulling data, and minimizes the amount of extra data held in memory during the process. This substantially increases the speed of computations and can lead to cost savings, which is why we chose to work w/ a column-oriented database provider — it works best for the type of data pipeline creation and querying setup we want to make.
Check out this great article from Chartio for more info on these types of databases.
Strengths of Snowflake
The security of our customers’ data is the biggest priority for us at Mozart, and Snowflake is great for ensuring their data is secure. Here are some aspects of Snowflake’s security model that we love:
Snowflake’s cloud data platform uses best-in-class security technologies, including dynamic data masking and end-to-end encryption for data in transit and at rest. This makes their service super secure and resilient even for the most demanding data workloads.
Snowflake has high levels of government and industry data security compliance, such as FedRAMP ATO at the Moderate level, SOC 2 Type 2, PCI DSS compliance, and HIPPA compliance.
Snowflake’s multi-tenant options are really clear. Every customer using Mozart can have a different database, and it’s default secure.
Separation of Compute and Storage / Controlling Costs
The way that Snowflake is architected, you define storage units separate from your compute units. This means you can scale your compute units up and down as needed. That allows you to match the speed with which your data is being processed to your workloads as they come in, and therefore control costs.
For example, if all of your ETL tools and data transforms are going to be running every night at midnight, you can just instantly spin up a ton of compute, have it complete its job, and then spin it all down. You don’t have to pay for 24 hours of giant compute amounts. That works really well for us since we have many different types of jobs running at different cadences. We want to minimize the lag for our customers and be able to spin up the compute as needed. Our customers are running ad hoc jobs in BI tools or directly in Mozart, and Snowflake allows us to maintain flexibility while lowering the cost for our customers.
Additionally, for some data warehouse solutions, it’s possible to accidentally write an inefficient query that racks up a huge compute bill. That’s impossible on Snowflake since you have total control over the compute resources you’re giving to each job.
Ease of Use
We want to make our customers life’s simpler by reducing complexity, and Snowflake helps us do that in a few different ways:
Snowflake supports standard SQL and a subset of analytic extensions, so analysts that are familiar with PostgreSQL and MySQL can pick up the Snowflake query language without any training.
It has good support for semi-structured data, like JSON. Since we’re pulling in data from hundreds of different sources, a lot of times that data will come in as unstructured JSON. Snowflake has really easy ways to query and transform that data into more usable tables.
Snowflake is cross-platform. Our customers’ data lives in AWS, GCP, Azure, and many other platforms. Snowflake isn’t tied to any one cloud platform. It can be deployed close to our customers’ data and services, and it will be able to interact with each customer, minimizing data lag and cost.
There’s no need to define indices. Snowflake handles all of that automatically for you as it sees how you query your data.
It has simple integrations with any BI tool, like Tableau, Looker, and Mode. It also has great interfaces for programmatic access to Python, Spark, Node.js, etc. Pretty much anything can hook up to it.
Is your company interested in using Snowflake as its managed warehouse? Request a demo to learn more about how Mozart Data can set that up for you in no time, and provide other useful tools in the process.
If you’re interested in improving the state of data tooling — we’re hiring! Check out the open rolls on our careers page.