While I enjoyed this read, I think one key point missed is that databases don’t just store data, they do process it. The main processing paradigm in many including Snowflake is SQL, and it isn’t just used to get data out of databases for end user queries. In the analytic world huge volumes of grunt transformation, some of it quite sophisticated, are done in SQL. There are decades of history in optimising this stuff which means engineers have to worry a lot less about tuning than in other processing paradigms. Even in an ML pipeline perhaps 80-90% of what is done is work most easily done in SQL.
Of course not everyone likes working in that paradigm (though it’s interesting that almost every data lake alternative has felt the need to bolt on SQL-like processing) which is where the ability to use Scala and Java directly on or in Snowflake fleshes out the capability.
A critical part that needs to be kept in mind is ‘ownership of data’ - companies, businesses and entities want to own their data without having to move it around. A platform like Databricks provides that to those customers.
Snowflake customers own their data and don't have to move it around. Data Lake and Data Warehouse are a single technology and not a separate silo. Governed by the same access policies. Not only that, but Snowflake enables direct sharing--so not only do you have a single place for your data, you can securely share that data with downstream and upstream partners in a well governed way. No APIs. No SFTP. No latency. This opens up use cases like data cleanrooms that are difficult or impossible using other technologies.
Saying databricks is only moving into the Storage layer is a bit misleading. Databricks SQL offers a visualization layer with extensibility for low latency and high concurrency serving to BI tools like Tableau and Power BI. Databricks already had stream/batch processing with Spark Structured Streaming, with Pipeline management on the way with Delta Live Tables and Multi Task Jobs.
The real question to ask is, how much innovation is Snowflake investing into its core product, without leaning on partners? External Functions will not scale efficiently, as you will have to pay another vendor for that functionality, and costs can increase rapidly if you try to use that for ML/AI.
Snowflake is an excellent data warehouse, but the world has evolved to require more from its data platforms, and I fear Snowflake lacks the innovation required to deliver the future.
I wouldn't worry about Snowflake's level of innovation. 90% of Databricks revenue is data engineering and Snowflake already crushes Databricks on most of those workloads using SQL instead of Spark (which customers tend to prefer for those workloads). And Snowflake AI/ML partners tend to be higher ranked by Gartner on the magic quadrant, so with Databricks diluting their investment in playing catch up to Snowflake, they will continue to fall behind on that front as well. Dataiku and Data Robot (and others) are already delivering a better AI/ML experience than Databricks.
Excited for Part 2! Do you see either company appealing to different market segments / verticals that might distinguish their strategy, or are they all in on enterprise and going to collide more and more on deals? Great article.
While I enjoyed this read, I think one key point missed is that databases don’t just store data, they do process it. The main processing paradigm in many including Snowflake is SQL, and it isn’t just used to get data out of databases for end user queries. In the analytic world huge volumes of grunt transformation, some of it quite sophisticated, are done in SQL. There are decades of history in optimising this stuff which means engineers have to worry a lot less about tuning than in other processing paradigms. Even in an ML pipeline perhaps 80-90% of what is done is work most easily done in SQL.
Of course not everyone likes working in that paradigm (though it’s interesting that almost every data lake alternative has felt the need to bolt on SQL-like processing) which is where the ability to use Scala and Java directly on or in Snowflake fleshes out the capability.
A critical part that needs to be kept in mind is ‘ownership of data’ - companies, businesses and entities want to own their data without having to move it around. A platform like Databricks provides that to those customers.
Snowflake customers own their data and don't have to move it around. Data Lake and Data Warehouse are a single technology and not a separate silo. Governed by the same access policies. Not only that, but Snowflake enables direct sharing--so not only do you have a single place for your data, you can securely share that data with downstream and upstream partners in a well governed way. No APIs. No SFTP. No latency. This opens up use cases like data cleanrooms that are difficult or impossible using other technologies.
Saying databricks is only moving into the Storage layer is a bit misleading. Databricks SQL offers a visualization layer with extensibility for low latency and high concurrency serving to BI tools like Tableau and Power BI. Databricks already had stream/batch processing with Spark Structured Streaming, with Pipeline management on the way with Delta Live Tables and Multi Task Jobs.
The real question to ask is, how much innovation is Snowflake investing into its core product, without leaning on partners? External Functions will not scale efficiently, as you will have to pay another vendor for that functionality, and costs can increase rapidly if you try to use that for ML/AI.
Snowflake is an excellent data warehouse, but the world has evolved to require more from its data platforms, and I fear Snowflake lacks the innovation required to deliver the future.
I wouldn't worry about Snowflake's level of innovation. 90% of Databricks revenue is data engineering and Snowflake already crushes Databricks on most of those workloads using SQL instead of Spark (which customers tend to prefer for those workloads). And Snowflake AI/ML partners tend to be higher ranked by Gartner on the magic quadrant, so with Databricks diluting their investment in playing catch up to Snowflake, they will continue to fall behind on that front as well. Dataiku and Data Robot (and others) are already delivering a better AI/ML experience than Databricks.
Excited for Part 2! Do you see either company appealing to different market segments / verticals that might distinguish their strategy, or are they all in on enterprise and going to collide more and more on deals? Great article.