TL;DR
Spoiler alert: your first data hire should be a senior data engineer who can help you architect, build, and mature your data ecosystem, and do some basic analytics.
Let’s do data!
Congratulations! You’re a leader who’s been put in charge of getting your company to build out a data function. That means you’re the one tasked with buying tools and making the first hire - an exciting prospect. Woot! 🎉
Beware Analysis Paralysis
As exciting as standing up a data team is, it can also be pretty daunting.
🧑💻 Who do you hire first? A data analyst? An analytics engineer? A data engineer? A business analyst? A data leader?
💰 Where do you allocate your budget? Should you buy some shiny new BI tools and convince one of your software engineers to “start doing data”? Or should you blow all of your budget on a super expensive data scientist and trust that you’ll be able to make do with your existing tools?
😳 Or do you just curl up under your desk with a severe case of analysis paralysis?
Where should you start?
First: chill out, it’s going to be OK.
Second: let’s look at things in the context of Data Maturity. We define this as a function of a company's people 🤓, tools 🛠, and readiness 🏁. You've identified the people need, and let's assume you have at least decent readiness, since you're the business leader driving this effort.
Now it's time to think about the tools you'll need, and that will very much depend on where your critical data lives -- so it’s worth starting by taking an inventory of that:
What kind of data do you have? It could be transactional, sales history, customer data, inventory, clickstream, marketing, etc.
What are the key use cases that are driving this data need? Are there certain business-level KPIs or OKRs relating back to specific datasets?
Is it already all in one place and pretty clean, or do you have several legacy systems spread across a variety of sources?
Is it cloud-based or in a data center?
Is it stuck in SaaS applications like Salesforce & Hubspot?
Or, god forbid, is most of your data in spreadsheets on people’s laptops? 😱
Welcome to Data Engineering! 👋
Now that you’ve surveyed your data, you need to focus on how to make it actionable. Data engineering is how you get your data into a state where it can be analyzed and used to answer business questions.
Data engineering will get your data house in order. The point of data engineering is to use tools - whether homegrown processes, the “modern data stack”1, or more cutting-edge technologies - to allow a straightforward way for anyone at your company to derive data-driven answers to business questions.
In practice this will probably look like:
Cataloging and assessing your company’s data, both native and external 🔎
Building pipelines and processes to move and join data sources 🔗
Designing a system to let your company do analyses from reporting to deep learning 📈
But who should I hire?
There are actually a lot of people who can do some data engineering to varying degrees. You could find yourself a software engineer who’s interested in data and is already on your payroll. You could hire an analytics engineer, who is going to be focusing on preparing data for the analytics use case. Or, you could hire a data engineer, who is going to focus on building a data ecosystem that can support analytics, ML, real time data analysis, etc.
Regardless of the title, you need someone who can own and support the data ecosystem. 👉 Me, I’d hire a senior data engineer. You’ll want someone who can assess the existing data ecosystem, bring in modern technologies, and partner with the business to understand where the most valuable, high-priority data lives.
Senior data engineers are typically fluent in SQL, the lingua franca of data and analytics, and have solid basic analytics skills as well because they understand the data itself. They’re familiar with the data space and BI tools as well, and can help build out the toolset to support what the business needs from data.
What about Data Science?
I’ve seen a lot of companies start by hiring an analyst or data scientist, and more often than not they quickly realize there’s a core piece of functionality missing when they’re trying to do SQL or build visualizations or even models on data that is a mess or unreliable.
I’ve said it before and I’ll say it again, folks:
90% of data science is data work, not modeling.
So let’s set up those folks for success by first focusing on data engineering and then bringing in more advanced analytics. Then when you bring in a rockstar data scientist, they can focus on building models that solve real business problems, rather than troubleshooting data pipelines and cleaning bad data.
Senior data engineers are typically part architect, part analyst, part engineer. This is who I’d recommend you want to hire first, and I’m happy to say the market is starting to figure this out, too.
By focusing from the start on data maturity, you’ll be able to get your data into good shape, and then build out reporting, BI, analytics, and data science. You know, the sexy stuff 🔥
What do you think and who would you hire first? Find me on Twitter @ctartow to discuss, I’d love to hear what you think!
I have a lot of thoughts about the “modern data stack” but that’s another post for another time 😉