« The Third Wave Data Scientist
April 6, 2019 • ☕️ 5 min read
An update of the data science skill portfolio
Introduction
Drew Conway’s visualization of the data science skill set is an often cited classic. Different opinions and the versatility of the role have spawned numerous variations:
There seems to be no consensus on the data science skill set. Additionally, as the field evolves, shortcomings become obvious and new challenges arise. How can we describe this evolution?
The first wave of data scientists happend before data became big and before data science was actually a thing (pre-2010s): Statisticians and analysts who had always been around, doing a lot of what modern data scientists are doing, but accompanied by less hype.
Second wave: Large-scale data collection created a demand for smart minds who can work magic and turn all this big data into big money. Companies were still figuring out what kind of people to employ and often turned to science graduates. While the second wave data scientists did a lot right, their carefully crafted models often ended up as PoCs and failed to bring about actual change.
Now, at the end of the 2010s, amidst the hype around deep learning and AI, enter the third wave of data scientists: Experimenting and innovating, efficiently seeking out business value und bridging the deployment gap to create great data products. What skills are required here?
1. Business Mindset
The business mindset is the centerpiece of the data science skill set, as it sets goals and applies the other skills to reach them. Patrick McKenzie states in this blog post:
Engineers are hired to create business value, not to program things: Businesses do things for irrational and political reasons all the time […], but in the main they converge on doing things which increase revenue or reduce costs.
Likewise, data scientists are hired to create business value, not just to build models. Ask yourself: How will the outcome of my work influence company decisions? What do I have to do to maximize this effect? With this entrepreneurial spirit, the third wave data scientist does not only produce actionable insights, but also seeks that they bring about real change.
Look where the money flows in your organization — the divisions with the largest cost or revenue will likely offer the highest financial leverage. However, business value is a fuzzy concept: It goes beyond cost and revenue of the current fiscal year. Experimenting and creating an innovative data culture will increase a company’s long-term competitiveness.
Prioritizing your work and knowing when to stop is the key to efficiency. Think of diminishing returns: Is it worth spending weeks to tweak a model for another 0.2% of precision? Quite often, good enough is the real perfect.
Domain expertise, which makes up a third of Conway’s skill set, is by no means to be neglected — however, you’ll almost everywhere have to learn it on the job. This includes knowledge about your industry as well as all the company processes, naming schemes and peculiarities. This knowledge does not only set the frame conditions for your work, but it is often indispensable to understand and interpret your data.
Keep it simple, stupid
Look out for the low hanging fruit and quick wins. A simple SQL query on existing data warehouse might yield valuable insights unbeknownst to product managers or executives. Don’t fall into the trap of doing “buzzword-driven data science”, focusing on state-of-the-art deep learning where a beautifully simple regression model would be sufficient — and much less work to build, implement and maintain. Know the complicated things, but do not overcomplicate things.
2. Software Engineering Craftsmanship
The notion of (second wave) data scientists needing only “hacking skills” instead of proper software engineering has been repeatedly critizised. Lack of readability, modularity or versioning hinders collaboration, reproducibility and productionizing.
Instead, learn the craft from proper software engineers. Test your code and use version control. Follow an established coding style (e.g. PEP8) and learn how to use an IDE (e.g. PyCharm). Try pair programming. Modularize and document your code, use meaningful variable names and refactor, refactor, refactor.
Bridge the deployment gap for agile prototyping of data products: Learn to use tools for logging and monitoring. Know how to build a REST API (e.g. using Flask) to provide your results to others. Learn how to ship your work inside a Docker container or deploy it to a platform like Heroku. Instead of letting your models rot on your laptop, wrap them into data-driven services that fit snugly into your company’s IT landscape.
3. Statistics and Algorithms Toolbox
Data scientists have to thoroughly understand the basic concepts in statistics and particularly in machine learning (A STEM university education is probably the best way to acquire this foundation). There are tons of resources on what’s important, so I’m not gonna delve further into this here. You will often have to explain algorithms or concepts like statistical uncertainty to your clients, or red-flag an insight because of a confusion between correlation and causation.
4. Soft Skills
As people skills are as important to productivity as technical skills, the third wave data scientist makes conscious efforts to improve in these areas.
Work well with others
Consult your peers — most people are happy to help or give advice. Treat others on par: You might have a fancy degree and an understanding of sophisticated algorithms, but others have experience that you don’t (This sounds like basic social advice, but who hasn’t met an arrogant IT professional?).
Understand your client
Ask the right follow-up questions. If a client or your boss wants you to calculate some key figures or create some chart, ask “Why? What for? What do you want to achieve? What actions will you take, depending on the result?” to better understand the core of the problem. Then figure out how to get there together — is there another, better way to reach that goal than the one proposed?
Navigate company politics
Network, not because you expect others to benefit you in your career, but because you are an approachable person. Connect with people with similar work topics. If there are no platforms for this in your company, create them.Identify key stakeholders and figure out how to help them with their problems. Invite others early on and make them part of the change process. Keep in mind: A company is not a rational entity, but an assembly of often irrational human beings.
Communicate your results
Ramp up your visualization and presentation skills. Communicate from a client perspective: How can I precisely answer their question? Learn to communicate at different levels and sum up the details of your work. People are easily intrigued by fancy multidimensional plots, but often a simple bar chart conveys a message more efficiently. Showcase your results: When people see what you’re doing, and see that you’re doing great work, they will trust you.
Evaluate yourself
Communicate your goals and problems and actively seek out advice. Find role models inside and outside the data science community and learn from them.
Do you miss anything? Disagree entirely? Share your opinion in the comments!