« Increasing the Impact of Your Machine Learning Model

June 21, 2019 • ☕️ 2 min read

Machine LearningData ScienceBusinessProductivityWorkflow

About Balancing Efforts

With all the media hype and the latest developments in the academic machine learning community, it sometimes feels like this is everything that matters:

Or, in case of e.g. word embeddings for natural language processing:

What’s the problem with that?

Typically, industry machine learning projects aren’t based on a fixed, preexisting reference dataset like MINST. A lot of effort goes into procuring and cleaning training data. As these tasks are highly project-specific and can’t be generalized, they are rarely talked about and receive no media attention.

Similar is true for the post-modelling steps: How to bring your model into production? How will the model outputs create actual business value? And by the way, shouldn’t you have been thinking about these questions beforehand? While the model serving workflows are somewhat transferable, monetization strategies are usually specific and not made public.

With these considerations, we can paint a more accurate picture. This is what actually influences the impact of your machine learning project:

Now, how do I best increase model impact?

The key is balancing your efforts. More precisely: Being aware of working on which step yields the highest return-on-investment in terms of spent time and overall impact. This will change as you progress: Once you’ve sufficiently improved one component, working on another will become more effective in relation. Keep in mind that you don’t need to do this in the final execution order: The monetization phase, which is the last step in above diagram, is likely the one you should first think about.

Procuring more training data or increasing its quality might be more tedious and less glamorous than boasting about the usage of state-of-the-art deep learning architectures, but often it’s more rewarding in terms of overall model performance. Don’t focus on fancy technologies if you haven’t secured the path of how to establish a productive machine learning system and how it will actually influence company decisions.

So, what does that mean when starting a new project? You could

Take the data and dump it into Keras. If model performance is unsatisfactory, use more layers. Tune all paramters.
Weeks pass.

Or you could

Think about how this project will actually help your company. Bring the necessary people on board early on and make them feel part of the effort. Devise a plan for all steps. Make sure a key decision maker supports you. Dive into the data, do some visualization and cleaning and obtain a good feeling about averages, outliers, correlations, missing fields and the underlying business processes. Try a decision tree or logistic regression as a simple, explainable baseline model. Use e.g. a cloud platform for prototype deployment and test your workflow.
Iterate from there. During each step, improve whatever component you think currently gives you the highest ROI in terms of business value and time. Get feedback.
You’ll have a prototype to show for early. This will build trust and unlock resources. Further gradual, balanced improvements make sure you don’t waste time tightening the wrong screws.

That’s it! Feel like something is missing? Disagree entirely? Share your opinion in the comments!