Five Principles for AI Success
Despite the vast potential of AI to transform business, many enterprises today struggle to deliver working AI applications. By some estimates over 80 percent of all AI projects fail. Whilst there are multiple factors that contribute to these failures, one factor is ubiquitous – making machine learning work well for a new application remains a fundamentally difficult task.
With over 20 years experience in applying machine learning to challenging time series problems, we know what it takes to deliver successful AI products. We've boiled this experience down to five core principles for AI success.
With over 20 years experience in applying machine learning to challenging time series problems, we know what it takes to deliver successful AI products. We've boiled this experience down to five core principles for AI success.
Start Simple
It can be tempting when starting work on a new AI project to dive into the deep end with a cutting-edge ML model or sophisticated training procedure that seems well suited to the task at hand. This is almost always a mistake and more often than not results in considerable wasted effort and delayed timelines. Instead, it is important to begin with the simplest “vanilla” approach for the ML problem to be solved, e.g. logistic regression on the raw data values (suitably normalized) for a classification task.1
This strategy offers several important benefits. Firstly, it establishes a baseline level of performance for the task at hand. This provides a valuable reference point when evaluating the performance of more complex models and learning procedures (e.g. ‘why does my bidirectional RNN with windowed FFT features perform a lot worse than my simple baseline model, when it “should” work much better …?’).
Secondly, with this approach the initial train-test code can typically be implemented relatively quickly, whereas the equivalent code for a more sophisticated approach is likely to be more difficult to implement and almost certainly a lot harder to debug. Hence, starting simply allows an initial set of results to be arrived at in a reasonable time frame and with a high degree of confidence.2 This is far preferable to spending many months of effort to come up with potentially unreliable results that leave more questions than answers.
Finally, the performance of the baseline model may come as a pleasant surprise – perhaps it is good enough to move forward with, or is sufficiently encouraging as to suggest a clear path towards a potential solution (e.g. via refinements to the features, or the training procedure). It is always worth keeping in mind that ML is fundamentally an empirical discipline – don’t make the mistake of assuming up front which approaches will work well and which will not…
Summary: Data Scientists and ML Engineers are often eager to get their hands dirty with the latest advances in ML, but it’s important to remember that the aim when embarking on an AI project should not be to solve the problem in a single step, but rather to work efficiently towards an effective solution in a scientific manner.
This strategy offers several important benefits. Firstly, it establishes a baseline level of performance for the task at hand. This provides a valuable reference point when evaluating the performance of more complex models and learning procedures (e.g. ‘why does my bidirectional RNN with windowed FFT features perform a lot worse than my simple baseline model, when it “should” work much better …?’).
Secondly, with this approach the initial train-test code can typically be implemented relatively quickly, whereas the equivalent code for a more sophisticated approach is likely to be more difficult to implement and almost certainly a lot harder to debug. Hence, starting simply allows an initial set of results to be arrived at in a reasonable time frame and with a high degree of confidence.2 This is far preferable to spending many months of effort to come up with potentially unreliable results that leave more questions than answers.
Finally, the performance of the baseline model may come as a pleasant surprise – perhaps it is good enough to move forward with, or is sufficiently encouraging as to suggest a clear path towards a potential solution (e.g. via refinements to the features, or the training procedure). It is always worth keeping in mind that ML is fundamentally an empirical discipline – don’t make the mistake of assuming up front which approaches will work well and which will not…
Summary: Data Scientists and ML Engineers are often eager to get their hands dirty with the latest advances in ML, but it’s important to remember that the aim when embarking on an AI project should not be to solve the problem in a single step, but rather to work efficiently towards an effective solution in a scientific manner.
Visualization is Key
Data visualization is arguably the most underused tool in the Data Scientist’s toolbox. Visualization is more than just a way to display high level results from ML model experiments – when used properly it is a powerful tool that can quickly reveal issues with source data, features, labels, training procedures and more.
When embarking on an ML project it’s important at the outset to perform an in-depth visualization of the raw data.3 The aim here is to begin building familiarity with the data and to spot potential issues at an early stage before significant effort is spent on data modeling. An additional benefit of this approach is that it gives the Data Scientist or ML Engineer the opportunity to provide feedback about the source data to the relevant stakeholders early in the project lifecycle. Putting plots together may not be the most exciting work, however the payoff is worth the effort and the visualizations themselves will often be useful to refer back to throughout the duration of the project.
Data visualization should be used liberally throughout an ML project – creating new plots is almost never a bad idea. Visualization gives you the ability to see what is going on “under the hood” of an ML pipeline, whether it is an issue with batch sampling, computed features, or a poorly functioning model. If in doubt, visualize.
Summary: Data Scientists and ML Engineers often fail to appreciate that one of the most powerful tools at their disposal is human pattern recognition. There is often no substitute to making progress in AI than staring at plots and thinking.
When embarking on an ML project it’s important at the outset to perform an in-depth visualization of the raw data.3 The aim here is to begin building familiarity with the data and to spot potential issues at an early stage before significant effort is spent on data modeling. An additional benefit of this approach is that it gives the Data Scientist or ML Engineer the opportunity to provide feedback about the source data to the relevant stakeholders early in the project lifecycle. Putting plots together may not be the most exciting work, however the payoff is worth the effort and the visualizations themselves will often be useful to refer back to throughout the duration of the project.
Data visualization should be used liberally throughout an ML project – creating new plots is almost never a bad idea. Visualization gives you the ability to see what is going on “under the hood” of an ML pipeline, whether it is an issue with batch sampling, computed features, or a poorly functioning model. If in doubt, visualize.
Summary: Data Scientists and ML Engineers often fail to appreciate that one of the most powerful tools at their disposal is human pattern recognition. There is often no substitute to making progress in AI than staring at plots and thinking.
Beware of Averages
A corollary to the previous principle is to beware of averages. Alarm bells should go off whenever you find yourself looking at an aggregate statistic such as a mean or a median. It’s not that averages are bad per se, but rather if you don’t have a good idea of what the underlying data looks like, you might be in for a surprise.
The solution to this is simple – always plot the underlying data first, before taking an average. This will provide reassurance that you’re not taking an average of “junk data” (which may not be apparent from the average itself). More interestingly, it may reveal clusters or outliers in the data that offer insight into the problem at hand.
The solution to this is simple – always plot the underlying data first, before taking an average. This will provide reassurance that you’re not taking an average of “junk data” (which may not be apparent from the average itself). More interestingly, it may reveal clusters or outliers in the data that offer insight into the problem at hand.
Drill Down at the Extremes
It can be challenging when trying to improve the performance of an ML model to know exactly where to begin. The results on a test set might consist of a huge number of individual predictions; given this, where is the best place to start digging into the data to gain a deeper understanding of the model's performance?
A useful rule of thumb is to start by focusing on the examples with the highest prediction errors. Plotting the raw data (and/or features) for these examples can often reveal surprising insights into the data or limitations of the model.4 On the flip side, it can also be helpful to plot the data for examples with the lowest prediction errors. Understanding the types of examples where the model performs particularly well can offer insights into how to improve the model (or features) to get better performance across the board.
A useful rule of thumb is to start by focusing on the examples with the highest prediction errors. Plotting the raw data (and/or features) for these examples can often reveal surprising insights into the data or limitations of the model.4 On the flip side, it can also be helpful to plot the data for examples with the lowest prediction errors. Understanding the types of examples where the model performs particularly well can offer insights into how to improve the model (or features) to get better performance across the board.
Do The Simple Things Well
Success in AI often comes down to doing the simple things well. At a high level this requires a relentless focus on clear thinking and avoiding rabbit holes (which are pervasive in ML). From a technical perspective it involves the use of simple statistical tools and visualization techniques to see through the "data fog". In theory it sounds easy; in practice it is anything but.
Machine learning is fundamentally hard because it requires the Data Scientist or ML Engineer to undertake what is in essence a high dimensional search problem. Doing the simple things well is generally the best strategy to make progress, but this in turn requires the intuition and judgement that come with experience.5
Nick Patterson, erstwhile code breaker and Renaissance Technologies quant whiz, offers an insightful perspective on this topic6:
Machine learning is fundamentally hard because it requires the Data Scientist or ML Engineer to undertake what is in essence a high dimensional search problem. Doing the simple things well is generally the best strategy to make progress, but this in turn requires the intuition and judgement that come with experience.5
Nick Patterson, erstwhile code breaker and Renaissance Technologies quant whiz, offers an insightful perspective on this topic6:
"It's funny that I think the most important thing to do on data analysis is to do the simple things right. So, here's a kind of non-secret about what we did at Renaissance – in my opinion, our most important statistical tool was simple regression with one target and one independent variable. It's the simplest statistical model you can imagine. Any reasonably smart high school student could do it.
Now we have some of the smartest people around, working in our hedge fund, we have string theorists we recruited from Harvard, and they're doing simple regression. Is this stupid and pointless? Should we be hiring stupider people and paying them less? And the answer is no. And the reason is nobody tells you what the variables you should be regressing [are]. What's the target? Should you do a nonlinear transform before you regress? What's the source? Should you clean your data? Do you notice when your results are obviously rubbish? And so on. And the smarter you are the less likely you are to make a stupid mistake. And that's why I think you often need smart people who appear to be doing something technically very easy, but actually, usually it's not so easy."
Now we have some of the smartest people around, working in our hedge fund, we have string theorists we recruited from Harvard, and they're doing simple regression. Is this stupid and pointless? Should we be hiring stupider people and paying them less? And the answer is no. And the reason is nobody tells you what the variables you should be regressing [are]. What's the target? Should you do a nonlinear transform before you regress? What's the source? Should you clean your data? Do you notice when your results are obviously rubbish? And so on. And the smarter you are the less likely you are to make a stupid mistake. And that's why I think you often need smart people who appear to be doing something technically very easy, but actually, usually it's not so easy."
1. Similarly, when working with large datasets it is usually a good idea to start with only a subset of the full dataset to avoid lengthy train-test cycles that can impede the learning process (both human and machine). ↩︎
2. Management will thank you. ↩︎
3. For large datasets it's a good idea to start by visualizing a random sample of the full dataset. For time series this might involve plotting randomly sampled segments, or plotting the data for a sample of “entities” (e.g. patients for medical data or machines for IIoT data). ↩︎
4. This is also a very effective way to uncover labeling errors. ↩︎
5. The aphorism "good judgement comes from experience, and experience comes from bad judgement" is particularly apt for machine learning. ↩︎