Reluctance to trust and rely on machine-based decisions is widespread. That is understandable; how can one be sure the automated decision system takes into account all the factors it should? Employees struggle first to learn the new technology, and then after making great progress and producing a promising model, decision-makers can still prove extremely reluctant to risk a new approach, no matter how well tests reveal its effectiveness. Still, in today’s competitive work environment, having a positive relationship with machines is essential to increasing profits and building return on investment (ROI).
Establishing A Baseline
Measuring ROI involves examining how increased gains compare to costs. To increase ROI, you need to establish a baseline with which to compare progress, and a metric of goodness (criteria of merit.) Also, shockingly, your current performance may not be what you think it is. A clear understanding of your baseline and metric, tells you where you are; combined with business understanding, you’ll know where you want to go. After that, you can brainstorm on how to get there.
Data analytics increases performance in many ways, chief of which are:
- Improving profit, efficiency, accuracy, and satisfaction
- Reducing cost and delays
- Finding new, better opportunities
- Eliminating substandard practices
Machine learning and data science can help businesses attain their goals — but only if it is actually implemented. To increase ROI with data science, you must master two very different skills: the technology, and the ability to convince others to trust and use it.
Become an Expert in the Technology
Learning the uses and the strengths and weaknesses of data science methods is a challenge for many business leaders. Experienced data science consultants can help those who want to apply the latest developments in data science and machine learning identify good projects and their needed components. At Elder Research we are trusted advisors to many large firms, and we can perform a first step of a Readiness Assessment. Jeff Deal and Gerhard Pilcher have also written a book, Mining Your Own Business, to guide managers of data mining (data science) projects. And, Elder Research provides sophisticated analytics training (short- and long-term), and mentoring so that a client’s staff can grow in data science skills, and eventually perform all the tasks independently.
Self Driving Car Example
Much hype currently surrounds artificial intelligence (AI), and the recent breakthroughs – in driving cars, playing Go and Poker, writing articles, etc. — are impressive. But few know that the greatest AI advances depend on data science. AI typically uses a top-down deductive logic-based approach, while data science is inductive; it figures out new theories (logic and equations) from the bottom-up; that is, from examples of individual labeled cases. It turns out that data science is what is responsible for the “AI” achievements of late. With AI alone, you can often follow the logic, but with data science, you likely won’t be able to exactly interpret how the model is working. To some degree, you must have faith in the process, acknowledging that the machine knows what it’s doing, when the strict statistical tests show good accuracy on out-of-sample data.
Self-driving cars are steadily increasing in ability. In March of 2018 a widely reported pedestrian death gave pause to self-driving car fans, but it is very likely that accidents prevented by the technology will vastly outnumber actual crashes once widely used.
The key to the success so far is “deep learning,” a multi-layered neural network that creates inexplicable equations. For example, if a self-driving car turns left, it is because of an equation that indicated this is the best action to take after seeing a vast array of training data consisting of actual driving and the decisions made by an expert. The equation trained by thousands of examples takes the place of, or complements or confirms, AI’s logic rules.
Deep learning is different from classic AI. Merged together, inductive modeling methods from the data science world, and logic-based methods from the AI world, are becoming very powerful.
Merging of Methods
In data science, there are many different data modeling techniques, or ways to “connect the dots” of the data to form answers. The most popular are:
- Regression
- Decision Trees
- Neural Networks
- Nearest Neighbors
- Naïve Bayes
… but there are dozens of interesting algorithms. So, which is the best? In an experiment I performed with Stephen Lee of the U. Idaho, we found that on a test set of six popular challenge problems, we found that Neural Networks won the contest, but that all five algorithms tested came in 1st or 2nd place on at least two of the problems. So, each algorithm was useful and could be best depending on the business problem you are trying to solve, and Neural Networks are worth trying.
But we also tried the challenge using four different ensemble techniques, such as averaging and voting. That is where you use all the competing models to come up with a consensus estimate. And all four of those ensemble methods beat even the best individual method! So even if you have success with one technique, you should definitely try ensembles to improve performance.
Further, if you compare the standard deviation, or the spread of original techniques, to any ensemble technique, you’ll see that the model ensembles are much more reliable and less variable.
Convince People to Deploy Data Science Models
Elder Research is in its 25th year, so we have a lot of experience, and have learned some things over the decades. In our first decade, we had a 90% rate of project technical success, yet only 65% of our models were actually used! We were not happy with this – even though research shows that only 33% of data science or machine learning projects are implemented nation wide. We found the critical break point to be “carbon-based life forms”; that is, humans. Managers would fail to actually use the hard work they’d commissioned even though it had met all their requirements. What was going on?
We learned it is the problem of change. The technology often dictates a major change in how people act. Under pressure, and fearing to stand out from others, many people revert to the old way of doing things instead of trusting the model. We studied this, and found many ways to improve the environment of trust – both technical and interpersonal. It paid off; in our second decade, our technical success rose to 98% and our adoption success soared to 92%.
The Capital One Example
In the late ‘90’s, when Capital One had only 100 employees with Masters and PhDs analyzing credit scoring (last I saw years ago they were > 300), they hired Elder Research to see if the new-fangled data mining techniques could add anything. They were skeptical that a small, generalist company could contribute anything to their core area of expertise. I noticed the difference in expectations and offered a bet: if we couldn’t improve their models, we would only charge them half price — but if we did better, we would charge double. They happily agreed to the wager.
Using expert modeling and our not-so secret weapon of ensembling, we were able to improve performance. The leverage — or increase in profit for even incremental improvements in accuracy – is so high with credit scoring that Capital One’s ROI shot up, and they happily paid double our price.
I have seen this happen so many times that to this day, I like to offer clients deals where they can opt to pay nothing up-front, but only pay a share of the additional profit after it’s implemented. I am that confident in the technology and in our expertise. (Let us know if that is intriguing to you!)
The Art of Persuasion
Data science is a rational approach to decision-making, but humans tend to be more irrational than we like to admit. In his book Thinking, Fast and Slow, Daniel Kahneman describes several disturbing examples of how often our judgments are influenced by arbitrary things such as hunger, or recent positive accidents, or irrational associations. We believe it’s irrational if clients struggle to accept our results, but we’ve learned that it’s normal and that the challenge deserves respect and attention.
I’ll talk more about the “soft” issues later, but want to briefly point to a powerful technical tool I developed that can reveal how reliable results are: “target shuffling”. Target shuffling quantifies the likelihood your result could have occurred by chance. It accounts for the “vast search effect” where many millions (say) of alternative hypotheses are tested; something also known as “p-hacking”. With Target shuffling you can extract real meaning from data without being fooled by coincidences.
An Oil and Gas Exploration Example
We’ve used target shuffling in many real-world challenges. One memorable one showed an international oil and gas production company how it could save millions of dollars per year. We had built predictive analytics models to identify which of their gas wells in a certain region were at risk of ceasing production (due to, for example, above-ground lines freezing, or below-ground lines clogging). They wanted to know this 4-6 months in advance, so they could prioritize preventive maintenance and best employ their valuable staff and maintenance resources. With a great deal of hard work and expertise, my colleagues were successful at predicting the need for maintenance 3x better than their baseline method. Thus, with no additional expenditure on staff or machines, they could recover tens of millions of dollars of gas just by routing their staff according to our models. Target Shuffling showed that this result was so reliable and significant that there was only a 1/2,500 chance it was due to luck. And, if they could shrink their decision window down a month or two, the savings would rise dramatically due to increased accuracy.
The reality is that companies across many industries can achieve huge gains by extracting useful information from their data. It takes trust and some patience, but Elder Research can deliver proven results, typically within just a few months. We identify the baseline, and business metrics, gather data, and transform it into valuable information to help you make better decisions that will deliver incredible ROI and can transform your business.