Mining data to extract useful and enduring patterns remains a skill arguably more art than science. Pressure enhances the appeal of early apparent results, but it is all too easy to fool yourself. How can you resist the siren songs of the data and maintain an analysis discipline that will lead to robust results?
In two decades of mining data from diverse fields, we have made many mistakes, which may yet lead to wisdom. In this eBook, we briefly describe and illustrate from examples, what we believe are the “Top 10” mistakes of data science, in terms of frequency and seriousness. Most are basic, though a few are subtle. All have, when undetected, left analysts worse off than if they’d never looked at their data.