The Goldilocks Principle in Predictive Modeling and Data Science

In the field of statistics, there has been a lot written about statistical fallacies, logical fallacies, and fallacious reasoning. The following big list of fallacies is one that I like to use in my own undergraduate data science courses, particularly in my Data Ethics class where I teach my students about “lying with statistics”:

http://en.wikipedia.org/wiki/List_of_fallacies

Many of these fallacies are relevant to data science modeling, including this one: Circular Reasoning, where the reasoner “begins with what he or she is trying to end up with; sometimes called assuming the conclusion.”

A broken clock is truly an example of circular reasoning (as the dial is circular, and the clock represents a particular measurement in a repeating circular perspective): “Even a broken clock is right twice a day.”

(source: http://tvtropes.org/pmwiki/pmwiki.php/Main/StoppedClock)

In the following article, I use the broken clock analogy for circular reasoning in describing the importance of verification and validation in predictive analytics models: “Are your predictive models like broken clocks? Here’s how to fix them.” The article also discusses the importance of training vs. test data sets, the bias-variance tradeoff in data science modeling, underfitting vs. overfitting, and the Goldilocks Principle applied to data science.

(continue reading here)

Follow Kirk Borne on Twitter @KirkDBorne

Rocket-Powered Data Science

Data Reflections by Dr. Kirk Borne @KirkDBorne

The Goldilocks Principle in Predictive Modeling and Data Science

Leave a Reply Cancel reply