The Goldilocks Principle in Predictive Modeling and Data Science

In the field of statistics, there has been a lot written about statistical fallacies, logical fallacies, and fallacious reasoning. The following big list of fallacies is one that I like to use in my own undergraduate data science courses, particularly in my Data Ethics class where I teach my students about “lying with statistics”:

http://en.wikipedia.org/wiki/List_of_fallacies

Many of these fallacies are relevant to data science modeling, including this one: Circular Reasoning, where the reasoner “begins with what he or she is trying to end up with; sometimes called assuming the conclusion.”

A broken clock is truly an example of circular reasoning (as the dial is circular, and the clock represents a particular measurement in a repeating circular perspective): “Even a broken clock is right twice a day.”

(source: http://tvtropes.org/pmwiki/pmwiki.php/Main/StoppedClock)

In the following article, I use the broken clock analogy for circular reasoning in describing the importance of verification and validation in predictive analytics models: Are your predictive models like broken clocks? Here’s how to fix them.”  The article also discusses the importance of training vs. test data sets, the bias-variance tradeoff in data science modeling, underfitting vs. overfitting, and the Goldilocks Principle applied to data science.

(continue reading here) 

Follow Kirk Borne on Twitter @KirkDBorne

Leave a Reply

Your email address will not be published. Required fields are marked *