(The following article was first published in July of 2013 at analyticbridge.com. At least 3 of the links in the original article are now obsolete and/or broken. I re-post the article here with the correct links. A lot of things in the Big Data, Data Science, and IoT universe have changed dramatically since that first publication, but I did not edit the article accordingly, in order to preserve the original flavor and context. The central message is still worth repeating today.)
The on-going Big Data media hype stirs up a lot of passionate voices. There are naysayers (“it is nothing new“), doomsayers (“it will disrupt everything”), and soothsayers (e.g., Predictive Analytics experts). The naysayers are most bothersome, in my humble opinion. (Note: I am not talking about skeptics, whom we definitely and desperately need during any period of maximized hype!)
We frequently encounter statements of the “naysayer” variety that tell us that even the ancient Romans had big data. Okay, I understand that such statements logically follow from one of the standard definitions of big data: data sets that are larger, more complex, and generated more rapidly than your current resources (computational, data management, analytic, and/or human) can handle — whose characteristics correspond to the 3 V’s of Big Data. This definition of Big Data could be used to describe my first discoveries in a dictionary or my first encounters with an encyclopedia. But those “data sets” are hardly “Big Data” — they are universally accessible, easily searchable, and completely “manageable” by their handlers. Therefore, they are SMALL DATA, and thus it is a myth to label them as “Big Data”. By contrast, we cannot ignore the overwhelming fact that in today’s real Big Data tsunami, each one of us generates insurmountable collections of data on our own. In addition, the correlations, associations, and links between each person’s digital footprint and all other persons’ digital footprints correspond to an exponential (actually, combinatorial) explosion in additional data products.
Nevertheless, despite all of these clear signs that today’s big data environment is something radically new, that doesn’t stop the naysayers. With the above standard definition of big data in their quiver, the naysayers are fond of shooting arrows through all of the discussions that would otherwise suggest that big data are changing society, business, science, media, government, retail, medicine, cyber-anything, etc. I believe that this naysayer type of conversation is unproductive, unhelpful, and unscientific. The volume, complexity, and speed of data today are vastly different from anything that we have ever previously experienced, and those facts will be even more emphatic next year, and even more so the following year, and so on. In every sector of life, business, and government, the data sets are becoming increasingly off-scale and exponentially unmanageable. The 2011 McKinsey report “Big Data: The Next Frontier for Innovation, Competition, and Productivity.” made this abundantly clear. When the Internet of Things and machine-to-machine applications really become established, then the big data V’s of today will seem like child’s play.
In an attempt to illustrate the enormity of scale of today’s (and tomorrow’s) big data, I have discussed the exponential explosion of data in my TedX talk “Big Data, small world“ (e.g., you can fast-forward to my comments on this topic starting approximately at the 9:00 minute mark in the video). You can also read more about this topic in the article “Big Data Growth – Compound Interest on Steroids“, where I have elaborated on the compound growth rate of big data — the numbers will blow your mind, and they should blow away the naysayers’ arguments. Read all about it at http://rocketdatascience.org/?p=204.
Follow Kirk Borne on Twitter @KirkDBorne