Tag Archives: Big Data

Big Data Growth — Compound Interest on Steroids

(This article was originally published on BigDataRepublic.com in June 2013 — that site no longer exists.)

Could a simple math formula be responsible for all of modern civilization? An article in 2013 hypothesized that there is one, and the Formula for Compound Interest is it. The formula is actually quite straightforward, but the mathematical consequences are huge and potentially impossible to assimilate. Let us illustrate this with a simple example, and then we will see the consequences for the current Big Data revolution.

Assuming an annual period of compounding, if your principal (asset or debt) P grows at an annual rate R, then your net accumulation A after one year is P*(1+R). The accumulation A grows by an additional (1+R) factor for each additional year. Therefore, your accumulation after N years is equal to A=P*(1+R)N.

The fact that the number of compounding periods N is in the exponent of the compound interest growth formula means two things: (1) the growth rate is exponential (by definition); and (2) because the growth rate is exponential, the total accumulation A after a modest number of compounding periods can easily dwarf the initial value P, particularly for values of R equal to several percent per annum (or greater).

Many people have experienced the power of this compound interest growth through their own personal long-term retirement contributions. If you make a one-time investment of $5000 at age 20 (with no other contributions for the rest of your working career), then an annual return rate R=8% will yield a balance of $160,000 at 65 years old (a net gain of over 3000%).  If you make more modest but systematic contributions (for example $400 each year), then the final value of your retirement fund would also be $160,000 (from a total personal investment of $18,000 over 45 years – a net gain of 800%). This compound interest growth is amazing and impressive. Most people can understand these numbers and can relate them to normal life experience.

But consider what happens if the annual rate R is not a few percent, but double-digit or triple-digit percent. For example: if R=100%, then a $1 investment each year starting at age 20 would produce a net accumulation of $1024 after 10 years (from just $10 total personal investment). The net accumulation after 45 years at age 65 (from a total personal investment of $45) would equal $35,000,000,000,000 – that is, thirty-five trillion dollars! In this case, the mathematical consequences are enormous and too mind-boggling to comprehend. It is off-the-charts and unbelievable, and yet it is a mathematical certainty – the number (1+R) in the compound interest formula when R=100% is 2, and 245 is a truly huge number.

Finally, let us connect the original historical hypothesis to our current Big Data environment.  Some conservative estimates suggest that the world’s data volume doubles every year. That is a growth rate R=100%. Does that look familiar? Annual data-doubling corresponds to 210 times more data after every 10 years: from zettabytes now to geopbytes in a few decades (similar to investing $45 to get $35 trillion)!  The Big Data explosion is truly enormous growth on steroids! This is why Big Data is not simply “more data”, but it is something completely different, mind-boggling, and off-the-charts impossible to grasp. Nearly every government entity, corporate decision-maker, business strategist, marketing specialist, statistician, domain scientist, news service, digital publisher, and social media guru is talking “Big Data”. However, most of us involved in those conversations cannot begin to assimilate how the current growth in Big Data and a simple math formula will be responsible for radically transforming modern civilization all over again.

Therefore, don’t believe people when they say “We have always had Big Data!” That statement completely misses the point of today’s data revolution and trivializes the massive disruptive forces that are now transforming our digital world. Today’s big data is not yesterday’s big data!

Follow Kirk Borne on Twitter @KirkDBorne

Standards in the Big Data Analytics Profession

A sign of maturity for most technologies and professions is the appearance of standards. Standards are used to enable, to promote, to measure, and perhaps to govern the use of that technology or the practice of that profession across a wide spectrum of communities. Standardization increases independent applications and comparative evaluations of the tools and practices of a profession.

Standards often apply to processes and codes of conduct, but standards also apply to digital content, including: (a) interoperable data exchange (such as GIS, CDF, or XML-based data standards); (b) data formats (such as ASCII or IEEE 754); (c) image formats (such as GIF or JPEG); (d) metadata coding standards (such as ICD-10 for the medical profession, or the Dublin Core for cultural, research, and information artifacts); and (e) standards for the sharing of models (such as PMML, the predictive model markup language, for data mining models).

Standards are ubiquitous.  This abundance causes some folks to quip: “The nice thing about standards is that there are so many of them.”  So, it should not be surprising to note that standards are now beginning to appear also in the worlds of big data and data science, providing evidence of the growing maturity of those professions…

(continue reading herehttps://www.mapr.com/blog/raising-standard-big-data-analytics-profession)

Follow Kirk Borne on Twitter @KirkDBorne

My Data Science Declaration for 2015

Here it is… my Data Science Declaration for 2015 (posted to Twitter on January 14, 2015):

“Now is the time to begin thinking of Data Science as a profession not a job, as a corporate culture not a corporate agenda, as a strategy not a stratagem, as a core competency not a course, and as a way of doing things not a thing to do.”



Follow Kirk Borne on Twitter @KirkDBorne

Top 10 Conversations That You Don’t Want to Have on Data Innovation Day

On January 22, the world celebrates Data Innovation Day. Here are the top 10 conversations that you don’t want to have on that day. Let the countdown begin….

10.  CDO (Chief Data Officer) speaking to Data Innovation Day event manager who is trying to re-schedule the event for Father’s Day: “Hey! It’s pronounced ‘Day-tuh’, not ‘Dadda’.”

9.  CDO speaking at the company’s Data Innovation Day event regarding an acronym that was used to list his job title in the event program guide: “I am the company’s Big Data ‘As A Service’ guru, not the company’s Big Data ‘As Software Service’ guru.”  (Hint: that’s BigData-aaS, not BigData-aSS)

8.  Data Scientist speaking to Data Innovation Day session chairperson: “Why are all of these cows on stage with me? I said I was planning to give a LASSO demonstration.”

​7.  Anyone speaking to you: “Our organization has always done big data.”

6.  You speaking to anyone: “Seriously? The title of our Data Innovation Day Event is ‘Big Data is just Small Data, Only Bigger’?”

5.  New cybersecurity administrator (fresh from college) sends this e-mail to company’s Data Scientists at 4:59pm: “The security holes in our Hadoop system are now fixed. It will now automatically block all ports from accepting incoming data access requests between 5:00pm and 9:00am the next day.  Gotta go now.  Have a nice evening.  From your new BFF.”

4.  Data Scientist to new HR Department Analytics ​Specialist regarding the truckload of tree seedlings that she received as her end-of-year company bonus:  “I said in my employment application that I like Decision Trees, not Deciduous Trees.”

3.  Organizer for the huge Las Vegas Data Innovation Day Symposium speaking to the conference keynote speaker: “Oops, sorry.  I blew your $100,000 speaker’s honorarium at the poker tables in the Grand Casino.”

2.  Over-zealous cleaning crew speaking to Data Center Manager arriving for work in the morning after Data Innovation Day event that was held in the company’s Exascale Data Center: “We did a very thorough job cleaning your data center. And we won’t even charge you for the extra hours that we spent wiping the dirty data from all of those disk drives that you kept talking about yesterday.”

1.  Announcement to University staff regarding the Data Innovation Day event:  “Dan Ariely’s keynote talkBig Data is Like Teenage Sex‘ is being moved from room B002 in the Physics Department to the Campus Football Stadium due to overwhelming student interest.”


Follow Kirk Borne on Twitter @KirkDBorne

When Big Data Gets Local, Small Data Gets Big

We often hear that small data deserves at least as much attention in our analyses as big data. While there may be as many interpretations of that statement as there are definitions of big data, there are at least two situations where “small data” applications are worth considering. I will label these “Type A” and “Type B” situations.

In “Type A” situations, small data refers to having a razor-sharp focus on your business objectives, not on the volume of your data. If you can achieve those business objectives (and “answer the mail”) with small subsets of your data mountain, then do it, at once, without delay!

In “Type B” situations, I believe that “small” can be interpreted to mean that we are relaxing at least one of the 3 V’s of big data: Velocity, Variety, or Volume:

  1. If we focus on a localized time window within high-velocity streaming data (in order to mine frequent patterns, find anomalies, trigger alerts, or perform temporal behavioral analytics), then that is deriving value from “small data.”
  2. If we limit our analysis to a localized set of features (parameters) in our complex high-variety data collection (in order to find dominant segments of the population, or classes/subclasses of behavior, or the most significant explanatory variables, or the most highly informative variables), then that is deriving value from “small data.”
  3. If we target our analysis on a tight localized subsample of entries in our high-volume data collection (in order to deliver one-to-one customer engagement, personalization, individual customer modeling, and high-precision target marketing, all of which still require use of the full complexity, variety, and high-dimensionality of the data), then that is deriving value from “small data.”

(continue reading here: https://www.mapr.com/blog/when-big-data-goes-local-small-data-gets-big-part-1)

Follow Kirk Borne on Twitter @KirkDBorne

Local Linear Embedding(Image source**: http://mdp-toolkit.sourceforge.net/examples/lle/lle.html)

**Zito, T., Wilbert, N., Wiskott, L., Berkes, P. (2009). Modular toolkit for Data Processing (MDP): a Python data processing frame work, Front. Neuroinform. (2008) 2:8. doi:10.3389/neuro.11.008.2008

New Directions for Big Data and Analytics in 2015

The world of big data and analytics is remarkably vibrant and marked by incredible innovation, and there are advancements on every front that will continue into 2015. These include increased data science education opportunities and training programs, in-memory analytics, cloud-based everything-as-a-service, innovations in mobile (business intelligence and visual analytics), broader applications of social media (for data generation, consumption and exploration), graph (linked data) analytics, embedded machine learning and analytics in devices and processes, digital marketing automation (in retail, financial services and more), automated discovery in sensor-fed data streams (including the internet of everything), gamification, crowdsourcing, personalized everything (medicine, education, customer experience and more) and smart everything (highways, cities, power grid, farms, supply chain, manufacturing and more).

Within this world of wonder, where will we wander with big data and analytics in 2015? I predict two directions for the coming year…

(continue reading herehttp://www.ibmbigdatahub.com/blog/new-directions-big-data-and-analytics-2015)

Follow Kirk Borne on Twitter @KirkDBorne

Outlier Detection Gets a New Look – Surprise Discovery in Big Data

Novelty and surprise are two of the more exciting aspects of science – finding something totally new and unexpected can lead to a quick research paper, or it can make your career. As scientists, we all yearn to make a significant discovery. Petascale big data collections potentially offer a multitude of such opportunities. But how do we find that unexpected thing? These discoveries come under various names: interestingness, outlier, novelty, anomaly, surprise, or defect (depending on the application). Outlier? Anomaly? Defect? How did they get onto this list? Well, those features are often the unexpected, interesting, novel, and surprising aspects (patterns, points, trends, and/or associations) in the data collection. Outliers, anomalies, and defects might be insignificant statistical deviants, or else they could represent significant scientific discoveries.

(continue reading herehttp://stats.cwslive.wiley.com/details/feature/6597751/Outlier-Detection-Gets-a-Makeover—Surprise-Discovery-in-Scientific-Big-Data.html)

Follow Kirk Borne on Twitter @KirkDBorne

The Power of Three: Big Data, Hadoop, and Finance Analytics

Big data is a universal phenomenon. Every business sector and aspect of society is being touched by the expanding flood of information from sensors, social networks, and streaming data sources. The financial sector is riding this wave as well. We examine here some of the features and benefits of Hadoop (and its family of tools and services) that enable large-scale data processing in finance (and consequently in nearly every other sector).

Three of the greatest benefits of big data are discovery, improved decision support, and greater return on innovation. In the world of finance, these also represent critical business functions….

(continue reading here:  https://www.mapr.com/blog/potent-trio-big-data-hadoop-and-finance-analytics)

Follow Kirk Borne on Twitter @KirkDBorne

IBM Insight 2014 – Day 2: The “One Thing” – Watson Analytics

The highlight of Day 2 at IBM Insight 2014 was the presentation of numerous examples, new features, powerful capabilities, and strategic vision for Watson Analytics.  This was the “one thing” – (to borrow the phrase from the movie “City Slickers”) – the one thing that seems to matter the most, that will make the biggest impact, and that has captured the essence of big data and analytics technologies for the future, rapidly approaching world of data everywhere, sensors everywhere, and the Internet of Things.

(continue reading more about Watson Analytics here:  http://ibm.co/10zEl6S)

Follow Kirk Borne on Twitter @KirkDBorne

IBM Insight 2014 – Day 1 Soundbites: Carpe Datum

There are big data meetups, workshops, conferences, and symposia. And then… there is IBM Insight 2014! There’s only one word to describe this happenin’ event: “Wow!”

The content of the event is focused on IBM’s products, services, corporate strengths, and partnerships. But the theme and message is laser-focused on the light-speed transformation of business in 2014 that has been achieved through insights from big data and analytics. From the Day 1 opening laser light show and film clip that featured DataKind founder Jake Porway along with Sensemaking evangelist Jeff Jonas, to their spectacular well timed entrance into the packed 12,000-seat Mandalay Bay Arena, continuing into a vast array of workshops and hands-on labs, the first day of Insight 2014 has been like a rapid tour through multiple parallel “Alice in Wonderland” universes.

If you are not able to attend the event, you can watch at InsightGO. You can also watch participant interviews on TheCube from SiliconANGLE and Wikibon: http://siliconangle.tv/ibm-insight-2014/

Ideas and insights have filled the arena and convention center in every conversation. Attached below are some of the soundbites (harvested from presentations and conversations).

What are people talking about at IBM Insight 2014?

  • Analytics take big data from information to insights to innovation.
  • The new data-driven business is built around “Systems of Insight” that inform every decision, interaction, and process.
  • Systems of Insight involve more people, more places, and more data.
  • Big data analytics drive business integration, intelligence, and innovation.
  • Watson Analytics reinvents the analytics experience in the cloud — its brilliant human-computer interface gives a whole new meaning to “human factors engineering”.
  • Cognitive Analytics with Watson generates (in real-time) the questions that you should be asking your data, through natural language dialogue, guided discovery, and fully automated intelligence.
  • IBM has released a suite a new services for big data and analytics, including Watson Curator, DataWorks, DashDB, and Cloudant.
  • The new quest for business is personalized engagement that incorporates immersive user experiences: fusing the physical world with digital interactions of all kinds.
  • In the era of digital marketing and real-time customer analytics, battles are won or lost in minutes (or even seconds).
  • A paradox in digital marketing has emerged:  outward-facing customer-centric analytics (personalization, segment of one) have forced organizations into more inward focus on big data operations.  We believe that this paradox evaporates when we realize that the focus on operations is in response to the urgent need to focus on the customer, at the right time, with the right offer, at the right place, in the right context.  That’s the 360 view, and that’s cognitive analytics at its best!
  • Fast data (big data velocity) is fast becoming the number 1 challenge, source of innovation, and revenue-generator for business.  Big data volume is so “2012”, and big data variety is so “2013” (though I personally think that we have yet to see the real power and revolution in data-driven business discovery through high-variety data, particularly via fast complex streaming data emanating from multiple sensors, sources,  and signals).
  • The real “big data analytics” talent shortage is in finding folks who know both the analytics (data science) and the business.
  • The Chief Data Officer is an agent for business transformation and change in the big data era.
  • IBM Insight might just be the 2014 World Series of Big Data Analytics.
  • Perhaps the real insight at IBM Insight 2014 is that what you really need to do is “to dress for success” with the right T-shirt …:

B0-OViwCMAAWnPj(Caption: Kirk Borne, Cortnie Abercrombie, and Jake Porway sharing a moment)

  • Carpe datum!

Follow Kirk Borne on Twitter @KirkDBorne