Tag Archives: Big Data

Standards in the Big Data Analytics Profession

A sign of maturity for most technologies and professions is the appearance of standards. Standards are used to enable, to promote, to measure, and perhaps to govern the use of that technology or the practice of that profession across a wide spectrum of communities. Standardization increases independent applications and comparative evaluations of the tools and practices of a profession.

Standards often apply to processes and codes of conduct, but standards also apply to digital content, including: (a) interoperable data exchange (such as GIS, CDF, or XML-based data standards); (b) data formats (such as ASCII or IEEE 754); (c) image formats (such as GIF or JPEG); (d) metadata coding standards (such as ICD-10 for the medical profession, or the Dublin Core for cultural, research, and information artifacts); and (e) standards for the sharing of models (such as PMML, the predictive model markup language, for data mining models).

Standards are ubiquitous.  This abundance causes some folks to quip: “The nice thing about standards is that there are so many of them.”  So, it should not be surprising to note that standards are now beginning to appear also in the worlds of big data and data science, providing evidence of the growing maturity of those professions…

(continue reading herehttps://www.mapr.com/blog/raising-standard-big-data-analytics-profession)

Follow Kirk Borne on Twitter @KirkDBorne

My Data Science Declaration for 2015

Here it is… my Data Science Declaration for 2015 (posted to Twitter on January 14, 2015):

“Now is the time to begin thinking of Data Science as a profession not a job, as a corporate culture not a corporate agenda, as a strategy not a stratagem, as a core competency not a course, and as a way of doing things not a thing to do.”



Follow Kirk Borne on Twitter @KirkDBorne

Top 10 Conversations That You Don’t Want to Have on Data Innovation Day

On January 22, the world celebrates Data Innovation Day. Here are the top 10 conversations that you don’t want to have on that day. Let the countdown begin….

10.  CDO (Chief Data Officer) speaking to Data Innovation Day event manager who is trying to re-schedule the event for Father’s Day: “Hey! It’s pronounced ‘Day-tuh’, not ‘Dadda’.”

9.  CDO speaking at the company’s Data Innovation Day event regarding an acronym that was used to list his job title in the event program guide: “I am the company’s Big Data ‘As A Service’ guru, not the company’s Big Data ‘As Software Service’ guru.”  (Hint: that’s BigData-aaS, not BigData-aSS)

8.  Data Scientist speaking to Data Innovation Day session chairperson: “Why are all of these cows on stage with me? I said I was planning to give a LASSO demonstration.”

​7.  Anyone speaking to you: “Our organization has always done big data.”

6.  You speaking to anyone: “Seriously? The title of our Data Innovation Day Event is ‘Big Data is just Small Data, Only Bigger’?”

5.  New cybersecurity administrator (fresh from college) sends this e-mail to company’s Data Scientists at 4:59pm: “The security holes in our Hadoop system are now fixed. It will now automatically block all ports from accepting incoming data access requests between 5:00pm and 9:00am the next day.  Gotta go now.  Have a nice evening.  From your new BFF.”

4.  Data Scientist to new HR Department Analytics ​Specialist regarding the truckload of tree seedlings that she received as her end-of-year company bonus:  “I said in my employment application that I like Decision Trees, not Deciduous Trees.”

3.  Organizer for the huge Las Vegas Data Innovation Day Symposium speaking to the conference keynote speaker: “Oops, sorry.  I blew your $100,000 speaker’s honorarium at the poker tables in the Grand Casino.”

2.  Over-zealous cleaning crew speaking to Data Center Manager arriving for work in the morning after Data Innovation Day event that was held in the company’s Exascale Data Center: “We did a very thorough job cleaning your data center. And we won’t even charge you for the extra hours that we spent wiping the dirty data from all of those disk drives that you kept talking about yesterday.”

1.  Announcement to University staff regarding the Data Innovation Day event:  “Dan Ariely’s keynote talkBig Data is Like Teenage Sex‘ is being moved from room B002 in the Physics Department to the Campus Football Stadium due to overwhelming student interest.”


Follow Kirk Borne on Twitter @KirkDBorne

When Big Data Gets Local, Small Data Gets Big

We often hear that small data deserves at least as much attention in our analyses as big data. While there may be as many interpretations of that statement as there are definitions of big data, there are at least two situations where “small data” applications are worth considering. I will label these “Type A” and “Type B” situations.

In “Type A” situations, small data refers to having a razor-sharp focus on your business objectives, not on the volume of your data. If you can achieve those business objectives (and “answer the mail”) with small subsets of your data mountain, then do it, at once, without delay!

In “Type B” situations, I believe that “small” can be interpreted to mean that we are relaxing at least one of the 3 V’s of big data: Velocity, Variety, or Volume:

  1. If we focus on a localized time window within high-velocity streaming data (in order to mine frequent patterns, find anomalies, trigger alerts, or perform temporal behavioral analytics), then that is deriving value from “small data.”
  2. If we limit our analysis to a localized set of features (parameters) in our complex high-variety data collection (in order to find dominant segments of the population, or classes/subclasses of behavior, or the most significant explanatory variables, or the most highly informative variables), then that is deriving value from “small data.”
  3. If we target our analysis on a tight localized subsample of entries in our high-volume data collection (in order to deliver one-to-one customer engagement, personalization, individual customer modeling, and high-precision target marketing, all of which still require use of the full complexity, variety, and high-dimensionality of the data), then that is deriving value from “small data.”

(continue reading here: https://www.mapr.com/blog/when-big-data-goes-local-small-data-gets-big-part-1)

Follow Kirk Borne on Twitter @KirkDBorne

Local Linear Embedding(Image source**: http://mdp-toolkit.sourceforge.net/examples/lle/lle.html)

**Zito, T., Wilbert, N., Wiskott, L., Berkes, P. (2009). Modular toolkit for Data Processing (MDP): a Python data processing frame work, Front. Neuroinform. (2008) 2:8. doi:10.3389/neuro.11.008.2008

New Directions for Big Data and Analytics in 2015

The world of big data and analytics is remarkably vibrant and marked by incredible innovation, and there are advancements on every front that will continue into 2015. These include increased data science education opportunities and training programs, in-memory analytics, cloud-based everything-as-a-service, innovations in mobile (business intelligence and visual analytics), broader applications of social media (for data generation, consumption and exploration), graph (linked data) analytics, embedded machine learning and analytics in devices and processes, digital marketing automation (in retail, financial services and more), automated discovery in sensor-fed data streams (including the internet of everything), gamification, crowdsourcing, personalized everything (medicine, education, customer experience and more) and smart everything (highways, cities, power grid, farms, supply chain, manufacturing and more).

Within this world of wonder, where will we wander with big data and analytics in 2015? I predict two directions for the coming year…

(continue reading herehttp://www.ibmbigdatahub.com/blog/new-directions-big-data-and-analytics-2015)

Follow Kirk Borne on Twitter @KirkDBorne

Outlier Detection Gets a New Look – Surprise Discovery in Big Data

Novelty and surprise are two of the more exciting aspects of science – finding something totally new and unexpected can lead to a quick research paper, or it can make your career. As scientists, we all yearn to make a significant discovery. Petascale big data collections potentially offer a multitude of such opportunities. But how do we find that unexpected thing? These discoveries come under various names: interestingness, outlier, novelty, anomaly, surprise, or defect (depending on the application). Outlier? Anomaly? Defect? How did they get onto this list? Well, those features are often the unexpected, interesting, novel, and surprising aspects (patterns, points, trends, and/or associations) in the data collection. Outliers, anomalies, and defects might be insignificant statistical deviants, or else they could represent significant scientific discoveries.

(continue reading herehttp://stats.cwslive.wiley.com/details/feature/6597751/Outlier-Detection-Gets-a-Makeover—Surprise-Discovery-in-Scientific-Big-Data.html)

Follow Kirk Borne on Twitter @KirkDBorne

The Power of Three: Big Data, Hadoop, and Finance Analytics

Big data is a universal phenomenon. Every business sector and aspect of society is being touched by the expanding flood of information from sensors, social networks, and streaming data sources. The financial sector is riding this wave as well. We examine here some of the features and benefits of Hadoop (and its family of tools and services) that enable large-scale data processing in finance (and consequently in nearly every other sector).

Three of the greatest benefits of big data are discovery, improved decision support, and greater return on innovation. In the world of finance, these also represent critical business functions….

(continue reading here:  https://www.mapr.com/blog/potent-trio-big-data-hadoop-and-finance-analytics)

Follow Kirk Borne on Twitter @KirkDBorne

IBM Insight 2014 – Day 2: The “One Thing” – Watson Analytics

The highlight of Day 2 at IBM Insight 2014 was the presentation of numerous examples, new features, powerful capabilities, and strategic vision for Watson Analytics.  This was the “one thing” – (to borrow the phrase from the movie “City Slickers”) – the one thing that seems to matter the most, that will make the biggest impact, and that has captured the essence of big data and analytics technologies for the future, rapidly approaching world of data everywhere, sensors everywhere, and the Internet of Things.

(continue reading more about Watson Analytics here:  http://ibm.co/10zEl6S)

Follow Kirk Borne on Twitter @KirkDBorne

IBM Insight 2014 – Day 1 Soundbites: Carpe Datum

There are big data meetups, workshops, conferences, and symposia. And then… there is IBM Insight 2014! There’s only one word to describe this happenin’ event: “Wow!”

The content of the event is focused on IBM’s products, services, corporate strengths, and partnerships. But the theme and message is laser-focused on the light-speed transformation of business in 2014 that has been achieved through insights from big data and analytics. From the Day 1 opening laser light show and film clip that featured DataKind founder Jake Porway along with Sensemaking evangelist Jeff Jonas, to their spectacular well timed entrance into the packed 12,000-seat Mandalay Bay Arena, continuing into a vast array of workshops and hands-on labs, the first day of Insight 2014 has been like a rapid tour through multiple parallel “Alice in Wonderland” universes.

If you are not able to attend the event, you can watch at InsightGO. You can also watch participant interviews on TheCube from SiliconANGLE and Wikibon: http://siliconangle.tv/ibm-insight-2014/

Ideas and insights have filled the arena and convention center in every conversation. Attached below are some of the soundbites (harvested from presentations and conversations).

What are people talking about at IBM Insight 2014?

  • Analytics take big data from information to insights to innovation.
  • The new data-driven business is built around “Systems of Insight” that inform every decision, interaction, and process.
  • Systems of Insight involve more people, more places, and more data.
  • Big data analytics drive business integration, intelligence, and innovation.
  • Watson Analytics reinvents the analytics experience in the cloud — its brilliant human-computer interface gives a whole new meaning to “human factors engineering”.
  • Cognitive Analytics with Watson generates (in real-time) the questions that you should be asking your data, through natural language dialogue, guided discovery, and fully automated intelligence.
  • IBM has released a suite a new services for big data and analytics, including Watson Curator, DataWorks, DashDB, and Cloudant.
  • The new quest for business is personalized engagement that incorporates immersive user experiences: fusing the physical world with digital interactions of all kinds.
  • In the era of digital marketing and real-time customer analytics, battles are won or lost in minutes (or even seconds).
  • A paradox in digital marketing has emerged:  outward-facing customer-centric analytics (personalization, segment of one) have forced organizations into more inward focus on big data operations.  We believe that this paradox evaporates when we realize that the focus on operations is in response to the urgent need to focus on the customer, at the right time, with the right offer, at the right place, in the right context.  That’s the 360 view, and that’s cognitive analytics at its best!
  • Fast data (big data velocity) is fast becoming the number 1 challenge, source of innovation, and revenue-generator for business.  Big data volume is so “2012”, and big data variety is so “2013” (though I personally think that we have yet to see the real power and revolution in data-driven business discovery through high-variety data, particularly via fast complex streaming data emanating from multiple sensors, sources,  and signals).
  • The real “big data analytics” talent shortage is in finding folks who know both the analytics (data science) and the business.
  • The Chief Data Officer is an agent for business transformation and change in the big data era.
  • IBM Insight might just be the 2014 World Series of Big Data Analytics.
  • Perhaps the real insight at IBM Insight 2014 is that what you really need to do is “to dress for success” with the right T-shirt …:

B0-OViwCMAAWnPj(Caption: Kirk Borne, Cortnie Abercrombie, and Jake Porway sharing a moment)

  • Carpe datum!

Follow Kirk Borne on Twitter @KirkDBorne


Chief Data Officer as Business Change Agent

Deriving business value from, leveraging, protecting, and promoting an organization’s rapidly growing data assets are now coming under the corporate executive sponsorship of a new member of the executive suite – the CDO (Chief Data Officer).  This role should be considered as distinctly different from other similarly defined roles: (a) the CIO, whose responsibilities now revolve primarily around information technologies and information security; (b) the CDS (Chief Data Scientist), whose role is evolving, but should be primarily that of Chief Scientist, specifically related to Data Science, exploring new business models and discoverying insights from the data resources; and (c) the CAO (Chief Analytics Officer), whose role is also evolving and who may be roughly equivalent to the CDS, though the CAO’s focus should be more on mapping the data science capabilities (championed by the CDS) and the data assets (sponsored by the CDO) onto the data-to-decisions, data-to-discovery, and data-to-insights goals of the line of business.

We also see a lot of overlap in this set of roles with those of the CMO (Chief Marketing Officer) and the Chief Innovation Strategy Officer.  We are not suggesting that each and every business will need all of these, but the organization should identify what their corporate strategy and business goals require, and then create the roles that will drive change in those directions.

In this evolving leadership landscape within the growing Big Data era, the CDO is definitely creating a lot of buzz.  Since Big Data and Analytics are now listed as the top drivers of innovation, revenue, and change within organizations, then the CDO should be there to drive that change.  Here are two sources of case studies and information regarding the CDO:

(1) See the new IBM Chief Data Officer website at http://ibm.com/services/c-suite/cdo. Related to this effort, see also the Institute for Business Value within IBM’s Center for Applied Insights. For further insights, listen to Cortnie Abercrombie of IBM as she provides further insights and recommendations for the CDO role in her online interviews: here and here!

(2) Download the Innovation Enterprise’s white paper Rise of the Chief Data Officer – An Executive Whose Time Has Come“, by George Hill and Chris Towers.  I was fortunate to write the Foreward for this booklet.  Here is an excerpt from my Foreward:

Many now believe that Big Data has matured, moving beyond the peak of its initial hype and is moving ahead into its promised plateau of productivity. Data has come of age in the corporate boardroom as well.  The enormous potential for new wealth, new products, new customers, new insights, and new entrepreneurial business lines has caused a cataclysmic shift in the power of “information” in the corporate executive suite. The existing CIO’s role seems to have solidified in the past decade to that of “Chief Information Technology Officer,” with an emphasis primarily on technology and infrastructure.  The new CxO in the boardroom is the data person (the “data lover”). This may be the Chief Data Scientist (focused on the analytics objectives, opportunities, and obsessions that arise in this era of Big Data).  But, we also see the CDO (Chief Data Officer) coming into the inner circle of executive power.

The CDO is focused on the data – acquisition, governance, quality, management, integration, policies (including privacy, preservation, deduplication, curation), value creation, recruiting skilled data professionals, establishing a data-driven corporate culture, team-building around data-centric business objectives, and acquisition and oversight of corporate data technologies (not I.T. in the historical sense). The responsibilities are enormous, the requisite skills are CxO-worthy, the challenges are many, and the opportunities to create and define the role are very attractive.

(continue reading here http://ie.theinnovationenterprise.com/event_justify_your_rois/Rise-of-the-Chief-Data-Officer.pdf)

Follow Kirk Borne on Twitter @KirkDBorne