Category Archives: Big Data

Feature Mining in Big Data

We love features in our data, lots of features, in the same way that we love features in our toys, mobile phones, cars, and other gadgets. Good features in our big data collection empower us to build accurate predictive models, identify the most informative trends in our data, discover insightful patterns, and select the most descriptive parameters for data visualizations. Therefore, it is no surprise that feature mining is one aspect of data science that appeals to all data scientists. Feature mining includes: (1) feature generation (from combinations of existing attributes), (2) feature selection (for mining and knowledge discovery), and (3) feature extraction (for operational systems, decision support, and reuse in various analytics processes, dashboards, and pipelines).

Learn more about feature mining and feature selection for Big Data Analytics in these publications:

Feature-Rich Toys and Data
Interactive Visualization-enabled Feature Selection and Model Creation
Feature Selection (available on the National Science Bowl blog site)
Feature Selection Methods used with different Data Mining algorithms
(and for heavy data science pundits) Computational Methods of Feature Selection

Follow Kirk Borne on Twitter @KirkDBorne

Outlier Detection Gets a New Look – Surprise Discovery in Big Data

Leave a reply

Novelty and surprise are two of the more exciting aspects of science – finding something totally new and unexpected can lead to a quick research paper, or it can make your career. As scientists, we all yearn to make a significant discovery. Petascale big data collections potentially offer a multitude of such opportunities. But how do we find that unexpected thing? These discoveries come under various names: interestingness, outlier, novelty, anomaly, surprise, or defect (depending on the application). Outlier? Anomaly? Defect? How did they get onto this list? Well, those features are often the unexpected, interesting, novel, and surprising aspects (patterns, points, trends, and/or associations) in the data collection. Outliers, anomalies, and defects might be insignificant statistical deviants, or else they could represent significant scientific discoveries.

(continue reading here … http://stats.cwslive.wiley.com/details/feature/6597751/Outlier-Detection-Gets-a-Makeover—Surprise-Discovery-in-Scientific-Big-Data.html)

Follow Kirk Borne on Twitter @KirkDBorne

The Power of Three: Big Data, Hadoop, and Finance Analytics

Leave a reply

Big data is a universal phenomenon. Every business sector and aspect of society is being touched by the expanding flood of information from sensors, social networks, and streaming data sources. The financial sector is riding this wave as well. We examine here some of the features and benefits of Hadoop (and its family of tools and services) that enable large-scale data processing in finance (and consequently in nearly every other sector).

Three of the greatest benefits of big data are discovery, improved decision support, and greater return on innovation. In the world of finance, these also represent critical business functions….

(continue reading here: https://www.mapr.com/blog/potent-trio-big-data-hadoop-and-finance-analytics)

Follow Kirk Borne on Twitter @KirkDBorne

IBM Insight 2014 – Day 2: The “One Thing” – Watson Analytics

Leave a reply

The highlight of Day 2 at IBM Insight 2014 was the presentation of numerous examples, new features, powerful capabilities, and strategic vision for Watson Analytics. This was the “one thing” – (to borrow the phrase from the movie “City Slickers”) – the one thing that seems to matter the most, that will make the biggest impact, and that has captured the essence of big data and analytics technologies for the future, rapidly approaching world of data everywhere, sensors everywhere, and the Internet of Things.

(continue reading more about Watson Analytics here: http://ibm.co/10zEl6S)

Follow Kirk Borne on Twitter @KirkDBorne

IBM Insight 2014 – Day 1 Soundbites: Carpe Datum

Leave a reply

There are big data meetups, workshops, conferences, and symposia. And then… there is IBM Insight 2014! There’s only one word to describe this happenin’ event: “Wow!”

The content of the event is focused on IBM’s products, services, corporate strengths, and partnerships. But the theme and message is laser-focused on the light-speed transformation of business in 2014 that has been achieved through insights from big data and analytics. From the Day 1 opening laser light show and film clip that featured DataKind founder Jake Porway along with Sensemaking evangelist Jeff Jonas, to their spectacular well timed entrance into the packed 12,000-seat Mandalay Bay Arena, continuing into a vast array of workshops and hands-on labs, the first day of Insight 2014 has been like a rapid tour through multiple parallel “Alice in Wonderland” universes.

If you are not able to attend the event, you can watch at InsightGO. You can also watch participant interviews on TheCube from SiliconANGLE and Wikibon: http://siliconangle.tv/ibm-insight-2014/

Ideas and insights have filled the arena and convention center in every conversation. Attached below are some of the soundbites (harvested from presentations and conversations).

What are people talking about at IBM Insight 2014?

Analytics take big data from information to insights to innovation.
The new data-driven business is built around “Systems of Insight” that inform every decision, interaction, and process.
Systems of Insight involve more people, more places, and more data.
Big data analytics drive business integration, intelligence, and innovation.
Watson Analytics reinvents the analytics experience in the cloud — its brilliant human-computer interface gives a whole new meaning to “human factors engineering”.
Cognitive Analytics with Watson generates (in real-time) the questions that you should be asking your data, through natural language dialogue, guided discovery, and fully automated intelligence.
IBM has released a suite a new services for big data and analytics, including Watson Curator, DataWorks, DashDB, and Cloudant.
The new quest for business is personalized engagement that incorporates immersive user experiences: fusing the physical world with digital interactions of all kinds.
In the era of digital marketing and real-time customer analytics, battles are won or lost in minutes (or even seconds).
A paradox in digital marketing has emerged: outward-facing customer-centric analytics (personalization, segment of one) have forced organizations into more inward focus on big data operations. We believe that this paradox evaporates when we realize that the focus on operations is in response to the urgent need to focus on the customer, at the right time, with the right offer, at the right place, in the right context. That’s the 360 view, and that’s cognitive analytics at its best!
Fast data (big data velocity) is fast becoming the number 1 challenge, source of innovation, and revenue-generator for business. Big data volume is so “2012”, and big data variety is so “2013” (though I personally think that we have yet to see the real power and revolution in data-driven business discovery through high-variety data, particularly via fast complex streaming data emanating from multiple sensors, sources, and signals).
The real “big data analytics” talent shortage is in finding folks who know both the analytics (data science) and the business.
The Chief Data Officer is an agent for business transformation and change in the big data era.
IBM Insight might just be the 2014 World Series of Big Data Analytics.
Perhaps the real insight at IBM Insight 2014 is that what you really need to do is “to dress for success” with the right T-shirt …:

(Caption: Kirk Borne, Cortnie Abercrombie, and Jake Porway sharing a moment)

Carpe datum!

Follow Kirk Borne on Twitter @KirkDBorne

Chief Data Officer as Business Change Agent

Leave a reply

Deriving business value from, leveraging, protecting, and promoting an organization’s rapidly growing data assets are now coming under the corporate executive sponsorship of a new member of the executive suite – the CDO (Chief Data Officer). This role should be considered as distinctly different from other similarly defined roles: (a) the CIO, whose responsibilities now revolve primarily around information technologies and information security; (b) the CDS (Chief Data Scientist), whose role is evolving, but should be primarily that of Chief Scientist, specifically related to Data Science, exploring new business models and discoverying insights from the data resources; and (c) the CAO (Chief Analytics Officer), whose role is also evolving and who may be roughly equivalent to the CDS, though the CAO’s focus should be more on mapping the data science capabilities (championed by the CDS) and the data assets (sponsored by the CDO) onto the data-to-decisions, data-to-discovery, and data-to-insights goals of the line of business.

We also see a lot of overlap in this set of roles with those of the CMO (Chief Marketing Officer) and the Chief Innovation Strategy Officer. We are not suggesting that each and every business will need all of these, but the organization should identify what their corporate strategy and business goals require, and then create the roles that will drive change in those directions.

In this evolving leadership landscape within the growing Big Data era, the CDO is definitely creating a lot of buzz. Since Big Data and Analytics are now listed as the top drivers of innovation, revenue, and change within organizations, then the CDO should be there to drive that change. Here are two sources of case studies and information regarding the CDO:

(1) See the new IBM Chief Data Officer website at http://ibm.com/services/c-suite/cdo. Related to this effort, see also the Institute for Business Value within IBM’s Center for Applied Insights. For further insights, listen to Cortnie Abercrombie of IBM as she provides further insights and recommendations for the CDO role in her online interviews: here and here!

(2) Download the Innovation Enterprise’s white paper “Rise of the Chief Data Officer – An Executive Whose Time Has Come“, by George Hill and Chris Towers. I was fortunate to write the Foreward for this booklet. Here is an excerpt from my Foreward:

Many now believe that Big Data has matured, moving beyond the peak of its initial hype and is moving ahead into its promised plateau of productivity. Data has come of age in the corporate boardroom as well. The enormous potential for new wealth, new products, new customers, new insights, and new entrepreneurial business lines has caused a cataclysmic shift in the power of “information” in the corporate executive suite. The existing CIO’s role seems to have solidified in the past decade to that of “Chief Information Technology Officer,” with an emphasis primarily on technology and infrastructure. The new CxO in the boardroom is the data person (the “data lover”). This may be the Chief Data Scientist (focused on the analytics objectives, opportunities, and obsessions that arise in this era of Big Data). But, we also see the CDO (Chief Data Officer) coming into the inner circle of executive power.

The CDO is focused on the data – acquisition, governance, quality, management, integration, policies (including privacy, preservation, deduplication, curation), value creation, recruiting skilled data professionals, establishing a data-driven corporate culture, team-building around data-centric business objectives, and acquisition and oversight of corporate data technologies (not I.T. in the historical sense). The responsibilities are enormous, the requisite skills are CxO-worthy, the challenges are many, and the opportunities to create and define the role are very attractive.

(continue reading here … http://ie.theinnovationenterprise.com/event_justify_your_rois/Rise-of-the-Chief-Data-Officer.pdf)

Follow Kirk Borne on Twitter @KirkDBorne

IBM Insight 2014 – The Big Data World Series (or something like that)

Leave a reply

I am attending the IBM Insight 2014 conference this year, along with many(!) other big data analytics luminaries, including Jake Porway (@jakeporway, of Data Kind), James Kobielus (@jameskobielus, the IBM Big Data Evangelist — I love that title!), Carla Gentry (@data_nerd, of Analytical Solution), Lillian Pierson (@BigDataGal, data journalist and author of the new book “Data Science for Dummies“), and thousands (maybe millions) more!

There are many new developments happening in 2014 across the big data universe that will be discussed at #IBMinsight. These include cloud-based analytics, cognitive analytics, Big SQL, the Internet of Things, machine-to-machine analytics, real-time actionable insights from data, content-based customer experience management, and predictive everything (e.g., customer intelligence; manufacturing; biomedical conditions; etc.). This is the place to be right now in order to learn about these rapidly evolving advancements.

For the twitter crowd, you can follow and jump into one of the Insight Tweet Chats from anywhere in the world. For example, join the discussion on Sunday October 26 at 11:00pm (New York time): https://www.crowdchat.net/IBMInsight

Even if you cannot be in Las Vegas(**) for this great event (perhaps “The 2014 World Series of Big Data”), you can still watch the presentations online and learn all about the latest Big Data Analytics and Data Science insights via “Insight GO“. The IBM InsightGo site describes it like this:

InsightGO is IBM’s interactive digital platform, streaming live broadcasts straight to your laptop or mobile device. It’s the next best thing to being in Vegas.

InsightGO is specially designed for both offsite and onsite attendees and features:

General sessions and client keynotes
Live interviews with experts and influencers
Live product demos from the EXPO floor
Moderated chats and trending topics

InsightGO will be hosted by writer, gamer and video star Veronica Belmont.

InsightGO registration is complimentary.

(**) What happens in Vegas stays in my Twitter timeline!

Follow Kirk Borne on Twitter @KirkDBorne

When Big Data Goes Local, Small Data Gets Big

Leave a reply

This two-part series focuses on the value of doing small data analyses on a big data collection. In Part 1 of the series, we describe the applications and benefits of “small data” in general terms from several different perspectives. In Part 2 of the series, we’ll spend some quality time with one specific algorithm (Local Linear Embedding) that enables local subsets of data (i.e., small data) to be used in developing a global understanding of the full big data collection.

We often hear that small data deserves at least as much attention in our analyses as big data. While there may be as many interpretations of that statement as there are definitions of big data (and see more here), there are at least two situations where “small data” applications are worth considering…

(continue reading here … https://www.mapr.com/blog/when-big-data-goes-local-small-data-gets-big-part-1)

Follow Kirk Borne on Twitter @KirkDBorne

Apervi’s Conflux Gives a Big Boost to a Confluence of Big Data Workflows

Leave a reply

Data-driven workflows are the life and existence of big data professionals everywhere: data scientists, data analysts, and data engineers. We perform all types of data functions in these workflow processes: archive, discover, access, visualize, mine, manipulate, fuse, integrate, transform, feed models, learn models, validate models, deploy models, etc. It is a dizzying day’s work. We start manually in our workflow development, identifying what needs to happen at each stage of the process, what data are needed, when they are needed, where data needs to be staged, what are the inputs and outputs, and more. If we are really good, we can improve our efficiency in performing these workflows manually, but not substantially. A better path to success is to employ a workflow platform that is scalable (to larger data), extensible (to more tasks), more efficient (shorter time-to-solution), more effective (better solutions), adaptable (to different user skill levels and to different business requirements), comprehensive (providing a wide scope of functionality), and automated (to break the time barrier of manual workflow activities).

(continue reading here … http://www.bigdatanews.com/group/bdn-daily-press-releases/forum/topics/apervi-s-conflux-gives-a-big-boost-to-a-confluence-of-big-data-wo)

Follow Kirk Borne on Twitter @KirkDBorne

Rocket-Powered Data Science

Data Reflections by Dr. Kirk Borne @KirkDBorne

Category Archives: Big Data

Feature Mining in Big Data

Outlier Detection Gets a New Look – Surprise Discovery in Big Data

The Power of Three: Big Data, Hadoop, and Finance Analytics

IBM Insight 2014 – Day 2: The “One Thing” – Watson Analytics

IBM Insight 2014 – Day 1 Soundbites: Carpe Datum

Chief Data Officer as Business Change Agent

IBM Insight 2014 – The Big Data World Series (or something like that)

When Big Data Goes Local, Small Data Gets Big

Apervi’s Conflux Gives a Big Boost to a Confluence of Big Data Workflows