Category Archives: Machine Learning

Data Makes Possible Many Things: Insights Discovery, Innovation, and Better Decisions

In the early days of the big data era (at the peak of the big data hype), we would often hear about the 3 V’s of big data (Volume, Variety, and Velocity). Then, people started adding more V’s, including Veracity and Value, plus many more! I was guilty of adding several more in my article “Top 10 Big Data Challenges – A Serious Look at 10 Big Data V’s“.

Through the years, I have decided on the following “4 V’s of Big Data” summary in my own presentations:

  • Volume = the most annoying V
  • Velocity = the most challenging V
  • Variety = the most rich V for insights discovery, innovation, and better decisions
  • Value = the most important V

As Dez Blanchfield once said, “You don’t need a data scientist to tell you big data is valuable. You need one to show its value.”

A series of articles that drive home the all-important value of data is being published on the DataMakesPossible.com site. The site’s domain name says it all: Data Makes Possible!

What does data make possible? I occasionally discuss these things in articles that I write for the MapR blog series, where I have summarized the big data value proposition very simply in my own list of the 3 D2D‘s of big data:

  • Data-to-Discovery (Class Discovery, Correlation Discovery, Novelty Discovery, Association Discovery)
  • Data-to-Decisions
  • Data-to-Dollars (or Data-to-Dividends)

The story is richer and much more impactful these days than those trivial 1-letter (V) or 3-letter (D2D) mnemonics would suggest. We are far past the peak of inflated expectations in the big data hype cycle, and we are even beyond the trough of disillusionment. We have truly entered the plateau of productivity from data in our organizations, though the analytics-driven culture is still going through growing pains.

So, what does data make possible? I document our progress toward deriving big value from big data in a series of articles that I am writing (with more to come) for the DataMakesPossible.com site. These articles include:

But don’t stop there! There are many more fabulous articles, insights, case studies, and impactful stories at DataMakesPossible.com — visit the site often, as there are new posts every week.

Let’s improve our world together through the insights and discoveries that large, comprehensive data collections can provide! That’s what data scientists love to do.

Follow Kirk Borne on Twitter @KirkDBorne

Recent top-selling books in AI and Machine Learning

Here are some Artificial Intelligence and Machine Learning books that are top sellers on Amazon.

“The Book of Why: The New Science of Cause and Effect”

“Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems”

“Deep Learning (Adaptive Computation and Machine Learning)”

“Applied Artificial Intelligence: A Handbook For Business Leaders”

“Machine Learning For Absolute Beginners: A Plain English Introduction”

“Life 3.0: Being Human in the Age of Artificial Intelligence”

“An Introduction to Statistical Learning: with Applications in R” (7th printing; 2017 edition)

“Grokking Algorithms: An illustrated guide for programmers and other curious people”

“Prediction Machines: The Simple Economics of Artificial Intelligence”

“Deep Learning with Python”

“Python Machine Learning” (2nd edition)

“Advances in Financial Machine Learning”

“The Future of Leadership: Rise of Automation, Robotics and Artificial Intelligence”

“Deep Learning With Python: Step By Step Guide With Keras and Pytorch”

“Deep Reinforcement Learning Hands-On”

“Pattern Recognition and Machine Learning”


Disclosure statement: As an Amazon Associate I earn from qualifying purchases made here. These earnings offset the costs of hosting this website and domain name registration.

Machine Learning Making Big Moves in Marketing

Machine Learning is (or should be) a core component of any marketing program now, especially in digital marketing campaigns. The following insightful quote by Dan Olley (EVP of Product Development and CTO at Elsevier) sums up the urgency and criticality of the situation: “If CIOs invested in machine learning three years ago, they would have wasted their money. But if they wait another three years, they will never catch up.” This statement also applies to CMOs.

To illustrate and to motivate these emerging and growing developments in marketing, we list here some of the top Machine Learning trends that we see:

  1. Hyper-personalization (SegOne context-driven marketing)
  2. Real-time sentiment analysis and response (social customer care)
  3. Behavioral analytics (predictive and prescriptive)
  4. Conversational chatbots (using NLG: Natural Language Generation)
  5. Agile analytics (DataOps)
  6. Influencer marketing (amplification of your message to specific audiences)
  7. Journey Sciences (using graph and linked data modeling)
  8. Context-based customer engagement through IoT (knowing the knowable via ubiquitous sensors)

You can read more details about each of these developments in my MapR blog.

And check out the many excellent resources and consulting services (in Big Data Analytics, Data Science, Machine Learning, and Machine Intelligence) at Booz Allen Hamilton, to help all of your data-driven campaigns make big moves and move forward more effectively.

Four Ways to Harness Big Data in the Energy Sector

The big change in every industry – energy included – is the sensoring of the world. We have put sensors on just about everything, and that’s definitely true in the energy sector, whether it’s in electricity, oil and gas, supply chain and manufacturing, or even customer interaction.

We call that big data, which people sometimes take to mean as big volume, but I like to think of as big value. Because with all this information, we can understand how our systems work – and put them to better use to create greater value for our business.

Prescriptive models

There is greater insight than ever before into how energy is distributed, both in the course of the day and across specific locations. In NASA, we tracked potential ‘killer asteroids’ – asteroids that have the potential to do serious damage to our earth.

In the energy industry that increased insight will help when averting its own disasters, or ‘killer asteroids’. Both predictive and prescriptive models are particularly useful here, for predicting future outcomes and for prescribing different (better, optimal) outcomes: if enough data are collected from factors such as the environment, devices used, and contextual information such as weather patterns or energy usage, models now know how to set the user parameters to prevent disaster.

On an energy grid, if a bad outcome is upcoming, conditions can be set to prevent machine failure, whether through reducing the operating temperature or speed, or increasing the frequency of on/off cycles.

Digital twins

An exciting proposition for the industrial sector is the digital twin, a full Computer-Aided Design (CAD) model of any device that can be run alongside its original to track behaviour and identify the cause of any failures. In the energy industry, this could be a digital replica of an offshore wind turbine, for example, monitoring power usage, production and efficiencies, and replaying any anomalies to identify their cause.

Crowdsourcing

When I was at NASA, we worked with scientists worldwide to create an online web portal called Zooniverse that presents large data collections to the public, enabling everyone to contribute to scientific discovery in datasets that are too large for the science teams to explore by themselves.

With the mountains of data we had it might take years, even centuries, to look through. But if you put it online and create an interesting enough challenge around that data, people across the world are going to volunteer their time in the name of scientific research.

We had some incredible results from this crowdsourcing mission. A Dutch schoolteacher called Hanny van Arkel discovered a strange green object while she was looking through images, which turned out to be an entirely new type of astronomical object (and is now named after her).

If I was to hypothesize how you could similarly crowdsource data in the energy sector, perhaps you could have people look at data prior to blackouts. There could be a pattern to the outage, or they could identify places where the power stays on during these blackouts – such as schools or grocery stores – which could be pinpointed on an app so people could use them when there are cuts.

Move beyond reporting

Energy industry executives should be interested in big data, because it allows them to look at all aspects of their business. I have conversations with publicly traded companies who only report on an annual or quarterly basis. I understand this is a regulatory requirement, but it seems to me to be only acting in hindsight.

You can’t drive a vehicle by looking in the rear-view mirror. You have to understand what’s coming, and the only way you can do that is by taking in all the information that is available to you: from the child playing on the street beside the road, to the truck two cars ahead.

Big data can deal with hindsight – events that have happened in the past; oversight – events happening now; and foresight – events happening in the future. And prescriptive modelling gives insight into all of these events. At a time where the global energy industry is undergoing huge change, don’t we need that forward-looking view of the rich information embedded within our big data reservoirs?

Summary

Therefore, the energy industry can benefit from these four approaches (prescriptive modelling, digital twinning, crowdsourcing, and moving beyond reporting) as ways to manage the digital disruption and the flood of new data coming from sensors everywhere. Industry leaders used to encourage organizations to adopt certain products or methodologies to help their data analytics move at the speed of business. I believe now that this is not a viable approach. What you really need are solutions that can help your business move at the speed of your data!

———————–

NOTE: This post was originally published at https://www.linkedin.com/pulse/four-ways-harness-big-data-energy-sector-kirk-borne/

Facilitate Proactive Cybersecurity Operations with Big Data Analytics and Machine Intelligence

Digital processes are disrupting and transforming organizations everywhere. Big Data, Machine Learning, AI, and their applications companion, Machine Intelligence, are the soul of this digital revolution. These applications are probably no more essential anywhere else than in cybersecurity operations.

The quantity and complexity of digital network information has skyrocketed in recent years due to the explosion of internet-connected devices, the rise of operational technologies (OT), and the growth of an interconnected global economy (including IOT: the Internet of Things). With exponentially multiplying mountains of human- and machine-generated network data, the ability to extract meaningful signals about potentially nefarious activities, and ultimately deter those activities, has become increasingly complex.

In other domains, such as marketing and e-commerce, businesses have been able to effectively apply data mining to create customer “journeys” in order to predict and recommend content or products to the end-user. However, within cybersecurity operations, the ability to map the journey of an analyst or an adversary is inherently complex due to the dynamic nature of computer networks, the sophistication of adversaries, and the pervasiveness of technical and human factors that expose network vulnerabilities. Despite these challenges, there is hope for making meaningful progress. E-commerce marketing and cyber operations share one significant factor—the primary actor is a human being, whose interests, intents, motivations, and goals often manifest through their actions, behaviors, and other digital breadcrumbs.

For modern cybersecurity operations to be effective, it is necessary for organizations to monitor digital breadcrumbs from diverse data streams to identify strong activity signals. But, it doesn’t stop at monitoring sensors for signals. Our cybersecurity applications must proceed proactively from sensors (data collectors) to signals (big data) to sentinels (pattern detection and recognition, including the identification of early warnings and creation of alerts, through the algorithms of AI and machine learning) to sense-making (insights and action, through machine intelligence).

You can learn more about extracting meaningful signals from mountains of data and instrumenting advanced analytics for improved cyber defenses in the new report from Booz Allen Hamilton. Download and read your copy of the free report “Modernizing Cyber Security Operations with Machine Intelligence” here: https://www.oreilly.com/ideas/modernizing-cybersecurity-approaches

For additional insights into Booz Allen Hamilton’s Machine Intelligence capabilities, and how they can help your organization, download your copy of the “Machine Intelligence Primer” here: http://www.boozallen.com/machineintelligence

Discovering and understanding patterns in highly dimensional data

Dimensionality reduction is a critical component of any solution dealing with massive data collections. Being able to sift through a mountain of data efficiently in order to find the key descriptive, predictive, and explanatory features of the collection is a fundamental required capability for coping with the Big Data avalanche. Identifying the most interesting dimensions of data is especially valuable when visualizing high-dimensional (high-variety) big data.

There is a “good news, bad news” angle here. First, the bad news: the human capacity for seeing multiple dimensions is very limited: 3 or 4 dimensions are manageable; 5 or 6 dimensions are possible; but more dimensions are difficult-to-impossible to assimilate. Now for the good news: the human cognitive ability to detect patterns, anomalies, changes, or other “features” in a large complex “scene” surpasses most computer algorithms for speed and effectiveness. In this case, a “scene” refers to any small-n projection of a larger-N parameter space of variables.

In data visualization, a systematic ordered parameter sweep through an ensemble of small-n projections (scenes) is often referred to as a “grand tour”, which allows a human viewer of the visualization sequence to see quickly any patterns or trends or anomalies in the large-N parameter space. Even such “grand tours” can miss salient (explanatory) features of the data, especially when the ratio N/n is large.

Consequently, a data analytics approach that combines the best of both worlds (machine algorithms and human perception) will enable efficient and effective exploration of large high-dimensional data. One such approach is to apply Computer Vision algorithms, which are designed to emulate human perception and cognitive abilities. Another approach is to generate “interestingness metrics” that signal to the data end-user the most interesting and informative features (or combinations of features) in high-dimensional data. A specific example of the latter is latent (hidden) variable discovery.

Latent variables are not explicitly observed but are inferred from the observed features, specifically because they are the variables that deliver the all-important (but sometimes hidden) descriptive, predictive, and explanatory power of the data set. Latent variables can also be concepts that are implicitly represented by the data (e.g., the “sentiment” of the author of a social media posting).  

Because some latent variables are “observable” in the sense that they can be generated through a “yet to be discovered” mathematical combination of several of the measured variables, these are therefore an obvious example of dimension reduction for visual exploration of large high-dimensional data.

Latent (Hidden) Variable Models are used in statistics to infer variables that are not observed but are inferred from the variables that are observed. Latent variables are widely used in social science, psychology, economics, life sciences and machine learning. In machine learning, many problems involve collection of high-dimensional multivariate observations and then hypothesizing a model that explains them. In such models, the role of the latent variables is to represent properties that have not been directly observed.

After inferring the existence of latent variables, the next challenge is to understand them. This can be achieved by exploring their relationship with the observed variables (e.g., using Bayesian methods) . Several correlation measures and dimensionality reduction methods such as PCA can be used to measure those relationships. Since we don’t know in advance what relationships exist between the latent variables and the observed variables, more generalized nonparametric measures like the Maximal Information Coefficient (MIC) can be used.

MIC has become popular recently, to some extent because it provides a straightforward R-squared type of estimate to measure dependency among variables in a high-dimensional data set.  Since we don’t know in advance what a latent variable actually represents, it is not possible to predict the type of relationship that it might possess with the observed variables. Consequently, a nonparametric approach makes sense in the case of large high-dimensional data, for which the interrelationships among the many variables is a mystery. Exploring variables that possess the largest values of MIC can help us to understand the type of relationships that the latent variables have with the existing variables, thereby achieving both dimension reduction and a parameter space in which to conduct visual exploration of high-dimensional data.

The techniques described here can help data end-users to discover and understand data patterns that may lead to interesting insights within their massive data collections.

Follow Kirk Borne on Twitter @KirkDBorne