Category Archives: Machine Learning

Four Ways to Harness Big Data in the Energy Sector

The big change in every industry – energy included – is the sensoring of the world. We have put sensors on just about everything, and that’s definitely true in the energy sector, whether it’s in electricity, oil and gas, supply chain and manufacturing, or even customer interaction.

We call that big data, which people sometimes take to mean as big volume, but I like to think of as big value. Because with all this information, we can understand how our systems work – and put them to better use to create greater value for our business.

Prescriptive models

There is greater insight than ever before into how energy is distributed, both in the course of the day and across specific locations. In NASA, we tracked potential ‘killer asteroids’ – asteroids that have the potential to do serious damage to our earth.

In the energy industry that increased insight will help when averting its own disasters, or ‘killer asteroids’. Both predictive and prescriptive models are particularly useful here, for predicting future outcomes and for prescribing different (better, optimal) outcomes: if enough data are collected from factors such as the environment, devices used, and contextual information such as weather patterns or energy usage, models now know how to set the user parameters to prevent disaster.

On an energy grid, if a bad outcome is upcoming, conditions can be set to prevent machine failure, whether through reducing the operating temperature or speed, or increasing the frequency of on/off cycles.

Digital twins

An exciting proposition for the industrial sector is the digital twin, a full Computer-Aided Design (CAD) model of any device that can be run alongside its original to track behaviour and identify the cause of any failures. In the energy industry, this could be a digital replica of an offshore wind turbine, for example, monitoring power usage, production and efficiencies, and replaying any anomalies to identify their cause.

Crowdsourcing

When I was at NASA, we worked with scientists worldwide to create an online web portal called Zooniverse that presents large data collections to the public, enabling everyone to contribute to scientific discovery in datasets that are too large for the science teams to explore by themselves.

With the mountains of data we had it might take years, even centuries, to look through. But if you put it online and create an interesting enough challenge around that data, people across the world are going to volunteer their time in the name of scientific research.

We had some incredible results from this crowdsourcing mission. A Dutch schoolteacher called Hanny van Arkel discovered a strange green object while she was looking through images, which turned out to be an entirely new type of astronomical object (and is now named after her).

If I was to hypothesize how you could similarly crowdsource data in the energy sector, perhaps you could have people look at data prior to blackouts. There could be a pattern to the outage, or they could identify places where the power stays on during these blackouts – such as schools or grocery stores – which could be pinpointed on an app so people could use them when there are cuts.

Move beyond reporting

Energy industry executives should be interested in big data, because it allows them to look at all aspects of their business. I have conversations with publicly traded companies who only report on an annual or quarterly basis. I understand this is a regulatory requirement, but it seems to me to be only acting in hindsight.

You can’t drive a vehicle by looking in the rear-view mirror. You have to understand what’s coming, and the only way you can do that is by taking in all the information that is available to you: from the child playing on the street beside the road, to the truck two cars ahead.

Big data can deal with hindsight – events that have happened in the past; oversight – events happening now; and foresight – events happening in the future. And prescriptive modelling gives insight into all of these events. At a time where the global energy industry is undergoing huge change, don’t we need that forward-looking view of the rich information embedded within our big data reservoirs?

Summary

Therefore, the energy industry can benefit from these four approaches (prescriptive modelling, digital twinning, crowdsourcing, and moving beyond reporting) as ways to manage the digital disruption and the flood of new data coming from sensors everywhere. Industry leaders used to encourage organizations to adopt certain products or methodologies to help their data analytics move at the speed of business. I believe now that this is not a viable approach. What you really need are solutions that can help your business move at the speed of your data!

———————–

NOTE: This post was originally published at https://www.linkedin.com/pulse/four-ways-harness-big-data-energy-sector-kirk-borne/

Facilitate Proactive Cybersecurity Operations with Big Data Analytics and Machine Intelligence

Digital processes are disrupting and transforming organizations everywhere. Big Data, Machine Learning, AI, and their applications companion, Machine Intelligence, are the soul of this digital revolution. These applications are probably no more essential anywhere else than in cybersecurity operations.

The quantity and complexity of digital network information has skyrocketed in recent years due to the explosion of internet-connected devices, the rise of operational technologies (OT), and the growth of an interconnected global economy (including IOT: the Internet of Things). With exponentially multiplying mountains of human- and machine-generated network data, the ability to extract meaningful signals about potentially nefarious activities, and ultimately deter those activities, has become increasingly complex.

In other domains, such as marketing and e-commerce, businesses have been able to effectively apply data mining to create customer “journeys” in order to predict and recommend content or products to the end-user. However, within cybersecurity operations, the ability to map the journey of an analyst or an adversary is inherently complex due to the dynamic nature of computer networks, the sophistication of adversaries, and the pervasiveness of technical and human factors that expose network vulnerabilities. Despite these challenges, there is hope for making meaningful progress. E-commerce marketing and cyber operations share one significant factor—the primary actor is a human being, whose interests, intents, motivations, and goals often manifest through their actions, behaviors, and other digital breadcrumbs.

For modern cybersecurity operations to be effective, it is necessary for organizations to monitor digital breadcrumbs from diverse data streams to identify strong activity signals. But, it doesn’t stop at monitoring sensors for signals. Our cybersecurity applications must proceed proactively from sensors (data collectors) to signals (big data) to sentinels (pattern detection and recognition, including the identification of early warnings and creation of alerts, through the algorithms of AI and machine learning) to sense-making (insights and action, through machine intelligence).

You can learn more about extracting meaningful signals from mountains of data and instrumenting advanced analytics for improved cyber defenses in the new report from Booz Allen Hamilton. Download and read your copy of the free report “Modernizing Cyber Security Operations with Machine Intelligence” here: https://www.oreilly.com/ideas/modernizing-cybersecurity-approaches

For additional insights into Booz Allen Hamilton’s Machine Intelligence capabilities, and how they can help your organization, download your copy of the “Machine Intelligence Primer” here: http://www.boozallen.com/machineintelligence

Discovering and understanding patterns in highly dimensional data

Dimensionality reduction is a critical component of any solution dealing with massive data collections. Being able to sift through a mountain of data efficiently in order to find the key descriptive, predictive, and explanatory features of the collection is a fundamental required capability for coping with the Big Data avalanche. Identifying the most interesting dimensions of data is especially valuable when visualizing high-dimensional (high-variety) big data.

There is a “good news, bad news” angle here. First, the bad news: the human capacity for seeing multiple dimensions is very limited: 3 or 4 dimensions are manageable; 5 or 6 dimensions are possible; but more dimensions are difficult-to-impossible to assimilate. Now for the good news: the human cognitive ability to detect patterns, anomalies, changes, or other “features” in a large complex “scene” surpasses most computer algorithms for speed and effectiveness. In this case, a “scene” refers to any small-n projection of a larger-N parameter space of variables.

In data visualization, a systematic ordered parameter sweep through an ensemble of small-n projections (scenes) is often referred to as a “grand tour”, which allows a human viewer of the visualization sequence to see quickly any patterns or trends or anomalies in the large-N parameter space. Even such “grand tours” can miss salient (explanatory) features of the data, especially when the ratio N/n is large.

Consequently, a data analytics approach that combines the best of both worlds (machine algorithms and human perception) will enable efficient and effective exploration of large high-dimensional data. One such approach is to apply Computer Vision algorithms, which are designed to emulate human perception and cognitive abilities. Another approach is to generate “interestingness metrics” that signal to the data end-user the most interesting and informative features (or combinations of features) in high-dimensional data. A specific example of the latter is latent (hidden) variable discovery.

Latent variables are not explicitly observed but are inferred from the observed features, specifically because they are the variables that deliver the all-important (but sometimes hidden) descriptive, predictive, and explanatory power of the data set. Latent variables can also be concepts that are implicitly represented by the data (e.g., the “sentiment” of the author of a social media posting).  

Because some latent variables are “observable” in the sense that they can be generated through a “yet to be discovered” mathematical combination of several of the measured variables, these are therefore an obvious example of dimension reduction for visual exploration of large high-dimensional data.

Latent (Hidden) Variable Models are used in statistics to infer variables that are not observed but are inferred from the variables that are observed. Latent variables are widely used in social science, psychology, economics, life sciences and machine learning. In machine learning, many problems involve collection of high-dimensional multivariate observations and then hypothesizing a model that explains them. In such models, the role of the latent variables is to represent properties that have not been directly observed.

After inferring the existence of latent variables, the next challenge is to understand them. This can be achieved by exploring their relationship with the observed variables (e.g., using Bayesian methods) . Several correlation measures and dimensionality reduction methods such as PCA can be used to measure those relationships. Since we don’t know in advance what relationships exist between the latent variables and the observed variables, more generalized nonparametric measures like the Maximal Information Coefficient (MIC) can be used.

MIC has become popular recently, to some extent because it provides a straightforward R-squared type of estimate to measure dependency among variables in a high-dimensional data set.  Since we don’t know in advance what a latent variable actually represents, it is not possible to predict the type of relationship that it might possess with the observed variables. Consequently, a nonparametric approach makes sense in the case of large high-dimensional data, for which the interrelationships among the many variables is a mystery. Exploring variables that possess the largest values of MIC can help us to understand the type of relationships that the latent variables have with the existing variables, thereby achieving both dimension reduction and a parameter space in which to conduct visual exploration of high-dimensional data.

The techniques described here can help data end-users to discover and understand data patterns that may lead to interesting insights within their massive data collections.

Follow Kirk Borne on Twitter @KirkDBorne