Tag Archives: Analytics

Three Types of Actionable Business Analytics Not Called Predictive or Prescriptive

Decades (at least) of business analytics writings have focused on the power, perspicacity, value, and validity in deploying predictive and prescriptive analytics for business forecasting and optimization, respectively. These are primarily forward-looking actionable (proactive) applications. 

There are other dimensions of analytics that tend to focus on hindsight for business reporting and causal analysis – these are descriptive and diagnostic analytics, respectively, which are primarily reactive applications, mostly explanatory and investigatory, not necessarily actionable.

In the world of data there are other types of nuanced applications of business analytics that are also actionable – perhaps these are not too different from predictive and prescriptive, but their significance, value, and implementation can be explained and justified differently. Before we dive into these additional types of analytics applications, let us first consider a little pedagogical exercise with two simple evidence-based inferences.

(a) In essentially 100% of cases where an automobile is involved in an accident, the automobile had four wheels on the car prior to the accident.

(b) In 100% of divorce cases, the divorcing couple was married prior to the divorce.

What is the point of those obvious statistical inferences? The point is that the 100% association between the event and the preceding condition has no special predictive or prescriptive power. Hence, prior knowledge of these 100% associations does not offer any actionable value. In statistical terms, the joint probability of event Y and condition X co-occurring, designated P(X,Y), is essentially the probability P(Y) of event Y occurring. The probability of the condition X occurring, P(X), is irrelevant since the existence of the precondition X is implicitly present by default.

Okay, those examples represent two remarkably uninteresting cases. Even when similar sorts of inferences occur in a business context, they have essentially zero value. How do predictive and prescriptive analytics fit into this statistical framework?

Using the same statistical terminology, the conditional probability P(Y|X) (the probability of Y occurring, given the presence of precondition X) is an expression of predictive analytics. By exploring and analyzing the business data, analysts and data scientists can search for and uncover such predictive relationships. This is predictive power discovery. Another way of saying this is: given observed data X, we can predict some outcome Y. Or more simply: given X, find Y.

Similarly (actually, conversely), we can use the conditional probability P(X|Y) (which is the probability that the precondition X exists, given the existence of outcome Y) as an expression of prescriptive analytics. How does that work in practice? By exploring and analyzing business data, analysts and data scientists can search for and uncover the conditions (causal factors) that have led to different outcomes. So, if the business wants to optimize some outcome Y, then data analysts will be tasked with finding the conditions X that must be implemented to achieve that desired outcome. This is prescriptive power discovery. Another way of saying this is: given some desired optimal outcome Y, what conditions X should we put in place. Or more simply: given Y, find X. Note how this simple mathematical expression of prescriptive analytics is exactly the opposite of our previous expression of predictive analytics (given X, find Y).

Here are a few business examples of this type of prescriptive analytics: Which marketing campaign is most efficient and effective (has best ROI) in optimizing sales? Which environmental factors during manufacturing, packaging, or shipping lead to reduced product returns? Which pricing strategies lead to the best business revenue? What equipment maintenance schedule minimizes failures, downtime (mean time to recovery), and overall maintenance costs?

Now that we have described predictive and prescriptive analytics in detail, what is there left? What are the three types of actionable (and valuable) business analytics applications that are not called predictive or prescriptive? They are sentinel, precursor, and cognitive analytics. Let’s define what these are.

  1. Sentinel Analytics – in common usage, the sentinel is the person on the guard station who is charged with watching for significant incoming or emergent activity. In practice, all activity is being observed and a decision is made as to whether any particular activity requires some sort of triage: sounding an alarm, or sending an alert to decision-makers, or doing nothing.
    • In the enterprise, sentinel analytics is most timely and beneficial when applied to real-time, dynamic data streams and time-critical decisions. For example, sensors (including internet of things devices and APIs on data networks) can be deployed with logic (analytics, statistical, and/or machine learning algorithms) to monitor and “watch” business systems and processes for emerging patterns, trends, behaviors, unusual operating modes, and anomalies that might be indicators of activities that require business attention, decisions, and/or action. 
  2. Precursor Analytics – in common usage, precursors are the early-warning indicators (harbingers, forerunners) of something else more serious or catastrophic that is about to come. We occasionally hear about earthquake precursors (increased levels of radon in groundwater), tidal wave precursors (a deep ocean earthquake), and cyber-attack precursors (phishing incidents). Precursor analytics is related to sentinel analytics. The latter (sentinel) is associated primarily with “watching” the data for interesting patterns that might require action, while precursor analytics is associated primarily with training the business systems to quickly identify those specific “learned” patterns and events that are known to be associated with high-risk events, thus requiring timely attention, intervention, and remediation. 
    • In these applications, the data science involvement includes both the “learning” of the most significant patterns to alert on and the improvement of their models (logic) to minimize false positives and false negatives. The analytics triage is critical, to avoid alarm fatigue (sending too many unimportant alerts) and to avoid underreporting of important actionable events. One could say that sentinel analytics is more like unsupervised machine learning, while precursor analytics is more like supervised machine learning. That is not a totally clean separation and distinction, but it might help to clarify their different applications of data science. 
    • The counterexample to the supervised learning explanation of precursor analytics is a “black swan” event – a rare high-impact event that is difficult to predict under normal circumstances – such as the global pandemic, which led to the failure of many predictive models in business. Broken models are definitely disruptive to analytics applications and business operations. Paradoxically, the precursor was actually predictive in a disruptive anti-predictive sort of way, which brings us right back to P(Y|X), or maybe it should be stated as P(“not Y”|X) where X is the black swan event (i.e., the predicted outcome Y from existing models will not occur in this case). As such, the global pandemic serves as a warning (a harbinger of disruption) and consequently as a “training example” to businesses for any future black swans. 
  3. Cognitive Analytics – this analytics mindset approach focuses on “surprise” discovery in data, using machine learning and AI to emulate and automate the cognitive abilities of humans. The goal is to discover novel, interesting, unexpected, and potentially valuable signals in the flood of streaming enterprise data. These may not be high-risk discoveries, but they could be high-reward discoveries. How does that resemble human cognitive abilities? Curiosity! Being curious about seeing something “funny” that you didn’t expect, thereby putting a “marker” in the data stream: “Look here! Pay attention! Ask questions about this!” 
    • Cognitive analytics is basically the opposite of descriptive analytics. In descriptive analytics, the task is to find answers to predetermined business questions (how much, how many, how often, who, where, when), whereas cognitive analytics is tasked with finding the business questions that should be asked. Descriptive: find the right answers in the data. Cognitive: find the right questions in the data. Cognitive analytics can then be viewed as a precursor to diagnostic analytics, which is the investigative stage of analytics that answers the questions raised by cognitive analytics (“Why did this happen?”, “Why are we seeing this pattern in our data?”, “What is the business impact of this trend, anomaly, behavior?”, “What is our next-best action as a result of this?”, “That’s funny! What is that?”).

None of these descriptions of the 3 “new” analytics applications are meant to declare that these are completely distinct and different from the “big 4” analytics applications that we have known for many years (Descriptive, Diagnostic, Predictive, Prescriptive). But the differences between the “big 4” and the “new 3” are in the nuanced business applications of these analytics in the enterprise and in the types of inferences that the data scientists are asked to derive from the business data. 

Deploying these analytics in the cloud further expands their accessibility, democratization, enterprise-wide acceptance, broad advocacy, and ultimate business value. Blending automated analytics products (coming from the sentinel, precursor, and cognitive applications) with human-in-the-loop inquisitiveness, curiosity, creativity, out-of-the-box thinking, idea generation, and persistence can transform any organization into a data analytics powerhouse through an analytic culture revolution. This is more imperative than ever, as a global survey of analytics executives has revealed:

  • “Companies have been working to become more data-driven for many years, with mixed results.”
  • “Right now, the biggest challenge for organizations working on their data strategy might not have to do with technology at all.”
  • “Corporate chief data, information, and analytics executives reported that cultural change is the most critical business imperative.”
  • “Just 26.5% of organizations report having established a data-driven organization.”
  • “91.9% of executives cite cultural obstacles as the greatest barrier to becoming data driven.”
  • Reference: https://hbr.org/2022/02/why-becoming-a-data-driven-organization-is-so-hard

Where do organizations get help to overcome these challenges? Microsoft delivers what its clients need to help them grow their top line with cloud-based analytics. Microsoft’s cloud-based analytics products and services propel business insights, innovation, and value from enterprise data, with all of the dimensions of analytics applications brought into the game. Specifically, cloud analytics (accessing and inferencing on multiple diverse business datasets across business units) for a wide variety of enterprise applications can sharpen the workforce’s focus on value and growth, including: forward-looking insights through predictive, sentinel, and precursor analytics; novel recommendations; rich customer engagement; analytic product innovation; resilience through prescriptive analytics; surprise discovery in data, asking the right questions, and exploring the most insightful lines of inquiry through cognitive analytics; and more.

Microsoft Azure Cloud extends ease-of-access analytics to all, delivers increased speed to deployment, provides leading security, compliance, and governance – with price performance for any organization. Whether organizations are seeking scalability in their enterprise data systems, advanced analytics capabilities (including the “big 4” and the “new 3”), real-time analytics (essential value-drivers from streaming data, including IoT, network logs, online customer interactions, supply chain, etc.), and the best in machine learning model-building and deployment services, Microsoft Azure Cloud has you covered. To learn more about it, go to https://azure.microsoft.com/en-us/solutions/cloud-scale-analytics and bring actionable business analytics to higher levels of proficiency and productivity across your organization.

Three Emerging Analytics Products Derived from Value-driven Data Innovation and Insights Discovery in the Enterprise

I recently saw an informal online survey that asked users which types of data (tabular, text, images, or “other”) are being used in their organization’s analytics applications. This was not a scientific or statistically robust survey, so the results are not necessarily reliable, but they are interesting and provocative. The results showed that (among those surveyed) approximately 90% of enterprise analytics applications are being built on tabular data. The ease with which such structured data can be stored, understood, indexed, searched, accessed, and incorporated into business models could explain this high percentage. A similarly high percentage of tabular data usage among data scientists was mentioned here.

If my explanation above is the correct interpretation of the high percentage, and if the statement refers to successfully deployed applications (i.e., analytics products, in contrast to non-deployed training experiments, demos, and internal validations of the applications), then maybe we would not be surprised if a new survey (not yet conducted) was to reveal that a similar percentage of value-producing enterprise data innovation and analytics/ML/AI applications (hereafter, “analytics products”) are based on on-premises (on-prem) data sources. Why? … because the same productivity benefits mentioned above for tabular data sources (fast and easy data access) would also be applicable in these cases (on-prem data sources). And no one could deny that these benefits would be substantial. What could be faster and easier than on-prem enterprise data sources?

Accompanying the massive growth in sensor data (from ubiquitous IoT devices, including location-based and time-based streaming data), there have emerged some special analytics products that are growing in significance, especially in the context of innovation and insights discovery from on-prem enterprise data sources. These enterprise analytics products are related to traditional predictive and prescriptive analytics, but these emergent products may specifically require low-latency (on-prem) data delivery to support enterprise requirements for timely, low-latency analytics product delivery. These three emergent analytics products are:

(a) Sentinel Analytics – focused on monitoring (“keeping an eye on”) multiple enterprise systems and business processes, as part of an observability strategy for time-critical business insights discovery and value creation from enterprise data sources. For example, sensors can monitor and “watch” systems and processes for emergent trends, patterns, anomalies, behaviors, and early warning signs that require interventions. Monitoring of data sources can include online web usage actions, streaming IT system patterns, system-generated log files, customer behaviors, environmental (ESG) factors, energy usage, supply chain, logistics, social and news trends, and social media sentiment. Observability represents the business strategy behind the monitoring activities. The strategy addresses the “what, when, where, why, and how” questions from business leaders concerning the placement of “sensors” that are used to collect the essential data that power the sentinel analytics product, in order to generate timely insights and thereby enable better data-informed “just in time” business decisions.

(b) Precursor Analytics – the use of AI and machine learning to identify, evaluate, and generate critical early-warning alerts in enterprise systems and business processes, using high-variety data sources to minimize false alarms (i.e., using high-dimensional data feature space to disambiguate events that seem to be similar, but are not). Precursor analytics is related to sentinel analytics. The latter is associated primarily with “watching” the data for interesting patterns, while precursor analytics is associated primarily with training the business systems to quickly identify those specific patterns and events that could be associated with high-risk events, thus requiring timely attention, intervention, and remediation. One could say that sentinel analytics is more like unsupervised machine learning, while precursor analytics is more like supervised machine learning. That is not a totally clear separation and distinction, but it might help to clarify their different applications of data science. Data scientists work with business users to define and learn the rules by which precursor analytics models produce high-accuracy early warnings. For example, an exploration of historical data may reveal that an increase in customer satisfaction (or dissatisfaction) with one particular product is correlated with some other satisfaction (or dissatisfaction) metric downstream at a later date. Consequently, based on this learning, deploying a precursor analytics product to detect the initial trigger event early can thus enable a timely response to the situation, which can produce a positive business outcome and prevent an otherwise certain negative outcome.

(c) Cognitive Analytics – focused on “surprise” discovery in diverse data streams across numerous enterprise systems and business processes, using machine learning and data science to emulate and automate the curiosity and cognitive abilities of humans – enabling the discovery of novel, interesting, unexpected, and potentially business-relevant signals across all enterprise data streams. These may not be high risk. They might actually be high-reward discoveries. For example, in one company, an employee noticed that it was the customer’s birthday during their interaction and offered a small gift to the customer at that moment—a gift that was pre-authorized by upper management because they understood that their employees are customer-facing and they anticipated that their employees would need to have the authority to take such customer-pleasing actions “in the moment”. The outcome was very positive indeed, as this customer reported the delightful experience on their social media account, thereby spreading positive sentiment about the business to a wide audience. Instead of relying on employees to catch all surprises in the data streams, the enterprise analytics applications can be trained to automatically watch for, identify, and act on these surprises. In the customer birthday example, the cognitive analytics product can be set up for automated detection and response, which can occur without the employee in the loop at all, such as in a customer’s online shopping experience or in a chat with the customer call center bot.

These three analytics products are derived from business value-driven data innovation and insights discovery in the enterprise. Investigating and deploying these are a worthy strategic move for any organization that is swimming in a sea (or lake or ocean) of on-prem enterprise data sources.

In closing, let us look at some non-enterprise examples of these three types of analytics:

  • Sentinel – the sentinel on the guard station at a military post is charged with watching for incoming activity. They are assigned this duty just in case something occurs during the night or when everyone else is busy with other operational things. That “something” might be an enemy approaching or a wild bear in the forest. In either case, keeping an eye on the situation is critical for the success of the operation. Another example of a sentinel is a marked increase in the volatility of stock market prices, indicating that there may be a lot of FUD (fear, uncertainty, and doubt) in the market that could lead to wild swings or downturns. In fact, anytime that any streaming data monitoring metric shows higher than usual volatility, this may be an indicator that the monitored thing requires some attention, an investigation, and possibly an intervention.
  • Precursor – prior to large earthquakes, it has been found that increased levels of radon are detected in soil, in groundwater, and even in the air in people’s home basements. This precursor is presumed to be caused by the radon being released from cavities within the Earth’s crust as the crust is being strained prior to the sudden slippage (the earthquake). Earthquakes themselves can be precursors to serious events – specifically, a large earthquake detected at the bottom of the ocean can produce a massive tidal wave, that can travel across the ocean and have drastic consequences on distant shores. In some cases, the precursor can occur sufficiently in advance of the tidal wave’s predicted arrival at inhabited shores, thereby enabling early warnings to be broadcasted. In both of these cases, the precursor (radon release or ocean-based earthquake) is not the biggest problem, though they may be seen as sentinels of an on-going event, but the precursor is an early warning sign of a potentially bigger catastrophe that’s coming (a major land-based earthquake or a tidal wave hitting major population centers along coastlines, respectively).
  • Cognitive – a cognitive person walking into an intense group meeting (perhaps a family or board meeting) can probably tell the mood of the room fairly quickly. The signals are there, though mostly contextual, thus probably missed by a cognitively impaired person. A cognitive person is curious about odd things that they see and hear—things or circumstances or behaviors that seem out of context, unusual, and surprising. The thing itself (or the data about the thing) may not be surprising (though it could be), but the context (the “metadata”, which is “other data about the primary data”) provides a signal that something needs attention here. Perhaps the simplest expression of being cognitive in this data-drenched world comes from a quote attributed to famous science writer Isaac Asimov: “The most exciting phrase to hear in science, the one that heralds new discoveries, is not ‘Eureka!’ (I found it!) but ‘That’s funny…‘.”

The cognitive enterprise versus the cognitively impaired enterprise – which of these would your organization prefer to be? Get moving now with sentinel, precursor, and cognitive analytics through data innovation and insights discovery with your on-prem enterprise data sources.

Read more about analytics innovation from on-prem enterprise data sources in this 3-part blog series:

  1. Solving the Data Daze – Analytics at the Speed of Business Questions
  2. The Data Space-Time Continuum for Analytics Innovation and Business Growth
  3. Delivering Low-Latency Analytics Products for Business Success

Editorial Review of “Building Industrial Digital Twins”

I was asked by the publisher to provide an editorial review of the book “Building Industrial Digital Twins: Design, develop, and deploy digital twin solutions for real-world industries using Azure Digital Twins“, by Shyam Varan Nath and Pieter van Schalkwyk. For this, I received a complimentary copy of the book and no other compensation.

Let us begin…

This book is a very timely contribution to the world of industrial digital transformation. The digital twin is more than a data collector. It is an insight engine, providing not only data for descriptive and diagnostic analytics applications, but also providing essential data for predictive and prescriptive analytics applications. This is all fueled and facilitated by data flows across processes, products, and people’s activities, used in synergy with computational models and simulations of the system being digitally twinned. In order to help an organization get started with a DT (digital twin), this book outlines the process of building the MVT (Minimum Viable Twin). All phases of the MVT process are discussed: strategy, designs, pilot, implementation, test, validation, operations, and monitoring. 

This book knows and forcefully proves what is the enabler and value producer of digital anything (especially and most emphatically the DT) — it is all about the data and the simulation — that’s business modeling at its finest, incorporating the best of technology (physical assets, sensors, and cloud), techniques (analytics, algorithms, and modeling), and talent (culture, people, and strategic plans).

There were many themes and topics (both broad and specific) that fascinated me and kept me engaged in discovering serendipitous knowledge nuggets throughout this book. Here are a few: 

1) Azure DT, whose cloud-based PaaS (Platform-as-a-Service) provides a viable, scalable, and accessible launchpad for DTaaS in any organization.

2) Streaming sensor data from the IoT (Internet of Things) and IIoT (Industrial IoT) become the source for an IoC (Internet of Context), ultimately delivering Insights-aaS, Context-aaS, and Forecasting-aaS.

3) The consistent emphasis on and elaboration of key DT value propositions, requirements, and KPI tracking.

4) The DT Canvas (chapter 4)!

5) Helpful discussions of phased DT deployments, prototypes, pilots, feedback, and validation.

6) Specific Industry 4.0 examples, with constant reminders that’s it all about the data plus analytics!

7) Forward-looking DTs in the industrial enterprise.

Beyond being a technical how-to manual (though it is definitely that), this book delivers so much more! It is truly a business digital transformation manual.

My top learning and pondering moments at Splunk .conf22

I recently attended the Splunk .conf22 conference. While the event was live in-person in Las Vegas, I attended virtually from my home office. Consequently I missed the incredible in-person experience of the brilliant speakers on the main stage, the technodazzle of 100’s of exhibitors’ offerings in the exhibit arena, and the smooth hip hop sounds from the special guest entertainer — guess who?

What I missed in-person was more than compensated for by the incredible online presentations by Splunk leaders, developers, and customers. If you have ever attended a major expo at one of the major Vegas hotels, you know that there is a lot of walking between different sessions — literally, miles of walking per day. That’s good for you, but it often means that you don’t attend all of the sessions that you would like because of the requisite rushing from venue to venue. None of that was necessary on the Splunk .conf22 virtual conference platform. I was able to see a lot, learn a lot, be impressed a lot, and ponder a lot about all of the wonderful features, functionalities, and future plans for the Splunk platform.

One of the first major attractions for me to attend this event is found in the primary descriptor of the Splunk Platform — it is appropriately called the Splunk Observability Cloud, which includes an impressive suite of Observability and Monitoring products and services. I have written and spoken frequently and passionately about Observability in the past couple of years. For example, I wrote this in 2021:

“Observability emerged as one of the hottest and (for me) most exciting developments of the year. Do not confuse observability with monitoring (specifically, with IT monitoring). The key difference is this: monitoring is what you do, and observability is why you do it. Observability is a business strategy: what you monitor, why you monitor it, what you intend to learn from it, how it will be used, and how it will contribute to business objectives and mission success. But the power, value, and imperative of observability does not stop there. Observability meets AI – it is part of the complete AIOps package: ‘keeping an eye on the AI.’ Observability delivers actionable insights, context-enriched data sets, early warning alert generation, root cause visibility, active performance monitoring, predictive and prescriptive incident management, real-time operational deviation detection (6-Sigma never had it so good!), tight coupling of cyber-physical systems, digital twinning of almost anything in the enterprise, and more. And the goodness doesn’t stop there.”

Continue reading my thoughts on Observability at http://rocketdatascience.org/?p=1589

The dominant references everywhere to Observability was just the start of awesome brain food offered at Splunk’s .conf22 event. Here is a list of my top moments, learnings, and musings from this year’s Splunk .conf:

  1. Observability for Unified Security with AI (Artificial Intelligence) and Machine Learning on the Splunk platform empowers enterprises to operationalize data for use-case-specific functionality across shared datasets. (Reference)
  2. The latest updates to the Splunk platform address the complexities of multi-cloud and hybrid environments, enabling cybersecurity and network big data functions (e.g., log analytics and anomaly detection) across distributed data sources and diverse enterprise IT infrastructure resources. (Reference)
  3. Splunk Enterprise 9.0 is here, now! Explore and test-drive it (with a free trial) here.
  4. The new Splunk Enterprise 9.0 release enables DevSecOps users to gain more insights from Observability data with Federated Search, with the ability to correlate ops with security alerts, and with Edge Management, all in one platform. (Reference)
  5. Security information and event management (SIEM) on the Splunk platform is enhanced with end-to-end visibility and platform extensibility, with machine learning and automation (AIOps), with risk-based alerting, and with Federated Search (i.e., Observability on-demand). (Reference)
  6. Customer success story: As a customer-obsessed bank with ultra-rapid growth, Nubank turned to Splunk to optimize data flows, analytics applications, customer support functions, and insights-obsessed IT monitoring. (Reference)
  7. The key characteristics of the Splunk Observability Cloud are Resilience, Security, Scalability, and EXTENSIBILITY. The latter specifically refers to the ease in which developers can extend Splunk’s capabilities to other apps, applying their AIOps and DevSecOps best practices and principles! Developers can start here.
  8. The Splunk Observability Cloud has many functions for data-intensive IT, Security, and Network operations, including Anomaly Detection Service, Federated Search, Synthetic Monitoring, Incident Intelligence, and much more. Synthetic monitoring is essentially digital twinning of your network and IT environment, providing insights through simulated risks, attacks, and anomalies via predictive and prescriptive modeling. [Reference]
  9. Splunk Observability Cloud’s Federated Search capability activates search and analytics regardless of where your data lives — on-site, in the cloud, or from a third party. (Reference)
  10. The new release of the Splunk Data Manager provides a simple, modern, automated experience of data ingest for Splunk Cloud admins, which reduces the time it takes to configure data collection (from hours/days to minutes). (Reference)
  11. Splunk works on data, data, data, but the focus is always on customer, customer, customer — because delivering best outcomes for customers is job #1. Explore Splunk’s amazing Partner ecosystem (Partnerverse) and the impressive catalog of partners’ solutions here.
  12. Splunk .conf22 Invites Organizations to Unlock Innovation With Data.

In summary, here is my list of key words and topics that illustrate the diverse capabilities and value-packed features of the Splunk Observability Cloud Platform that I learned about at the .conf22 event:

– Anomaly Detection Assistant
– Risk-based Alerting (powered by AI and Machine Learning scoring algorithms)
– Federated Search (Observability on-demand)
– End-to-End Visibility
– Platform Extensibility
– Massive(!) Scalability of the Splunk Observability Cloud (to billions of transactions per day)
– Insights-obsessed Monitoring (“We don’t need more information. We need more insights.”)
– APIs in Action (to Turn Data into Doing™)
– Splunk Incident Intelligence
– Synthetic Monitoring (Digital Twin of Network/IT infrastructure)
– Splunk Data Manager
– The Splunk Partner Universe (Partnerverse)

My closing thought — Cybersecurity is basically Data Analytics: detection, prediction, prescription, and optimizing for unpredictability. This is what Splunk lives for!

Follow me on LinkedIn here and on Twitter at @KirkDBorne.

Disclaimer: I was compensated as an independent freelance media influencer for my participation at the conference and for this article. The opinions expressed here are entirely my own and do not represent those of Splunk or of any Splunk partners. Any misrepresentations of the products and services mentioned in my statements are entirely my own responsibility. Nothing here should be construed as an offer to sell or as financial advice of any kind. My comments are entirely of a technical nature, focused on the technical capabilities of the items mentioned in the article.

Data Insights for Everyone — The Semantic Layer to the Rescue

What is a semantic layer? That’s a good question, but let’s first explain semantics. The way that I explained it to my data science students years ago was like this. In the early days of web search engines, those engines were primarily keyword search engines. If you knew the right keywords to search and if the content providers also used the same keywords on their website, then you could type the words into your favorite search engine and find the content you needed. So, I asked my students what results they would expect from such a search engine if I typed the following words into the search box: “How many cows are there in Texas?” My students were smart. They realized that the search results would probably not provide an answer to my question, but the results would simply list websites that included my words on the page or in the metadata tags: “Texas”, “Cows”, “How”, etc. Then, I explained to my students that a semantic-enabled search engine (with a semantic meta-layer, including ontologies and similar semantic tools) would be able to interpret my question’s meaning and then map that meaning to websites that can answer the question.

This was a good opening for my students to the wonderful world of semantics. I brought them deeper into the world by pointing out how much more effective and efficient the data professionals’ life would be if our data repositories had a similar semantic meta-layer. We would be able to go far beyond searching for correctly spelled column headings in databases or specific keywords in data documentation, to find the data we needed (assuming we even knew the correct labels, metatags, and keywords used by the dataset creators). We could search for data with common business terminology, regardless of the specific choice or spelling of the data descriptors in the dataset. Even more than that, we could easily start discovering and integrating, on-the-fly, data from totally different datasets that used different descriptors. For example, if I am searching for customer sales numbers, different datasets may label that “sales”, or “revenue”, or “customer_sales”, or “Cust_sales”, or any number of other such unique identifiers. What a nightmare that would be! But what a dream the semantic layer becomes!

When I was teaching those students so many years ago, the semantic layer itself was just a dream. Now it is a reality. We can now achieve the benefits, efficiencies, and data superhero powers that we previously could only imagine. But wait! There’s more.

Perhaps the greatest achievement of the semantic layer is to provide different data professionals with easy access to the data needed for their specific roles and tasks. The semantic layer is the representation of data that helps different business end-users discover and access the right data efficiently, effectively, and effortlessly using common business terms. The data scientists need to find the right data as inputs for their models — they also need a place to write-back the outputs of their models to the data repository for other users to access. The BI (business intelligence) analysts need to find the right data for their visualization packages, business questions, and decision support tools — they also need the outputs from the data scientists’ models, such as forecasts, alerts, classifications, and more. The semantic layer achieves this by mapping heterogeneously labeled data into familiar business terms, providing a unified, consolidated view of data across the enterprise.

The semantic layer delivers data insights discovery and usability across the whole enterprise, with each business user empowered to use the terminology and tools that are specific to their role. How data are stored, labeled, and meta-tagged in the data cloud is no longer a bottleneck to discovery and access. The decision-makers and data science modelers can fluidly share inputs and outputs with one another, to inform their role-specific tasks and improve their effectiveness. The semantic layer takes the user-specific results out of being a “one-off” solution on that user’s laptop to becoming an enterprise analytics accelerant, enabling business answer discovery at the speed of business questions.

Insights discovery for everyone is achieved. The semantic layer becomes the arbiter (multi-lingual data translator) for insights discovery between and among all business users of data, within the tools that they are already using. The data science team may be focused on feature importance metrics, feature engineering, predictive modeling, model explainability, and model monitoring. The BI team may be focused on KPIs, forecasts, trends, and decision-support insights. The data science team needs to know and to use that data which the BI team considers to be most important. The BI team needs to know and to use which trends, patterns, segments, and anomalies are being found in those data by the data science team. Sharing and integrating such important data streams has never been such a dream.

The semantic layer bridges the gaps between the data cloud, the decision-makers, and the data science modelers. The key results from the data science modelers can be written back to the semantic layer, to be sent directly to consumers of those results in the executive suite and on the BI team. Data scientists can focus on their tools; the BI users and executives can focus on their tools; and the data engineers can focus on their tools. The enterprise data science, analytics, and BI functions have never been so enterprisey. (Is “enterprisey” a word? I don’t know, but I’m sure you get my semantic meaning.)

That’s empowering. That’s data democratization. That’s insights democratization. That’s data fluency/literacy-building across the enterprise. That’s enterprise-wide agile curiosity, question-asking, hypothesizing, testing/experimenting, and continuous learning. That’s data insights for everyone.

Are you ready to learn more how you can bring these advantages to your organization? Be sure to watch the AtScale webinar “How to Bridge Data Science and Business Intelligence” where I join a panel in a multi-industry discussion on how a semantic layer can help organizations make smarter data-driven decisions at scale. There will be several speakers, including me. I will be speaking about “Model Monitoring in the Enterprise — Filling the Gaps”, specifically focused on “Filling the Communication Gaps Between BI and Data Science Teams With a Semantic Data Layer.”

Register to attend and view the webinar at https://bit.ly/3ySVIiu.

https://bit.ly/3ySVIiu

Data Science Training Opportunities

A few years ago, I generated a list of places to receive data science training. That list has become a bit stale. So, I have updated the list, adding some new opportunities, keeping many of the previous ones, and removing the obsolete ones.

Also, here is a thorough, informative, and interesting article that outlines the critical skills needed in order to be a good data scientist: https://www.toptal.com/data-science#hiring-guide

Here are 30 training opportunities that I encourage you to explore:

  1. The Booz Allen Field Guide to Data Science
  2. NYC Data Science Academy
  3. NVIDIA Deep Learning Institute
  4. Metis Data Science Training
  5. Leada’s online analytics labs
  6. Data Science Training by General Assembly
  7. Learn Data Science Online by DataCamp
  8. (600+) Colleges and Universities with Data Science Degrees
  9. Data Science Master’s Degree Programs
  10. Data Analytics, Machine Learning, & Statistics Courses at edX
  11. Data Science Certifications (by AnalyticsVidhya)
  12. Learn Everything About Analytics (by AnalyticsVidhya)
  13. Big Bang Data Science Solutions
  14. CommonLounge
  15. IntelliPaat Online Training
  16. DataQuest
  17. NCSU Institute for Advanced Analytics
  18. District Data Labs
  19. Data School
  20. Galvanize
  21. Coursera
  22. Udacity Nanodegree Program to Become a Data Scientist
  23. Udemy – Data & Analytics
  24. Insight Data Science Fellows Program
  25. The Open Source Data Science Masters
  26. Jigsaw Academy Post Graduate Program in Data Science & Machine Learning
  27. O’Reilly Media Learning Paths
  28. Data Engineering and Data Science Training by Go Data Driven
  29. 18 Resources to Learn Data Science Online (by Simplilearn)
  30. Top Online Data Science Courses to Learn Data Science

Follow Kirk Borne on Twitter @KirkDBorne

Field Guide to Data Science
Learn the what, why, and how of Data Science and Machine Learning here.

Analytics By Design, For The Analytics Win

We hear a lot of hype that says organizations should be “Datafirst”, or “AI-first, or “Datadriven”, or “Technologydriven”. A better prescription for business success is for our organization to be analyticsdriven and thus analytics-first, while being data-informed and technology-empowered. Analytics are the products, the outcomes, and the ROI of our Big DataData Science, AI, and Machine Learning investments!

AI strategies and data strategies should therefore focus on outcomes first. Such a focus explicitly induces the corporate messaging, strategy, and culture to be better aligned with what matters the most: business outcomes!

The analytics-first strategy can be referred to as Analytics By Design, which is derived from similar principles in education: Understanding By Design. Analytics are the outcomes of data activities (data science, machine learning, AI) within the organization. So we should keep our eye on the prize — maintaining our focus on the business outcomes (the analytics), which are data-fueled, technology-enabled, and metrics-verified. That’s the essence of Analytics by Design.

The longer complete version of this article “How Analytics by Design Tackles The Yin and Yang of Metrics and Data” is available at the Western Digital DataMakesPossible.com blog site. In that article, you can read about:

  • The two complementary roles of data — “the yin and the yang” — in which data are collected at the front end (from business activities, customer interactions, marketing reports, and more), while data are also collected at the back end as metrics to verify performance and compliance with stated goals and objectives.
  • The four principles of Analytics By Design.
  • The five take-away messages for organizations that have lots of data and that want to win with Analytics By Design.

For data scientists, the message is “Come for the data. Stay for the science!”

Read the full story here: “How Analytics by Design Tackles The Yin and Yang of Metrics and Data

Bias-Busting with Diversity in Data

Diversity in data is one of the three defining characteristics of big data — high data variety — along with high data volume and high velocity. We discussed the power and value of high-variety data in a previous article: “The Five Important D’s of Big Data Variety” We won’t repeat those lessons here, but we focus specifically on the bias-busting power of high-variety data, which was actually the last of the five D’s mentioned in the earlier article: Decreased model bias.

Here, we broaden our meaning of “bias” to go beyond model bias, which has the technical statistical meaning of “underfitting”, which essentially means that there is more information and structure in the data than our model has captured. In the current context, we apply a broader definition of bias: lacking a neutral viewpoint, or having a viewpoint that is partial. We will call this natural bias, since the examples can be considered as “naturally occurring” without obvious intent. This article does not elaborate on personal bias (which might be intentional), though the cause for that kind of prejudice is essentially the same: not considering and taking into account the full knowledge and understanding of the person or entity that is the subject of the bias.

We wrote a longer complete version of this article here: “Busting Bias with More Data Variety” at the Western Digital DataMakesPossible.com blog site.

In that full version of this article, we go on to describe several examples of natural bias and then to present a recommended bias-busting remedy for those of us working in the realm of data science. We refer to that remedy as the CCDI data & analytics strategy: Collect, Curate, Differentiate, and Innovate.

Here is one of the four examples of natural bias that you will find in the longer, complete version of the article:

  • An example of natural bias comes from a famous cartoon. The cartoon shows three or more blind men (or blindfolded men) feeling an elephant. They each feel a different aspect of the elephant: the tail, a tusk, an ear, the body, a leg — and consequently they each offer a different interpretation of what they believe this thing is (which they cannot see). They say it might be a rope (the tail), or a spear (the tusk), or a large fan (the ear), or a wall (the body), or a tree trunk (the leg). Only after the blindfolds are removed (or an explanation is given) do they finally “see” the full truth of this large complex reality. It has many different features, facets, and characteristics. Focusing on only one of those features and insisting that this partial view describes the whole thing would be foolish. We have similar complex systems in our organizations, whether it is the human body (in healthcare), or our population of customers (in marketing), or the Earth (in climate science), or different components in a complex system (like a manufacturing facility), or our students (in a classroom), or whatever. Unless we break down the silos and start sharing our data (insights) about all the dimensions, viewpoints, and perspectives of our complex system, we will consequently be drawn into biased conclusions and actions, and thus miss the key insights that enable us to understand the wonderful complexity and diversity of the thing in its entirety. Integrating the many data sources enables us to arrive at the “single correct view” of the thing: the 360 view!
Collecting high-variety data from diverse sources, connecting the dots, and building the 360 view of our domain is not only the data silo-busting thing to do. It is also the bias-busting thing to do. High-variety data makes that possible, and there is no shortage of biases for high-variety data to bust, including cognitive bias, confirmation bias, salience bias, and sampling bias, just to name a few! …
Read the full story here… “Busting Bias with More Data Variety

Variety is the Secret Sauce for Big Discoveries in Big Data

When I was out for a walk recently, I heard a loud low-flying aircraft passing overhead. This was not unusual since we live in the flight path of planes landing at a major international airport about 10 miles from our home. In this case, I thought to myself that the sound seemed more directly overhead and lower than normal as well as being suggestive of a larger than average jet aircraft.

I realized that in my one simple thought, I had made three different inferences from a single stream of data. The data stream was the audible sound of the aircraft. The three inferences were about the altitude (lower than normal), the size (larger than average), and the flight path (more overhead). When I looked up, my tri-inference hypothesis was confirmed. The plane was a very large, low-flying jet for a major overnight shipping company. The slightly unusual flight path may have been associated with the fact that these planes are probably instructed to land on a different runway at the airport than the usual commercial passenger airlines’ flights – consequently, the altitude and location were slightly different from the slightly smaller commercial passenger airlines that pass overhead every day.

This situation caused me to reflect on how often we can jump to conclusions, infer a hypothesis, and (maybe without as much proof as in this case) we assume that our conclusion is true.

For the modern digital organization, the proof of any inference (that drives decisions) should be in the data! Rich and diverse data collections enable more accurate and trustworthy conclusions.

I frequently refer to the era of big data as “the end of demographics”. By that, I mean that we now have many more features, attributes, data sources, and insights into each entity in our domain: people, processes, and products. These multiple data sources enable a “360 view” of the entity, thus empowering a more personalized (even hyper-personalized) understanding of and response to the needs of that unique entity. In “big data language”, we are talking about one of the 3 V’s of big data: big data Variety!

High variety is one of the foundational key features of big data — we now measure many more features, characteristics, and dimensions of insight into nearly everything due to the plethora of data sources, sensors, and signals that we measure, monitor, and mine. Consequently, we no longer need to rely on a limited number of features and attributes when making decisions, taking actions, and generating inferences. We can make better, tailored, more personalized decisions and actions. Every entity is unique! That marks the end of demographics.

Here is another example: suppose that a person goes to their doctor to report problems with painful headaches. That is a single symptom (headache pain) — a single data source, a single signal, a single sensor. However, one could imagine a large number of possible inferences from that one single signal. The headaches could be caused by insufficient sleep (sleep apnea), high blood pressure, pregnancy, or a brain tumor. Obviously, each one of these diagnoses carries a seriously different course of action and treatment.

In “data science language”, what we are describing are different segments (clusters) in the hyperspace of symptoms and causes in which the many causes (clusters) are projected on top of one another (overlap one another) in the symptom space. The way that a data scientist resolves that degeneracy (another data science word) is to introduce more parameters (higher variety data) in order to “look at” those overlapping clusters from different angles and perspectives, thus resolving the different diagnosis clusters. High variety data enables the discovery of multiple clusters, and eventually identifies the correct cluster (correct diagnosis, in this case).

Higher variety data means that we are adding data from other sensors, other signals, other sources, and of different types. Going back to our low-flying airplane example, this has the following application: I not only heard the aircraft (sound = audio data), but I also looked at it (sight = visual data) and I observed its flight path (dynamic change over time = time series data). The proof of my inference about the airplane was in the data! Additional data sources provided the variety of data signals that were needed in order to derive a correct conclusion.

Similarly, when you go to the doctor with that headache, the doctor will start asking about other symptoms (e.g., lack of appetite; or other pains) and may order other medical tests (blood pressure checks, or other lab results). Those additional data sources and sensors provide the variety of data signals that are needed in order to derive the correct diagnosis.

These examples (low-flying aircraft, and headache pain) are representative analogies of a large number of different use cases in every organization, every business, and every process. The more data you have, the better you are able to detect and discover interesting and important phenomena and events. However, the more variety of data you have, the better you are able to correctly diagnose, interpret, understand, gain insights from, and take appropriate action in response to those phenomena and events.

High-variety data is the fuel that powers these insights, because variety is definitely the secret sauce for bigger and better discovery from big data collections.

Follow Kirk on Twitter at @KirkDBorne