Tag Archives: Promotions

Just-in-Time Supply Chain Management with Data Analytics

A common phrase in SCM (Supply Chain Management) is Just-In-Time (JIT) inventory. JIT refers to a management strategy in which raw materials, products, or services are delivered to the right place, at the right time, as demand requires. This has always been an excellent business goal, but the power to excel at JIT inventory management is now improving dramatically with the increased use of data analytics across the supply chain.

In the article “Operational Analytics and Droning About Big Data“, we discussed two examples of JIT: (1) a just-in-time supply replenishment system for human bases on the Moon, and (2) the proposal by Amazon to use drones to deliver products to your front door “just in time”! The Internet of Things will almost certainly generate similar use cases and benefits.

Descriptive analytics (hindsight) tells you what has already happened in your supply chain. If there was a deficiency or problem somewhere, then you can react to that event. But, that is “old school” supply chain management. Modern analytics is predictive (foresight), allowing you to predict where the need will occur (in advance) so that you can proactively deliver products and services at the point of need, just in time.

The next advance in analytics is prescriptive (insight), which uses optimization techniques (from operations research) in combination with insights and knowledge of your business (systems, processes, and resources) in order to optimize your delivery systems, for the best possible outcome (greater sales, fewer losses, reduced inventory, etc.). Just-in-time supply chain management then becomes something more than a reality — it now becomes an enabler of increased efficiency and productivity.

Many more examples of use cases in the manufacturing and retail industries (and elsewhere) where just-in-time analytics is important (and what you can do about it) have been enumerated by the fast Automatic Modeling folks from Soft10, Inc. Check out their fast predictive analytics products at http://soft10ware.com/.

(Read more about these ideas at: https://www.linkedin.com/pulse/supply-chain-data-analytics-jit-legit-kirk-borne)

Follow Kirk Borne on Twitter @KirkDBorne

 

The Goldilocks Principle in Predictive Modeling and Data Science

In the field of statistics, there has been a lot written about statistical fallacies, logical fallacies, and fallacious reasoning. The following big list of fallacies is one that I like to use in my own undergraduate data science courses, particularly in my Data Ethics class where I teach my students about “lying with statistics”:

http://en.wikipedia.org/wiki/List_of_fallacies

Many of these fallacies are relevant to data science modeling, including this one: Circular Reasoning, where the reasoner “begins with what he or she is trying to end up with; sometimes called assuming the conclusion.”

A broken clock is truly an example of circular reasoning (as the dial is circular, and the clock represents a particular measurement in a repeating circular perspective): “Even a broken clock is right twice a day.”

(source: http://tvtropes.org/pmwiki/pmwiki.php/Main/StoppedClock)

In the following article, I use the broken clock analogy for circular reasoning in describing the importance of verification and validation in predictive analytics models: Are your predictive models like broken clocks? Here’s how to fix them.”  The article also discusses the importance of training vs. test data sets, the bias-variance tradeoff in data science modeling, underfitting vs. overfitting, and the Goldilocks Principle applied to data science.

(continue reading here) 

Follow Kirk Borne on Twitter @KirkDBorne

Numbers are Powerful, Especially in Combination

The phrase “Big Data” refers to a set of serious analytical challenges that arise when the data increase in quantity, real-time speed, and complexity.  The three V’s of big data (Volume, Velocity, and Variety) are now well known and well worn. Their familiarity and frequent association with “big data hype” may numb us to the important data challenges that they are meant to represent. These three characterizations have their counterparts in tools and technologies.  For example, Hadoop (Apache’s open source implementation of the MapReduce programming model) is the technology du jour for management and analysis of high-volume data.  The Hadoop Distributed File System (HDFS) is the file system for big data storage and access in Hadoop clusters.  Apache Spark is a computing framework (built on HDFS) for fast processing of high-velocity data.

But, what about high-variety data?  The storage and management challenges of such data are already addressed (see above), but the real challenge is in performing effective and efficient statistical modeling, data mining, and discovery across high-dimensional (complex) data sets.  Software tools like Soft10 Inc.‘s “automatic statistician” Dr. Mo are designed to address that specific challenge.

When considering complex (high-variety) data, it is important to note that even relatively small-volume data sets can pose huge challenges to modeling, mining, and analysis algorithms and tools. For example, consider a gigabyte data table with a billion entries. If those entries correspond to 500 million rows and 2 columns, then some relatively simple “textbook” techniques can be applied: e.g., correlation analysis, regression analysis, Naïve Bayes, etc. However, if those entries correspond to one million rows and 1000 columns, then the complexity of the data analysis explodes exponentially.

It is not hard to find data sets that are at least this complex, if not much worse.  For example, the human genome consists of 3 billion base pairs (of just four bases: A, C, G, T) – the number of possible sequences of length 3 billion that can be formed from just four items is 4 to the power of 3 billion (limited of course by various genetic constraints). Another example will be the astronomical database to be obtained in the 10-year survey of the sky by the Large Synoptic Survey Telescope (lsst.org) – the final source table will consist of approximately 20 trillion rows and over 200 columns of scientific information per source.  Analyses of all possible combinations of these scientific parameters (to discover new correlations, patterns, associations, clusters, etc.) would be prohibitive.

The combinatorial theorem in mathematics tells us that there are (2^N – 1) possible combinations of N things. For example, a statistical analysis of a data table with just 3 columns (A,B,C) would require 7 distinct analyses (statistical models) of the behavior of the data: A, B, C, A with B, B with C, A with C, and with all three taken at once.  A data table with 5 columns would require 31 distinct analyses; and a table with 25 columns would require over 33 million distinct analyses. My calculator tells me that the number of distinct combinations of 200 variables is greater than 10^60.  This extraordinarily rapid growth rate is called the “combinatorial explosion”.  While no software package could ever perform that many variations of high-dimensional data analysis, it is common to focus on joint combinations of fewer parameters.  Even pairs, triples, and similar small-number combinations can have significant correlation and covariance, consequently yielding important discoveries.

Therefore, in order to meet the challenge of big data complexity (high variety), fast modeling technology is needed.  Such tools provide big benefits to both statisticians and non-statisticians.  These benefits multiply favorably when the technology can automatically build and test a large number of models (different combinations of parameters) in parallel.  Furthermore, the power of the technology is even more enhanced when it ranks the output models and parameter selection in order of significance and correlation strength.  Soft10’s “automatic statistician” Dr. Mo does these things and more. Dr. Mo models complex high-dimensional data both row-wise and column-wise. Dr. Mo produces high-accuracy predictions.  Dr. Mo’s proprietary multi-model technology is a powerful tool for predictive modeling and analytics across many application domains, including medicine and health, finance, customer analytics, target marketing, nonprofits, membership services, and more. Check out Dr. Mo at http://soft10ware.com/ and read more herehttp://soft10ware.com/big-data-complexity-requires-fast-modeling-technology/

Kirk Borne is a member of the Soft10, Inc. Board of Advisors.

Follow Kirk Borne on Twitter @KirkDBorne

Apervi’s Conflux Gives a Big Boost to a Confluence of Big Data Workflows

Data-driven workflows are the life and existence of big data professionals everywhere: data scientists, data analysts, and data engineers. We perform all types of data functions in these workflow processes: archive, discover, access, visualize, mine, manipulate, fuse, integrate, transform, feed models, learn models, validate models, deploy models, etc. It is a dizzying day’s work. We start manually in our workflow development, identifying what needs to happen at each stage of the process, what data are needed, when they are needed, where data needs to be staged, what are the inputs and outputs, and more.  If we are really good, we can improve our efficiency in performing these workflows manually, but not substantially. A better path to success is to employ a workflow platform that is scalable (to larger data), extensible (to more tasks), more efficient (shorter time-to-solution), more effective (better solutions), adaptable (to different user skill levels and to different business requirements), comprehensive (providing a wide scope of functionality), and automated (to break the time barrier of manual workflow activities).

(continue reading here http://www.bigdatanews.com/group/bdn-daily-press-releases/forum/topics/apervi-s-conflux-gives-a-big-boost-to-a-confluence-of-big-data-wo)

Apervi Conflux

 

Follow Kirk Borne on Twitter @KirkDBorne

Visual Cues in Big Data for Analytics and Discovery

One of the most fun outcomes that you can achieve with your data is to discover new and interesting things.  Sometimes, the most interesting thing is the detection of a novel, unexpected, surprising object, event, or behavior – i.e., the outlier, the thing that falls outside the bounds of your original expectations, the thing that signals something new about your data domain (a new class of behavior, an anomaly in the data processing pipeline, or an error in the data collection activity).  The more quickly that you can find the interesting features and characteristics within your data collection, consequently the more likely you are to improve decision-making and responsiveness in your data-driven workflows.

Tapping into the human natural cognitive ability to see patterns quickly and to detect anomalies readily is powerful medicine for big data analytics headaches.  That’s where data visualization shines most brightly in the big data firmament!

(continue reading here … http://www.bigdatanews.com/group/bdn-daily-press-releases/forum/topics/press-release-visual-cues-in-big-data-for-analytics-and-discovery)

 

Follow Kirk Borne on Twitter @KirkDBorne