Tag Archives: Data Science

Fraud Analytics: Fast Automatic Modeling for Customer Loyalty Programs

It doesn’t take a rocket scientist to understand the deep and dark connection between big money and big fraud. One need only look at black markets for drugs and other controlled and/or precious commodities. But what about cases where the commodity is soft, intangible, and practically virtual? I am talking about loyalty and rewards programs.

A study by Colloquy (in 2011) estimated that the loyalty and rewards programs in the U.S. alone had an estimated outstanding value of $48 billion US dollars. This is “outstanding” value because it doesn’t carry tangible benefit until the rewards or loyalty points are cashed in, redeemed, or otherwise exchanged for something that you can “take to the bank”. In anybody’s book, $48 billion is really big value — i.e., big money rewards for loyal customers, and a big target for criminals seeking to defraud the rightful beneficiaries of these rewards.

The risk vs. reward equation in loyalty programs now has huge numbers on both sides of that equation. There’s great value for customers. There’s great return on investment for businesses seeking loyal customers. And that’s great bait to lure criminals into the game.

In the modern digital marketplace, it is now possible to manipulate payment systems on a larger scale, thereby defrauding the business of thousands of dollars in rewards points. The scale of the fraud could match the scale of the entire loyalty program for some firms, which would therefore bankrupt their supply of rewards for their loyal and faithful customers. This is a really big problem waiting to happen unless something is done about it.

The something that can be done about it is to take advantage of the fast predictive modeling capabilities for fraud detection that are enabled by access to more data (big data), better technology (analytics tools), and more insightful predictive and prescriptive algorithms (data science).

Fraud analytics is no silver bullet. It won’t rid the world of fraudsters and other criminals. But at least fast automatic modeling will give firms better defenses, more timely alerts, and faster response capabilities. This is essential because, in the digital era, it is not only business that is moving at the speed of light, but so also are the business disruptors.

Some simple use cases for fraud analytics within the context of customer loyalty reward programs can be found in the article “Where There’s Big Money, There’s Big Fraud (Analytics)“.

Payment fraud reaches across a vast array of industries: insurance (of all kinds), underwriting, social programs, purchasing and procurement, and now loyalty and rewards programs. Be prepared. Check out the analytics solutions from the fast automatic modeling folks at http://soft10ware.com/.

Follow Kirk Borne on Twitter @KirkDBorne

 

Blogging My Way Through Data Science, Big Data, and Analytics

I frequently write blog posts on other sites.  You can find those articles here (updated March 21, 2016):

I also write “one-off” blog posts, such as these examples:

Follow Kirk Borne on Twitter @KirkDBorne

What Motivates a Data Scientist?

I recently had the pleasure of being interviewed by Manu Jeevan for his Big Data Made Simple blog.  He asked me several questions:

  • How did you get into data science?
  • What exactly is enterprise data science?
  • How does Booz Allen Hamilton use data science?
  • What skills should business executives have to effectively to communicate with data scientists?
  • How is big data changing the world? (Please give us interesting examples)
  • What are your go-to tools for doing data science?
  • In your TedX talk Big Data, Small World you gave special attention to association discovery, is there a specific reason for that?
  • The Data Scientist has been called the sexiest job of the 21st century. Do you agree?
  • What advice would you give to people aspiring for a long career in data science?

All of these questions were ultimately aimed at understanding the key underlying question: “What motivates you to work in data science?” The question about enterprise data science really comes the closest to identifying what really motivates me — that is, I am exceedingly fortunate every day to be given the opportunity to work with a fantastic team of data scientists at Booz Allen Hamilton, with the mandate to explore data as a corporate asset and to exploit data science as a core capability in order to achieve more profound discoveries, to make better (data-driven) decisions, and to propel new innovations across numerous domains, industries, agencies, and organizations. My Data Science Declaration also sums up these motivating factors for me.

You can see the full scope of my answers to the above questions here: http://bigdata-madesimple.com/interview-with-leading-data-science-expert-kirk-borne/.

Follow Kirk Borne on Twitter @KirkDBorne

Just-in-Time Supply Chain Management with Data Analytics

A common phrase in SCM (Supply Chain Management) is Just-In-Time (JIT) inventory. JIT refers to a management strategy in which raw materials, products, or services are delivered to the right place, at the right time, as demand requires. This has always been an excellent business goal, but the power to excel at JIT inventory management is now improving dramatically with the increased use of data analytics across the supply chain.

In the article “Operational Analytics and Droning About Big Data“, we discussed two examples of JIT: (1) a just-in-time supply replenishment system for human bases on the Moon, and (2) the proposal by Amazon to use drones to deliver products to your front door “just in time”! The Internet of Things will almost certainly generate similar use cases and benefits.

Descriptive analytics (hindsight) tells you what has already happened in your supply chain. If there was a deficiency or problem somewhere, then you can react to that event. But, that is “old school” supply chain management. Modern analytics is predictive (foresight), allowing you to predict where the need will occur (in advance) so that you can proactively deliver products and services at the point of need, just in time.

The next advance in analytics is prescriptive (insight), which uses optimization techniques (from operations research) in combination with insights and knowledge of your business (systems, processes, and resources) in order to optimize your delivery systems, for the best possible outcome (greater sales, fewer losses, reduced inventory, etc.). Just-in-time supply chain management then becomes something more than a reality — it now becomes an enabler of increased efficiency and productivity.

Many more examples of use cases in the manufacturing and retail industries (and elsewhere) where just-in-time analytics is important (and what you can do about it) have been enumerated by the fast Automatic Modeling folks from Soft10, Inc. Check out their fast predictive analytics products at http://soft10ware.com/.

(Read more about these ideas at: https://www.linkedin.com/pulse/supply-chain-data-analytics-jit-legit-kirk-borne)

Follow Kirk Borne on Twitter @KirkDBorne

 

Definitive Guide to Data Literacy For All – A Reading List

One of the most important roles that we should be embracing right now is training the next-generation workforce in the art and science of data. Data Literacy is a fundamental literacy that should be imparted at the earliest levels of learning, and it should continue through all years of education. Education research has shown the value of using data in the classroom to teach any subject — so, I am not advocating the teaching of hard-core data science to children, but I definitely promote the use of data mining and data science applications in the teaching of other subjects (perhaps, in all subjects!). See my “Using Data in the Classroom Reading List” here on this subject.

I encourage you to read a position paper that I wrote (along with a few astronomy colleagues) for the US National Academies of Science in 2009 that addressed the data science literacy requirements in astronomy. Though focused on the needs in astronomy workforce development for the coming decade, the paper also contains more general discussion of “data literacy for the masses” that is applicable to any and all disciplines, domains, and organizations: “Data Science For The Masses.”

Two new “…For Dummies” books can help in those situations, to bring data literacy to a much larger audience (of students, business leaders, government agencies, educators, etc.). Those new books are: “Data Science For Dummies” by Lillian Pierson, and “Data Mining for Dummies” by Meta Brown.  And here is one more that I believe is an excellent data literacy companion: The Data Journalism Handbook.

Update (April 2016) – The following site has a wealth of information on the use of “Data in Education”: http://www.ands.org.au/working-with-data/publishing-and-reusing-data/data-in-education

Data Mining For Dummies

 

 

(Read more here: http://www.datasciencecentral.com/profiles/blogs/dummies-for-data-science-a-reading-list)

Follow Kirk Borne on Twitter @KirkDBorne

A Day in the Life of Confounding Factors and Explanatory Variables

Would we trust an insurance provider who sets motorbike insurance rates based on the sales of sour cream? Or would we schedule our space launches according to the number of doctoral degrees awarded in Sociology?

Probably all of us would agree that this kind of decision-making is unjustified. A specific decision like this appears to be only superficially supported by the evidence of correlations between those various factors, but is there more to the story? Does it go any deeper? What if there exists a hidden causal factor that induces the apparently spurious correlation?

For example, suppose the increase in space launches and the increase in doctoral degrees in Sociology were both related to an increase in government investments in research studies on the sociological impacts of establishing a permanent human colony on the Moon. This case reveals a hidden causal connection in an otherwise strange correlation. The explanatory variable (which is a hidden confounding factor) is the research investment, and the response variables are the space launches and doctoral degrees.

What about other cases? What about the evidence that sour cream sales correlate with motorbike accidents? In such cases, shouldn’t we all be pleased to see organizations making evidence-based data-driven objective decisions (especially in this brave new world of exploding data volumes and ubiquitous analytics)? No, I don’t think so!!

So, what kind of world is this?

Welcome to the world of explanatory variables and confounding factors!

Statistical literacy is needed now more than ever (to paraphrase H. G. Wells). This includes awareness of and adherence to common principles of statistical reasoning. For example…

(continue reading here http://www.statisticsviews.com/details/feature/7914611/A-Day-in-the-Life-of-Explanatory-Variables-and-Confounding-Factors.html)

Follow Kirk Borne on Twitter @KirkDBorne

4 Reasons why an Accurate Analytics Model may not be Good Enough

Here are four reasons why the result of your analytics modeling might be correct (according to some accuracy metric), but it might not be the right answer:

  1. Your model may be underfit.

  2. Your model may be overfit.

  3. Your model may be biased.

  4. Your model may be suffering from the false positive paradox.

In data science, we are trained to keep searching even after we find a model that appears to be accurate. Data Scientists should continue searching for a better solution, for at least the four reasons listed above. Please note that I am not advocating “paralysis of analysis”, where never-ending searches for new and better solutions are just an excuse (or a behavior pattern) that prevents one from making a final decision. Good leaders know when an answer is “good enough”. We discussed this in a previous article: “Machine Unlearning – The Value of Imperfect Models”…

(For more discussion of the four cases listed above, continue reading herehttps://www.mapr.com/blog/4-reasons-look-further-accurate-answer-your-analytics-question)

Follow Kirk Borne on Twitter @KirkDBorne

The Definitive Q&A for Aspiring Data Scientists

I was recently asked five questions by Alex Woodie of Datanami for the article, “So You Want To Be A Data Scientist” that he was preparing. He used a few snippets from my full set of answers. The longer version of my answers provided additional advice. For aspiring data scientists of all ages, I provide here the full, unabridged version of my answers, which may help you even more to achieve your goal. (Note: I paraphrase Alex’s original questions in quotes below.)

1. “What is the number one piece of advice you give to aspiring data scientists?”

My number one piece of advice always is to follow your passions first. Know what you are good at and what you care about, and pursue that. So, you might be good at math, or programming, or data manipulation, or problem solving, or communications (data journalism), or whatever. You can do that flavor of data science within the context of any domain: scientific research, government, media communications, marketing, business, healthcare, finance, cybersecurity, law enforcement, manufacturing, transportation, or whatever. As a successful data scientist, your day can begin and end with you counting your blessings that you are living your dream by solving real-world problems with data. I saw a quote recently that summarizes this: “If you think your scarce data science skills could be better used elsewhere, be bold and make the move.” (Reference).

2. “What are the most important skills for an aspiring data scientist to acquire?”

There are many skills under the umbrella of data science, and we should not expect any one single person to be a master of them all. The best solution to the data science talent shortage is a team of data scientists. So I suggest…

(continue reading herehttps://www.mapr.com/blog/definitive-qa-aspiring-data-scientists)

Follow Kirk Borne on Twitter @KirkDBorne

Clear and Obvious Analytics for Clear and Present Dangers

Not every industry has found their clear and obvious applications of big data analytics. But the clear and present dangers of risk and fraud in financial transactions demand fast predictive modeling. Precisely because we live in the ubiquitous digital era, where most business (and non-business) transactions are rarely (if ever) in analog form and those transactions no longer move at the pace of humans (but at the speed of light), consequently the volume of digital signals as well as lurking dangers is enormous.

Digital signals (from sensors everywhere in our operational business systems) carry transactional information (what happened to what?), as well as metadata (descriptors) and analytics information (data-encoded knowledge and insights).  These analytics can be behavioral (providing insights into the interests, intentions, and preferences of the actors in a given transaction) as well as functional (providing insights into the actions or events associated with the transaction).

Behavioral analytics is developing into a major component of digital marketing, as firms seek to sell, cross-sell, and up-sell their products to the right customer at the right time.  Behavioral analytics is also critical in risk mitigation of all sorts: financial, cybersecurity, health (individual and population), supply chain, machine performance, and so on.

Here are 10 examples of where fast predictive analytics can play a vital role in most industries (with a focus on financial):

  1. Predict credit risk and fraud in real-time!
  2. Use Social Media for deeper understanding (likes and dislikes) of your customers.
  3. Personalize customer interactions in real-time, across multiple channels.
  4. Stop improper insurance payments before claims are paid!
  5. Spot insurance rate evasion tactics during the quote process – before you issue a policy!
  6. Predict High Health Risk versus Low Health Risk to better manage healthcare decision-making.
  7. Generate better predictive models of health, car, and home insurance eligibility fraud, underwriting fraud, and improper payments.
  8. Spot adversarial and anomalous behavior in cyber networks – stop the data breach or illegal funds transfer before it happens!
  9. Eliminate your Supply Chain hiccups – move the right products to the right locations in the right quantities – and at the right time!
  10. Make better business decisions regarding merchandising, demand forecasting, and pricing – don’t leave money on the table, or products in your warehouse.

Let us look a little more closely at the financial services industry…

One of the common conditions in traditional financial services (including home, health, and auto insurance) has been the “pay and chase” — i.e., you make the payment to the claimant, and then (after making the payment) you find out that the claim is fraudulent, thus beginning the chase to get your money back.

The new world of predictive modeling and advanced analytics allows for a new mantra in the financial and insurance industries: “Do Not Pay!” — i.e., you do not pay the claim until you have analyzed its likelihood for claim fraud, extraordinary financial risk, or payment anomalies (e.g., duplicate payments).

Predictive analytics modeling delivers a better financial risk posture for your organization than the “pay and chase”. With access to greater and more diverse data sources, it is now possible to develop better models of your customers’ credit risk regardless of the industry. This is certainly true in the financial services industry where there is so much data available: credit scores, credit history, court records, tax records, health records, insurance claims, and more. There is no excuse for not examining as much “public data” as you can in conjunction with other data sources that are available to you internally within your organization. Moderate outlays of your organization’s funds that are incurred in acquiring access to diverse external data sources should be offset by the savings accrued by “not sending your funds out the door” erroneously (either to intentional fraudsters or in unintentional duplicate claims).

An analytics-driven predictive model can predict fraud more efficiently (with fast automatic statistical software packages) and more effectively (with higher precision and higher recall: fewer false positives and false negatives) than traditional business processes. A good predictive analytics model should: (a) detect claims that “smell funny”, (b) prevent the “pay and chase” mode of operations, and (c) stop claims fraud abruptly by empowering a “do not pay” mode of operations. Predictive analytics modeling should aim to satisfy the following business requirements:

  • Detect and prevent both opportunistic and professional fraud throughout the claims process.
  • Detect underwriting fraud, to prevent premium leakage at the point of sale and renewal.
  • Spot rate evasion tactics during the quote process – before you issue a policy.

Many more examples of use cases in the financial services industry (and elsewhere) where fast predictive analytics is important (and what you can do about it) have been expertly enumerated by the fast statistical modeling folks from Soft10, Inc. Check out their fast analytics products (including the Instant Online Overbilling Claims Detector) at http://soft10ware.com/.

Follow Kirk Borne on Twitter @KirkDBorne

A Growth Hacker’s Journey Through the Recent History of Data Science

In 1998, I was attending a conference when an astronomer that I knew from across the country sought me out and asked if his group could send the data from their large astronomy experiment to NASA’s ADF (Astrophysics Data Facility, where I was working at the time). Their data set was two Terabytes in total. That seemed big (like the birth of “Astronomy Big Data”) to me, especially for 1998, but I didn’t know how big until I went back to work a few days later. When I mentioned this opportunity to the NASA facility senior managers, they looked at me like I was unaware of something really obvious and important. They were right! They “reminded” me that the data facility was the home for 15,000 NASA space science mission data sets, and the aggregate sum total data volume for all of those data sets combined(!) was less than one Terabyte! They couldn’t possibly accept one single experiment’s data that single-handedly eclipsed the total volume of all of the other 15,000 experiments’ data sets combined.

Well, this was embarrassing! What could we do? I was told that ADF could accept the data if I would write a research grant proposal and win some funds to pay for all of the new I.T. resources that would be required. “What kind of research proposal would pay for such a thing?” I asked myself. This led me to investigate a field of research that I had only briefly heard in conversation once or twice previously — Data Mining (= Machine Learning applied to large data sets). The more I read about this topic (now called Data Science), the more I became convinced that this is what I wanted to do for the rest of my research career. I was hooked. I was at the right place at the right time…

(continue reading herehttps://www.mapr.com/blog/growth-hackers-journey-right-place-right-time)

Follow Kirk Borne on Twitter @KirkDBorne