Internet of Things - Rocket-Powered Data Science

We often think of analytics on large scales, particularly in the context of large data sets (“Big Data”). However, there is a growing analytics sector that is focused on the smallest scale. That is the scale of digital sensors — driving us into the new era of sensor analytics.

Small scale (i.e., micro scale) is nothing new in the digital realm. After all, the digital world came into existence as a direct consequence of microelectronics and microcircuits. We used to say in the early years of astronomy big data (which is my background) that the same transistor-based logic microcircuitry that comprises our data storage devices (which are storing massive streams of data) is essentially the same transistor-based logic microcircuitry inside our sensors (which are collecting that data). The latter includes, particularly, the sensors inside digital cameras, consisting of megapixels and even gigapixels. Consequently, there should be no surprise that the two digital data functions (sensing and storing) are intimately connected and that we are therefore drowning in oceans of data.

But, in our rush to crown data “big”, we sometimes may have forgotten that micro-scale component to the story. But not any longer. There is growing movement in the microchip world in new and interesting directions.

I am not only talking about evolutions of the CPU (central processing unit) that we have seen for years: the GPU (graphics processing unit) and the FPGA (field programmable gate array). We are now witnessing the design, development, and deployment of more interesting application-specific integrated circuits (ASICs), one of which is the TPU (tensor processing unit) which is specifically designed for AI (artificial intelligence) applications. The TPU (as its name suggests) can perform tensor calculations on the chip, in the same way that earlier generation integrated circuits were designed to perform scalar operations (in the CPU) and to perform vector and/or parallel streaming operations (in the GPU).

Speeding the calculations is precisely the goal of these new chips. One that I heard discussed in the context of cybersecurity applications is the BPU (Behavior Processing Unit), designed to detect BOI (behaviors of interest). Whereas the TPU might be detecting persons of interest (POI) or objects of interest in an image, the BOI is looking at patterns in the time series (sequence data) that are indicative of interesting (and/or anomalous) behavioral patterns.

The BOI detector (the BPU) would definitely represent an amplifier to cybersecurity operations, in which the massive volumes of data streaming through our networks and routers are so huge that we never actually capture (and store) all of that data. So we need to detect the anomalous pattern in real-time before a damaging cyber incident occurs!

You can continue reading the long version of this article and learn more about the growing class of new analytics ASIC processors in my article “Sensor Analytics at Micro Scale on the xPU” at the Western Digital DataMakesPossible.com blog site.

Learn more about Machine Learning for Edge Devices at Western Digital here: https://blog.westerndigital.com/machine-learning-edge-devices/

Finally, see what’s cooking in Western Digital’s new Machine Learning Accelerator here: https://blog.westerndigital.com/machine-learning-accelerator-embedded-world-2019/

Brief Guide to xPU for AI Accelerators

Source for graphic: https://www.sigarch.org/a-brief-guide-of-xpu-for-ai-accelerators/

(The following article was first published in July of 2013 at analyticbridge.com. At least 3 of the links in the original article are now obsolete and/or broken. I re-post the article here with the correct links. A lot of things in the Big Data, Data Science, and IoT universe have changed dramatically since that first publication, but I did not edit the article accordingly, in order to preserve the original flavor and context. The central message is still worth repeating today.)

The on-going Big Data media hype stirs up a lot of passionate voices. There are naysayers (“it is nothing new“), doomsayers (“it will disrupt everything”), and soothsayers (e.g., Predictive Analytics experts). The naysayers are most bothersome, in my humble opinion. (Note: I am not talking about skeptics, whom we definitely and desperately need during any period of maximized hype!)

We frequently encounter statements of the “naysayer” variety that tell us that even the ancient Romans had big data. Okay, I understand that such statements logically follow from one of the standard definitions of big data: data sets that are larger, more complex, and generated more rapidly than your current resources (computational, data management, analytic, and/or human) can handle — whose characteristics correspond to the 3 V’s of Big Data. This definition of Big Data could be used to describe my first discoveries in a dictionary or my first encounters with an encyclopedia. But those “data sets” are hardly “Big Data” — they are universally accessible, easily searchable, and completely “manageable” by their handlers. Therefore, they are SMALL DATA, and thus it is a myth to label them as “Big Data”. By contrast, we cannot ignore the overwhelming fact that in today’s real Big Data tsunami, each one of us generates insurmountable collections of data on our own. In addition, the correlations, associations, and links between each person’s digital footprint and all other persons’ digital footprints correspond to an exponential (actually, combinatorial) explosion in additional data products.

Nevertheless, despite all of these clear signs that today’s big data environment is something radically new, that doesn’t stop the naysayers. With the above standard definition of big data in their quiver, the naysayers are fond of shooting arrows through all of the discussions that would otherwise suggest that big data are changing society, business, science, media, government, retail, medicine, cyber-anything, etc. I believe that this naysayer type of conversation is unproductive, unhelpful, and unscientific. The volume, complexity, and speed of data today are vastly different from anything that we have ever previously experienced, and those facts will be even more emphatic next year, and even more so the following year, and so on. In every sector of life, business, and government, the data sets are becoming increasingly off-scale and exponentially unmanageable. The 2011 McKinsey report “Big Data: The Next Frontier for Innovation, Competition, and Productivity.” made this abundantly clear. When the Internet of Things and machine-to-machine applications really become established, then the big data V’s of today will seem like child’s play.

In an attempt to illustrate the enormity of scale of today’s (and tomorrow’s) big data, I have discussed the exponential explosion of data in my TedX talk “Big Data, small world“ (e.g., you can fast-forward to my comments on this topic starting approximately at the 9:00 minute mark in the video). You can also read more about this topic in the article “Big Data Growth – Compound Interest on Steroids“, where I have elaborated on the compound growth rate of big data — the numbers will blow your mind, and they should blow away the naysayers’ arguments. Read all about it at http://rocketdatascience.org/?p=204.

Follow Kirk Borne on Twitter @KirkDBorne

Rocket-Powered Data Science

Data Reflections by Dr. Kirk Borne @KirkDBorne

Tag Archives: Internet of Things

Sensor Analytics on Big Data at Micro Scale

Why Today’s Big Data is Not Yesterday’s Big Data — Exponential and Combinatorial Growth