CSCMP's Supply Chain Quarterly
June 18, 2019

Three things you should know about big data and analytics

The terms "big data" and "analytics" are widely used but often are not clearly defined. The explanations presented here will help managers understand what these concepts really mean and how they can use them to improve their supply chains.

Wherever you look lately, the concepts of analytics and "big data" are being hyped by the media, consultants, and software vendors. Some of that hype is justified, as some companies are indeed using analytics and big data to transform their businesses. But most supply chain managers aren't getting a clear definition of what "analytics" and "big data" really mean. They are either hearing those terms being used by vendors that are trying to sell a specific solution, or they are seeing the terms used, without being clearly defined, in the popular business press. And even if the definitions themselves are clear, it may not be clear to the supply chain manager what value those concepts offer.

Without a full understanding of what the field of analytics is about, supply chain managers may be missing out on many opportunities—both for their companies and for themselves. For one thing, more and more companies are finding that they can gain a competitive advantage by better using data and analytics to make decisions. For another, more companies are demanding that their managers understand data and analytics. Before supply chain managers can grasp how analytics and big data can be applied in their own world, though, they'll first need to understand not only what those concepts actually mean but also what difference they can make. With that in mind, here are three things that supply chain managers should know.

1. Big data is not just for the IT department.
Although widely used, the term "big data" is poorly defined. This lack of a clear definition may be the reason why some supply chain managers make the mistake of thinking that big data has little relevancy to their work and is only the concern of the information technology (IT) department. If they limit themselves to viewing big data as just an IT storage issue, supply chain managers may overlook the economic value of the data they are collecting. As it is, many are not taking full advantage of the data they already have at hand, nor are they thinking creatively about how they could use it.

There are actually three definitions of big data currently in use. Each one is valuable, in different ways, to supply chain managers.

The first, and most "IT-centric," definition refers to an amount of data that is too large or complex for current mainstream data storage and retrieval systems. In other words, you can't simply store "big data" in a standard database and run a query against it. Common examples of this view of big data include the massive amounts of data generated by sensors (which fill up databases quickly) or free-form text or video data (which doesn't fit into a relational database very well). This is an IT-centric view because it focuses on the technology of storing and retrieving data and on the need to use new types of servers and software, such as Hadoop. Hadoop is a file storage system that goes beyond standard relational-database technology, allowing users to store and retrieve unstructured data over hundreds or thousands of distributed machines. Facebook, for example, runs with Hadoop so it can store massive amounts of unstructured data.

Supply chain managers don't need to know the technical details of Hadoop and other such technologies. But they do need to realize that they can and should capture and analyze data if there are business reasons for doing so. For example, if you have thousands of sensors on your manufacturing, warehousing, or trucking equipment, there now are ways for you to analyze the data those sensors collect. You could, for example, use that information to better predict when machines will fail or to improve the fuel efficiency of trucks. Or if your customer service team records customers' voicemails and e-mails, you can analyze this data to help your company provide better service.

The IT community tends to think this is where the definition of big data ends. Actually, from a manager's point of view, the following two definitions are just as interesting.

The second definition of big data comes from Viktor Mayer-Schönberger and Kenneth Cukier's book Big Data: A Revolution That Will Transform How We Live, Work, and Think. In this book, they define big data as the "universe" of data for a given subject. They are not saying whether the amount of data is too great for current mainstream technologies, but they are saying that it is now possible to have all the data. The authors argue that there are two important implications of having (or the possibility of getting) this universe of data. First, once you have all the data, it is possible to look for correlations and gain insights you would not have seen before. For example, if you have data on everything a truck was doing before it was involved in an accident, you can better determine what led to the accident and use this information to prevent future accidents. Since accidents are hopefully rare events, normal data sampling would not have given you the information needed to find the right correlations.

The second important implication is that there may be a lot of value in having the universe of data. For example, right now there is a lively debate going on in the farming industry. Large seed manufacturers say they can help drive up crop yields if farmers send them detailed data on their soil quality on something like a square-foot-by-square-foot basis. (While this may seem like a huge amount of data to collect, it is actually easy to do with today's high-tech tractors.) Obviously, an increase in yields is a good thing, but farmers recognize that if the seed companies had detailed data about every farm, they could use that data for other purposes, such as trading on agricultural commodity futures. Understandably, the farmers would like to make sure they own this valuable data.

There are two important lessons here for supply chain managers. First, you may have access to a "universe" of data that has economic value outside of its original purpose. Second, you should be careful about giving away your data to other organizations that could reap its economic value.

The third definition of big data is derived from how the term is used in the popular press. The press tends to label some interesting or creative use of data as "big data." When you read the article, though, you quickly realize that the data set being discussed is not "the universe of data on a particular subject." In fact, the data set typically is not even very large. Rather, in most cases, the data set is being used creatively.

It would be wrong to dismiss this definition as simply a case of the press misusing a buzzword. Instead, this definition points out something meaningful: It is important to think creatively—about using the data you have access to, about combining the data you have in unique ways, and about looking for new, readily available, external data sets (such as weather, housing starts, or demographics) that would help you make better decisions.

In short, supply chain managers can benefit from knowing all three definitions of big data. It is important to understand that your colleagues in IT are finding ways to work with large and unstructured data sets; what was technically impossible several years ago may be possible now. It is also important to understand big data in terms of the "universe of data," as you may be able to find causes of rare events that were impossible to determine before, or the data you have or give away may have economic value. Finally, now that you have this wealth of data—purchases from suppliers, shipments to customers, and the performance of your assets, to name just a few possibilities—you need to start thinking creatively about how to use it.

2. There is more than one type of analytics.
Sometimes the term "analytics" is used interchangeably with the term "big data." But they actually are distinct concepts. At the highest level, analytics is "the ability to collect, analyze, and act on data."1 As we saw, big data says something interesting about the size of the data, the universe of data, or the creative use of data. Based on our high-level definition, it is clear that analytics can be used with old, ordinary, and small data sets as well as with new, creative, and big data sets.

Our high-level definition of analytics, however, does not give us enough details to see what is unique or new about the field of analytics. Haven't managers always collected, analyzed, and acted on data? To add to the confusion, too often an organization or a vendor will use the term analytics to refer to just one type of analysis—typically the use of a business-intelligence or reporting system. Or, if you read about e-commerce companies, analytics will refer to tracking and analyzing user clicks on a website.

But the field of analytics is much bigger than just a reporting system or analyzing Web clicks. (Otherwise, it would not have captured the attention of the business community, and companies across a wide range of industries would not be reporting on its benefits.) Serious thinkers and the academic community have identified three different types of analytics.

First, there is descriptive analytics, which presents your data in a way that helps you make sense of what is happening in your supply chain. This is where a business intelligence system will sit. It gathers data from your entire supply chain and organization and presents it to you as dashboards, scorecards, and ad hoc queries. Descriptive analytics also includes visualization of data and geographic mapping, which helps you tell a story with the data in a way you could not do with a tabular report.

The second type is predictive analytics, which are all the techniques that allow you to take the data you have available (internally and externally) and make better predictions. This can be anything from creating better forecasts to predicting when a machine will break down, and from estimating your chances of having to go to the spot market for transportation capacity to predicting which products your customers will likely buy.

Finally, prescriptive analytics refers to using your data and your predictions to make recommendations on what action to take. Prescriptive analytics is most often associated with optimization technology. In supply chains, optimization technology is commonly used to help managers decide such matters as how many facilities they should have and where they should be located, how to best route trucks, and how to schedule warehouse or factory operations.

When these three definitions are presented, they often are ranked in terms of their degree of complexity and strategic importance. Descriptive analytics usually sits at the bottom because it is considered to be the easiest to implement and to provide the least amount of strategic value. Predictive analytics is next, as it is a little more difficult to implement but brings more benefits. And prescriptive analytics usually sits on top, because it is the most complicated to implement and provides the greatest value.

Not everyone agrees with this ranking. A descriptive analytics project can be very complex to implement—it can be difficult to clearly describe a large, complex, global business. Moreover, giving an entire management team a clear picture of the organization may lead to significant, strategic change. In contrast, it may be very easy to implement prescriptive analytics for truck routing, say, and while the savings may be nice, it won't lead to a significant strategy shift. Instead of ranking the types of analytics, you should think of each as having its own place and realize that each company or organization will value different types of analytics at different times and places.

As a supply chain manager, it is important for you to understand the three areas of analytics so you can make sure you are properly addressing each one. Rather than have a strategy for just one type of analytics, it's necessary to have a strategy for each. For example, do you have good descriptive analytics in place to understand your supply chain? Are you using predictive analytics to forecast demand and machine failures? Are you using prescriptive analytics to determine where to make products, where to locate facilities, and how to schedule resources?

Knowing these three definitions will also help you assess proposals for analytics projects. The people presenting these projects may not fully define what type of analytics they are promoting. The definitions explained here will give you a framework for determining exactly what the project will do and how it fits into your overall analytics strategy.

3. Machine learning has many potential supply chain applications.
Supply chain managers need to possess a basic understanding not only of analytics but also of machine learning. Machine learning refers to the collection of algorithms that have been developed over the last several decades in a variety of fields, such as statistics, data mining, and artificial intelligence. These algorithms represent the "brains" behind a lot of what is new in predictive analytics.

In the simplest sense, machine-learning algorithms take a set of input data and then create a model based on the data that will either predict future outcomes or will uncover patterns in the data. It may seem less intimidating once you recognize that regression analysis (the statistical process for estimating relationships among variables) is classified as one type of a machine-learning algorithm.

It may seem strange that supply chain managers should need to know about machine learning; it sounds like something that belongs in computer science or robotics. But just as supply chain managers today should know about regression analysis, they should also know about machine learning. Knowing how machine-learning algorithms work and what they can accomplish will help you better understand what is now possible with predictive analytics. That is, it will give you new ideas you can apply in your organization to get more value from your data.

Supply chain managers should be familiar with several of the more popular machine-learning algorithms. For example, with some data sets, these algorithms (like "k-nearest neighbor," "decision trees," or "random forests") can out-predict traditional regression analysis by better teasing out patterns in the data or by allowing text-input values. Other algorithms are well suited for predicting whether an event will happen or not—for example, will the order be late, will the carrier accept the load, or will the machine break. This is typically done using logistic regression (a statistical technique used to predict the probability of possible outcomes). There are also algorithms for understanding text documents—like trying to determine whether an e-mail sent to customer service is negative or positive. This is done with naïve Bayes algorithms (a probabilistic algorithm used to classify inputs). Algorithms can help you determine which items are likely to be ordered or shipped together—very helpful for determining which items should be stored together. These are known as "association rules" or "market-basket analysis." And finally, there are algorithms for detecting clusters in your data.

Once you start to see the power of the various machine-learning algorithms, you will realize that you can combine them in interesting ways to solve complex supply chain problems. Some companies' transportation departments have built sophisticated models to first predict when a carrier will accept a load and then use price optimization to set the best price for that load. Some companies use the algorithms to better determine which products their sales teams should recommend to their customers (similar in some ways to the recommendation engines that Amazon and Netflix use). Others use the algorithms to predict when inventory might go obsolete and to adjust prices accordingly to move the stock.

Supply chain managers should be aware of the different ways these algorithms are being used to solve business problems. In the field of analytics, good ideas can start out in one organization and then be picked up and adapted by others. For example, association rules have long been used in the grocery industry to see what items consumers would buy together. One retailer's e-commerce managers realized that they could use the same algorithms to determine which items they should place in the same warehouses because they tend to ship together. As this example suggests, the more you know about the different applications, the more likely you will be able to apply them to your business.

Time to enhance your skills
The field of analytics is very exciting right now. It is getting a lot of attention and is having a big impact on all types of businesses and organizations. Although it is more often associated with high-tech companies like Google or Amazon, it is quickly moving into more traditional supply chain environments. Companies that don't embrace it may be left behind or at a big disadvantage.

This article should provide you with both a framework for thinking more clearly about analytics and a starting point for conducting more research on your own. There is a wealth of books and online resources about analytics and big data as well as commercial tools and open-source software available to help you get started.

As a supply chain manager, you will benefit from challenging yourself to come up with new ways to use the data in your possession. Be a leader in data and analytics; it will help propel your company—and your career—forward.

1. This definition is adapted from the article "Competing on Analytics," by Thomas H. Davenport, which appeared in the January 2006 issue of Harvard Business Review.

Michael Watson is a partner with the firm Opex Analytics and an adjunct professor at Northwestern University.

Join the Discussion

After you comment, click Post. If you're not already logged in, you will be asked to log in or register.

Want more articles like this? Sign up for a free subscription to Supply Chain Executive Insight, a monthly e-newsletter that provides insights and commentary on supply chain trends and developments. Click here to subscribe.

We Want to Hear From You! We invite you to share your thoughts and opinions about this article by sending an e-mail to ?Subject=Letter to the Editor: Quarter 2014: Three things you should know about big data and analytics"> . We will publish selected readers' comments in future issues of CSCMP's Supply Chain Quarterly. Correspondence may be edited for clarity or for length.

Want more articles like this? Subscribe to CSCMP's Supply Chain Quarterly.