Skip to content

It has been said that data is the new oil. This is true in more ways than one: like oil, data must be refined and processed before it becomes valuable, and oil in the wrong places can become a burden rather than a delight. Experimenting with real data can help identify opportunities and demonstrate the business case. Then data can help optimize the organization and even deliver a better product.

Business decisions are driven by information from the business and its context. Probably a lot of data already goes into that decision pipeline: sales data, operational information and, for those who sell ice cream, also the weather forecast. Excel is still one of the most widely used tools for analyzing, processing and taking action on information. However, in a world where we can measure every axle rotation in our machines if we want to, track every meter our delivery trucks travel, monitor every tiny change in the soil moisture of a field, and track a customer's journey through our web store with pixel precision, Excel is running into its limits. Welcome to the world of data analytics.

Data does not necessarily have to speak for itself. Proper analysis is necessary to gain insight. Data science is the term that covers many of the areas related to wringing useful information out of data, including capturing, managing, interpreting and learning from it. Data analytics makes it possible to use this information in business processes. Big data refers to the tools and techniques needed to process quantities of data that are otherwise unmanageable because of their size, nature or structure.

Top-down or bottom-up

What types of data should we collect? With what frequency? Where should we store them and for how long? What tools should we use? All this depends on one basic question: what insights are we interested in? If we want to do "something with data," answering this question is the starting point. And there are two ways to approach this: top-down or bottom-up.

The top-down approach starts from the insights we want to gain. Predicting the time before a machine part fails or determining the most wasteful part of a delivery route, for example. Based on this, we identify the platforms, tools, procedures and sensors that are going to give us these insights.

In the bottom-up approach, we start with the data and ask what we can learn from it - nine times out of ten, we already have an idea. This is a more exploratory way of working, which belongs to the domain of data science. After we identify the business cases, we often move to a more top-down approach.

A good way to start is to set up a small-scale project with the data that is already available. This can quickly provide insight into what is feasible and what the results might be. We also quickly learn that we usually don't have enough data to make the business case and that we need to drill down a little further.

Time and (a little) money

Since data science and data analytics are fundamentally different from regular analytics - business or technical - we will need specialized expertise. We probably won't find it in our core business. By enlisting the help of outside consultants, we can begin to practice until the practice becomes part of the primary process.

As with any new project, it takes time to get started with data analytics. Most of that goes to identifying the business case. It usually takes a few days to get a good understanding of the possibilities and potential outcomes.

Depending on the application and the data needed, we may need additional sensors or a network infrastructure, which will incur additional hardware costs. One of the great promises of big data, however, is that we can collect data cheaply, compensating for any inaccuracies in abundance. To estimate the throughput of a machine, for example, we can settle for inexpensive vibration sensors with a precision of less than a second, and long-term trends from several basic temperature sensors scattered across a factory site may be more interesting than an occasional, highly accurate, one-point measurement.

The backend infrastructure doesn't have to be a huge expense either, thanks to pay-as-you-go cloud providers like Amazon (AWS), Microsoft (Azure) and Google (Google Cloud Platform). We can store and process our data with no upfront costs and often don't have to spend more than a handful of dollars to get a trial going. While relinquishing control may feel uncomfortable, these providers are much better equipped to protect data and comply with regulations.

Points of interest

There are several pitfalls to watch out for when collecting data. First, although relatively inexpensive, transferring and storing information still costs money. With really large amounts of data, this can add up to a significant bill.

Second, some data may be more sensitive than thought. For example, the activity patterns of a heating system may be interesting for predictive maintenance, but they also indicate when someone is home - or not. It may also seem like a good idea for an electric motor manufacturer to measure drive speeds in a customer's plant to prevent failures, but this information is also very valuable to that customer's competitor. Some data may even be subject to strict regulation, such as the GDPR, which caused a stir when it was introduced in Europe last year.

Collecting data just for the sake of collecting data - often referred to as a "data lake" - would be unwise. While it may have some value for exploring new applications, it comes at a high cost. In recent years, the practice has fallen out of favor, in favor of a more structured approach.

The starting point has become: don't store if you don't need to. Part of developing a data analytics application is identifying what data is actually needed and for how long. Do we really need to measure every tenth of a second or can we be satisfied with measuring every minute or even every hour? Do we need to track accelerometer movements in three dimensions or does the whole thing already provide enough information? Do we need to store data points older than a week or can we compact them into a weekly average or throw them away altogether? If we do need to collect sensitive or personal information, let's make sure that all parties involved at least agree on the use and take additional technical measures to prevent unauthorized access to the data.

Solve problems faster

When we take these considerations into account, data can provide valuable and actionable insights. It can significantly speed up troubleshooting in an engineering facility. Minor problems only require lower qualified service technicians. Some problems can even be prevented altogether by detecting in advance that something is not right. And all that thanks to data.

Sign up for our newsletter and receive updates.