From Data to Decisions
We live in a world where the amount of data in the world is absolutely mind-blowing. According to the DataAge 2025 report by Seagate, the global datasphere will grow from about 45 ZB in 2019 to 175 ZB in 2025, quadrupling in size. One zettabyte (ZB) is equivalent to one billion terabytes, a number so big that most of us are totally unable to comprehend or appreciate how many bits of information it represents.
Many organisations are accumulating and hoarding data at a rapid pace. But in many organisations, this data is worth absolutely nothing!
Let me explain. Organisations do not exist to be static in nature. Organisations need to make decisions and take actions that allow them to survive in the competitive world out there. It is decisions, not data, that is of consequence in the world out there. By itself, petabytes of data is of no consequence to decision making. To inform decisions, data needs to be processed. It needs to be transformed to make it valuable.
Let’s consider an analogy. Just like gold ore is mined, processed to extract gold, refined to remove impurities and finally melted down to make jewellery, so data needs to be transformed to extract its value. Accumulating a big heap of gold ore without further processing is not what mining companies have in mind. They want to process the ore to extract something that is valuable.
Are you accumulating a big heap of data and not doing anything with it like many organisations do? Your data is likely to be a liability – it has to be stored, kept confidential and comply with regulation. Your data can be stolen, which could lead to serious reputational damage and lawsuits. It can be maliciously altered, with serious consequences for your business. It could be accidentally deleted leading to nightmarish scenarios.
Rather, data needs to be transformed to make it an asset in your organisation. When data is transformed to the point where it allows us to make decisions or take actions, you become a data-driven organisation that is exploiting data as an asset.
In this article, I would like to share some insight into this transformation process. In particular, I would like to discuss the transformation process from the point of view of analytics and machine learning (and not from a philosophical or epistemological point of view).
Data transformation creates value
The image below illustrates this transformation process and can be used as a reference. As you are refining your data, moving from left (data) to right (decision) your data becomes more valuable. Let’s dive into some details.
Data
Recording the world
Data are sets of values of a quantitative or qualitative nature. It represents measurements we make of the world, using sensors through the process of observation.
Data could be streamed into an organisation in real time, or be the result of a batch or once-off ingestion processes. Data is often ingested into a data lake or stored in databases or data warehouses.
When you take a picture or video, send a text message, fill out an online form or take down a temperature reading, you have created new data. In itself, the data is just a recording of the world. A picture is just a collection of observed pixel values representing objects in the world. A text message is just an ordered collection of characters. At the lowest level, data can be seen as unstructured in nature. To do something useful with the data, requires further structuring and interpretation.
Knowledge
Why did it happen?
Often, there are repeating patterns in our information. Temperature fluctuates depending on time of day and seasonal changes. Sales figures may fluctuate depending on approaching holidays, or show distinct patterns between weekdays and weekends.
First, we need to find these patterns in our information. Then we have to understand why these patterns occur. In the process, we gain knowledge about the world that allows us to explain why certain phenomena occur. The company share price fell because there is a breaking news story expressing negative sentiment. Demand for the company’s product is increasing because of a successful marketing campaign. Knowledge allows us to express these patterns, often using quantifiable descriptions.
Note that data, information and knowledge are primarily concerned with historical events. Data is a recording of the world as it was at a past point in time. Information describes the relationships that existed in this historical data. Knowledge describes the repeating patterns that we have observed that are historical in nature.
When we are making decisions or taking actions in the world, we are aiming to affect the future. Our historical knowledge needs to be applied to our future actions.
Insight
What will happen?
Insight is concerned with understanding the underlying principles that give rise to the patterns (knowledge) we observe. If we understand these principles, we are in a position to make predictions about what the future may hold.
If we have observed in the past that positive or negative press influence share price in a corresponding way, we can predict that similar coverage will have similar results in the future. If we have made past observations about fluctuations in the temperature, humidity, wind speed and air pressure and the effect it has on rainfall, we can make quite accurate predictions about what rainfall could be expected in the next week if we observe similar patterns at the moment.
Insight then, is the ability to apply knowledge to the current situation in order to make predictions about the future.
Decisions / Actions
What should I do?
Insight allows us to make predictions about the future. But it does not directly tell us what to do. If we can safely predict that it will rain tomorrow, we need to take the decision to wear a raincoat or carry an umbrella.
Of course, the decisions organisations need to make are far more complex and there are often many different courses of action that could be taken. To be able to make the best possible decision, an organisation must be able to predict the effect that each possible decision will have within its environment and optimize by choosing the decision with the highest possible return and acting on it.
An Example: Self-driving vehicles
Our discussion to this point may appear to be very abstract. Let’s make it more concrete with an example. Consider self-driving vehicles. Self-driving vehicles need to continually make decisions to be able to safely drive on our roads.
Data. There is a continuous stream of data from a number of sensors, including cameras, lidar, GPS, speed sensors, etc.
Information. From the data, algorithms may correlate data from different sensors and detect the presence of entities of interest, such as other vehicles, pedestrians and road infrastructure. Data has been transformed into information.
Knowledge. Analysis of historical information streams reveals certain repeating patterns. For example, the presence of a stop sign is strongly correlated with vehicles close to the stop sign reducing their speed and stopping. By detecting thousands of these patterns, a knowledge base is built that collectively describes the rules of the road.
Insight. A self-driving vehicle may detect that there is a stop sign in the distance, as well as a vehicle in front of it. Using the knowledge it accumulated, it may now predict that it is likely for the vehicle in front of it to reduce its speed and come to a stop.
Decisions/Actions. Using the prediction that the car in front of it will slow down and stop, the decision is made that the self-driving vehicle itself should reduce speed to maintain a safe following distance. Based on this decision, it applies an action, namely applying a force to the break system.
The above description may be a very oversimplified description (in reality, self-driving vehicles are driven by a large number of complex neural networks), but it should give you a good idea of the data transformation process.
Advanced Analytics and Machine Learning
How do we transform from data to information to knowledge? This is the domain of advanced analytics and machine learning.
Descriptive and diagnostic analytics are applied to data to transform it to information and knowledge. It can vary from simple statistical summaries of data to unsupervised machine learning techniques that cluster information to extract repeating patterns. Predictive and prescriptive analytics is used at the higher end of the transformation process to gain insight and make decisions respectively. Here machine learning algorithms based on supervised or reinforcement learning are typically very effective.
One point to make here is that you need to put in place effective data pipelines that allow you to transform data into decisions. The need for advanced analytics and machine learning is clear – without it you may not be making optimal decisions.
To conclude, data could be an asset in your organisation, but only if you transform it into meaningful decisions. Use your data wisely!