#1 - From Roger the shepherd to Data Governance
This article aims at explaining the WHAT and the WHY behind Data Governance, thanks to Roger the shepherd.
Hello world,
you've probably heard of data, but have you ever heard of Data Governance?
Data, a natural-born-human tool
Data, a Latin word as the plural of datum ("something given") is a tool created by human beings to generate an added value over a given phenomenon. To be honest, data was existing far before internet and computers.
Centuries ago, Roger, a shepherd from France, used to count (Roger’s data processing) his sheeps (Roger’s data) on a day-to-day basis to be aware of the impact of a predator. This behaviour allowed him to hire Grichka, a hunter from elsewhere, able at minimizing the loss of animals over time.
At those ancient times, data used to be locally siloted (mostly in Roger’s brain), which greatly limited the risk of data leakage.
Then internet appeared, allowing human beings to create more data and share it easily, interfacing with machines and other connected peers.
At these more recent times, data became less siloted, passing between a few computers. The main drawback from this technology - as usual - was the human fault and lack of knowledge. This is why few security holes (a cause of data leak) started to pop up to anyone who passes by: Edward Snowden saw this on the website of Los Alamos Nuclear research Laboratory (LANL) when he was a kid1.
Big Data, or the moment when Roger got a smartphone
Then came our times, the era of smartphones, cloud and Internet of Things (IoT). At this point, each human being and IoT sensor (your iSomething, your iFridge…) started to generate a lot (like, A LOT) of data anywhere, anytime. And all this data went - almost desiloted - into the cloud ☁️ , a network of computers located somewhere and having huge memory capacities.
In 2021, Roger (yes, he’s an everlasting super shepherd) has a smartphone along with an internet package. This makes him create data on the cloud. He does so when posting a photo of his favorite meal in a social network, when using a GPS application to drive to his best friend's house or when paying with his credit card this great Irouléguy AOP 2009 (a French wine from Basque Country).
Most of these phenomena occur to make something useful appear on the screen of Roger’s smartphone / laptop / tablet. This is being made possible because data is everywhere generated, somewhere collected, elsewhere processed for further publication (this is the data life cycle2).
From Roger to you and me, data has ever been used:
“to highlight (lenses) and act (levers) on a given phenomenon”3
These huge amounts of data gathered via the internet from smartphones, laptops, IoT sensors are called Big Data.
But, as with any new technology, Big Data came along with a few drawbacks: data leaks and unwanted data usages.
Big Data without governance = chaos
As an analogy, Data can be compared to a photo taken of something, at a given place and a given time.
However, when Roger takes a photo from Shaun - his sheep - too quickly, the picture will be blurred; so as a data if its meaning hasn’t been defined clearly yet.
Also, Roger cannot guess the shooting place of his parents old paper-based photo when the location hasn’t been noted on the back. In the same way, data can be enigmatic if its source is not known.
As one of the first numeric photos now appears ugly on a high-definition screen (ask Roger to try opening a photo he took 20 years ago with his first cellphone camera on his new smartphone!), a given data quality can decrease over time and needs to be assessed according to the context it is used withtin.
The problem is, decisions are taken everyday according to blurred, unsourced and unqualified data.
Whereas Roger didn’t need to follow norms and best practices to count his sheeps securely centuries ago, Data Governance has been set more recently to overcome the following problematics at the era of Big Data, Cloud and Articifial Intelligence (AI): data leaks, unwanted data usages, data meaning, data sourcing and data quality.
Data Governance, from a boat to trust
The word "govern" comes from the Latin gubernare (and the Greek kubernân), refering to direct a ship.4
According to Gartner, Data Governance is:
“The specification of decision rights and an accountability framework to ensure the appropriate behavior in the valuation, creation, consumption and control of data and analytics.”5
Also, Data Governance can be described as:
“An opportunity to create a trust relationship with the Client.”6
The variety of definitions available for Data Governance shows how hard it can be to contextualize Data Governance in your relation with data.
Fortunatelly, a more consensual defintion will be given at the end of this article.
WHY to govern Data?
Don’t you know? Like Roger did for his sheeps, you already govern your food in your fridge, your files in folders or your clothes within a dressing!
On this last example, your dressing governance allows you to access quickly the right clothes (sportswear, pyjama, …) for the right usage (remote-working, remote-working,...).
Now imagine you would access these classy clothes you’ve ever dreamed of, without acquiring it for life, at the exact time you need it (e.g. your return to the office).
To make this happen, you would first need some Persons (either Natural Persons or Legal Persons) to share their clothes with you. Then, as a potential consumer, you would need to know the existence of these clothes, their availability and description (brand, size,…). And finally, you would need to know the quality of these clothes rated by former consumers of these same clothes.
In few words, Data Governance can be defined as follows:
At the era of Big Data, Cloud infrastructures and Artificial Intelligence (AI) capabilities, Data Governance is the set of norms (what you have to do), best practices (what you would better do) and accountabilities (who does what) to help organizations at creating more value out from data, while complying with security policies and regulations regarding personal data collection and processing (like GDPR or CCPA).
Additionaly, governing data shall minimize data leaks, prevent misuse of data (by improving data meaning), avoid some legal risks (by lack of compliance around data processing and usage) and save some work overload (by the time wasted searching for the meaning, source and quality of this data appearing on this crucial dashboard at your Monday meeting with your boss).
Maybe next time?
Good news! Algorithms are mostly not yet capable of doing such work of data contextualization (except GPT-3, but it’s another story). We, as human beings, still have key skills to deal with!
But to achieve such goal, organizations need to make people more likely to share data and consume data, and this is where the trust relationship occurs.
Don’t miss the next article to understand who these people are, e.g. the beneficiaries of Data Governance, along with their expectations.
#StayCurious
Gartner (IT Glossary)
Julien Levy, Associate Professor (HEC Paris, 2019)