Data Observability has suddenly become a hot topic following the recent $135m raise by Data Observability platform company Monte Carlo and IBM’s acquisition of Databand.ai.
Envitia‘s CEO, Nabil Lodey, looks at this growing sector and its relevance to private and public sector organisations.
What is Data Observability?
With an exponential growth in data and cloud technology designed to process huge volumes and types of data, there is now the question of trust in the data itself. This is especially so for the quality of data that is ingested into analytics and AI & ML applications.
Using the old “garbage in garbage out” analogy, the billions spent on data and data infrastructure risks a poor ROI if the data itself cannot be relied upon to improve decision-making.
Data Observability enables organisation to trust their data. It measures and monitors the health of data, provides an insight into the quality of data, creates alerts when anomalies are identified, and then resolves data issues. Or at least, resolves the priority issues that relate to an organisations data making capability.
Is this another buzzword?
One phrase I read that I really like is “Good decisions made on bad data are just bad decisions you don’t know about…. yet”. That’s so true.
Data Observability isn’t the catchiest title, but it does summarise a set of activities that are required to focus on data quality which isn’t captured elsewhere in the market. By having a separate sector, it focuses an organisation’s attention on how they need to fulfil their obligation towards accurate data. Unfortunately, many organisations don’t measure the cost of missing data and bad data until things go terribly wrong. These are costly mistakes that could have been avoided.
What does Data Observability involve?
There are different types of observability across data compute infrastructure, data quality and data pipelines, and each has the ability to monitor, assess, react and then take recommended actions to resolve. It all depends on how an organisation uses data and how it defines “fit for purpose”, both now and in the future. Data observability will evolve as data pipelines get more complex.
Does this sector lend itself to a product or services?
The data ecosystem is still very unique to every organisation but there is a role for products as part of a wider data stack. Any claim that one software vendor can do everything often turns out to be false and loses credibility. If an organisation knows exactly what to look for then a product that has a very specific purpose, and can be easily integrated, will enhance its internal data capability.
For monitoring, there are plenty of great products that can look at pipeline health and analytics to raise alerts. But for something like complex data quality, particularly the need to resolve issues, I would see this as more a services-based at the moment. As more services can be automated we will see products evolving in this sector.
For example, at Envitia, we have a data catalogue (Envitia’s Data Discovery Platform) but we offer this to our customers as “tech-led services”. Each of our customers are very different, particularly in the public sector, so we need a team of data experts as a services wrapper around the core product, especially where it comes to data modelling and data quality. They each have unique security requirements so it’s important we can customise this capability for them.
What exactly does the Envitia Data Discovery Platform provide for your customers?
Our DDP is a metadata catalogue which has advanced data catalogue capabilities from a static to a more dynamic product. The DDP harvests metadata and then allows a user to search for data across their enterprise and provides an Amazon-like suggestion services to find related datasets depending on what they looked at before or similar users. So if a user looks for something they would end up with a grouping of related datasets. It’s all about the relationships between data that we find exciting, to find insights they wouldn’t know exists. Particularly around location.
The platform already knows the source of the data for lineage monitoring, and we can control security and access depending on user permissions. We can then also assess and rectify the data quality and trustworthiness to provide one single source of truth that is consistently monitored and updated as an ongoing process so that trusted data can be used, enriched, and then reused by multiple applications getting the most value for money. One feature which is important for large organisations is that we focus on metadata so that the data itself could remain in their legacy systems and reduce the risk and cost of moving large quantities of data. This also enables future M2M applications which is all based on metadata.
What’s next for Data Observability?
Data quality and data modeling is still a manual process so automating and standardising all data governance activity will pay dividends across many sectors. Organisations are looking for secure data that’s well managed, maintains its usefulness, can be accessed easily, with as many automated processes as possible.
Like anything else in data, the key for value for money is to get ahead of the game and move from reactive measures with a time lag, to real time resolving of issues, to predictive and preventative measures.
After that it’s about doing it efficiently and optimising costs across the data stack and data pipelines so an organisation can focus on the data that is critical for its operation.
By Nabil Lodey
CEO of Envitia