Data quality is a broad term referring to the measurement and maintenance of data conditions as it travels across multiple locations on your systems. You can assess data quality levels in terms of accuracy, consistency, relevance, and reliability.
You know that data quality is considered one of the pillars of a data management framework. The question is, why? Why is it so important for organizations to ensure the consistency of incoming data from various source applications?
NASA launched the space shuttle Challenger on January 28, 1986. It exploded 73 seconds into its flight claiming the lives of all seven astronauts crewing the shuttle. The widely accepted cause was an O-ring, a circular gasket, that failed under low temperature and high pressure at the launch time.
Engineers had raised concern about the O-ring almost nine months before the launch. The effect of low temperature on the gasket was documented, and the manufacturer had denied the launch readiness of the shuttle at one point.
They stored this information in different, disconnected databases. If one database identified the O-ring as critical, another marked it as redundant.
Inconsistency of data, reliance on incomplete data, and lack of access led someone to close the O-ring case without it having been fixed, and NASA landed in a grievous disaster.
We have discussed a data quality disaster at nearly its worst. Now, let us see how it translates to a corporate scenario where the stakes are quite different.
According to a report by Gartner.com, organizations lose $10 million to $14 million every year due to bad data quality. A report by Integrate states that 40% of all leads contain inaccurate data. These are not pretty numbers, nonetheless true.
Let us say you want to create a financial report accounting for the cost of corporate travel. You import data from multiple source applications. Someone booked a trip via an agency, and you rightly take the bill information into account. The same expenditure also appears in the credit card feed, and you include that in the report as well. You fall victim to a classic case of duplicate data and end up with a false report.
Duplicate data is one facet of poor data quality; there is a bunch of others. We will look into a few of them.
The systematic storage of customer information, including but not limited to contact data, is crucial for a business to reach and engage its potential customer base. The challenge lies in the fact that every department of the business touches customer information.
The data is collected and updated at many steps like order placement, cross-selling, social media campaigns. It is hard to maintain the accuracy and consistency of data throughout a customer’s lifecycle.
Traditional relational databases often struggle to cope with the volume, velocity, and variety of data collected from diverse sources. Small mistakes creep in and create compound problems as the erroneous data travels through ERP, CRM, and other transactional systems.
Even if you consider only alphanumeric data and completely disregard unstructured data, the format is still a pain point for data analysts. One can document a small detail like the date of a transaction in multiple formats; it can create hurdles when finding and accessing data.
If you are operating in a multinational setup, you must be all too familiar with this particular challenge. We all know how quickly the accounts can go south if you convolute currencies and different measuring units.
In the case of language, special characters or accent marks can create a lot of trouble if the system is not configured to handle them. For instance, a computer might not recognize that Sao Paulo and São Paulo represent the same place.
Manually entered data will have typos, incomplete records, format inconsistencies. It is hard to escape these errors regardless of the quality of employees. The way, of course, is to minimize human effort.
Data management aims to collect, store, keep, and use data with security, efficiency, and cost-effectiveness. By now, you are familiar with the hurdles poor data quality can create in the path of successful data management.
Gartner and D&B have done a great job of assigning monetary value to data. According to their reports, the cost of fixing inaccurate data can supersede the cost of preventing such inaccuracies from occurring by ten times.
Laying down the rules and policies for data quality is an ongoing process. One has to update it with the changing nature of data and the regulations placed to ensure fair usage of that data. It is nearly impossible to manage metadata, governance frameworks, or data as a whole without formulating robust data quality.
At Incept we have been implementing data management frameworks for a decade. We have seen, encountered, and dealt with a long list of complex data quality issues, and we love to share our experience and expertise in the field.