Tackling Data Quality

As companies try to realize the value locked away in the vast amounts of data they generate and store everyday, an old problem is fast becoming a newly vexing issue: Data quality.

Specifically, how do you know what data to use when, say, looking for a total sales figure for a business customer that has five different divisions and multiple corporate brands.

Some business units in your organization have referred to GM, for example, as Chevrolet, GMC, and Buick for years because those are the only divisions they do business with. Others in accounting, for example, just look at GM as GM because that is what works for them, and so on.

This can be costly, said Bob Hagenau, co-founder and VP of Product Management and Corporate Development for Purisma, an master data management vendor.

“A lot of times (companies) can’t track entitlements (for example) and they end up giving free support out that can cost them millions and millions of dollars,” he said. ” … you’ve got all these silos of customer information that can’t be integrated and don’t allow you to have a complete understanding of that customer.”

While these disparities are nothing new, the push today to mine data for new opportunities, regulatory compliance, and trends and patterns in customer behavior (such as how much they buy from you—or you from them—in a given year) is bringing this thorny issue to the fore.

“When you talk about it at a high-level it sounds like it ought to be easy,” said Philip Russom, senior manager of Research and Services at TDWi: The Data Warehousing Institute. “But the truth of the matter is different application take a very different view of the customer and require different pieces of information about the customer.”

With 90% of data stored in corporations today about their customers, their products or financials this is where the problem is most pervasive and the pain greatest, said Russom, who authored Taking Data Quality to the Enterprise through Data Governance, a report on this issue in March.

But it isn’t that the data itself is necessarily bad or inaccurate, it is the definition of data—metadata—across the enterprise that causes the most consternation, said Majid Abai, president and CEO of Seena Technologies, an enterprise information management and architecture consulting firm.

When metadata doesn’t agree then the underlying information is hard to use from one application or division to the next. “There are several levels of the problem,” said Abai. “Number one is definition of data, metadata, does not match from one business unit to another.”

No. 2 is IT’s obsession over the years with applications, not data. Garbage in, garbage out, as the expression goes, is, as ever, an issue today. And the third biggest issue isn’t about poor data quality but the copying of data from one application to another instead of using data from a shared pool so every application is parsing the same numbers.