O’Reilly news

Organizations Struggle With Data Consistency, Governance, and Bias, Finds Recent O'Reilly Research

February 12, 2020

Upcoming O’Reilly Strata Data and AI Conference to discuss the problem of data quality and how organizations can improve usage and analysis

BOSTON—February 12, 2020O’Reilly, the premier source for insight-driven learning on technology and business, today announced the results of research into the state of data quality in 2020. The “O’Reilly State of Data Quality in 2020” report reveals concerns around data quality and uncertainty about how best to address those concerns in the enterprise.

Key findings include:

  • There are too many data sources – and little consistency: when asked to share the primary data quality issues they face, more than 60% said they were suffering from “too many data sources and inconsistent data.” This was followed by 50% reporting “disorganized data stores and lack of metadata” and “poor data quality controls at data entry” (selected by 47%).
  • Organizations are dealing with several data quality problems at the same time: a majority of respondents reported that they’re dealing with either three or four data quality issues at the same time. 56% of respondents reported at least four data quality issues and 71% reported having at least three data quality issues.
  • Data governance best practices are not being adhered to: 80% of respondents say their organizations do not publish information about data provenance or data lineage, which – along with robust metadata – are essential tools for correctly diagnosing and resolving data quality issues.
  • Few resources are currently available: 44% of respondents said that they had “too few resources available to address data quality issues.”
  • Use of Machine Learning (ML) and Artificial Intelligence (AI) to address data quality issues is growing: almost half (48%) of respondents, however, say they are now using data analysis, ML, or AI tools to address data quality issues. This should help improve the lack of resources problem, as ML and AI can help simplify and automate the tasks involved in discovering, profiling, and indexing data.

“These findings show the need for both better education and better data management and cataloging tools – those that generate metadata and capture/manage data provenance and lineage,” said Rachel Roumeliotis, Vice President, Content Strategy for O’Reilly, and conference co-chair. “While the research indicates a growing understanding from the c-level of the importance of data quality, there still needs to be a push to educate organizations about data quality, data governance, and general data literacy.”

Conducted in late 2019, the “O’Reilly State of Data Quality in 2020” report surveyed more than 1,900 professionals in the data industry. To download the full report, please visit: https://www.oreilly.com/radar/the-state-of-data-quality-in-2020/.

About O’Reilly

For 40 years, O’Reilly has provided technology and business training, knowledge, and insight to help companies succeed. Our unique network of experts and innovators share their knowledge and expertise through the company’s SaaS-based training and learning solution, O’Reilly online learning. O’Reilly delivers highly topical and comprehensive technology and business learning solutions to millions of users across enterprise, consumer, and university channels. For more information, visit www.oreilly.com.

Email a link to this press release