Business Automation and Software Blog

A Methodical Approach to Data Quality

Posted by Robert Baran on Thu, Jun 09, 2011 @ 04:24 PM

Accurate data is the driving force behind any successful computer software investment. It is crucial for businesses to have a strategic approach to their data quality, knowing how to clean it and how to keep it clean. A good place to start in the process is to set some specific goals in regards to supporting ongoing functional operations. These should include: 

  • The improvement of data management processes such as reducing the amount of time it takes to process quarterly updates and reports, etc.
  • Ensuring compliance to regulatory standards and requirements
  • Cleansing and combining source systems into one master file
  • Etc.

Next, when you are ready to develop a methodical strategy, there are several factors which you should consider.

1. Context
What is the type of data being cleansed and what purpose does it serve? Defining your data up front will help you best determine how it is used and how to keep it clean. Examples of this include: customer information; financial data; supply chain data; and more. Once the context is determined, it can be matched against the appropriate type of cleansing methods to insure consistency and accuracy.

2. Storage
Considering storage as a data quality factor ensures that the physical storage medium is included in the overall data quality strategy. For example, if the data resides in an enterprise application, the type of application (CRM, ERP, and so on) will dictate the connectivity options to the data. Connectivity options between the data and data quality function generally fall into three categories:

  1. Data Extraction.
    This occurs when the data is copied from the host system. It is then cleansed, typically in a batch operation, and then reloaded back into the host.
  2. Embedded Procedures
    This is the opposite of extractions. In this method data quality functions are embedded into the host system. Custom-coded, stored procedure programming calls invoke the data quality functions. This is used when the strategy dictates the utmost customization and tightest integration into the operational environment.
  3. Integrated Functionality.
    This lies between data extraction and embedded procedures. Through the use of specialized, vendor-supplied links, data quality functions are integrated into enterprise information systems. A link then allows for a quick standard integration with seamless operation and can function in either a transactional or batch mode.

3. Dataflow
This is the movement of your data, how it enters and moves through your organization. Mapping your data flow will provide staging areas which will depict a ‘freeze frame’ of the moving data target. This will indicate where the data is manipulated, and if the usage of the data changes context.

This is important because it depicts access options to the data and catalogs the locations in a networked environment.  Data flow answers the question. “Within operational constraints, what are the opportunities to cleanse the data?” In general opportunities fall into the following categories:

  1. Transactional Updates
  2. Operational Feeds
  3. Purchased Data
  4. Legacy Migration
  5. Regular Maintenance

4. Workflow
Workflow is the sequence of physical tasks necessary to accomplish a given operation. In this category data quality operations usually fall into the following areas:

  1. Front-Office Transaction – Real-time Cleansing
  2. Back-Office Transaction – Staged Cleansing
  3. Back-Office - Batch Cleansing
  4. Cross-Office Enterprise - Application Cleansing
  5. Continuous Monitoring and Reporting

If a data touch point is not protected with validation functions, defective data is captured, created, or propagated per the nature of the touch point. An important action in the workflow factor is listing the various touch points to identify locations where defective data can leak into your information stream.

5. Stewardship
Data itself has no value except to communicate information to people. The people who manage the data processes are the data stewards. When evaluating the data stewardship factor for a new project the following tasks need to be performed:

  1. Answer questions such as:
    1. Who are the stakeholders of the data?
    2. Who are the predominant user groups, and can a representative of each be identified?
    3. Who is responsible for the creation, capture, maintenance, reporting, distribution, and deletion of the data?
  2. Carefully organize requirements-collection sessions with the stakeholders. Tell these representatives any plans that can be shared assure them that nothing yet is final, and gather input.
  3. Once a near-final set of requirements and a preliminary project plan are ready, reacquaint the stakeholders with the plan, but be open to changes.
  4. Plan to provide training on any new processes, data model changes, and updated data definitions.
  5. Consider the impact of new processes or changed data sets on the organizational structure. 

Usually a data quality project is focused on an existing system, and current personnel reporting structures can absorb the new processes or model changes. 

6. Continuous Monitoring
It is important to create processes for regularly validating your data. If you are unsure how often you should profile you data consider the following:

  1. How often is your data used? (Hourly, daily, weekly, monthly, etc.)
  2. How important is the operation using the data? (Mission critical, life dependent, routine operations, end of the month reporting etc.)
  3. How much does it cost to monitor? The better the monitoring technology, the lower the labor costs.
  4. What is the operational impact of monitoring the data? There are two aspects to consider: the impact of assessing operational data during live operations, and the impact of the process on personnel.

Ultimately you should:

  1. Identify measurements and metrics to collect
  2. Identify when and where to monitor
  3. Implement monitoring process
  4. Run a baseline assessment
  5. Post monitoring reports
  6. Schedule regular data steward team meetings to review monitoring trends

As a whole, it is crucial that you not only focus your time and resources on computer hardware, networking and enterprise software solutions; but also on the data that will support these investments to drive your business forward. Your data is only helpful to you if it is accurate, and following the six methods above will help you stay on the right track.

Adapted from Data Quality Strategy: A Step-by-Step Approach