Taming BIG Data: Taking Back Control, Part 2 – Creation

CIOs and others responsible for corporate technology initiatives are challenged to gain control of the ever expanding amount of data available today. The “Taming Big Data” series of articles focuses on solutions that build a sustainable model to keep up with such changes. In this article, we will look at formalizing your enterprise information management (EIM) program.

An EIM program allows a company to provide accurate, consistent information to all of its resources (employees, computer databases, etc.), allowing them to perform their jobs more effectively. A key objective of the EIM program is to transform a vast amount of information collected every day into a strategic advantage. To this end, CIOs often seek a tactical solution where benefits can be realized early and work itself into the overall enterprise information strategy.

One approach in starting this journey is by looking at the information life cycle management (ILM), or the process of managing specific data assets of an organization from creation to disposition. The five areas of ILM include data usage, creation, retention, availability and maintenance. To make it all manageable, you can focus on each area as a separate phase of a project.

This article will focus on the importance of and best practices surrounding the second phase, creation.

Companies are in a better position to understand which data are most important once the organization truly understands how information is used in the company (see part 1, Taming Big Data: Taking Back Control). Additionally, the process of bringing functional and technical resources together will have matured a bit and lessons learned from the first phase can be implemented during the second phase.

This second phase is about leveraging the knowledge of business information priorities, the associated technical cost and applying the knowledge focusing on how data is created. From an EIM perspective, the discussion focuses on two areas: the master data management (to include other important reference data), and data quality.

Source feeds

There is an adage that states: Garbage in, garbage out. Information management experts use this expression to explain that regardless of how well you build your system, if the information created or received is bad, the information sent out or reported will also be bad. Data is created either through receiving a source feed or internally via data entry.

Data is received from “upstream” systems and fed to “downstream” systems. By exploring the data lineage of data elements from report or extract to source, anyone in the company can understand which data is created externally by suppliers, vendor partners, and/or other agencies that provide standard reference data (i.e., post office for address information).

An integrated workgroup can use this information to determine/revise the standard protocols for handling data received by each of the sources, and prioritized by the importance to the company. Work group members can also capture the latency of the data; that is, the amount of time required between when the data is produced and received and the amount of time between when the data is received or created and available for customers.

Gaining this insight will allow for better expectation management and allows the company to create acceptable service level agreements. It also indirectly affects the perceived quality of the information assets.

Internal via data entry

Many companies create their own data using standard systems (i.e., ERP, CRM or other home-grown systems). Organizations can most often improve their quality of data by focusing on internally generated data first since they have more control for effecting change. Oftentimes, training and communication of data entry resources play a significant role in enhancing the quality of data created. (Some companies with mature procedures become most effective by aligning the performance of data entry resources with their compensation model.)

It is critical that the business, technology and EIM program management resources work together to implement a solution that validates the data. A common theme in all of these phases is that the collaboration: The business workgroup members understand the company’s core competencies and can best assess the needs of the organization; the technical workgroup members support the business by providing tools that can be used by an organization to effectively perform their jobs; and the EIM program management structures the project, defines the goals and outcomes, facilitates the workgroup interactions and formally submits recommendations to the governance review board.

Collaboration is most effective when individuals are clear with their workgroup roles and the goals of the initiative. During this second phase, creation, the company focuses on studying and maturing the business processes and technical maintenance procedures used to source important information assets.

The business workgroup members are responsible for:

  1. Confirming and providing the company’s business process flows related to any receipt or creation of data (i.e., customer setup, new product entry process for inventory, etc.);
  2. Identifying specific requirements for data entry and quality assurance resources (i.e., drop down selections on a screen vs. free text ); and
  3. Defining the ownership model and revising the business data dictionary.

The first step for business workgroup members is to consolidate, capture, and clearly understand the business process flows related to data creation. They need to be in a position to quickly answer questions related to processes, such as product or customer set up and to understand which data is most critical for reporting (i.e., master data and other reference data).

Externally, they need to understand the agreements with source data partners. These workgroup members should assess the existing and ideal arrangements with each critical source of information. Internally, they need to have a firm grasp of the requirements for creating data and obtaining feedback on specific data via discussions with users of the exports and/or reports.

Most importantly, they need to understand who, within the business, is responsible for the most important data assets. This will be the person who makes the decisions on clarifying definitions, supporting policies and procedures, and have significant input to the associated maintenance resource periodic performance reviews. Identification of these data owners is often difficult and drives the company to develop a clearer understanding of how information assets are managed.