The eDiscovery Implications of Structured Data

by David White, partner, Seyfarth Shaw LLP & CGOC faculty member

E-discovery has been a hot topic on everybody’s radar for a number of years now. The past several years have shown exponential growth in the number of high-profile e-discovery sanction cases due to missteps.

Headline avoidance: preserving and collecting data

The resulting headlines have made many companies keenly aware of the need to effectively preserve and collect the various types of unstructured data, such as user-created documents and email messages. However, little attention has been paid to date to the e-discovery issues that attach to large, enterprise-wide structured database systems.

In addition to email, a typical enterprise also has hundreds or even thousands of structured business applications in which financial, product, customer, employee, patient, and other material information may be stored, managed and manipulated. These systems are often the lifeblood of many companies and can stand as the system of record for all business or employee activities. It is therefore essential that e-discovery compliance programs incorporate processes for the preservation and collection of information from these systems as well, and not just focus on email and file servers.

These systems, however, can present a significant and unique challenge when it comes to retaining, collecting, preserving and properly dispositioning the information they hold to meet varying e-discovery, compliance and business needs. For example, while documents and email are generally self-contained and static, the information in structured data systems is dynamic and relational.

The databases themselves have a much longer lifecycle and are continually changing and evolving. This makes e-discovery on structured data magnitudes more difficult. Without an information management process that directly addresses these issues, IT organizations are forced to over-preserve and retain massive amounts of information that no longer has any business value, while at the same time meeting ever-shrinking budget constraints. A robust compliance process should address these challenges and help reduce overall spend across both legal and IT.

Addressing structured challenges and compliance

The specific challenges to be addressed by a successful e-discovery and compliance posture for structured data fall into five primary areas:

1. Identification and retention

The identification of the information or objects within the system that are subject to retention requirements is central to ensure the company is neither over-preserving nor under-preserving information.

In addition to the records they contain, databases comprise a variety of elements, including reports, printouts, queries, application layer source code, pick lists and more. The relevance of these elements to an e-discovery request depends on the system, the business, the industry and the type of legal action. Yet, instructions from the legal departments often simply say “preserve all payroll records,” which offers little actionable guidance.

In certain class action lawsuits, for example, the underlying data records are often not as important as the algorithms performed on them and how reports present the data to decision makers. In class actions involving the classification of employees for payroll purposes, which may turn on the amount of discretion these employees exercise in their daily job activities, the pick lists (the options presented to the user by the system during data entry) can be far more relevant than the option they ultimately choose.

In a patent suit, the raw source code for the application might be at issue. Even where the underlying data records are the only relevant information needing to be retained, decisions need to be made as to whether this information is sourced from the data tables in the system or from existing reports that may be ported to a spool server on a periodic basis. .

2. Preservation and collection

After determining the relevant elements of the database, companies must also address the challenges of preservation and collection. High volume transactional information often has very short retention periods because saving such massive amounts would be cost prohibitive and because the information has no business value once it has been aggregated into other systems. This means that any decision to preserve this transactional data needs to be made very early on.

How to preserve this data can also be challenging. Snapshots and exports, for example, can be meaningless when looked at out of context, so capturing this data must be done in a way that preserves the integrity of the relationships within the system. Yet, the organic way in which these systems evolve over time means they often lack descriptive documentation. Their size also can make it difficult to find employees with sufficient knowledge of all the systems’ intricacies. This often makes for a bit of trial and error during the extraction process.

Another very common collection challenge arises when defining queries. For example, queries that seem simple and clearly defined to legal such as “all employees in the state of California” can be surprisingly complex to a database administrator. How do you define a California employee? Is it by the address where their paycheck is sent or the zip code of the facility they are assigned to? What about an employee who is assigned to a facility in California but lives in Arizona and is always on the road covering a sales territory in Nevada?