Does Your Data Warehouse Need a Fix?

Dimensional Structure: The data modeler examines the schema for its adherence to good dimensional modeling techniques and for its potential response to customer requirements.

The following is a sampling of things considered:

  • Is the schema a de-normalized dimensional model or a normalized entity relationship diagram?
  • Do the schema and its metadata support a commitment to conformed dimensions and conformed facts?
  • Does each dimension attribute appear once, and only once?
  • Do the dimensions contain a good mix of hierarchical and nonhierarchical attributes that the business considers important?
  • Are the textual dimension attributes verbose and of high quality?
  • Are all dimension keys, with the exception of bridge tables, single-part surrogate keys?
  • How are slowly changing dimensions handled?
  • How are many-to-many dimensions handled?
  • What snow-flaked dimensions exist, and are they appropriate?
  • Are most facts additive?
  • What semi-additive and non-additive or ‘text’ facts exist, and how are they manipulated in the queries?
  • Are the existing transaction and snapshot schemas appropriate for responding to customer queries?
  • Are the aggregates suitable?
  • Technical Architecture and Infrastructure

    A checkup of your technical architecture concentrates on how you accomplish what you set out to do, while an examination of your infrastructure looks at where you are doing it.

    This part of the checkup may require the services of several technical specialists.

    Back Room Architecture: Assess your back room architecture by looking at how you extract, clean, conform and deliver data from the source systems to the presentation servers.

    Note the capabilities needed and scrutinize whether your standards, development methodology and products provide the desired results within the prescribed timeframes.

    As part of the examination of how you clean and conform the data, make sure to focus on data accuracy. A data warehouse that contains incorrect, confusing, or partially populated data will be abandoned.

    Front Room Architecture: To the end users, the query reports are the data warehouse. In talking with the users, you will have observed who needs the reports, how they are distributed, and what types of analysis is required.

    The report templates should contain the appropriate specifics to meet customer needs, the users must be sufficiently trained, the query tool ought to be easy to use, and performance must be fast. Match the needed capabilities with reality and see how close you come.

    Infrastructure: Specify what hardware, system-wide operating capabilities and storage capacities you need to grow your data warehouse and detail what currently exists. Determine the gap between wants and actuality, and identify who is responsible for narrowing the gap.

    Analyze Findings and Plan the Future

    Organize the checkup findings within the four areas of analysis. Separate the things done well from things done poorly. Determine what you should be doing more of within the items you are doing well.

    For the problem items, indicate their impact on the well-being of the data warehouse. The impact is measured in terms of what will happen if you do not fix the problem.

    You can quantify the impact by using high/medium/low rankings, 1-5 marks, or more colorful measures such as “fatal flaw,” “fix within a year,” “would be nice, but not crucial” and “do no harm.”

    For fatal flaws and items requiring remediation before long, note what other areas will be affected and what the prerequisites are to affect a fix.

    Next, determine the complexity or feasibility of remediation. Assign a “low” feasibility rank or a grade of 1 (or even zero) for a complete rewrite, and assign a “high” feasibility rank or a grade of 10 where the remediation is an extension of a natural process, such as adding another report template.

    Taken together, the impact and feasibility rankings should point the way to your data warehouse therapy regimen.

    Look carefully at your high-impact items. These are the items that can disable your data warehouse. If you have fatal flaws, you will have to do some heavy thinking about your next steps. If the checkup basically validates your overall data warehouse strategy, then turn your attention to doing more great things and tuning your data warehouse so it can fly.

    In the end, remember, all data warehouse decisions are driven by user requirements. The user community evolves over time. The decisions they make and their information needs metamorphose in response to their customers, suppliers, competitors, and market events.

    Data is knowledge. Check up on your users and find out how well the data warehouse is evolving along with their needs and you will have provided them with a used and useful resource.

    Esther Soloveichik is with Intrasphere Technologies, a premier technology consulting firm with a core focus on Life Sciences. Intrasphere provides end-to-end technology services and has successfully implemented large-scale projects for some of the world’s leading global companies, including Pfizer, Schering-Plough, Novartis and Eli Lilly, among others.