What is a Data Warehouse?

  • By Amit Mehendale

Data Warehouse is certainly not …

  • a product, an environment
  • a system, an architecture
  • an end to itself but is at the heart of the Business Intelligence (BI) infrastructure of the organization

The accepted definition of a data warehouse is a database that contains the following four characteristics:

1. Subject oriented

2. Nonvolatile

3. Integrated

4. Time variant

Subject oriented means that the data is organized around subjects (such as Sales) rather than operational applications (such as order processing). Operational databases are organized around business application; they are application oriented. Recall the five queries that the directors have identified as examples of the types of questions they would like to ask of their data. We concluded that they are concerned with sales of products over time. The subject area in our case study is clearly “sales.”

Nonvolatile means that the data, once placed in the warehouse, is not usually subject to change. Anyone who is using the database has confidence that a query will always produce the same result no matter how often it is run. Operational databases are extremely volatile in that they are constantly changing. A query is unlikely to produce the same result twice if it is accessing tables which are frequently updated.

Integrated means the data is consistent. For instance, dates are always stored in the same format. Integration is a problem for most organizations, particularly where there are many different types of technology in use. Even more subtle differences occur within different applications within the same technology. This occurs where, for instance, one application designer decides to hold customer addresses as five columns of 35 characters each, whereas another might use a Varchar(100) format. Before data is allowed to enter the data warehouse, it must be integrated. So integration is a process through which the data passes after it leaves the application database and before it enters the warehouse database.

Time variant means that historical data is recorded. Almost all queries executed against a data warehouse have some element of time associated within them. We have already established that most operational systems do not retain historical information. It is almost impossible to predict what will happen in the future without observing what happened in the past. A data warehouse helps to address this fundamental issue by adding a historical dimension to the data taken from the operational databases.

Advantages of Data-Warehouse

  • High query performance

    • But not necessarily most current information
  • Doesn’t interfere with local processing at sources

    • Complex queries at warehouse
    • OLTP at information sources
  • Information copied at warehouse

    • Can modify, annotate, summarize,  restructure, etc.
    • Can store historical information
    • Security

Pains beware

  • Data Integration
  • Data Quality
  • Data Availability
  • End user education
  • Proper sizing – HW and Database environment
  • Lack of off-the-shelf product (mature Packaged Analytics)
  • Post implementation challenges

    • Ensure usage
    • Identify new areas of Intelligence
    • Measure business benefits – productivity enhancements, savings, increase in revenue etc.

Factors for DW Project Budgeting

  • The Size of the database
  • The complexity and cleanliness of the data
  • The number of source databases and their characteristics
  • The number of users
  • The choice of tools
  • The network requirements
  • Training
  • The gap between the required and the available skills

Most Commented Posts

Leave a Reply