Thursday, 3 July 2014

Data Warehouse Concepts


Facts Tables:
ü  A fact table typically has two types of columns: foreign keys to dimension tables and measures those that contain numeric facts. A fact table can contain fact's data on detail or aggregated level.
ü  A fact table stores quantitative information for analysis and is often de-normalized.
ü  Fact table is typically numeric data and it is often data that can be easily manipulated, particularly by summing together many thousands of rows.

Types of fact:
1.      Additive:
Additive facts are facts that can be summed up through all of the dimensions in the fact table. A sales fact is a good example for additive fact.
2.      Semi-Additive:
Semi-additive facts are facts that can be summed up for some of the dimensions in the fact table, but not the others.
E.g.  Daily balances fact can be summed up through the customers dimension but not through the time dimension.
3.      Non-Additive:
Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table.
E.g. Facts which have percentages, ratios calculated.

Dimensions:
ü  A dimension is a structure, often composed of one or more hierarchies, that categorizes data. Dimensional attributes help to describe the dimensional value. They are normally descriptive, textual values. Several distinct dimensions, combined with facts, enable you to answer business questions. Commonly used dimensions are customers, products, and time.
ü  Dimension data is typically collected at the lowest level of detail and then aggregated into higher level totals that are more useful for analysis. These natural rollups or aggregations within a dimension table are called hierarchies.

Surrogate key:
ü  Surrogate keys are nothing but integers which do not have any meaning in terms of business and used as primary key in dimension table.
ü  Surrogate keys join the dimension tables to the fact table. Surrogate keys serve as an important means of identifying each instance or entity inside of a dimension table.

Fact less fact:
A fact less fact table is fact table that does not contain fact. They contain only dimensional keys and it captures events that happen only at information level but not included in the calculations level. Just information about an event that happen over a period.
A fact less fact table captures the many-to-many relationships between dimensions, but contains no numeric or textual facts. They are often used to record events or coverage information. Common examples of fact less fact tables include:
ü  Identifying product promotion events
ü  Tracking student attendance or registration events
ü  Tracking insurance-related accident events
ü  Identifying building, facility, and equipment schedules for a hospital or university
ü  Fact less fact tables are used for tracking a process or collecting stats.
E.g. student, time, and class dimensions used to create student attendance fact less fact table. 

Degenerated dimension:
ü  A degenerate dimension is when the dimension attribute is stored as part of fact table, and not in a separate dimension table.
ü  These are essentially dimension keys for which there are no other attributes. In a data warehouse, these are often used as the result of a drill through query to analyze the source of an aggregated number in a report.
ü  You can use these values to trace back to transactions in the OLTP system.

Conformed dimension:
ü  A Dimension that is used in multiple locations is called a conformed dimension.
ü  A conformed dimension may be used with multiple fact tables in a single database, or across multiple data marts or data warehouses.

Schema types:
1.  Star Schema:
ü  In the star schema design, a single object sits in the middle and is radically connected to other surrounding objects (dimension lookup tables) like a star.
ü  Each dimension is represented as a single table.
ü  The primary key in each dimension table is related to a foreign key in the fact table.

2.  Snowflake schema:
ü  The snowflake schema is an extension of the star schema, where each point of the star explodes into more points.
ü  In a star schema, each dimension is represented by a single dimensional table, whereas in a snowflake schema, that dimensional table is normalized into multiple lookup tables, each representing a level in the dimensional hierarchy.

No comments:

Post a Comment