Wednesday, 12 February 2014

Dimension Hierarchies in Datawarehousing



Role of Dimensional Hierarchies in Data Analysis (OLAP):

The Hierarchy is a Level based structure where we can drill down the data from Highest level to Lowest Level also we can roll up the data.

Hierarchies plays a vital role in data analysis process (OLAP) in datawarehousing.Hierarchies allows the business users to analyze the data with respect to multiple dimensions. In data warehousing and on-line analytical processing they provide for examining data at different levels of detail. Several types of hierarchies have been presented with issues concerning dependencies and summarizability of data along the levels. Design mechanism for implementation of dimensional hierarchies in datawarehouse logical scheme has been proposed. A hierarchy is a set of levels having many-to-one relationships between each other, and the set of levels collectively makes up a dimension or a tree. A tree shows a hierarchical relationship.
In a relational database, the different levels of a hierarchy can be stored in a single table (as in a star schema) or in separate tables (as in a snowflake schema).

Eg:

  • In Product Dimension, a Product Name --> to Product Sub Category. Product Sub Category --> to categories, and categories --> to Segments.
  • in the time dimension : Fiscal Year > Fiscal Quarter > Fiscal Month > Fiscal Week > Day
  • in the region dimension : cities --> State -->countries -->Regions


To further simplify lets assume the user would like to see the Customer vise revenue, Product vice revenue and Time vise revenue and so on. We can create multiple reports with customer, Product and Time etc. Now the user would like to pick a specific dimension and do a complete analysis at different levels. If we consider Time, The user can analyze the revenues at Year level, Half Yearly level, Quarterly level, Month level, week level and Day level. Again the user may want to select a particular year and do his analysis. To get this requirement we may need to create multiple reports and even after creating multiple reports it may not serve the exact purpose. This is where we look for the best option to makes the things easy. Dimension Hierarchy is the exact solution for this scenario.

By using a Hierarchy a business user can analyze the data with different dimensions; different levels and He can choose a particular element or value to focus on. They are different types of hierarchies existing in datawarehousing. Though all the types of hierarchies are not supported by all the reporting tools most of them are supported with multiple hierarchy types. The most common hierarchies are

1)    Parent Child Hierarchies
2)    Skipped Level Hierarchies
3)    Ragged Hierarchies
4)    Level Based Hierarchy

We will try to understand about each and every hierarchy with some examples here.

Level –based Hierarchies:
Each level represents a position in the hierarchy. For example, a time dimension might have a hierarchy that represents data at the month, quarter, and year levels.
Each level above the base (or most detailed) level contains aggregate values for the levels below it. The members at different levels have a one-to-many parent-child relation. For example, Q1-05 and Q2-05 are the children of 2005, thus 2005 is the parent of Q1-05 and Q2-05.



About Parent-Child Hierarchies
The most common real-life occurrence of a parent-child hierarchy is an organizational reporting hierarchy chart, where the following all apply:
  • Each individual in the organization is an employee.
  • Each employee, apart from the top-level managers, reports to a single manager.
  • The reporting hierarchy has many levels.
These conditions illustrate the basic features that define a parent-child hierarchy, namely:
  • A parent-child hierarchy is based on a single logical table (for example, the "Employees" table)
  • Each row in the table contains two identifying keys, one to identify the member itself, the other to identify the "parent" of the member (for example, Emp_ID and Manager_ID)

The following table shows how this parent-child hierarchy could be represented by the rows and key values in an Employees table.
Emp_ID
Manager_ID
Andrew
null
Barbara
Andrew
Carlos
Andrew
Dawn
Barbara
Emre
Barbara



About Levels and Distances in Parent-Child Hierarchies
Unlike the situation with level-based hierarchies, all the dimension members of a parent-child hierarchy occur in a single logical column. In a parent-child hierarchy, the parent of a member is in another row in the same logical column, pointed to by the parent key. This is unlike a level-based hierarchy, where the parent of a member is in a different logical column in the same row. In other words, navigation in a parent-child hierarchy follows data values, while navigation in a level-based hierarchy follows the metadata structure.

About Parent-Child Relationship Tables
The parent-child relationship table must include four columns, as follows:
  • A column that identifies the member
  • A column that identifies an ancestor of the member

Note:
The ancestor may be the parent of the member, or a higher-level ancestor.

  • A "distance" column that specifies the number of parent-child hierarchical levels from the member to the ancestor
  • A "leaf" column that indicates if the member is a leaf member (1=Yes, 0=No)
The column names can be user defined. The data types of the columns must satisfy the following conditions:
  • The member and ancestor identifier columns have the same data type as the associated columns in the logical table that contains the hierarchy members.
  • The "distance" and "leaf" columns are INTEGER columns.
Note the following about the rows in a parent-child relationship table:
  • Each member must have a row pointing at itself, with distance zero.
  • Each member must have a row pointing at each of its ancestors. For a root member, this is a termination row with null for the parent and distance values.

Member_Key
Ancestor_Key
Distance
Isleaf
Andrew
Andrew
0
0
Barbara
Barbara
0
0
Carlos
Carlos
0
0
Dawn
Dawn
0
0
Emre
Emre
0
0
Andrew
null
null
0
Barbara
Andrew
1
0
Carlos
Andrew
1
1
Dawn
Barbara
1
1
Dawn
Andrew
2
1
Emre
Barbara
1
1
Emre
Andrew
2
1

Unbalanced (or ragged) hierarchy. An unbalanced (or ragged) hierarchy is a hierarchy where the leaves (members with no children) do not necessarily have the same depth. For example, a site can choose to have data for the current month at the day level, previous months data at the month level, and the previous 5 years data at the quarter level.
User applications can use the ISLEAF function to determine whether to allow drilldown from any particular member. A missing member is implemented in the data source with a null value for the member value. All computations treat the null value as a unique child within its parent. Level-based measures and aggregate-by calculations group all missing nodes together.
Note that unbalanced hierarchies are not necessarily the same as parent-child hierarchies. Parent-child hierarchies are unbalanced by nature, but level-based hierarchies can be unbalanced also.



Skipped-level hierarchy. A skip-level hierarchy is a hierarchy where there are members that do not have a value for a particular ancestor level. For example, in a Country-State-City-District hierarchy, the city 'Washington, D.C.' does not belong to a State. In this case, you can drill down from the Country level (USA) to the City level (Washington, D.C.) and below.
In a query, skipped levels are not displayed, and do not affect computations. When sorted hierarchically, members appear under their nearest ancestors.
A missing member at a particular level is implemented in the data source with a null value for the member value. All computations treat the null value as a unique child within its parent. Level-based measures and aggregate-by calculations group all skip-level nodes together.


Hierarchy with Unbalanced and Skip-Level Characteristics



3 comments: