Role of Dimensional
Hierarchies in Data Analysis (OLAP):
The Hierarchy is a Level based
structure where we can drill down the data from Highest level to Lowest Level
also we can roll up the data.
Hierarchies plays a vital role in
data analysis process (OLAP) in datawarehousing.Hierarchies allows the business
users to analyze the data with respect to multiple dimensions. In data
warehousing and on-line analytical processing they provide for examining data
at different levels of detail. Several types of hierarchies have been presented
with issues concerning dependencies and summarizability of data along the
levels. Design mechanism for implementation of dimensional hierarchies in
datawarehouse logical scheme has been proposed. A hierarchy is
a set of levels having many-to-one relationships between
each other, and the set of levels collectively makes up a dimension
or a tree. A tree shows a hierarchical relationship.
In a relational database, the
different levels of a hierarchy can be stored in a single table (as in a star schema) or in separate tables (as in a snowflake schema).
Eg:
- In Product Dimension, a Product Name --> to Product Sub Category. Product Sub Category --> to categories, and categories --> to Segments.
- in the time dimension : Fiscal Year > Fiscal Quarter > Fiscal Month > Fiscal Week > Day
- in the region dimension : cities --> State -->countries -->Regions
To further simplify lets assume the
user would like to see the Customer vise revenue, Product vice revenue and Time
vise revenue and so on. We can create multiple reports with customer, Product
and Time etc. Now the user would like to pick a specific dimension and do a
complete analysis at different levels. If we consider Time, The user can
analyze the revenues at Year level, Half Yearly level, Quarterly level, Month
level, week level and Day level. Again the user may want to select a particular
year and do his analysis. To get this requirement we may need to create
multiple reports and even after creating multiple reports it may not serve the
exact purpose. This is where we look for the best option to makes the things
easy. Dimension Hierarchy is the exact solution for this scenario.
By using a Hierarchy a business user
can analyze the data with different dimensions; different levels and He can
choose a particular element or value to focus on. They are different types of
hierarchies existing in datawarehousing. Though all the types of hierarchies
are not supported by all the reporting tools most of them are supported with
multiple hierarchy types. The most common hierarchies are
1)
Parent Child Hierarchies
2)
Skipped Level Hierarchies
3)
Ragged Hierarchies
4)
Level Based Hierarchy
We will try to understand about each
and every hierarchy with some examples here.
Level –based
Hierarchies:
Each level represents a position in
the hierarchy. For example, a time dimension might have a hierarchy that
represents data at the month, quarter, and year levels.
Each level above the base (or most detailed) level contains aggregate values for the levels below it. The members at different levels have a one-to-many parent-child relation. For example, Q1-05 and Q2-05 are the children of 2005, thus 2005 is the parent of Q1-05 and Q2-05.
Each level above the base (or most detailed) level contains aggregate values for the levels below it. The members at different levels have a one-to-many parent-child relation. For example, Q1-05 and Q2-05 are the children of 2005, thus 2005 is the parent of Q1-05 and Q2-05.
About
Parent-Child Hierarchies
The most common
real-life occurrence of a parent-child hierarchy is an organizational reporting
hierarchy chart, where the following all apply:
- Each individual in the organization is an employee.
- Each employee, apart from the top-level managers, reports to a single manager.
- The reporting hierarchy has many levels.
These
conditions illustrate the basic features that define a parent-child hierarchy,
namely:
- A parent-child hierarchy is based on a single logical table (for example, the "Employees" table)
- Each row in the table contains two identifying keys, one to identify the member itself, the other to identify the "parent" of the member (for example, Emp_ID and Manager_ID)
The following
table shows how this parent-child hierarchy could be represented by the rows
and key values in an Employees table.
Emp_ID
|
Manager_ID
|
Andrew
|
null
|
Barbara
|
Andrew
|
Carlos
|
Andrew
|
Dawn
|
Barbara
|
Emre
|
Barbara
|
About Levels
and Distances in Parent-Child Hierarchies
Unlike the
situation with level-based hierarchies, all the dimension members of a
parent-child hierarchy occur in a single logical column. In a parent-child
hierarchy, the parent of a member is in another row in the same logical column,
pointed to by the parent key. This is unlike a level-based hierarchy, where the
parent of a member is in a different logical column in the same row. In other
words, navigation in a parent-child hierarchy follows data values, while
navigation in a level-based hierarchy follows the metadata structure.
The
parent-child relationship table must include four columns, as follows:
- A column that identifies the member
- A column that identifies an ancestor of the member
Note:
The ancestor may be the parent of
the member, or a higher-level ancestor.
|
- A "distance" column that specifies the number of parent-child hierarchical levels from the member to the ancestor
- A "leaf" column that indicates if the member is a leaf member (1=Yes, 0=No)
The column
names can be user defined. The data types of the columns must satisfy the
following conditions:
- The member and ancestor identifier columns have the same data type as the associated columns in the logical table that contains the hierarchy members.
- The "distance" and "leaf" columns are INTEGER columns.
Note the
following about the rows in a parent-child relationship table:
- Each member must have a row pointing at itself, with distance zero.
- Each member must have a row pointing at each of its ancestors. For a root member, this is a termination row with null for the parent and distance values.
Member_Key
|
Ancestor_Key
|
Distance
|
Isleaf
|
Andrew
|
Andrew
|
0
|
0
|
Barbara
|
Barbara
|
0
|
0
|
Carlos
|
Carlos
|
0
|
0
|
Dawn
|
Dawn
|
0
|
0
|
Emre
|
Emre
|
0
|
0
|
Andrew
|
null
|
null
|
0
|
Barbara
|
Andrew
|
1
|
0
|
Carlos
|
Andrew
|
1
|
1
|
Dawn
|
Barbara
|
1
|
1
|
Dawn
|
Andrew
|
2
|
1
|
Emre
|
Barbara
|
1
|
1
|
Emre
|
Andrew
|
2
|
1
|
Unbalanced (or
ragged) hierarchy. An unbalanced (or ragged) hierarchy is a hierarchy where the leaves
(members with no children) do not necessarily have the same depth. For example,
a site can choose to have data for the current month at the day level, previous
months data at the month level, and the previous 5 years data at the quarter
level.
User
applications can use the ISLEAF function to determine whether to allow
drilldown from any particular member. A missing member is implemented in the
data source with a null value for the member value. All computations treat the
null value as a unique child within its parent. Level-based measures and
aggregate-by calculations group all missing nodes together.
Note that
unbalanced hierarchies are not necessarily the same as parent-child
hierarchies. Parent-child hierarchies are unbalanced by nature, but level-based
hierarchies can be unbalanced also.
Skipped-level
hierarchy. A skip-level
hierarchy is a hierarchy where there are members that do not have a value for a
particular ancestor level. For example, in a Country-State-City-District
hierarchy, the city 'Washington, D.C.' does not belong to a State. In this
case, you can drill down from the Country level (USA) to the City level
(Washington, D.C.) and below.
In a query,
skipped levels are not displayed, and do not affect computations. When sorted
hierarchically, members appear under their nearest ancestors.
A missing
member at a particular level is implemented in the data source with a null
value for the member value. All computations treat the null value as a unique
child within its parent. Level-based measures and aggregate-by calculations
group all skip-level nodes together.
Hierarchy with
Unbalanced and Skip-Level Characteristics
nice post thank you
ReplyDeleteReally helpful. Thanks
ReplyDeleteGreat Article. As I read the blog I felt a tug on the heartstrings. it exhibits how much effort has been put into this.
ReplyDeleteIEEE Projects for CSE in Big Data
Spring Framework Corporate TRaining
Final Year Project Centers in Chennai
JavaScript Training in Chennai