Wednesday, December 19, 2007

Reports: getting total of all children values in a tree

... previous

Well, actually, this note is a continuation of a model series on reports.
So the structure of an article is the same: the problem description -> non-model solution -> model clause solution.

But also this note is very similar to summation of tree values, I've written a little bit earlier.

Input data:

So let's get started.
Our input for this post would be a table t from here and an additional table t_hrchy with a hierarchical structure of departments:
SQL> drop table t_hrchy;

Table dropped

SQL>
SQL> create table t_hrchy as (select 'DEP2' dep, 'DEP1' dep_par from dual union all
2 select 'DEP3' dep, 'DEP2' dep_par from dual union all
3 select 'DEP4' dep, 'DEP3' dep_par from dual)
4 /

Table created

SQL> select * from t_hrchy;

DEP DEP_PAR
---- -------
DEP2 DEP1
DEP3 DEP2
DEP4 DEP3

SQL>
So this table tells us that DEP1 is a parent department for DEP2, DEP2 is a parent department for DEP3 and DEP3 is a parent department for DEP4.

Problem:

Now our task would be to create a report where in one column would be sales of a particular department output and in another column we need to get the sales of a current department and for all the departments that are under the current one in the organization structure.
The data should be consolidated only inside the same year and inside the same city.

Non-Model solution:

When the matter concerns the hierarchical structure the first thing that pops up in our head is hierarchical queries with CONNECT BY clause.

Elaborating this a little bit we can use the following query proposed by Rob van Wijk here or here:
SQL> select to_char(year,'yyyy') year,
2 city,
3 t.dep,
4 th.dep_par,
5 sales,
6 sum(connect_by_root sales) tot_sales
7 from t, t_hrchy th
8 where t.dep = th.dep(+)
9 connect by t.dep = prior dep_par
10 and prior year = year
11 and prior city = city
12 group by city, year, t.dep, dep_par, sales
13 order by 1,2,3;

YEAR CITY DEP DEP_PAR SALES TOT_SALES
---- ------ ---- ------- ---------- ----------
2006 Moscow DEP1 562 2254
2006 Moscow DEP2 DEP1 457 1692
2006 Moscow DEP3 DEP2 890 1235
2006 Moscow DEP4 DEP3 345 345
2006 Omsk DEP1 23 264
2006 Omsk DEP2 DEP1 154 241
2006 Omsk DEP3 DEP2 87 87
2007 Moscow DEP1 762 2926
2007 Moscow DEP2 DEP1 657 2164
2007 Moscow DEP3 DEP2 1020 1507
2007 Moscow DEP4 DEP3 487 487
2007 Omsk DEP1 34 403
2007 Omsk DEP2 DEP1 213 369
2007 Omsk DEP3 DEP2 156 156

14 rows selected

SQL>
Lets see what this query actually does.
First we use an outer join of two tables - so for each row we get a parent department name.

Then we build a hierarchy by using
connect by t.dep = prior dep_par
Mention that hierarchy is built in the backward direction starting from the children and going up to their parents.

Also we add in the connect by clause the following:
and prior year = year
and prior city = city
So we build hierarchy separartely for different years and cities.

By using the operator
connect_by_root sales
in every row we find a value for sales of the root elements for this or that thread in a hierarchy. As we didn't specify any START WITH condition - the hierarchy is built starting from all the departments present in the data.
For example, let's look what would happen inside the partition for 2007 year Moscow city:
SQL> select to_char(year,'yyyy') year,
2 city,
3 t.dep,
4 th.dep_par,
5 sales,
6 connect_by_root sales root_sales,
7 level,
8 sys_connect_by_path(t.dep,'/') hrchy_path
9 from t, t_hrchy th
10 where t.dep = th.dep(+)
11 and trunc(t.year,'y')=to_date('01.01.2007','dd.mm.yyyy')
12 and t.city='Moscow'
13 connect by t.dep = prior dep_par
14 and prior year = year
15 and prior city = city
16 /

YEAR CITY DEP DEP_PAR SALES ROOT_SALES LEVEL HRCHY_PATH
---- ------ ---- ------- ---------- ---------- ---------- ----------------------------
2007 Moscow DEP1 762 762 1 /DEP1
2007 Moscow DEP2 DEP1 657 657 1 /DEP2
2007 Moscow DEP1 762 657 2 /DEP2/DEP1
2007 Moscow DEP3 DEP2 1020 1020 1 /DEP3
2007 Moscow DEP2 DEP1 657 1020 2 /DEP3/DEP2
2007 Moscow DEP1 762 1020 3 /DEP3/DEP2/DEP1
2007 Moscow DEP4 DEP3 487 487 1 /DEP4
2007 Moscow DEP3 DEP2 1020 487 2 /DEP4/DEP3
2007 Moscow DEP2 DEP1 657 487 3 /DEP4/DEP3/DEP2
2007 Moscow DEP1 762 487 4 /DEP4/DEP3/DEP2/DEP1

10 rows selected

SQL>
As you see DEP1 is met four times:
- Firstly it stands for itself (level=1).
- Then we begin to build hierarchy starting from DEP2 and DEP1 is met as a "child" (remember we build a hierarchy in the opposite direction) for DEP2 (level=2)
- The next time we meet it as a "grandchild" for DEP3 (level=3)
- And finally as a "great-grandchild" for DEP4 (level=4).

In each row we have a value of sales for the root department. So if we group data by department and put a sum() aggregate function on connect_by_root(sales) column we get the sales of DEP1 and all the sales of all departments that are under DEP1 in one row.
The same stuff will happen to all the other departments:
SQL> select to_char(year,'yyyy') year,
2 city,
3 t.dep,
4 th.dep_par,
5 sales,
6 sum(connect_by_root sales) tot_salesh
7 from t, t_hrchy th
8 where t.dep = th.dep(+)
9 and trunc(t.year,'y')=to_date('01.01.2007','dd.mm.yyyy')
10 and t.city='Moscow'
11 connect by t.dep = prior dep_par
12 and prior year = year
13 and prior city = city
14 group by year,city,t.dep,th.dep_par,sales
15 order by 1,2,3
16 /

YEAR CITY DEP DEP_PAR SALES TOT_SALESH
---- ------ ---- ------- ---------- ----------
2007 Moscow DEP1 762 2926
2007 Moscow DEP2 DEP1 657 2164
2007 Moscow DEP3 DEP2 1020 1507
2007 Moscow DEP4 DEP3 487 487

SQL>


Model Solution

Now we are proceeding and will solve the same problem using Model clause without any connect by's.

To get less data to be outputed at every step - let's restrict our data to 2007 year and Moscow city only as we did in Non-Model solution:
SQL> select * from t, t_hrchy th
2 where t.dep = th.dep(+)
3 and trunc(t.year,'y')=to_date('01.01.2007','dd.mm.yyyy')
4 and t.city='Moscow'
5 order by 3,1,2
6 /

CITY DEP YEAR SALES DEP DEP_PAR
------ ---- ----------- ---------- ---- -------
Moscow DEP1 31.12.2007 762
Moscow DEP2 31.12.2007 657 DEP2 DEP1
Moscow DEP3 31.12.2007 1020 DEP3 DEP2
Moscow DEP4 31.12.2007 487 DEP4 DEP3

SQL>
Let's think of what woud be placed in PARTITION BY part of our model. Remeber that partitions should be absolutely independent one of another.
Yes - we would put YEAR and CITY there. As a task instructs us to treat sales of different years and cities separately.
In dimension we would place DEP and DEP_PART columns.
And in measures - we'll put SALES, of course. And create one more measure called TOT_SALES with initial value of zero.
SQL> select * from t, t_hrchy th
2 where t.dep = th.dep(+)
3 and trunc(t.year,'y')=to_date('01.01.2007','dd.mm.yyyy')
4 and t.city='Moscow'
5 model
6 partition by (year,city)
7 dimension by (t.dep dep,dep_par)
8 measures(sales, 0 tot_sales)
9 ()
10 order by 1,2,3
11 /

YEAR CITY DEP DEP_PAR SALES TOT_SALES
----------- ------ ---- ------- ---------- ----------
31.12.2007 Moscow DEP1 762 0
31.12.2007 Moscow DEP2 DEP1 657 0
31.12.2007 Moscow DEP3 DEP2 1020 0
31.12.2007 Moscow DEP4 DEP3 487 0

SQL>

Now we are going to write a rule for calculating the tot_sales.
For any cell in our model the tot_sales should be equal to a sum of sales of the current department and all the tot_sales of all departments under the current.
We can write it as:
tot_sales[any,any] = sales[CV(),CV()]+nvl(sum(tot_sales)[any,CV(dep)],0)

BTW it would be correct in case when the tot_sales of the departments under the current one already included all the tot_sales of the departments under them.

So what is important here is the ORDER in what the rules are executed.
If we don't specify anything about the order of execution to our model it uses the default SEQUENTIAL ORDER and can easily throw out an error for us:
SQL> select * from t, t_hrchy th
2 where t.dep = th.dep(+)
3 and trunc(t.year,'y')=to_date('01.01.2007','dd.mm.yyyy')
4 and t.city='Moscow'
5 model
6 partition by (year,city)
7 dimension by (t.dep dep,dep_par)
8 measures(sales, 0 tot_sales)
9 rules
10 (tot_sales[any,any] = sales[CV(),CV()]+nvl(sum(tot_sales)[any,CV(dep)],0))
11 order by 1,2,3
12 /

...

ORA-32637: Self cyclic rule in sequential order MODEL

SQL>

But putting AUTOMATIC ORDER causes all rules to be evaluated in an order based on their logical dependencies.
Let's check:
SQL> select * from t, t_hrchy th
2 where t.dep = th.dep(+)
3 and trunc(t.year,'y')=to_date('01.01.2007','dd.mm.yyyy')
4 and t.city='Moscow'
5 model
6 partition by (year,city)
7 dimension by (t.dep dep,dep_par)
8 measures(sales, 0 tot_sales)
9 rules automatic order
10 (tot_sales[any,any] = sales[CV(),CV()]+nvl(sum(tot_sales)[any,CV(dep)],0))
11 order by 1,2,3
12 /

YEAR CITY DEP DEP_PAR SALES TOT_SALES
----------- ------ ---- ------- ---------- ----------
31.12.2007 Moscow DEP1 762 2926
31.12.2007 Moscow DEP2 DEP1 657 2164
31.12.2007 Moscow DEP3 DEP2 1020 1507
31.12.2007 Moscow DEP4 DEP3 487 487

SQL>

VoilĂ ! This is exactly what we were looking for.

Let's find a detailed description of what's happening when we don't use AUTOMATIC ORDER and what's changing when we begin to use it.
First of all, the equivalent of our single rule would be actually a list of four rules for every row (unique set of dimension values) in our data:
tot_sales['DEP1', null ]=sales['DEP1', null ]+nvl(sum(tot_sales)[any,'DEP1'],0),
tot_sales['DEP2','DEP1']=sales['DEP2','DEP1']+nvl(sum(tot_sales)[any,'DEP2'],0),
tot_sales['DEP3','DEP2']=sales['DEP3','DEP2']+nvl(sum(tot_sales)[any,'DEP3'],0),
tot_sales['DEP4','DEP3']=sales['DEP4','DEP3']+nvl(sum(tot_sales)[any,'DEP4'],0)

But using SEQUENTIAL ORDER doesn't guarantee us any order of these rules execution. Because as it stated in the documetation: SEQUENTIAL ORDER: This causes rules to be evaluated in the order they are written. This is the default.
But we have only one rule in our model. So the detailed rules can be run in any order:
for example first we find tot_sales['DEP1', null ] and then tot_sales['DEP2','DEP1'], or vice versa. And that is the problem, because this two possibilities will give us different results in the output.

What actually ORA-32637 stays for is:
Cause: A self-cyclic rule was detected in the sequential order MODEL. Sequential order MODELs cannot have self cyclic rules to guarantee that the results do not depend on the order of evaluation of the cells that are updated or upserted.
Action: Use ordered rule evaluation for this rule.


So if use order by DEP we shouldn't have such an error, let's try:
SQL> select * from t, t_hrchy th
2 where t.dep = th.dep(+)
3 and trunc(t.year,'y')=to_date('01.01.2007','dd.mm.yyyy')
4 and t.city='Moscow'
5 model
6 partition by (year,city)
7 dimension by (t.dep dep,dep_par)
8 measures(sales, 0 tot_sales)
9 rules
10 (tot_sales[any,any] order by dep = sales[CV(),CV()]+nvl(sum(tot_sales)[any,CV(dep)],0))
11 order by 1,2,3
12 /

YEAR CITY DEP DEP_PAR SALES TOT_SALES
----------- ------ ---- ------- ---------- ----------
31.12.2007 Moscow DEP1 762 762
31.12.2007 Moscow DEP2 DEP1 657 657
31.12.2007 Moscow DEP3 DEP2 1020 1020
31.12.2007 Moscow DEP4 DEP3 487 487

SQL>
Yes, the query returns no error - because we explicitly defined the order of rules execution. But as you see we have wrong results. Because actually in our case we should order by dep DESC! You can try it yourself.
While we don't know what are the dependences in the hierarchy data beforehand - let's give Oracle to decide - what are the dependencies between cells and simply use AUTOMATIC ORDER:
SQL> select * from t, t_hrchy th
2 where t.dep = th.dep(+)
3 model
4 partition by (to_char(year,'yyyy') year,city)
5 dimension by (t.dep dep,dep_par)
6 measures(sales, 0 tot_sales)
7 rules automatic order
8 (tot_sales[any,any] = sales[CV(),CV()]+nvl(sum(tot_sales)[any,CV(dep)],0))
9 order by 1,2,3
10 /

YEAR CITY DEP DEP_PAR SALES TOT_SALES
---- ------ ---- ------- ---------- ----------
2006 Moscow DEP1 562 2254
2006 Moscow DEP2 DEP1 457 1692
2006 Moscow DEP3 DEP2 890 1235
2006 Moscow DEP4 DEP3 345 345
2006 Omsk DEP1 23 264
2006 Omsk DEP2 DEP1 154 241
2006 Omsk DEP3 DEP2 87 87
2007 Moscow DEP1 762 2926
2007 Moscow DEP2 DEP1 657 2164
2007 Moscow DEP3 DEP2 1020 1507
2007 Moscow DEP4 DEP3 487 487
2007 Omsk DEP1 34 403
2007 Omsk DEP2 DEP1 213 369
2007 Omsk DEP3 DEP2 156 156

14 rows selected

SQL>


to be continued...

3 comments:

Anonymous said...

Voder,

This method works, but it takes som time to process, I wasn't able to get your connect by example to work above. Can you give me some tips on how to sum multiple columns in this type of query with out a seperate select for each column? In my mind it should be able to do it in one pass...

select rpad('*', 2*level, '*') || g3e_fid || ' (' || g3e_fno || ')',
active_customer_count_a,
--Active Customer Count ABC
(select sum(active_customer_count_a)
from pgn_lbs_temp t2
start with t2.g3e_fid = t1.g3e_fid
connect by prior t2.g3e_fid = t2.g3e_sourcefid) sum_active_customer_count_a,
(select sum(active_customer_count_b)
from pgn_lbs_temp t3
start with t3.g3e_fid = t1.g3e_fid
connect by prior t3.g3e_fid = t3.g3e_sourcefid) sum_active_customer_count_b,
(select sum(active_customer_count_c)
from pgn_lbs_temp t4
start with t4.g3e_fid = t1.g3e_fid
connect by prior t4.g3e_fid = t4.g3e_sourcefid) sum_active_customer_count_c,
--Summer1 ABC
(select sum(load_hist_summer1_a)
from pgn_lbs_temp t5
start with t5.g3e_fid = t1.g3e_fid
connect by prior t5.g3e_fid = t5.g3e_sourcefid) load_hist_summer1_a,
(select sum(load_hist_summer1_b)
from pgn_lbs_temp t6
start with t6.g3e_fid = t1.g3e_fid
connect by prior t6.g3e_fid = t6.g3e_sourcefid) load_hist_summer1_b,
(select sum(load_hist_summer1_c)
from pgn_lbs_temp t7
start with t7.g3e_fid = t1.g3e_fid
connect by prior t7.g3e_fid = t7.g3e_sourcefid) load_hist_summer1_c,
--Winter1 ABC
(select sum(load_hist_winter1_a)
from pgn_lbs_temp t8
start with t8.g3e_fid = t1.g3e_fid
connect by prior t8.g3e_fid = t8.g3e_sourcefid) load_hist_winter1_a,
(select sum(load_hist_winter1_b)
from pgn_lbs_temp t9
start with t9.g3e_fid = t1.g3e_fid
connect by prior t9.g3e_fid = t9.g3e_sourcefid) load_hist_winter1_b,
(select sum(load_hist_winter1_c)
from pgn_lbs_temp t10
start with t10.g3e_fid = t1.g3e_fid
connect by prior t10.g3e_fid = t10.g3e_sourcefid) load_hist_winter1_c
from pgn_lbs_temp t1
start with t1.g3e_fid = 32352823
connect by prior g3e_fid = g3e_sourcefid;

Volder said...

I think the following example will help you to understand how to achieve the needed behavior:


SQL> with t as (select 1 id, null parent_id, 100 value1, 20 value2 from dual union all
2 select 2 id, 1 parent_id, 50 value1, 10 value2 from dual union all
3 select 3 id, 2 parent_id, 5 value1, 30 value2 from dual)
4 --
5 select id_new id,
6 max(decode(id, id_new, value1)) value1,
7 max(decode(id, id_new, value2)) value2,
8 sum(value1),
9 sum(value2)
10 from (select t.*, connect_by_root(id) id_new, level
11 from t
12 connect by prior id = parent_id)
13 group by id_new
14 /

ID VALUE1 VALUE2 SUM(VALUE1) SUM(VALUE2)
---------- ---------- ---------- ----------- -----------
1 100 20 155 60
2 50 10 55 40
3 5 30 5 30

SQL>

Anonymous said...

Volder,
Thank you for your example that helped me out immensly.

Here is my question, it appears that when I use the with statement that queries two views the explain plain cost goes to 168 million. When I extract the data into a temporary table the explain plan cost goes to 4. Any ideas on why I can't do it with my views in the with statement?

WITH temp AS (
SELECT * FROM PGN_LBS_TEMP

/*
SELECT
tr.g3e_tno,
tr.g3e_fid,
tr.g3e_fno,
tr.g3e_sourcefid,
nvl(l.active_customer_count_a, 0) active_customer_count_a,
nvl(l.active_customer_count_b, 0) active_customer_count_b,
nvl(l.active_customer_count_c, 0) active_customer_count_c,
nvl(l.load_hist_summer1_a, 0) load_hist_summer1_a,
nvl(l.load_hist_summer1_b, 0) load_hist_summer1_b,
nvl(l.load_hist_summer1_c, 0) load_hist_summer1_c,
nvl(l.load_hist_winter1_a, 0) load_hist_winter1_a,
nvl(l.load_hist_winter1_b, 0) load_hist_winter1_b,
nvl(l.load_hist_winter1_c, 0) load_hist_winter1_c,
nvl(l.load_hist_summer2_a, 0) load_hist_summer2_a,
nvl(l.load_hist_summer2_b, 0) load_hist_summer2_b,
nvl(l.load_hist_summer2_c, 0) load_hist_summer2_c,
nvl(l.load_hist_winter2_a, 0) load_hist_winter2_a,
nvl(l.load_hist_winter2_b, 0) load_hist_winter2_b,
nvl(l.load_hist_winter2_c, 0) load_hist_winter2_c
FROM gisadmin.traceresult tr
LEFT OUTER JOIN gisadmin.lbs_custload_history_3_phase l ON (tr.g3e_fid = l.g3e_fid AND tr.g3e_fno IN (314, 244, 307, 237))
WHERE tr.g3e_tno = (SELECT g3e_id FROM traceid WHERE g3e_name = 'X134 Trace')
*/

)
SELECT
g3e_fid_new,
sum(active_customer_count_a),
sum(active_customer_count_b),
sum(active_customer_count_c),
sum(load_hist_summer1_a),
sum(load_hist_summer1_b),
sum(load_hist_summer1_c)
from (
select temp.*, connect_by_root(g3e_fid) g3e_fid_new, level
from temp
start with temp.g3e_fid = 32352823
connect by prior g3e_fid = g3e_sourcefid
)
where g3e_fid_new = 32352823
group by g3e_fid_new
;