Sunday, December 9, 2007

Reports: totals & subtotals

Starting from the current post I decided to write several notes on queries with Model clause. Mostly they would be dedicated to the topics of producing diverse reports.
In each case I'll give an opportunity how it can be done without using the model clause. And then - how the model clause could be used. So let's begin.
Input data:
SQL> drop table t;

Table dropped

SQL>
SQL> create table t as (select 'Omsk' city, 'DEP1' dep, to_date('31.12.2006','dd.mm.yyyy') year, 23 sales from dual union all
2 select 'Omsk' city, 'DEP2' dep, to_date('31.12.2006','dd.mm.yyyy') year, 154 sales from dual union all
3 select 'Omsk' city, 'DEP3' dep, to_date('31.12.2006','dd.mm.yyyy') year, 87 sales from dual union all
4 select 'Moscow' city, 'DEP1' dep, to_date('31.12.2006','dd.mm.yyyy') year, 562 sales from dual union all
5 select 'Moscow' city, 'DEP2' dep, to_date('31.12.2006','dd.mm.yyyy') year, 457 sales from dual union all
6 select 'Moscow' city, 'DEP3' dep, to_date('31.12.2006','dd.mm.yyyy') year, 890 sales from dual union all
7 select 'Moscow' city, 'DEP4' dep, to_date('31.12.2006','dd.mm.yyyy') year, 345 sales from dual union all
8 select 'Omsk' city, 'DEP1' dep, to_date('31.12.2007','dd.mm.yyyy') year, 34 sales from dual union all
9 select 'Omsk' city, 'DEP2' dep, to_date('31.12.2007','dd.mm.yyyy') year, 213 sales from dual union all
10 select 'Omsk' city, 'DEP3' dep, to_date('31.12.2007','dd.mm.yyyy') year, 156 sales from dual union all
11 select 'Moscow' city, 'DEP1' dep, to_date('31.12.2007','dd.mm.yyyy') year, 762 sales from dual union all
12 select 'Moscow' city, 'DEP2' dep, to_date('31.12.2007','dd.mm.yyyy') year, 657 sales from dual union all
13 select 'Moscow' city, 'DEP3' dep, to_date('31.12.2007','dd.mm.yyyy') year, 1020 sales from dual union all
14 select 'Moscow' city, 'DEP4' dep, to_date('31.12.2007','dd.mm.yyyy') year, 487 sales from dual)
15 /

Table created

SQL> commit;

Commit complete

SQL>
SQL>select * from t
2 order by 3,1,2
3 /

CITY DEP YEAR SALES
------ ---- ----------- ----------
Moscow DEP1 31.12.2006 562
Moscow DEP2 31.12.2006 457
Moscow DEP3 31.12.2006 890
Moscow DEP4 31.12.2006 345
Omsk DEP1 31.12.2006 23
Omsk DEP2 31.12.2006 154
Omsk DEP3 31.12.2006 87
Moscow DEP1 31.12.2007 762
Moscow DEP2 31.12.2007 657
Moscow DEP3 31.12.2007 1020
Moscow DEP4 31.12.2007 487
Omsk DEP1 31.12.2007 34
Omsk DEP2 31.12.2007 213
Omsk DEP3 31.12.2007 156

14 rows selected

SQL>
This is a table of sales for a company in Russia, which has it's departments in two cities: Omsk (Dep 1,2,3) and in Moscow (Dep 1,2,3,4). The data about sales is given for two years: 2006 and 2007.

Now we want to get sales for the year 2007. But we need not only detailed sales (as in a table), but also subtotals for each city. And a total for the whole company.

Non-Model Solution:
Usually such reports are created by using extensions to Group By: Cube, Rollup & Grouping Sets expression.

So our query would look like:
SQL> select city, dep, sum(sales) sales
2 from t
3 where trunc(year, 'y') = to_date('01.01.2007', 'dd.mm.yyyy')
4 group by rollup(city, dep)
5 order by 1, 2
6 /

CITY DEP SALES
------ ---- ----------
Moscow DEP1 762
Moscow DEP2 657
Moscow DEP3 1020
Moscow DEP4 487
Moscow 2926
Omsk DEP1 34
Omsk DEP2 213
Omsk DEP3 156
Omsk 403
3329

10 rows selected

SQL>

Model solution:
To create the same report using model clause, we have to execute the following query:
SQL> select city,dep,sales
2 from t
3 where trunc(year,'y') = to_date('01.01.2007','dd.mm.yyyy')
4 model
5 dimension by (0 total,city, dep)
6 measures(sales)
7 rules upsert all
8 (sales[1,any,null]=sum(sales)[0,CV(),any],
9 sales[1,null,null]=sum(sales)[0,any,any])
10 order by 1,2
11 /

CITY DEP SALES
------ ---- ----------
Moscow DEP1 762
Moscow DEP2 657
Moscow DEP3 1020
Moscow DEP4 487
Moscow 2926
Omsk DEP1 34
Omsk DEP2 213
Omsk DEP3 156
Omsk 403
3329

10 rows selected

SQL>
Let's elaborate how it works.

Using the condition in the WHERE clause - we get only data for the year 2007 in the result.
Then we create dimensions: City and Dep - are the dimensions from the table. But we add one more dimension called Total and by default assign value 0 to all the cells for that dimension.
As a measure we take the field from our table Sales.
So at this stage we get as a result:
SQL> select *
2 from t
3 where trunc(year,'y') = to_date('01.01.2007','dd.mm.yyyy')
4 model
5 dimension by (0 total,city, dep)
6 measures(sales)
7 ()
8 order by 2,3
9 /

TOTAL CITY DEP SALES
---------- ------ ---- ----------
0 Moscow DEP1 762
0 Moscow DEP2 657
0 Moscow DEP3 1020
0 Moscow DEP4 487
0 Omsk DEP1 34
0 Omsk DEP2 213
0 Omsk DEP3 156

7 rows selected

SQL>
Now we need to create extra lines for the subtotals and total.
We use rules for this.
You should remember that there are two ways to refer to a cell in a rule of a model clause: positional or symbolic.

Example of positional: sales['Omsk','DEP3'] or sales['Omsk','DEP4'].
When a cell which is positionally referenced exists - it's value is updated. When it doesn't exist - the model clause will create a cell with such dimensions.

Example of symbolic reference:
sales[city='Omsk',dep='DEP3'], sales[city is any,dep='DEP3'] ,sales[any,dep='DEP3'].
Actually the last one [any] is a positional reference - but I put it in the symbolic group because there's a remark in the documentation:
Note that ANY is treated as a symbolic reference even if it is specified positionally, because it really means that (dimension IS NOT NULL OR dimension IS NULL).
A typical behaviour for symbolically referenced cells: when it exists - the value gets updated. When it doesn't exist - no cells are created.

Don't forget about the cells which are referenced symbolically and positionally at the same time. By default they behave like all the references are symbolic.

Well, actually we could use 3 types of behaviour: UPDATE, UPSERT and UPSERT ALL.
For explicit determining we should place the name of behaviour directly after the RULES keyword. In this case it would be adjusted to all the rules.
If we want to specify a particular rule - we can put the name of behaviour right before the rule (we'll see how it works in the next post).

If we don't put any - the default behaviour UPSERT is used. How it works - I mentioned earlier.

When we use UPDATE - all cells are got updated, and if they don't exist - no cells are created, whichever reference (positional or symbolic) is used.

UPSERT ALL is a scpecial case - it can create cells even for symbolic reference.
But not in all cases, for example - it wouldn't create any cells if we put a rule:
sales[city='Voronezh','DEP5']=100 (because there's no city called 'Voronezh' in our table), but at the same time it would create a cell (which doesn't exist) for the rule sales[city='Omsk','DEP5']=100.
Why?
Because UPSERT ALL works in the following way (from the doc):

Step 1 Find Cells
Find the existing cells that satisfy all the symbolic predicates of the cell reference.

Step 2 Find Distinct Dimension Values
Using just the dimensions that have symbolic references, find the distinct dimension value combinations of these cells.

Step 3 Perform a Cross Product
Perform a cross product of these value combinations with the dimension values specified through positional references.

Step 4 Upsert New Cells

The results of Step 3 are then used to upsert new cells into the array

Just look:
SQL> select *
2 from t
3 where trunc(year,'y') = to_date('01.01.2007','dd.mm.yyyy')
4 model
5 return updated rows
6 dimension by (0 total,city, dep)
7 measures(sales)
8 rules upsert all
9 (sales[0,city='Omsk','DEP5']=100,
10 sales[0,city='Voronezh','DEP5']=200)
11 order by 2,3
12 /

TOTAL CITY DEP SALES
---------- ------ ---- ----------
0 Omsk DEP5 100

SQL>
By putting RETURN UPDATED ROWS we get as a result only updated or inserted cells.
So as you can see despite we had a rule with Voronezh - no cells were inserted.

Let's get back to our subtotals and total.
We had two rules:
sales[1,any,null]=sum(sales)[0,CV(),any],
sales[1,null,null]=sum(sales)[0,any,any]

But as you remember [any] is treated as a symbolic reference - and won't create any cells if we don't put UPSERT ALL semantics.
SQL> select city,dep,sales
2 from t
3 where trunc(year,'y') = to_date('01.01.2007','dd.mm.yyyy')
4 model return updated rows
5 dimension by (0 total,city, dep)
6 measures(sales)
7 rules
8 (sales[1,any,null]=sum(sales)[0,CV(),any])
9 order by 1,2
10 /

CITY DEP SALES
------ ---- ----------

SQL>
If we put UPSERT ALL semantics, our rules would do the following:

First rule: sales[1,any,null] will look for all dimension values that are referenced symbolically - city dimension: Omsk and Moscow.
Then create a cross product with positionally referenced dimensions: we'll get two cells - sales[1,'Omsk',null] and sales[1,'Moscow',null]
And these cells would be inserted.
BTW the dimension total=1 would mean that these are totals or subtotals.

Next rule sales[1,null,null] - will create a company total cell.
On the right side of the rules we have grouping formulas which will sum up all the sales values needed. By using CV() we reference to the dimension value of the cell on the left side of the rule.

PS
We could get the same result without Total dimension:
SQL> select city,dep,sales
2 from t
3 where trunc(year,'y') = to_date('01.01.2007','dd.mm.yyyy')
4 model
5 dimension by (city, dep)
6 measures(sales)
7 rules upsert all
8 (sales[any,null]=sum(sales)[CV(),any],
9 sales[null,null]=sum(sales)[any,null])
10 order by 1,2
11 /

CITY DEP SALES
------ ---- ----------
Moscow DEP1 762
Moscow DEP2 657
Moscow DEP3 1020
Moscow DEP4 487
Moscow 2926
Omsk DEP1 34
Omsk DEP2 213
Omsk DEP3 156
Omsk 403
3329

10 rows selected

SQL>

Here the rules are a little bit different.
The company Total is calculated from the cities subtotals, that has already been calculated, while in the main part (with Total dimension) we calculated all totals and subtotals on detailed sales data.

to be continued ...

0 comments: