Sunday, December 16, 2007

Reports: forecasting

... previous

We carry on with our reports (the needed input data).
Now what we need is to make a little forecast. We are requested to find out what the approximate sales for the next year would be.

The growth rate is determined as a ratio of current year sales (2007) to the previous year sales (2006), but it should be calculated separately for different cities.
So to find the growth rate for each row - we need to find total sales for the particular city for the current year and divide it on the total sales for the same city for the previous year.

Non-Model solution
Without using model - we can find the growth rate in a scalar subquery:
select sum(decode(to_char(year, 'yyyy'), '2007', sales)) /
sum(decode(to_char(year, 'yyyy'), '2006', sales))
from t
where city = t1.city
where t1.city should be a reference to the main outer query.
If we put this scalar subquery directly into the query we already had, we get the following result:
SQL> select city,
2 dep,
3 sum(sales) sales_2007,
4 round(sum(sales) *
5 (select sum(decode(to_char(year, 'yyyy'), '2007', sales)) /
6 sum(decode(to_char(year, 'yyyy'), '2006', sales))
7 from t
8 where city = t1.city),
9 2) sales_2008_forecast
10 from t t1
11 where trunc(year, 'y') = to_date('01.01.2007', 'dd.mm.yyyy')
12 group by rollup(city, dep)
13 /

CITY DEP SALES_2007 SALES_2008_FORECAST
------ ---- ---------- -------------------
Omsk DEP1 34 51,9
Omsk DEP2 213 325,15
Omsk DEP3 156 238,14
Omsk 403 615,19
Moscow DEP1 762 989,18
Moscow DEP2 657 852,88
Moscow DEP3 1020 1324,1
Moscow DEP4 487 632,19
Moscow 2926 3798,35
3329

10 rows selected

SQL>
As you see - we don't get a total for year 2008. This happens because city value in this total row is equal to NULL. And while we calculating the growth rate for particular cities - we get no rate at all for this row. So as a result we get NULL.

The workaround here would be first to get forecasting for detailed sales and then group with ROLLUP the sales for 2007 year and also for the predictable 2008 year.

So it would look like the following:
SQL> select city,
2 dep,
3 sum(sales) sales_2007,
4 sum(sales2) sales_2008_forecast,
5 round(100 * (sum(sales2) / sum(sales) - 1), 2) || '%' growth_rate
6 from (select t0.*,
7 round(sales *
8 (select sum(decode(to_char(year, 'yyyy'), '2007', sales)) /
9 sum(decode(to_char(year, 'yyyy'), '2006', sales))
10 from t
11 where city = t0.city),
12 2) sales2
13 from t t0) t1
14 where trunc(year, 'y') = to_date('01.01.2007', 'dd.mm.yyyy')
15 group by rollup(city, dep)
16 /

CITY DEP SALES_2007 SALES_2008_FORECAST GROWTH_RATE
------ ---- ---------- ------------------- ---------------
Omsk DEP1 34 51,9 52.65%
Omsk DEP2 213 325,15 52.65%
Omsk DEP3 156 238,14 52.65%
Omsk 403 615,19 52.65%
Moscow DEP1 762 989,18 29.81%
Moscow DEP2 657 852,88 29.81%
Moscow DEP3 1020 1324,1 29.81%
Moscow DEP4 487 632,19 29.81%
Moscow 2926 3798,35 29.81%
3329 4413,54 32.58%

10 rows selected

SQL>
Now all the totals are where they should be. And we added an extra column with growth rate. So we can see, that sales in Omsk grew up ~53% and in Moscow increased ~30%. The average growth rate from 2006 until 2007 years was ~33%.

Model solution #1
With the model solution instead of scalar subquery - we can use reference model clause.
We put the following query into the reference model:
SQL> select city, to_char(year,'yyyy') y, sum(sales) sum_sales from t group by city,year;

CITY Y SUM_SALES
------ ---- ----------
Moscow 2006 2254
Omsk 2006 264
Moscow 2007 2926
Omsk 2007 403

SQL>
And then we'll use the values from that resultset.
SQL> select city,dep,sales,sales_fc,round(100*(growth-1),2)||'%' growth_rate
2 from t where trunc(year,'y') = to_date('01.01.2007','dd.mm.yyyy')
3 model
4 reference r on (select city, to_char(year,'yyyy') y, sum(sales) sum_sales from t group by city,year)
5 dimension by (y, city)
6 measures(sum_sales)
7 main m
8 dimension by (0 total,city, dep)
9 measures(sales, 0 sales_FC, 0 growth)
10 rules upsert all
11 (sales[1,any,null]=sum(sales)[0,CV(),any],
12 sales[1,null,null]=sum(sales)[0,any,any],
13 growth[any,any,any]=r.sum_sales['2007',CV(city)]/r.sum_sales['2006',CV(city)],
14 sales_FC[any,any,any]=round(sales[CV(),CV(),CV()]*growth[CV(),CV(),CV()],2))
15 order by 2,3;

CITY DEP SALES SALES_FC GROWTH_RATE
------ ---- ---------- ---------- -----------------------------------------
Omsk DEP1 34 51,9 52.65%
Moscow DEP1 762 989,18 29.81%
Omsk DEP2 213 325,15 52.65%
Moscow DEP2 657 852,88 29.81%
Omsk DEP3 156 238,14 52.65%
Moscow DEP3 1020 1324,1 29.81%
Moscow DEP4 487 632,19 29.81%
Omsk 403 615,19 52.65%
Moscow 2926 3798,35 29.81%
3329 %

10 rows selected

SQL>

So we put our query into the reference clause, so that we can use it's cell values in the main part.
We created a new measure growth and in the rules part wrote:
growth[any,any,any]=r.sum_sales['2007',CV(city)]/r.sum_sales['2006',CV(city)]
So for each cell we find a growth rate for the current row (current city) by dividing summary sales of the current year for this particular city to the same of the previous year.
The next our step (rule) is finding the predicted sales for the next year:
sales_FC[any,any,any]=round(sales[CV(),CV(),CV()]*growth[CV(),CV(),CV()],2)
And we are facing the same problem we had in the Non-Model part. We can't define any city in the total row - so we can't find a growth rate for this row.
As you already know - we can easily write a rule to find the total sales for the next year as a sum of other cells:
SQL> select city,dep,sales,sales_fc,round(100*(growth-1),2)||'%' growth_rate
2 from t where trunc(year,'y') = to_date('01.01.2007','dd.mm.yyyy')
3 model
4 reference r on (select city, to_char(year,'yyyy') y, sum(sales) sum_sales from t group by city,year)
5 dimension by (y, city)
6 measures(sum_sales)
7 main m
8 dimension by (0 total,city, dep)
9 measures(sales, 0 sales_FC, 0 growth)
10 rules upsert all
11 (sales[1,any,null]=sum(sales)[0,CV(),any],
12 sales[1,null,null]=sum(sales)[0,any,any],
13 sales_FC[any,city is not null,any]=round(sales[CV(),CV(),CV()]*r.sum_sales['2007',CV(city)]/r.sum_sales['2006',CV(city)],2),
14 sales_FC[1,null,null]=sum(sales_FC)[1,any,null],
15 growth[any,any,any]=sales_FC[CV(),CV(),CV()]/sales[CV(),CV(),CV()])
16 order by 1,2;

CITY DEP SALES SALES_FC GROWTH_RATE
------ ---- ---------- ---------- ------------------
Moscow DEP1 762 989,18 29.81%
Moscow DEP2 657 852,88 29.81%
Moscow DEP3 1020 1324,1 29.81%
Moscow DEP4 487 632,19 29.81%
Moscow 2926 3798,35 29.81%
Omsk DEP1 34 51,9 52.65%
Omsk DEP2 213 325,15 52.65%
Omsk DEP3 156 238,14 52.65%
Omsk 403 615,19 52.65%
3329 4413,54 32.58%

10 rows selected

SQL>
You see that we changed the order of the rules and the rules themselves (that is because we want to find the growth rate for the overall total row, and hence we would find growth rates AFTER we get overall total sales value).
First we find the forecasting sales for the next year for all the rows excluding the very total:
sales_FC[any,city is not null,any]=
round(sales[CV(),CV(),CV()]*r.sum_sales['2007',CV(city)]/r.sum_sales['2006',CV(city)],2)
Then we find this total by summing up all the city subtotals:
sales_FC[1,null,null]=sum(sales_FC)[1,any,null]
And finally we get the growth rate by dividing sales_FC (forecast) on the actual sales (current year sales):
growth[any,any,any]=sales_FC[CV(),CV(),CV()]/sales[CV(),CV(),CV()]

After all this is the same result we had in the Non-Model solution.

Model solution #2
Well, in both solutions (Non-Model & Model #1) we accessed the table twice. First to get the main result and second one - to get the growth rate.

Now we are going to access table only once by using extra dimension in Model and refusing from the reference model clause.
SQL> select city,dep,sales,sales_fc,round(100*(growth-1),2)||'%' growth_rate from t
2 model
3 return updated rows
4 dimension by (0 total,city, dep, to_char(year,'yyyy') year)
5 measures(sales, sales sales_FC,0 growth)
6 rules upsert all
7 (sales[1,any,null,'2007']=sum(sales)[0,CV(),any,CV()],
8 sales[1,null,null,'2007']=sum(sales)[0,any,any,CV()],
9 sales_FC[any,any,any,'2007']=round(sales[CV(),CV(),CV(),CV()]*
10 sales[1,CV(),null,CV()]/sum(sales)[0,CV(),any,'2006'],2),
11 sales_FC[1,null,null,'2007']=sum(sales_FC)[1,any,null,'2007'],
12 growth[any,any,any,'2007']=sales_FC[CV(),CV(),CV(),CV()]/sales[CV(),CV(),CV(),CV()])
13 order by 1,2
14 /

CITY DEP SALES SALES_FC GROWTH_RATE
------ ---- ---------- ---------- -----------------------------------------
Moscow DEP1 762 989,18 29.81%
Moscow DEP2 657 852,88 29.81%
Moscow DEP3 1020 1324,1 29.81%
Moscow DEP4 487 632,19 29.81%
Moscow 2926 3798,35 29.81%
Omsk DEP1 34 51,9 52.65%
Omsk DEP2 213 325,15 52.65%
Omsk DEP3 156 238,14 52.65%
Omsk 403 615,19 52.65%
3329 4413,54 32.58%

10 rows selected

SQL>
What you should mention first is that we removed
where trunc(year,'y') = to_date('01.01.2007','dd.mm.yyyy')
So now all the data (for all years: 2006 and 2007) is pumped into model clause.

And you probably didn't miss that we put RETURN UPDATED ROWS, so only rows where cells were updated will be returned in the resultset.
That's how we can avoid additional outer WHERE clause for our MODEL: we update only rows that are corresppondent to 2007 year, but not the 2006.

Another thing changed is we added a dimension
to_char(year,'yyyy') year
So now we can reference to sales of a particular year by putting it in the format like [...,'YYYY'].

And the final amendment - would be adding a value for a new dimension in every cell reference ([<total_dim>,<city_dim>,<dep_dim>,'2007']). If we put it on the left side of a formula we can put just CV() on the right side in the place for <year_dim>.

to be continued ...

0 comments: