<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-3060148920065298017</id><updated>2011-11-28T04:34:13.031+04:00</updated><category term='Model'/><category term='Miscellaneous'/><category term='Reports'/><category term='Analytic Functions'/><category term='Regular Expressions'/><category term='Hierarchy'/><title type='text'>Volder's Oracle Notes</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://volder-notes.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3060148920065298017/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://volder-notes.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Volder</name><uri>http://www.blogger.com/profile/15557750923919600413</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://i049.radikal.ru/0712/26/a24baea7e400.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>24</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-3060148920065298017.post-2245561001824008341</id><published>2008-04-30T17:05:00.008+04:00</published><updated>2008-05-02T21:45:35.245+04:00</updated><title type='text'>Wrong encoding with HTTPURITYPE()</title><content type='html'>Several days ago my colleague asked me - do you have any idea why the following query returns Currency Name in the wrong encoding:&lt;br /&gt;&lt;blockquote&gt;SQL&gt; SELECT EXTRACTVALUE(VALUE(P), '/Valute/NumCode') NUMCODE,&lt;br /&gt;  2         EXTRACTVALUE(VALUE(P), '/Valute/CharCode') CHARCODE,&lt;br /&gt;  3         EXTRACTVALUE(VALUE(P), '/Valute/Nominal') NOMINAL,&lt;br /&gt;  4         EXTRACTVALUE(VALUE(P), '/Valute/Name') NAME,&lt;br /&gt;  5         EXTRACTVALUE(VALUE(P), '/Valute/Value') VALUE&lt;br /&gt;  6    FROM TABLE(XMLSEQUENCE(HTTPURITYPE('http://www.cbr.ru/scripts/XML_daily.asp?date_req=22.03.2008')&lt;br /&gt;  7                           .GETXML().EXTRACT('//ValCurs/Valute'))) P&lt;br /&gt;  8   WHERE EXTRACTVALUE(VALUE(P), '/Valute/CharCode') = 'USD'&lt;br /&gt;  9  /&lt;br /&gt;&lt;br /&gt;NUMCODE              CHARCODE             NOMINAL              NAME                 VALUE&lt;br /&gt;-------------------- -------------------- -------------------- -------------------- --------------------&lt;br /&gt;840                  USD                  1                    Aieea? NOA           23,7773&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;It is a link to our central bank site where rates of different currencies are listed in rouble equivalent.&lt;br /&gt;&lt;br /&gt;I have not come across such stuff before - so after reading manual on this topic, I found out that the encoding is not sent explicitly from this cbr http url in the content-type:&lt;br /&gt;&lt;blockquote&gt;SQL&gt; set serveroutput on&lt;br /&gt;SQL&gt; &lt;br /&gt;SQL&gt; DECLARE&lt;br /&gt;  2    httpuri HTTPURIType;&lt;br /&gt;  3  BEGIN&lt;br /&gt;  4    httpuri := HTTPURIType('http://www.rambler.ru/news/top/rss.html');&lt;br /&gt;  5    DBMS_OUTPUT.put_line('rambler: ' || httpuri.getContentType());&lt;br /&gt;  6    httpuri := HTTPURIType('http://www.cbr.ru/scripts/XML_daily.asp?date_req=22.03.2008');&lt;br /&gt;  7    DBMS_OUTPUT.put_line('cbr: ' || httpuri.getContentType());&lt;br /&gt;  8  END;&lt;br /&gt;  9  /&lt;br /&gt;&lt;br /&gt;rambler: text/xml; charset=windows-1251&lt;br /&gt;cbr: text/xml&lt;br /&gt;&lt;br /&gt;PL/SQL procedure successfully completed&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;as we can find from the &lt;a href="http://download.oracle.com/docs/cd/B28359_01/appdev.111/b28419/t_dburi.htm#i1007928"&gt;documentation&lt;/a&gt;:&lt;br /&gt;&lt;span style="font-style:italic;"&gt;The HTTPURITYPE is a subtype of the UriType that provides support for the HTTP protocol. This uses the &lt;span style="font-weight:bold;"&gt;UTL_HTTP&lt;/span&gt; package underneath to access the HTTP URLs.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;And in the chapter about UTL_HTTP we can find that:&lt;br /&gt;&lt;span style="font-style:italic;"&gt;Following the HTTP protocol standard specification, if the media type of a request or a response is text, but the character set information is missing in the Content-Type header, the character set of the request or response body should default to &lt;span style="font-weight:bold;"&gt;ISO-8859-1&lt;/span&gt;. A response created for a request inherits the default body character set of the request instead of the body character set of the current session. The default body character set is ISO-8859-1 in a database user session.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;So this is what happens in our case - default encoding is ISO-8859-1 and names of currencies are wrongly displayed.&lt;br /&gt;&lt;br /&gt;The easiest way is to specify the default charset with &lt;a href="http://download.oracle.com/docs/cd/B28359_01/appdev.111/b28419/u_http.htm#i1027267"&gt;SET_BODY_CHARSET&lt;/a&gt;:&lt;br /&gt;&lt;blockquote&gt;SQL&gt; set colwidth 20&lt;br /&gt;SQL&gt; select utl_http.get_body_charset() from dual;&lt;br /&gt;&lt;br /&gt;UTL_HTTP.GET_BODY_CH&lt;br /&gt;--------------------&lt;br /&gt;ISO-8859-1&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;br /&gt;SQL&gt; select * from v$nls_valid_values&lt;br /&gt;  2  where value like '%1251%'&lt;br /&gt;  3  /&lt;br /&gt;&lt;br /&gt;PARAMETER            VALUE                ISDEPRECATED&lt;br /&gt;-------------------- -------------------- ------------&lt;br /&gt;CHARACTERSET         CL8MSWIN1251         FALSE&lt;br /&gt;&lt;br /&gt;SQL&gt; exec utl_http.set_body_charset('CL8MSWIN1251');&lt;br /&gt;&lt;br /&gt;PL/SQL procedure successfully completed&lt;br /&gt;&lt;br /&gt;SQL&gt; select utl_http.get_body_charset() from dual;&lt;br /&gt;&lt;br /&gt;UTL_HTTP.GET_BODY_CH&lt;br /&gt;--------------------&lt;br /&gt;CL8MSWIN1251&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;br /&gt;SQL&gt; SELECT EXTRACTVALUE(VALUE(P), '/Valute/NumCode') NUMCODE,&lt;br /&gt;  2         EXTRACTVALUE(VALUE(P), '/Valute/CharCode') CHARCODE,&lt;br /&gt;  3         EXTRACTVALUE(VALUE(P), '/Valute/Nominal') NOMINAL,&lt;br /&gt;  4         EXTRACTVALUE(VALUE(P), '/Valute/Name') NAME,&lt;br /&gt;  5         EXTRACTVALUE(VALUE(P), '/Valute/Value') VALUE&lt;br /&gt;  6    FROM TABLE(XMLSEQUENCE(HTTPURITYPE('http://www.cbr.ru/scripts/XML_daily.asp?date_req=22.03.2008')&lt;br /&gt;  7                           .GETXML().EXTRACT('//ValCurs/Valute'))) P&lt;br /&gt;  8   WHERE EXTRACTVALUE(VALUE(P), '/Valute/CharCode') = 'USD'&lt;br /&gt;  9  /&lt;br /&gt;&lt;br /&gt;NUMCODE              CHARCODE             NOMINAL              NAME                 VALUE&lt;br /&gt;-------------------- -------------------- -------------------- -------------------- --------------------&lt;br /&gt;840                  USD                  1                    Доллар США           23,7773&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;So now the currency name is properly displayed. But this method affects only current session. And this default charset will be left for future requests also.&lt;br /&gt;&lt;br /&gt;Another way - is a little bit difficult.&lt;br /&gt;It is based on the fact, that in case you pump blob from the http link - the result encoding is not changed:&lt;br /&gt;&lt;a href="http://download.oracle.com/docs/cd/B28359_01/appdev.111/b28369/xdb15dbu.htm#sthref1517"&gt;URIType Methods&lt;/a&gt;:&lt;br /&gt;&lt;table border="3" cellspacing="5" bordercolor="black"&gt;&lt;tr&gt;&lt;td&gt;getBLOB()&lt;/td&gt;&lt;td&gt;Returns the target of the URI as a BLOB value. &lt;span style="font-weight:bold;"&gt;No character conversion is performed&lt;/span&gt;, and the character encoding is that of the URI target. This method can also be used to fetch binary data.&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;br /&gt;So the idea is to transform this blob value into XML with the needed encoding.&lt;br /&gt;I found a link from the OTN forum, which can be very useful: &lt;a href="http://forums.oracle.com/forums/thread.jspa?messageID=1864895&amp;#1864895"&gt;http://forums.oracle.com/forums/thread.jspa?messageID=1864895&amp;#1864895&lt;/a&gt;&lt;br /&gt;We can easily create a function, that will take as an input the blob value and the needed encoding for conversion and would return as a result XMLType value:&lt;br /&gt;&lt;blockquote&gt;SQL&gt; create or replace function BLOB2XML(l_blob blob, l_charset varchar2)&lt;br /&gt;  2    return xmltype is&lt;br /&gt;  3    l_clob         clob;&lt;br /&gt;  4    l_src_offset   number;&lt;br /&gt;  5    l_dest_offset  number;&lt;br /&gt;  6    l_blob_csid    number := nvl(nls_charset_id(l_charset), dbms_lob.default_csid);&lt;br /&gt;  7    v_lang_context number := dbms_lob.default_lang_ctx;&lt;br /&gt;  8    l_warning      number;&lt;br /&gt;  9    l_amount       number;&lt;br /&gt; 10  begin&lt;br /&gt; 11    dbms_lob.createtemporary(l_clob, true);&lt;br /&gt; 12    l_src_offset  := 1;&lt;br /&gt; 13    l_dest_offset := 1;&lt;br /&gt; 14    l_amount      := dbms_lob.getlength(l_blob);&lt;br /&gt; 15    dbms_lob.converttoclob(l_clob,&lt;br /&gt; 16                           l_blob,&lt;br /&gt; 17                           l_amount,&lt;br /&gt; 18                           l_dest_offset,&lt;br /&gt; 19                           l_src_offset,&lt;br /&gt; 20                           l_blob_csid,&lt;br /&gt; 21                           v_lang_context,&lt;br /&gt; 22                           l_warning);&lt;br /&gt; 23    return xmltype(l_clob);&lt;br /&gt; 24  end;&lt;br /&gt; 25  /&lt;br /&gt;&lt;br /&gt;Function created&lt;br /&gt;&lt;br /&gt;SQL&gt; show err blob2xml;&lt;br /&gt;No errors for FUNCTION SYSTEM.BLOB2XML&lt;br /&gt;&lt;br /&gt;SQL&gt; set colwidth 20&lt;br /&gt;SQL&gt; select utl_http.get_body_charset() from dual;&lt;br /&gt;&lt;br /&gt;UTL_HTTP.GET_BODY_CH&lt;br /&gt;--------------------&lt;br /&gt;ISO-8859-1&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;br /&gt;SQL&gt; SELECT EXTRACTVALUE(VALUE(P), '/Valute/NumCode') NUMCODE,&lt;br /&gt;  2         EXTRACTVALUE(VALUE(P), '/Valute/CharCode') CHARCODE,&lt;br /&gt;  3         EXTRACTVALUE(VALUE(P), '/Valute/Nominal') NOMINAL,&lt;br /&gt;  4         EXTRACTVALUE(VALUE(P), '/Valute/Name') NAME,&lt;br /&gt;  5         EXTRACTVALUE(VALUE(P), '/Valute/Value') VALUE&lt;br /&gt;  6    FROM TABLE(XMLSEQUENCE(blob2XML(HTTPURITYPE('http://www.cbr.ru/scripts/XML_daily.asp?date_req=22.03.2008')&lt;br /&gt;  7                           .GETblob(),'CL8MSWIN1251').EXTRACT('//ValCurs/Valute'))) P&lt;br /&gt;  8   WHERE EXTRACTVALUE(VALUE(P), '/Valute/CharCode') = 'USD'&lt;br /&gt;  9  /&lt;br /&gt;&lt;br /&gt;NUMCODE              CHARCODE             NOMINAL              NAME                 VALUE&lt;br /&gt;-------------------- -------------------- -------------------- -------------------- --------------------&lt;br /&gt;840                  USD                  1                    Доллар США           23,7773&lt;br /&gt;&lt;br /&gt;SQL&gt; select utl_http.get_body_charset() from dual;&lt;br /&gt;&lt;br /&gt;UTL_HTTP.GET_BODY_CH&lt;br /&gt;--------------------&lt;br /&gt;ISO-8859-1&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;As we can see the default charset for UTL_HTTP package has not being changed.&lt;br /&gt;Just to be sure that the problem was in encoding - let's try our query with ISO-8859-1 charset:&lt;br /&gt;&lt;blockquote&gt;SQL&gt; select * from v$nls_valid_values&lt;br /&gt;  2  where upper(value) like '%ISO%8859%1'&lt;br /&gt;  3  /&lt;br /&gt;&lt;br /&gt;PARAMETER            VALUE                ISDEPRECATED&lt;br /&gt;-------------------- -------------------- ------------&lt;br /&gt;CHARACTERSET         WE8ISO8859P1         FALSE&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;br /&gt;SQL&gt; SELECT EXTRACTVALUE(VALUE(P), '/Valute/NumCode') NUMCODE,&lt;br /&gt;  2         EXTRACTVALUE(VALUE(P), '/Valute/CharCode') CHARCODE,&lt;br /&gt;  3         EXTRACTVALUE(VALUE(P), '/Valute/Nominal') NOMINAL,&lt;br /&gt;  4         EXTRACTVALUE(VALUE(P), '/Valute/Name') NAME,&lt;br /&gt;  5         EXTRACTVALUE(VALUE(P), '/Valute/Value') VALUE&lt;br /&gt;  6    FROM TABLE(XMLSEQUENCE(blob2XML(HTTPURITYPE('http://www.cbr.ru/scripts/XML_daily.asp?date_req=22.03.2008')&lt;br /&gt;  7                           .GETblob(),'WE8ISO8859P1').EXTRACT('//ValCurs/Valute'))) P&lt;br /&gt;  8   WHERE EXTRACTVALUE(VALUE(P), '/Valute/CharCode') = 'USD'&lt;br /&gt;  9  /&lt;br /&gt;&lt;br /&gt;NUMCODE              CHARCODE             NOMINAL              NAME                 VALUE&lt;br /&gt;-------------------- -------------------- -------------------- -------------------- --------------------&lt;br /&gt;840                  USD                  1                    Aieea? NOA           23,7773&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;Yes, this is exactly what we got at our initial query.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3060148920065298017-2245561001824008341?l=volder-notes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://volder-notes.blogspot.com/feeds/2245561001824008341/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3060148920065298017&amp;postID=2245561001824008341' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3060148920065298017/posts/default/2245561001824008341'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3060148920065298017/posts/default/2245561001824008341'/><link rel='alternate' type='text/html' href='http://volder-notes.blogspot.com/2008/04/wrong-encoding-with-httpuritype.html' title='Wrong encoding with HTTPURITYPE()'/><author><name>Volder</name><uri>http://www.blogger.com/profile/15557750923919600413</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://i049.radikal.ru/0712/26/a24baea7e400.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3060148920065298017.post-7562702227927089803</id><published>2008-03-22T00:07:00.012+03:00</published><updated>2008-03-22T02:39:21.207+03:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Regular Expressions'/><title type='text'>Fragments of string between special patterns</title><content type='html'>Although my solution wasn't rated highly on the &lt;a href="http://forums.oracle.com/forums/message.jspa?messageID=2410149"&gt;forum&lt;/a&gt;, I decided to put it here :)&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Task:&lt;/span&gt;&lt;br /&gt;There is a string value given, comprising a number of elements that are put into particular patterns on both sides.&lt;br /&gt;We need to get all such patterns and inside value out of the string.&lt;br /&gt;E.g. the beginning pattern is '&lt;&lt;&lt;' and the ending pattern is '&gt;&gt;&gt;'.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Solution:&lt;/span&gt;&lt;br /&gt;&lt;blockquote&gt;SQL&gt; with t as (select 'xyz&amp;lt;&amp;lt;&amp;lt;testdata&amp;gt;&amp;gt;&amp;gt;123' col1  from dual union all&lt;br /&gt;  2             select 'zzz&amp;lt;&amp;lt;&amp;lt;test&amp;gt;&amp;lt;&amp;lt;&amp;lt;data&amp;gt;&amp;gt;&amp;gt;ssf' from dual union all&lt;br /&gt;  3             select 'sss&amp;lt;&amp;lt;&amp;lt;test&amp;gt;&amp;lt;data&amp;gt;&amp;gt;&amp;gt;sffsd' from dual union all&lt;br /&gt;  4             select 'ggg&amp;lt;&amp;lt;&amp;lt;test&amp;gt;&amp;gt;&amp;gt;34345&amp;lt;&amp;lt;&amp;lt;data&amp;gt;&amp;gt;&amp;gt;lks' from dual)&lt;br /&gt;  5             --&lt;br /&gt;  6             select t.*,&lt;br /&gt;  7                    reverse(regexp_replace(reverse(regexp_replace(col1, '(&amp;lt;&amp;lt;&amp;lt;.*?&amp;gt;&amp;gt;&amp;gt;)|.', '\1')),&lt;br /&gt;  8                                           '(&amp;gt;&amp;gt;&amp;gt;.*?&amp;lt;&amp;lt;&amp;lt;)|.', '\1')) col2&lt;br /&gt;  9               from t;&lt;br /&gt;&lt;br /&gt;COL1                            COL2&lt;br /&gt;------------------------------- ----------------------------------&lt;br /&gt;xyz&amp;lt;&amp;lt;&amp;lt;testdata&amp;gt;&amp;gt;&amp;gt;123            &amp;lt;&amp;lt;&amp;lt;testdata&amp;gt;&amp;gt;&amp;gt;&lt;br /&gt;zzz&amp;lt;&amp;lt;&amp;lt;test&amp;gt;&amp;lt;&amp;lt;&amp;lt;data&amp;gt;&amp;gt;&amp;gt;ssf        &amp;lt;&amp;lt;&amp;lt;data&amp;gt;&amp;gt;&amp;gt;&lt;br /&gt;sss&amp;lt;&amp;lt;&amp;lt;test&amp;gt;&amp;lt;data&amp;gt;&amp;gt;&amp;gt;sffsd        &amp;lt;&amp;lt;&amp;lt;test&amp;gt;&amp;lt;data&amp;gt;&amp;gt;&amp;gt;&lt;br /&gt;ggg&amp;lt;&amp;lt;&amp;lt;test&amp;gt;&amp;gt;&amp;gt;34345&amp;lt;&amp;lt;&amp;lt;data&amp;gt;&amp;gt;&amp;gt;lks &amp;lt;&amp;lt;&amp;lt;test&amp;gt;&amp;gt;&amp;gt;&amp;lt;&amp;lt;&amp;lt;data&amp;gt;&amp;gt;&amp;gt;&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Explanation:&lt;/span&gt;&lt;br /&gt;The first thing what we are doing - is getting all patterns that start from '&lt;&lt;&lt;' and end with '&gt;&gt;&gt;'. By using non-greedy search we have as a result not only one long value, but separated patterns (it is well seen in the fourth row):&lt;br /&gt;&lt;blockquote&gt;SQL&gt; with t as (select 'xyz&amp;lt;&amp;lt;&amp;lt;testdata&amp;gt;&amp;gt;&amp;gt;123' col1  from dual union all&lt;br /&gt;  2             select 'zzz&amp;lt;&amp;lt;&amp;lt;test&amp;gt;&amp;lt;&amp;lt;&amp;lt;data&amp;gt;&amp;gt;&amp;gt;ssf' from dual union all&lt;br /&gt;  3             select 'sss&amp;lt;&amp;lt;&amp;lt;test&amp;gt;&amp;lt;data&amp;gt;&amp;gt;&amp;gt;sffsd' from dual union all&lt;br /&gt;  4             select 'ggg&amp;lt;&amp;lt;&amp;lt;test&amp;gt;&amp;gt;&amp;gt;34345&amp;lt;&amp;lt;&amp;lt;data&amp;gt;&amp;gt;&amp;gt;lks' from dual)&lt;br /&gt;  5             --&lt;br /&gt;  6             select t.*,&lt;br /&gt;  7                    regexp_replace(col1, '(&amp;lt;&amp;lt;&amp;lt;.*?&amp;gt;&amp;gt;&amp;gt;)|.', '\1') col2&lt;br /&gt;  8               from t;&lt;br /&gt;&lt;br /&gt;COL1                            COL2&lt;br /&gt;------------------------------- ----------------------------------&lt;br /&gt;xyz&amp;lt;&amp;lt;&amp;lt;testdata&amp;gt;&amp;gt;&amp;gt;123            &amp;lt;&amp;lt;&amp;lt;testdata&amp;gt;&amp;gt;&amp;gt;&lt;br /&gt;zzz&amp;lt;&amp;lt;&amp;lt;test&amp;gt;&amp;lt;&amp;lt;&amp;lt;data&amp;gt;&amp;gt;&amp;gt;ssf        &amp;lt;&amp;lt;&amp;lt;test&amp;gt;&amp;lt;&amp;lt;&amp;lt;data&amp;gt;&amp;gt;&amp;gt;&lt;br /&gt;sss&amp;lt;&amp;lt;&amp;lt;test&amp;gt;&amp;lt;data&amp;gt;&amp;gt;&amp;gt;sffsd        &amp;lt;&amp;lt;&amp;lt;test&amp;gt;&amp;lt;data&amp;gt;&amp;gt;&amp;gt;&lt;br /&gt;ggg&amp;lt;&amp;lt;&amp;lt;test&amp;gt;&amp;gt;&amp;gt;34345&amp;lt;&amp;lt;&amp;lt;data&amp;gt;&amp;gt;&amp;gt;lks &amp;lt;&amp;lt;&amp;lt;test&amp;gt;&amp;gt;&amp;gt;&amp;lt;&amp;lt;&amp;lt;data&amp;gt;&amp;gt;&amp;gt;&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;&lt;br /&gt;This technique of getting only desired pattern from the string is described &lt;a href="http://volder-notes.blogspot.com/2007/11/regexp-getting-all-occurences-of-mask.html"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;But the result we got is not 100% what were going to receive.&lt;br /&gt;As you can see in the second line there is a resulting value '&amp;lt;&amp;lt;&amp;lt;test&amp;gt;&amp;lt;&amp;lt;&amp;lt;data&amp;gt;&amp;gt;&amp;gt;', which is not correct. Because there is an opening pattern '&lt;&lt;&lt;' is met inside the value. So it looks as only '&amp;lt;&amp;lt;&amp;lt;data&amp;gt;&amp;gt;&amp;gt;' should be returned.&lt;br /&gt;For that purpose we need to read the string starting from the end and moving in a backward direction to the very beginning of the value.&lt;br /&gt;In such cases using REVERSE() function could be a solution, although it is not documented. So first we reverse the string and secondly we pass it in the usual way from beginning to the end.&lt;br /&gt;And by the way as we have it reversed all our patterns should also be reversed: the first one '&gt;&gt;&gt;' would be met and then closing '&lt;&lt;&lt;'. In the end we just put the second REVERSE() to get the initial direction of the string:&lt;br /&gt;&lt;blockquote&gt;SQL&gt; with t as (select 'xyz&amp;lt;&amp;lt;&amp;lt;testdata&amp;gt;&amp;gt;&amp;gt;123' col1  from dual union all&lt;br /&gt;  2             select 'zzz&amp;lt;&amp;lt;&amp;lt;test&amp;gt;&amp;lt;&amp;lt;&amp;lt;data&amp;gt;&amp;gt;&amp;gt;ssf' from dual union all&lt;br /&gt;  3             select 'sss&amp;lt;&amp;lt;&amp;lt;test&amp;gt;&amp;lt;data&amp;gt;&amp;gt;&amp;gt;sffsd' from dual union all&lt;br /&gt;  4             select 'ggg&amp;lt;&amp;lt;&amp;lt;test&amp;gt;&amp;gt;&amp;gt;34345&amp;lt;&amp;lt;&amp;lt;data&amp;gt;&amp;gt;&amp;gt;lks' from dual)&lt;br /&gt;  5             --&lt;br /&gt;  6             select t.*,&lt;br /&gt;  7                    regexp_replace(col1, '(&amp;lt;&amp;lt;&amp;lt;.*?&amp;gt;&amp;gt;&amp;gt;)|.', '\1') col2,&lt;br /&gt;  8                    reverse(regexp_replace(col1, '(&amp;lt;&amp;lt;&amp;lt;.*?&amp;gt;&amp;gt;&amp;gt;)|.', '\1')) col3,&lt;br /&gt;  9                    regexp_replace(reverse(regexp_replace(col1, '(&amp;lt;&amp;lt;&amp;lt;.*?&amp;gt;&amp;gt;&amp;gt;)|.', '\1')),'(&amp;gt;&amp;gt;&amp;gt;.*?&amp;lt;&amp;lt;&amp;lt;)|.', '\1') col4,&lt;br /&gt; 10                    reverse(regexp_replace(reverse(regexp_replace(col1, '(&amp;lt;&amp;lt;&amp;lt;.*?&amp;gt;&amp;gt;&amp;gt;)|.', '\1')),'(&amp;gt;&amp;gt;&amp;gt;.*?&amp;lt;&amp;lt;&amp;lt;)|.', '\1')) col5&lt;br /&gt; 11               from t;&lt;br /&gt;&lt;br /&gt;COL1                            COL2                     COL3                     COL4                   COL5&lt;br /&gt;------------------------------- ------------------------ ------------------------ ---------------------- ---------------------&lt;br /&gt;xyz&amp;lt;&amp;lt;&amp;lt;testdata&amp;gt;&amp;gt;&amp;gt;123            &amp;lt;&amp;lt;&amp;lt;testdata&amp;gt;&amp;gt;&amp;gt;           &amp;gt;&amp;gt;&amp;gt;atadtset&amp;lt;&amp;lt;&amp;lt;           &amp;gt;&amp;gt;&amp;gt;atadtset&amp;lt;&amp;lt;&amp;lt;         &amp;lt;&amp;lt;&amp;lt;testdata&amp;gt;&amp;gt;&amp;gt;&lt;br /&gt;zzz&amp;lt;&amp;lt;&amp;lt;test&amp;gt;&amp;lt;&amp;lt;&amp;lt;data&amp;gt;&amp;gt;&amp;gt;ssf        &amp;lt;&amp;lt;&amp;lt;test&amp;gt;&amp;lt;&amp;lt;&amp;lt;data&amp;gt;&amp;gt;&amp;gt;       &amp;gt;&amp;gt;&amp;gt;atad&amp;lt;&amp;lt;&amp;lt;&amp;gt;tset&amp;lt;&amp;lt;&amp;lt;       &amp;gt;&amp;gt;&amp;gt;atad&amp;lt;&amp;lt;&amp;lt;             &amp;lt;&amp;lt;&amp;lt;data&amp;gt;&amp;gt;&amp;gt;&lt;br /&gt;sss&amp;lt;&amp;lt;&amp;lt;test&amp;gt;&amp;lt;data&amp;gt;&amp;gt;&amp;gt;sffsd        &amp;lt;&amp;lt;&amp;lt;test&amp;gt;&amp;lt;data&amp;gt;&amp;gt;&amp;gt;         &amp;gt;&amp;gt;&amp;gt;atad&amp;lt;&amp;gt;tset&amp;lt;&amp;lt;&amp;lt;         &amp;gt;&amp;gt;&amp;gt;atad&amp;lt;&amp;gt;tset&amp;lt;&amp;lt;&amp;lt;       &amp;lt;&amp;lt;&amp;lt;test&amp;gt;&amp;lt;data&amp;gt;&amp;gt;&amp;gt;&lt;br /&gt;ggg&amp;lt;&amp;lt;&amp;lt;test&amp;gt;&amp;gt;&amp;gt;34345&amp;lt;&amp;lt;&amp;lt;data&amp;gt;&amp;gt;&amp;gt;lks &amp;lt;&amp;lt;&amp;lt;test&amp;gt;&amp;gt;&amp;gt;&amp;lt;&amp;lt;&amp;lt;data&amp;gt;&amp;gt;&amp;gt;     &amp;gt;&amp;gt;&amp;gt;atad&amp;lt;&amp;lt;&amp;lt;&amp;gt;&amp;gt;&amp;gt;tset&amp;lt;&amp;lt;&amp;lt;     &amp;gt;&amp;gt;&amp;gt;atad&amp;lt;&amp;lt;&amp;lt;&amp;gt;&amp;gt;&amp;gt;tset&amp;lt;&amp;lt;&amp;lt;   &amp;lt;&amp;lt;&amp;lt;test&amp;gt;&amp;gt;&amp;gt;&amp;lt;&amp;lt;&amp;lt;data&amp;gt;&amp;gt;&amp;gt;&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;&lt;br /&gt;As you can see the redundant part '&amp;lt;&amp;lt;&amp;lt;test&amp;gt;' have dissapeared from the line #2.&lt;br /&gt;&lt;br /&gt;PS &lt;br /&gt;The opening pattern and closing pattern could be any, e.g. '&lt;&lt;~' as opening and '&gt;&lt;&lt;-' as closing, in this case the query would look like:&lt;br /&gt;&lt;blockquote&gt;SQL&gt; with t as (select 'xyz&amp;lt;&amp;lt;~testdata&amp;gt;&amp;lt;&amp;lt;-123' col1  from dual union all&lt;br /&gt;  2             select 'zzz&amp;lt;&amp;lt;~test&amp;gt;&amp;lt;&amp;lt;-data&amp;gt;&amp;lt;&amp;lt;-ssf' from dual union all&lt;br /&gt;  3             select 'sss&amp;lt;&amp;lt;~test&amp;gt;&amp;gt;&amp;gt;&amp;gt;&amp;lt;&amp;lt;-data&amp;gt;&amp;gt;~sffsd' from dual union all&lt;br /&gt;  4             select 'ggg&amp;lt;&amp;lt;~test&amp;gt;&amp;lt;&amp;lt;-34345&amp;lt;&amp;lt;~data&amp;lt;&amp;lt;~data2&amp;gt;&amp;lt;&amp;lt;-lks' from dual)&lt;br /&gt;  5             --&lt;br /&gt;  6             select t.*,&lt;br /&gt;  7                    reverse(regexp_replace(reverse(regexp_replace(col1, '(&amp;lt;&amp;lt;~.*?&amp;gt;&amp;lt;&amp;lt;-)|.', '\1')),'(-&amp;lt;&amp;lt;&amp;gt;.*?~&amp;lt;&amp;lt;)|.', '\1')) col2&lt;br /&gt;  8               from t;&lt;br /&gt;&lt;br /&gt;COL1                                      COL2&lt;br /&gt;----------------------------------------- -----------------------------------&lt;br /&gt;xyz&amp;lt;&amp;lt;~testdata&amp;gt;&amp;lt;&amp;lt;-123                     &amp;lt;&amp;lt;~testdata&amp;gt;&amp;lt;&amp;lt;-&lt;br /&gt;zzz&amp;lt;&amp;lt;~test&amp;gt;&amp;lt;&amp;lt;-data&amp;gt;&amp;lt;&amp;lt;-ssf                 &amp;lt;&amp;lt;~test&amp;gt;&amp;lt;&amp;lt;-&lt;br /&gt;sss&amp;lt;&amp;lt;~test&amp;gt;&amp;gt;&amp;gt;&amp;gt;&amp;lt;&amp;lt;-data&amp;gt;&amp;gt;~sffsd             &amp;lt;&amp;lt;~test&amp;gt;&amp;gt;&amp;gt;&amp;gt;&amp;lt;&amp;lt;-&lt;br /&gt;ggg&amp;lt;&amp;lt;~test&amp;gt;&amp;lt;&amp;lt;-34345&amp;lt;&amp;lt;~data&amp;lt;&amp;lt;~data2&amp;gt;&amp;lt;&amp;lt;-lks &amp;lt;&amp;lt;~test&amp;gt;&amp;lt;&amp;lt;-&amp;lt;&amp;lt;~data2&amp;gt;&amp;lt;&amp;lt;-&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3060148920065298017-7562702227927089803?l=volder-notes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://volder-notes.blogspot.com/feeds/7562702227927089803/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3060148920065298017&amp;postID=7562702227927089803' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3060148920065298017/posts/default/7562702227927089803'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3060148920065298017/posts/default/7562702227927089803'/><link rel='alternate' type='text/html' href='http://volder-notes.blogspot.com/2008/03/fragments-of-string-between-special.html' title='Fragments of string between special patterns'/><author><name>Volder</name><uri>http://www.blogger.com/profile/15557750923919600413</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://i049.radikal.ru/0712/26/a24baea7e400.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3060148920065298017.post-3277818998383827364</id><published>2008-02-23T18:53:00.013+03:00</published><updated>2008-03-02T00:30:23.215+03:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Regular Expressions'/><title type='text'>Patterns evaluation order in regular expressions</title><content type='html'>After a lull in my blogging activity I decided to write a note on Oracle regular expressions, which can be useful for those who wants to use Oracle regexp functions more efficiently.&lt;br /&gt;&lt;br /&gt;We will talk about masks with several alternatives.&lt;br /&gt;Lets look at the following example:&lt;br /&gt;&lt;blockquote&gt;SQL&gt; with t as (select '1H1' str from dual)&lt;br /&gt;  2  select regexp_replace(str, '1|1H', 'A') mask1,&lt;br /&gt;  3         regexp_replace(str, '1H|1', 'A') mask2&lt;br /&gt;  4    from t&lt;br /&gt;  5  /&lt;br /&gt;&lt;br /&gt;MASK1 MASK2&lt;br /&gt;----- -----&lt;br /&gt;AHA   AA&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;&lt;br /&gt;"|" (pipe) is OR operator in regular expressions. It is used to list several alternatives to be matched.&lt;br /&gt;But the most important thing is that they are passed one by one in order of apperance inside search mask.&lt;br /&gt;As you can see in the example we have two columns (mask1, mask2). The difference is that we changed places of "1" and "1H", and the results are absolutely different.&lt;br /&gt;&lt;br /&gt;So how it is working:&lt;br /&gt;&lt;span style="font-style:italic;"&gt;MASK1 ('1|1H'):&lt;/span&gt;&lt;br /&gt;In the initial string '1H1' we start to search for the first occurence.&lt;br /&gt;'1' matches the first pattern ('1') and hence replaced with 'A'.&lt;br /&gt;Then proceeding with the rest of the line ('H1').&lt;br /&gt;The next symbol 'H' is not matching '1' so we go to the next pattern ('1H'), but it doesn't match it also.&lt;br /&gt;So we leave it as it is and go to the rest '1', which matches the first pattern of a mask ('1') and consequently replaced with 'A'.&lt;br /&gt;&lt;br /&gt;So if we combine all the changes done - we finally get 'AHA'.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style:italic;"&gt;MASK2('1H|1'):&lt;/span&gt;&lt;br /&gt;Now when we changed the order of the patterns inside the mask - the result would be different.&lt;br /&gt;So we start with the first symbol '1' again.&lt;br /&gt;It matches the beginning of our first pattern ('1H').&lt;br /&gt;And inspite that this symbol is enough to cover the second pattern, we add one more letter to watch whether it satisfies the first pattern or not.&lt;br /&gt;So we add the next symbol 'H' and get '1H' which matches the first pattern, and hence replaced with 'A'.&lt;br /&gt;Then we proceed with the rest of line ('1').&lt;br /&gt;So it doesn't match the first pattern '1H', we check with the second pattern and we find a match, so change '1' to 'A'.&lt;br /&gt;In the final result we have 'AA'.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style:italic;"&gt;The main point what should be learnt here is that we don't proceed with the next pattern until we know for sure - that the current pattern wouldn't match.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;How can it be used in practice.&lt;br /&gt;In one of my previous &lt;a href="http://volder-notes.blogspot.com/2007/12/regexp-4-number-of-occurence-advanced.html"&gt;posts&lt;/a&gt; I already used this technique, but here are couple of other examples recently posted on OTN forum.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://forums.oracle.com/forums/message.jspa?messageID=2358180#2358180"&gt;Example #1.&lt;/a&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Task: &lt;/span&gt;&lt;br /&gt;The column comprises the list of names.&lt;br /&gt;We need to get the following result: in case there is only one name in a column - we need to return this name (not touched) with trimmed preceding and trailing spaces.&lt;br /&gt;In case when there are several words in a name - we need to return only first letters (initials).&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Solution:&lt;/span&gt;&lt;br /&gt;&lt;blockquote&gt;SQL&gt; with t as  (select 'Mark Thomsan' str        from dual union all&lt;br /&gt;  2              select 'Allen'                   from dual union all&lt;br /&gt;  3              select 'John    Trovolta Robert' from dual union all&lt;br /&gt;  4              select ' John'                   from dual union all&lt;br /&gt;  5              select 'Frederick   '            from dual union all&lt;br /&gt;  6              select ' Erick Cartman '         from dual union all&lt;br /&gt;  7              select '  Michael   '            from dual)&lt;br /&gt;  8     --&lt;br /&gt;  9  select str,regexp_replace(str,'^ *([^ ]*) *$|(^| )([^ ])|.','\1\3') str_new from t&lt;br /&gt; 10  /&lt;br /&gt;&lt;br /&gt;STR                     STR_NEW&lt;br /&gt;----------------------- ----------------&lt;br /&gt;Mark Thomsan            MT&lt;br /&gt;Allen                   Allen&lt;br /&gt;John    Trovolta Robert JTR&lt;br /&gt; John                   John&lt;br /&gt;Frederick               Frederick&lt;br /&gt; Erick Cartman          EC&lt;br /&gt;  Michael               Michael&lt;br /&gt;&lt;br /&gt;7 rows selected&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Explanation:&lt;/span&gt;&lt;br /&gt;We have a mask comprising 3 patterns (splitted with '|'):&lt;br /&gt;1) '^ *([^ ]*) *$'&lt;br /&gt;2) '(^| )([^ ])'&lt;br /&gt;3) '.'&lt;br /&gt;&lt;br /&gt;So the first one matches the whole string ('^' as the beginning and '$' as the end), that contains no or only one word ('([^ ]*)'), which is preceded or trailed by any number of spaces (' *').&lt;br /&gt;If our column value is like this - then we return only the word as a result ('\1').&lt;br /&gt;If there is more than one word in a column value - this mask is of no use.&lt;br /&gt;Hence we proceed with the second pattern.&lt;br /&gt;This matches the first letters of each word. We specify that the letters are the first only - by placing '(^| )' before any non-space symbol '([^ ])'.&lt;br /&gt;So all the first letters would be returned in the result '\3' (this will happen only for string which contains &gt; 1 word, otherwise the whole string would be covered by first pattern, and we will never reach the second pattern).&lt;br /&gt;The last pattern '.' is symply any other character - not mentioned in the previous two patterns.&lt;br /&gt;So it is kind of clean up technique to put '|.' in the end of regexp_replace mask.&lt;br /&gt;&lt;br /&gt;The result is what we need: Allen, John, Frederick and Michael were returned as they were in the input data. The others (contain more than 1 word) are replaced by initials only.&lt;br /&gt;&lt;a href="http://forums.oracle.com/forums/message.jspa?messageID=2362250"&gt;&lt;br /&gt;Example #2.&lt;/a&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Task:&lt;/span&gt; We need to eliminate all the spaces, which are not between two words.&lt;br /&gt;If there are more than one space between words they should be trimmed to only one.&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Solution:&lt;/span&gt;&lt;br /&gt;&lt;blockquote&gt;SQL&gt; with t as (select ' 6213,   2345, Application    Developer' str from dual union all&lt;br /&gt;  2             select '123, Avenue, app. 324, first    door  second floor' from dual)&lt;br /&gt;  3             --&lt;br /&gt;  4  select regexp_replace(str,'([[:alpha:]] ) *([[:alpha:]])| |(.)','\1\2\3') new_str from t&lt;br /&gt;  5  /&lt;br /&gt;&lt;br /&gt;NEW_STR&lt;br /&gt;-------------------------------------------------&lt;br /&gt;6213,2345,Application Developer&lt;br /&gt;123,Avenue,app.324,first door second floor&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Exaplnation:&lt;/span&gt;&lt;br /&gt;We have a mask comprising 3 patterns again:&lt;br /&gt;1) '([[:alpha:]] ) *([[:alpha:]])'&lt;br /&gt;2) ' '&lt;br /&gt;3) '(.)'&lt;br /&gt;&lt;br /&gt;The first pattern searches for two letters with at least one space between and returnes these letters with only one space between them ('\1\2').&lt;br /&gt;The second pattern contains only one space and matches all the other spaces, that were not covered by the first pattern.&lt;br /&gt;As we don't have any backreference for this pattern - all such spaces would be eliminated from the initial value.&lt;br /&gt;The third pattern matches any character '(.)'. We remember that all the spaces were covered by first or second pattern, so no spaces would match this '(.)'.&lt;br /&gt;As we have '\3' for this pattern - all such symbols would be returned in the result.&lt;br /&gt;That's how we left one space between letters only, and erased all the other spaces from the sentence.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3060148920065298017-3277818998383827364?l=volder-notes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://volder-notes.blogspot.com/feeds/3277818998383827364/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3060148920065298017&amp;postID=3277818998383827364' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3060148920065298017/posts/default/3277818998383827364'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3060148920065298017/posts/default/3277818998383827364'/><link rel='alternate' type='text/html' href='http://volder-notes.blogspot.com/2008/02/mask-evaluation-order-in-regular.html' title='Patterns evaluation order in regular expressions'/><author><name>Volder</name><uri>http://www.blogger.com/profile/15557750923919600413</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://i049.radikal.ru/0712/26/a24baea7e400.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3060148920065298017.post-8285861753031534900</id><published>2008-02-10T22:43:00.000+03:00</published><updated>2008-02-11T15:33:08.354+03:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Miscellaneous'/><title type='text'>Magic MIT whiteboard</title><content type='html'>I came across a post from &lt;a href="http://girlgeek-oracle.blogspot.com/2008/01/magical-mit-drawingmotion-simulator.html"&gt;Claudia Zeiler&lt;/a&gt;, where she mentioned about a really interesting stuff. It is not regarding Oracle, but I liked it, so decided to post in my blog also.&lt;br /&gt;I remember the times at my school, when we used wooden boards and were writing with chalks.&lt;br /&gt;Then came soft-tip pens with paper and white boards or stand with throw-over lists of paper.&lt;br /&gt;And look what can be used now :) BTW the reproduction of physical objects natural behaviour is pretty good.&lt;br /&gt;So no doubts, that our children would be cleverer us, using such a stuff while studying.&lt;br /&gt;&lt;br /&gt;&lt;object width="425" height="355"&gt;&lt;param name="movie" value="http://www.youtube.com/v/NZNTgglPbUA&amp;rel=1"&gt;&lt;/param&gt;&lt;param name="wmode" value="transparent"&gt;&lt;/param&gt;&lt;embed src="http://www.youtube.com/v/NZNTgglPbUA&amp;rel=1" type="application/x-shockwave-flash" wmode="transparent" width="425" height="355"&gt;&lt;/embed&gt;&lt;/object&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3060148920065298017-8285861753031534900?l=volder-notes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://volder-notes.blogspot.com/feeds/8285861753031534900/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3060148920065298017&amp;postID=8285861753031534900' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3060148920065298017/posts/default/8285861753031534900'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3060148920065298017/posts/default/8285861753031534900'/><link rel='alternate' type='text/html' href='http://volder-notes.blogspot.com/2008/02/magic-mit-whiteboard.html' title='Magic MIT whiteboard'/><author><name>Volder</name><uri>http://www.blogger.com/profile/15557750923919600413</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://i049.radikal.ru/0712/26/a24baea7e400.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3060148920065298017.post-4751937023778536089</id><published>2008-02-04T21:54:00.000+03:00</published><updated>2008-02-07T18:21:56.809+03:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Model'/><title type='text'>Prime factors of a number</title><content type='html'>&lt;img style="float:left; margin-right:50px" alt="Image:Animation Sieve of Eratosth-2.gif" src="http://upload.wikimedia.org/wikipedia/commons/c/c8/Animation_Sieve_of_Eratosth-2.gif" width="370" height="305" border="0" /&gt;&lt;br /&gt;All we know that every natural number (except 1) can be written as a product of prime numbers, and such a decomposition would be unique for each number. This is what the &lt;a href="http://en.wikipedia.org/wiki/Fundamental_theorem_of_arithmetic"&gt;Fundamental theorem of arithmetic&lt;/a&gt; is about.&lt;br /&gt;&lt;br /&gt;Let's get a query in SQL, which should return (output) a prime factorization of a given natural number (input).&lt;br /&gt;&lt;br /&gt;Some time ago I wrote on a OTN forum a &lt;a href="http://forums.oracle.com/forums/thread.jspa?messageID=1820246#1820246"&gt;query&lt;/a&gt;, using model clause, for finding all prime numbers less than a given one. It uses &lt;a href="http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes"&gt;Sieve of Eratosthenes&lt;/a&gt; method.&lt;br /&gt;&lt;br /&gt;So using this stuff I added one more model part that would iteratively find out all prime factors of a number (one prime factor at an iteration).&lt;br /&gt;&lt;br /&gt;And here is my query:&lt;br /&gt;&lt;blockquote&gt;SQL&gt; var n number&lt;br /&gt;SQL&gt; exec :n:=1260&lt;br /&gt;&lt;br /&gt;PL/SQL procedure successfully completed&lt;br /&gt;n&lt;br /&gt;---------&lt;br /&gt;1260&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;br /&gt;SQL&gt; with t as (select level l from dual connect by level &lt;= :n),&lt;br /&gt;  2     --&lt;br /&gt;  3      t1 as (select l prim_num from&lt;br /&gt;  4        (select * from t&lt;br /&gt;  5          model&lt;br /&gt;  6           dimension by (l dim)&lt;br /&gt;  7           measures (l,2 temp)&lt;br /&gt;  8            rules iterate (1e8) until (power(temp[1],2)&gt;:n)&lt;br /&gt;  9              (l[DIM&gt;TEMP[1]]=decode(mod(l[CV()],temp[1]),0,null,l[CV()]),&lt;br /&gt; 10               temp[1]=min(l)[dim&gt;temp[1]])&lt;br /&gt; 11        )&lt;br /&gt; 12    where l is not null)&lt;br /&gt; 13    --&lt;br /&gt; 14    select '('||prim_num||'^'||pow||')' prim_factors from (&lt;br /&gt; 15    select * from t1&lt;br /&gt; 16     model&lt;br /&gt; 17      dimension by (rownum rn)&lt;br /&gt; 18      measures(prim_num, :n val, 0 pow)&lt;br /&gt; 19      rules iterate(1000) until (val[1]=1)&lt;br /&gt; 20      (pow[any] order by rn = decode(mod(val[1],prim_num[CV()]),0,pow[CV()]+1,pow[CV()]),&lt;br /&gt; 21       val[rn&gt;1] order by rn = decode(mod(val[CV()-1],prim_num[CV()]),0,val[CV()-1]/prim_num[CV()],val[CV()-1]),&lt;br /&gt; 22       val[1]=min(val)[any]))&lt;br /&gt; 23       where rn&gt;1 and pow&gt;0&lt;br /&gt; 24  /&lt;br /&gt;&lt;br /&gt;PRIM_FACTORS&lt;br /&gt;--------------------------------------------------------------------------------&lt;br /&gt;(2^2)&lt;br /&gt;(3^2)&lt;br /&gt;(5^1)&lt;br /&gt;(7^1)&lt;br /&gt;n&lt;br /&gt;---------&lt;br /&gt;1260&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;&lt;br /&gt;It was not very effective, e.g. to find prime factors of numbers near 100000 it took about 2 or more minutes to execute on my laptop.&lt;br /&gt;&lt;br /&gt;So I decided to post it as challenge on russian SQL.RU forum (&lt;a href="http://sql.ru/forum/actualthread.aspx?tid=521131"&gt;russian version&lt;/a&gt; and &lt;a href="http://translate.google.com/translate?u=http%3A%2F%2Fsql.ru%2Fforum%2Factualthread.aspx%3Ftid%3D521131&amp;langpair=ru%7Cen&amp;hl=en&amp;ie=UTF8"&gt;english mirror&lt;/a&gt;) and there were pretty good solutions.&lt;br /&gt;&lt;br /&gt;But I liked the one from &lt;a href="http://sql.ru/forum/actualthread.aspx?bid=3&amp;tid=521131&amp;pg=2#5237579"&gt;andreymx&lt;/a&gt;:&lt;br /&gt;&lt;blockquote&gt;SQL&gt; var p_value number&lt;br /&gt;SQL&gt; exec :p_value:=123432435&lt;br /&gt;&lt;br /&gt;PL/SQL procedure successfully completed&lt;br /&gt;&lt;br /&gt;Executed in 0,03 seconds&lt;br /&gt;p_value&lt;br /&gt;---------&lt;br /&gt;123432435&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;br /&gt;SQL&gt; set timing on&lt;br /&gt;SQL&gt; WITH s1 AS (SELECT LEVEL lv FROM dual CONNECT BY ROWNUM &lt;= SQRT(:p_value)),&lt;br /&gt;  2       s2 AS (SELECT lv id1, :p_value/lv id2 FROM s1 WHERE MOD(:p_value, lv)=0),&lt;br /&gt;  3       s3 AS (SELECT id1 ID FROM s2 UNION SELECT id2 FROM s2),&lt;br /&gt;  4       s4 AS (SELECT ID, (SELECT MIN(ID) FROM s3 s3_1 WHERE s3_1.ID / s3.ID = TRUNC(s3_1.ID / s3.ID) AND s3_1.ID &gt; s3.ID) idd FROM s3)&lt;br /&gt;  5  SELECT :p_value||'='||MAX(LTRIM(RTRIM(SYS_CONNECT_BY_PATH(idd/ID, '*'), '*'), '*')) FROM s4&lt;br /&gt;  6  CONNECT BY ID=PRIOR IDd&lt;br /&gt;  7  START WITH ID=1&lt;br /&gt;  8  /&lt;br /&gt;&lt;br /&gt;:P_VALUE||'='||MAX(LTRIM(RTRIM&lt;br /&gt;--------------------------------------------------------------------------------&lt;br /&gt;123432435=3*3*5*7*71*5519&lt;br /&gt;&lt;br /&gt;Executed in 0,22 seconds&lt;br /&gt;p_value&lt;br /&gt;---------&lt;br /&gt;123432435&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;&lt;br /&gt;As you can see it executes a query for huge numbers less than a second :)) &lt;br /&gt;Of course, it depends, but still...&lt;br /&gt;&lt;br /&gt;Pair of words about it:&lt;br /&gt;First, he finds all the numbers less than square root of a number (s1).&lt;br /&gt;&lt;br /&gt;Then finds all the factors of that number among them and the opposite factor (dividing the number by each factor) (s2).&lt;br /&gt;Thats how he saves a lot of resources.&lt;br /&gt;&lt;br /&gt;Then Andrey builds a full list of factors for the initial number (s3).&lt;br /&gt;&lt;br /&gt;Then he finds for each factor the next bigger factor which is divided by the current one without rest (s4). This trick was interesting.&lt;br /&gt;&lt;br /&gt;At the last step he builds a hierarchy from the 1 factor up to the end (the initial number). All the prime factors are calculated as "child" factor/"parent" factor (IDD/ID in the query).&lt;br /&gt;&lt;br /&gt;nice, isn't it? :))&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3060148920065298017-4751937023778536089?l=volder-notes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://volder-notes.blogspot.com/feeds/4751937023778536089/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3060148920065298017&amp;postID=4751937023778536089' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3060148920065298017/posts/default/4751937023778536089'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3060148920065298017/posts/default/4751937023778536089'/><link rel='alternate' type='text/html' href='http://volder-notes.blogspot.com/2008/02/prime-factors-of-number.html' title='Prime factors of a number'/><author><name>Volder</name><uri>http://www.blogger.com/profile/15557750923919600413</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://i049.radikal.ru/0712/26/a24baea7e400.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3060148920065298017.post-7767462539777564192</id><published>2008-01-29T11:59:00.000+03:00</published><updated>2008-01-30T18:34:23.212+03:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Model'/><category scheme='http://www.blogger.com/atom/ns#' term='Reports'/><title type='text'>Reports: matrix report with percentage totals</title><content type='html'>&lt;a href="http://volder-notes.blogspot.com/2008/01/reports-matrix-report.html"&gt;...previous&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;On my previous note on reports I was asked for a query, that would return percentage ratios for vertical and horizontal subtotals (at least as I have understood ).&lt;br /&gt;&lt;br /&gt;So in this case our report draft could be represented in the following way:&lt;br /&gt;&lt;blockquote&gt;            |   Departments names block     |  Total&lt;br /&gt;------------+----------+---------+----------+---------------&lt;br /&gt;            |          |         |          |&lt;br /&gt;  Cities   -+-       Detailed data         -+- Ratio (%) of totals&lt;br /&gt;  Names     |           for each            |   for every city&lt;br /&gt;  block    -+-       city &amp; department     -+- to the very total &lt;br /&gt;            |          |         |          |&lt;br /&gt;------------+----------+---------+----------+----------------&lt;br /&gt; Total      | Ratio (%) of totals for every | Overall total &lt;br /&gt;            | department to the very total  |    (100%)&lt;/blockquote&gt;&lt;br /&gt;As I have not very much spare time I decided to put here only both solutions (non-model and model) without any explanation, and if you have any doubts/questions - you can put it as a comment, then I will elaborate on this.&lt;br /&gt;&lt;br /&gt;Input data you can still find &lt;a href="http://volder-notes.blogspot.com/2007/12/reports-totals-subtotals.html"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Non-Model solution:&lt;/strong&gt;&lt;br /&gt;&lt;blockquote&gt;SQL&gt; select nvl(city, 'TOTAL') as city_,&lt;br /&gt;  2         nvl2(city,to_char(sum(decode(dep, 'DEP1', sal, 0))),round(sum(decode(dep, 'DEP1', sal, 0))*100/sum(decode(flag,3,sal)),1)||'%') dep1,&lt;br /&gt;  3         nvl2(city,to_char(sum(decode(dep, 'DEP2', sal, 0))),round(sum(decode(dep, 'DEP2', sal, 0))*100/sum(decode(flag,3,sal)),1)||'%') dep2,&lt;br /&gt;  4         nvl2(city,to_char(sum(decode(dep, 'DEP3', sal, 0))),round(sum(decode(dep, 'DEP3', sal, 0))*100/sum(decode(flag,3,sal)),1)||'%') dep3,&lt;br /&gt;  5         nvl2(city,to_char(sum(decode(dep, 'DEP4', sal, 0))),round(sum(decode(dep, 'DEP4', sal, 0))*100/sum(decode(flag,3,sal)),1)||'%') dep4,&lt;br /&gt;  6         round(2*100*ratio_to_report(sum(decode(flag, 1, sal, 3, sal, 0))) over (),1)||'%' total&lt;br /&gt;  7       from (&lt;br /&gt;  8       select city, dep, sum(sales) sal, grouping_id(city, dep) flag&lt;br /&gt;  9               from t&lt;br /&gt; 10              where trunc(year, 'y') = date '2007-01-01'&lt;br /&gt; 11              group by cube(city, dep))&lt;br /&gt; 12      group by city&lt;br /&gt; 13      order by nvl2(city, 0, 1), total desc&lt;br /&gt; 14  /&lt;br /&gt;&lt;br /&gt;CITY_  DEP1        DEP2        DEP3        DEP4       TOTAL&lt;br /&gt;------ ----------- ----------- ----------- ---------- ----------&lt;br /&gt;Moscow 762         657         1020        487        87.9%&lt;br /&gt;Omsk   34          213         156         0          12.1%&lt;br /&gt;TOTAL  23.9%       26.1%       35.3%       14.6%      100%&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Model solution:&lt;/strong&gt;&lt;br /&gt;&lt;blockquote&gt;SQL&gt; select city,dep1,dep2,dep3,dep4,total  from t&lt;br /&gt;  2     where trunc(year,'y') = date '2007-01-01'&lt;br /&gt;  3      model&lt;br /&gt;  4       return updated rows&lt;br /&gt;  5       dimension by (city, dep)&lt;br /&gt;  6       measures(to_char(sales) dep1, to_char(sales) dep2, to_char(sales) dep3, to_char(sales) dep4, to_char(sales) total)&lt;br /&gt;  7       rules&lt;br /&gt;  8        upsert all&lt;br /&gt;  9         (dep1[any,null]=nvl(dep1[CV(),'DEP1'],0),&lt;br /&gt; 10          dep2[any,null]=nvl(dep1[CV(),'DEP2'],0),&lt;br /&gt; 11          dep3[any,null]=nvl(dep1[CV(),'DEP3'],0),&lt;br /&gt; 12          dep4[any,null]=nvl(dep1[CV(),'DEP4'],0),&lt;br /&gt; 13          total['TOTAL',null]=sum(dep1)[any,null]+sum(dep2)[any,null]+sum(dep3)[any,null]+sum(dep4)[any,null],&lt;br /&gt; 14          dep1['TOTAL',null]=round(sum(dep1)[any,null]*100/total['TOTAL',null],1)||'%',&lt;br /&gt; 15          dep2['TOTAL',null]=round(sum(dep2)[any,null]*100/total['TOTAL',null],1)||'%',&lt;br /&gt; 16          dep3['TOTAL',null]=round(sum(dep3)[any,null]*100/total['TOTAL',null],1)||'%',&lt;br /&gt; 17          dep4['TOTAL',null]=round(sum(dep4)[any,null]*100/total['TOTAL',null],1)||'%',&lt;br /&gt; 18          total[city&lt;&gt;'TOTAL',null]=round((dep1[CV(),CV()]+dep2[CV(),CV()]+dep3[CV(),CV()]+dep4[CV(),CV()])*100/total['TOTAL',null],1)||'%',&lt;br /&gt; 19          total['TOTAL',null]='100%'&lt;br /&gt; 20          )&lt;br /&gt; 21  /&lt;br /&gt;&lt;br /&gt;CITY   DEP1        DEP2        DEP3       DEP4       TOTAL&lt;br /&gt;------ ----------- ----------- ---------- ---------- ----------&lt;br /&gt;Moscow 762         657         1020       487        87.9%&lt;br /&gt;Omsk   34          213         156        0          12.1%&lt;br /&gt;TOTAL  23.9%       26.1%       35.3%      14.6%      100%&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;PS&lt;br /&gt;Added after the comment.&lt;br /&gt;&lt;br /&gt;"A good problem description worth half a solution" :)&lt;br /&gt;So again I'm not sure whether this is needed or not, but if you want to calculate only percentages in matrix report even for detailed data you can use the following:&lt;br /&gt;&lt;br /&gt;So the scheme of our report would be:&lt;br /&gt;&lt;blockquote&gt;            |   Departments names block     |  Total&lt;br /&gt;------------+----------+---------+----------+---------------&lt;br /&gt;            |          |         |          |&lt;br /&gt;  Cities   -+-    Ratio (%) of detailed    -+- Ratio (%) of totals&lt;br /&gt;  Names     |      data for each city &amp;     |   for every city&lt;br /&gt;  block    -+-     department to total     -+- to the very total &lt;br /&gt;            |          |         |          |&lt;br /&gt;------------+----------+---------+----------+----------------&lt;br /&gt; Total      | Ratio (%) of totals for every | Overall total &lt;br /&gt;            | department to the very total  |    (100%)&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;&lt;strong&gt;Non-model solution:&lt;/strong&gt;&lt;br /&gt;&lt;blockquote&gt;SQL&gt; select nvl(city, 'TOTAL') as city_,&lt;br /&gt;  2         round(sum(decode(dep, 'DEP1', sal, 0)), 2) || '%' dep1,&lt;br /&gt;  3         round(sum(decode(dep, 'DEP2', sal, 0)), 2) || '%' dep2,&lt;br /&gt;  4         round(sum(decode(dep, 'DEP3', sal, 0)), 2) || '%' dep3,&lt;br /&gt;  5         round(sum(decode(dep, 'DEP4', sal, 0)), 2) || '%' dep4,&lt;br /&gt;  6         round(sum(decode(flag, 1, sal, 3, sal, 0)), 2) || '%' total&lt;br /&gt;  7    from (select city, dep, sum(rtr) sal, grouping_id(city, dep) flag&lt;br /&gt;  8            from (select t.*, 100 * ratio_to_report(sales) over() rtr&lt;br /&gt;  9                    from t&lt;br /&gt; 10                   where trunc(year, 'y') = date '2007-01-01')&lt;br /&gt; 11           group by cube(city, dep))&lt;br /&gt; 12   group by city&lt;br /&gt; 13   order by nvl2(city, 0, 1), total desc&lt;br /&gt; 14  /&lt;br /&gt;&lt;br /&gt;CITY_  DEP1          DEP2         DEP3          DEP4          TOTAL&lt;br /&gt;------ ------------- ------------ ------------- ------------- -----------&lt;br /&gt;Moscow 22.89%        19.74%       30.64%        14.63%        87.89%&lt;br /&gt;Omsk   1.02%         6.4%         4.69%         0%            12.11%&lt;br /&gt;TOTAL  23.91%        26.13%       35.33%        14.63%        100%&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;&lt;br /&gt;&lt;strong&gt;Model solution:&lt;/strong&gt;&lt;br /&gt;&lt;blockquote&gt;SQL&gt; select city,&lt;br /&gt;  2         round(dep1,2)||'%' dep1,&lt;br /&gt;  3         round(dep2,2)||'%' de2,&lt;br /&gt;  4         round(dep3,2)||'%' de3,&lt;br /&gt;  5         round(dep4,2)||'%' de3,&lt;br /&gt;  6         round(total,2)||'%' total from (&lt;br /&gt;  7   select t.*, 100*ratio_to_report(sales) over () rtr from t&lt;br /&gt;  8      where trunc(year,'y') = date '2007-01-01')&lt;br /&gt;  9       model&lt;br /&gt; 10        return updated rows&lt;br /&gt; 11        dimension by (city, dep)&lt;br /&gt; 12        measures(rtr dep1, rtr dep2, rtr dep3, rtr dep4, rtr total)&lt;br /&gt; 13        rules&lt;br /&gt; 14         upsert all&lt;br /&gt; 15          (dep1[any,null]=nvl(dep1[CV(),'DEP1'],0),&lt;br /&gt; 16           dep2[any,null]=nvl(dep1[CV(),'DEP2'],0),&lt;br /&gt; 17           dep3[any,null]=nvl(dep1[CV(),'DEP3'],0),&lt;br /&gt; 18           dep4[any,null]=nvl(dep1[CV(),'DEP4'],0),&lt;br /&gt; 19           dep1['TOTAL',null]=sum(dep1)[any,null],&lt;br /&gt; 20           dep2['TOTAL',null]=sum(dep2)[any,null],&lt;br /&gt; 21           dep3['TOTAL',null]=sum(dep3)[any,null],&lt;br /&gt; 22           dep4['TOTAL',null]=sum(dep4)[any,null],&lt;br /&gt; 23           total[any,null]=dep1[CV(),CV()]+dep2[CV(),CV()]+dep3[CV(),CV()]+dep4[CV(),CV()]&lt;br /&gt; 24           )&lt;br /&gt; 25  /&lt;br /&gt;&lt;br /&gt;CITY   DEP1        DE2          DE3         DE3          TOTAL&lt;br /&gt;------ ----------- ------------ ----------- ------------ -----------&lt;br /&gt;Moscow 22.89%      19.74%       30.64%      14.63%       87.89%&lt;br /&gt;Omsk   1.02%       6.4%         4.69%       0%           12.11%&lt;br /&gt;TOTAL  23.91%      26.13%       35.33%      14.63%       100%&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;br /&gt;&lt;/blockquote&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3060148920065298017-7767462539777564192?l=volder-notes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://volder-notes.blogspot.com/feeds/7767462539777564192/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3060148920065298017&amp;postID=7767462539777564192' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3060148920065298017/posts/default/7767462539777564192'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3060148920065298017/posts/default/7767462539777564192'/><link rel='alternate' type='text/html' href='http://volder-notes.blogspot.com/2008/01/reports-matrix-report-with-percentage.html' title='Reports: matrix report with percentage totals'/><author><name>Volder</name><uri>http://www.blogger.com/profile/15557750923919600413</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://i049.radikal.ru/0712/26/a24baea7e400.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3060148920065298017.post-8480871080545296103</id><published>2008-01-27T00:10:00.000+03:00</published><updated>2008-01-27T01:09:16.124+03:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Model'/><title type='text'>Puzzle</title><content type='html'>On the russian SQL forum there was a question for &lt;a href="http://sql.ru/forum/actualthread.aspx?tid=518572"&gt;friday brain warm-up&lt;/a&gt;. The main idea is that the OP knows the answer beforehands, but he wants forum members to put some efforts on it to come up with easier solution.&lt;br /&gt;So here is the problem. I've taken the description from that &lt;a href="http://dheera.net/sci/sequence.php"&gt;site&lt;/a&gt;.&lt;br /&gt;You have a list of numbers:&lt;br /&gt;&lt;blockquote&gt;1&lt;br /&gt;11&lt;br /&gt;21&lt;br /&gt;1211&lt;br /&gt;111221&lt;br /&gt;312211&lt;br /&gt;13112221&lt;br /&gt;1113213211&lt;br /&gt;... &lt;/blockquote&gt;&lt;br /&gt;You need to understand the logic of building such a sequence and then post a SQL solution for that.&lt;br /&gt;&lt;br /&gt;Ok. If you want to try your skills - now it is the best time to stop and proceed by yourown. For solution look forward.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Solution:&lt;/span&gt; It's actually quite simple :) After starting the sequence with 1, each term in the sequence consists of groups of two numbers based on the previous term - the first being the quantity and the second specifying which digit.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Example:&lt;/span&gt; the first term is 1, which has "one 1" in it, therefore 11.&lt;br /&gt;11 has "two 1's" in it, therefore 21.&lt;br /&gt;21 has "one 2 and one 1" in it and therefore 1211.&lt;br /&gt;&lt;br /&gt;Now let's proceed with the second part - SQL solution.&lt;br /&gt;Although a query with similar idea was posted before I did, mine was:&lt;br /&gt;&lt;blockquote&gt;SQL&gt; select s from (select * from dual&lt;br /&gt;  2      model&lt;br /&gt;  3       dimension by (0 d)&lt;br /&gt;  4       measures(cast('1' as varchar2(1000)) s, cast(null as varchar2(1000)) s_new, 10 n, 1 flag)&lt;br /&gt;  5        rules iterate (10000000) until(flag[iteration_number+1]=n[0])&lt;br /&gt;  6         (s_new[iteration_number+1]=decode(flag[CV()-1],0,s_new[CV()-1],null)||&lt;br /&gt;  7                                    length(regexp_substr(s[CV()-1],'^(.)\1*'))||substr(s[CV()-1],1,1),&lt;br /&gt;  8          s[iteration_number+1]= regexp_replace(s[CV()-1],'^(.)\1*'),&lt;br /&gt;  9          flag[iteration_number+1]=nvl2(s[CV()],0,max(flag)[d&lt;CV()]+1),&lt;br /&gt; 10          s[iteration_number+1]=nvl(s[CV()], s_new[CV()])&lt;br /&gt; 11          ))&lt;br /&gt; 12          where flag&gt;0&lt;br /&gt; 13  /&lt;br /&gt;&lt;br /&gt;S&lt;br /&gt;--------------------------------------------------------------------------------&lt;br /&gt;1&lt;br /&gt;11&lt;br /&gt;21&lt;br /&gt;1211&lt;br /&gt;111221&lt;br /&gt;312211&lt;br /&gt;13112221&lt;br /&gt;1113213211&lt;br /&gt;31131211131221&lt;br /&gt;13211311123113112211&lt;br /&gt;&lt;br /&gt;10 rows selected&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3060148920065298017-8480871080545296103?l=volder-notes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://volder-notes.blogspot.com/feeds/8480871080545296103/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3060148920065298017&amp;postID=8480871080545296103' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3060148920065298017/posts/default/8480871080545296103'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3060148920065298017/posts/default/8480871080545296103'/><link rel='alternate' type='text/html' href='http://volder-notes.blogspot.com/2008/01/puzzle.html' title='Puzzle'/><author><name>Volder</name><uri>http://www.blogger.com/profile/15557750923919600413</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://i049.radikal.ru/0712/26/a24baea7e400.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3060148920065298017.post-2417308317421562488</id><published>2008-01-19T23:37:00.000+03:00</published><updated>2008-01-29T12:18:00.109+03:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Model'/><category scheme='http://www.blogger.com/atom/ns#' term='Reports'/><title type='text'>Reports: matrix report</title><content type='html'>&lt;a href="http://volder-notes.blogspot.com/2007/12/reports-getting-total-of-all-children.html"&gt;...previous&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;After a while I decided to write one more note on reports. Now it would be about matrix reports.&lt;br /&gt;&lt;br /&gt;The input data you can take &lt;a href="http://volder-notes.blogspot.com/2007/12/reports-totals-subtotals.html"&gt;here&lt;/a&gt;.&lt;br /&gt;As you can find out it is information on sales for two years (2006, 2007) of four departments (DEP1, DEP2, DEP3, DEP4) in two cities (Omsk, Moscow).&lt;br /&gt;Imagine we need to get sales only for the year 2007, but present it as a matrix report.&lt;br /&gt;Horizontally information on departments should be placed, vertically it should be information on cities. In the last row and column of a table we need to show totals on departments and cities consequently. At the bottom in the most right column we need to place overall total sales.&lt;br /&gt;So the general draft of our report would look like:&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;            |   Departments names block    |  Total&lt;br /&gt;------------+----------+---------+---------+-------------&lt;br /&gt;            |          |         |         |&lt;br /&gt;  Cities   -+-       Detailed data        -+- Totals&lt;br /&gt;  Names     |           for each           |   for&lt;br /&gt;  block    -+-       city &amp; department    -+- every city&lt;br /&gt;            |          |         |         |&lt;br /&gt;------------+----------+---------+---------+-------------&lt;br /&gt; Total      |  Totals for every department | Overall total&lt;/blockquote&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Non-Model solution:&lt;/span&gt;&lt;br /&gt;To build such a report we need to get subtotals both on departments and cities. We also need to get the overall total of all sales for the year 2007. For such requirments we need to use &lt;a href="http://download.oracle.com/docs/cd/B19306_01/server.102/b14223/aggreg.htm#sthref1621"&gt;CUBE Extension to GROUP BY&lt;/a&gt;.&lt;br /&gt;So the whole query would be:&lt;br /&gt;&lt;blockquote&gt;SQL&gt; select nvl(city, 'TOTAL') as city_,&lt;br /&gt;  2         sum(decode(dep, 'DEP1', sal, 0)) dep1,&lt;br /&gt;  3         sum(decode(dep, 'DEP2', sal, 0)) dep2,&lt;br /&gt;  4         sum(decode(dep, 'DEP3', sal, 0)) dep3,&lt;br /&gt;  5         sum(decode(dep, 'DEP4', sal, 0)) dep4,&lt;br /&gt;  6         sum(decode(flag, 1, sal, 3, sal, 0)) total&lt;br /&gt;  7    from (select city, dep, sum(sales) sal, grouping_id(city, dep) flag&lt;br /&gt;  8            from t&lt;br /&gt;  9           where trunc(year, 'y') = date '2007-01-01'&lt;br /&gt; 10           group by cube(city, dep))&lt;br /&gt; 11   group by city&lt;br /&gt; 12   order by nvl2(city, 0, 1), total desc&lt;br /&gt; 13  /&lt;br /&gt;&lt;br /&gt;CITY_        DEP1       DEP2       DEP3       DEP4      TOTAL&lt;br /&gt;------ ---------- ---------- ---------- ---------- ----------&lt;br /&gt;Moscow        762        657       1020        487       2926&lt;br /&gt;Omsk           34        213        156          0        403&lt;br /&gt;TOTAL         796        870       1176        487       3329&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;&lt;br /&gt;This is the report we were looking for. So we can easily see that Dep3 sales in Omsk in the year 2007 were 156, while total sales of Dep3 were 1176, and total sales in Omsk were 403. Overall total sales were 3329.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Model solution:&lt;/span&gt;&lt;br /&gt;Using model clause we need to have a tricky RETURN UPDATED ROWS clause. So that we would not group data in the outer query.&lt;br /&gt;Also we will use UPSERT ALL command for all rules. What is this and how it can be used you can read in &lt;a href="http://volder-notes.blogspot.com/2007/12/reports-totals-subtotals.html"&gt;one of my previous notes&lt;/a&gt;. So the query would be:&lt;br /&gt;&lt;blockquote&gt;SQL&gt; select city,dep1,dep2,dep3,dep4,total from t&lt;br /&gt;  2   where trunc(year,'y') = date '2007-01-01'&lt;br /&gt;  3    model&lt;br /&gt;  4     return updated rows&lt;br /&gt;  5     dimension by (city, dep)&lt;br /&gt;  6     measures(sales dep1, sales dep2, sales dep3, sales dep4, sales total)&lt;br /&gt;  7     rules&lt;br /&gt;  8      upsert all&lt;br /&gt;  9       (dep1[any,null]=nvl(dep1[CV(),'DEP1'],0),&lt;br /&gt; 10        dep2[any,null]=nvl(dep1[CV(),'DEP2'],0),&lt;br /&gt; 11        dep3[any,null]=nvl(dep1[CV(),'DEP3'],0),&lt;br /&gt; 12        dep4[any,null]=nvl(dep1[CV(),'DEP4'],0),&lt;br /&gt; 13        dep1['TOTAL',null]=sum(dep1)[any,null],&lt;br /&gt; 14        dep2['TOTAL',null]=sum(dep2)[any,null],&lt;br /&gt; 15        dep3['TOTAL',null]=sum(dep3)[any,null],&lt;br /&gt; 16        dep4['TOTAL',null]=sum(dep4)[any,null],&lt;br /&gt; 17        total[any,null]=dep1[CV(),CV()]+dep2[CV(),CV()]+dep3[CV(),CV()]+dep4[CV(),CV()]&lt;br /&gt; 18        )&lt;br /&gt; 19  /&lt;br /&gt;&lt;br /&gt;CITY         DEP1       DEP2       DEP3       DEP4      TOTAL&lt;br /&gt;------ ---------- ---------- ---------- ---------- ----------&lt;br /&gt;Moscow        762        657       1020        487       2926&lt;br /&gt;Omsk           34        213        156          0        403&lt;br /&gt;TOTAL         796        870       1176        487       3329&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;&lt;br /&gt;Notice that we have put as many measures for departments as we have in our data. So we are doing a column to row transformation, but as it was in the non-model solution it is static transformation. So we need to know how many columns we need to get beforehand.&lt;br /&gt;&lt;br /&gt;As you know the model clause is working by executing a bunch of rules (one by one in a sequential order in our case).&lt;br /&gt;&lt;br /&gt;We are placing the following groups of rules:&lt;br /&gt;First we create new rows with department equal NULL, where we place sales of each department in each city.&lt;br /&gt;Then we place rules for creating a TOTAL row, where we find the total sales for each department.&lt;br /&gt;Finaly we write a rule (column TOTAL) - to calculate totals for every city and to find the overall total also.&lt;br /&gt;So in the end we get the same results as with non-model solution.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://volder-notes.blogspot.com/2008/01/reports-matrix-report-with-percentage.html"&gt;to be continued...&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3060148920065298017-2417308317421562488?l=volder-notes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://volder-notes.blogspot.com/feeds/2417308317421562488/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3060148920065298017&amp;postID=2417308317421562488' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3060148920065298017/posts/default/2417308317421562488'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3060148920065298017/posts/default/2417308317421562488'/><link rel='alternate' type='text/html' href='http://volder-notes.blogspot.com/2008/01/reports-matrix-report.html' title='Reports: matrix report'/><author><name>Volder</name><uri>http://www.blogger.com/profile/15557750923919600413</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://i049.radikal.ru/0712/26/a24baea7e400.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3060148920065298017.post-6255157643839493455</id><published>2008-01-11T14:15:00.000+03:00</published><updated>2008-01-13T01:08:23.501+03:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Model'/><title type='text'>Combinatorial problem</title><content type='html'>The resource is from &lt;a href="http://sql.ru/forum/actualthread.aspx?bid=3&amp;tid=512998&amp;pg=1"&gt;Russian Forum&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Input:&lt;/strong&gt;&lt;br /&gt;We've got an alphabet consisting of N unique symbols.&lt;br /&gt;E.g. alphabet='AB'.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Problem:&lt;/strong&gt;&lt;br /&gt;We need to find all possible variations with length M (so there would be power(N,M) number of combinations).&lt;br /&gt;For our query let it be 4 as in the original source.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Solution:&lt;/strong&gt;&lt;br /&gt;ALthough there were several other solutions, e.g. using hierarchical queries, I post here my solution with model clause.&lt;br /&gt;&lt;blockquote&gt;SQL&gt; with t as (select 'AB' str from dual),&lt;br /&gt;  2  t1 as (select str, level-1 lvl from t connect by level&lt;=power(length(str),4))&lt;br /&gt;  3  --&lt;br /&gt;  4   select lpad(num,4,substr(str,1,1)) path from t1&lt;br /&gt;  5    model&lt;br /&gt;  6     partition by (lvl part)&lt;br /&gt;  7     dimension by (0 dim)&lt;br /&gt;  8     measures (lvl, cast(null as varchar2(4)) num, str)&lt;br /&gt;  9      rules iterate (999) until (lvl[0] = 0)&lt;br /&gt; 10       (num[0] = substr(str[0],mod(lvl[0],length(str[0]))+1,1)||num[0],&lt;br /&gt; 11        lvl[0] = trunc(lvl[0]/length(str[0]))&lt;br /&gt; 12       )&lt;br /&gt; 13   order by part&lt;br /&gt; 14  /&lt;br /&gt;&lt;br /&gt;PATH&lt;br /&gt;----&lt;br /&gt;AAAA&lt;br /&gt;AAAB&lt;br /&gt;AABA&lt;br /&gt;AABB&lt;br /&gt;ABAA&lt;br /&gt;ABAB&lt;br /&gt;ABBA&lt;br /&gt;ABBB&lt;br /&gt;BAAA&lt;br /&gt;BAAB&lt;br /&gt;BABA&lt;br /&gt;BABB&lt;br /&gt;BBAA&lt;br /&gt;BBAB&lt;br /&gt;BBBA&lt;br /&gt;BBBB&lt;br /&gt;&lt;br /&gt;16 rows selected&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;&lt;br /&gt;As &lt;a href="http://prodlife.wordpress.com/"&gt;Chen Shapira&lt;/a&gt; asked me for some comments on the logic.&lt;br /&gt;I can elucidate a little bit.&lt;br /&gt;The first thing we need to understand that when we input an alphabet of N unique symbols and want to create all unique combinations of such symbols with length M - finally we get power(N,M) combinations.&lt;br /&gt;It is kind of combinatorial stuff which is learnt at school, so the description you can find in Wiki &lt;a href="http://en.wikipedia.org/wiki/Combinatorics#Permutations_with_repetitions"&gt;Permutations with repetitions&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;So the first what we do in our query - is generating the needed number of values we will have in the final result ('with clause' query connect by level&lt;=...).&lt;br /&gt;To distinguish these values we assign ordinal numbers starting from 0 up to the max number of value. Why we start from zero you understand later.&lt;br /&gt;&lt;br /&gt;Now what we are going to do with these numbers? We are going to transform them from denary numeral system to the numeral system of the needed base which is length(str) in our case.&lt;br /&gt;&lt;br /&gt;So probably if my english is not so good to understand I'll explain it with examples.&lt;br /&gt;So here we're transforming to the binary system:&lt;br /&gt;&lt;blockquote&gt;SQL&gt; with t as (select '01' str from dual),&lt;br /&gt;  2  t1 as (select str, level-1 lvl from t connect by level&lt;=power(length(str),4))&lt;br /&gt;  3  --&lt;br /&gt;  4   select initial_value, num, lpad(num,4,substr(str,1,1)) path from t1&lt;br /&gt;  5    model&lt;br /&gt;  6     partition by (lvl part)&lt;br /&gt;  7     dimension by (0 dim)&lt;br /&gt;  8     measures (lvl initial_value, lvl, cast(null as varchar2(4)) num, str)&lt;br /&gt;  9      rules iterate (999) until (lvl[0] = 0)&lt;br /&gt; 10       (num[0] = substr(str[0],mod(lvl[0],length(str[0]))+1,1)||num[0],&lt;br /&gt; 11        lvl[0] = trunc(lvl[0]/length(str[0]))&lt;br /&gt; 12       )&lt;br /&gt; 13   order by part&lt;br /&gt; 14  /&lt;br /&gt;&lt;br /&gt;INITIAL_VALUE NUM  PATH&lt;br /&gt;------------- ---- ----&lt;br /&gt;            0 0    0000&lt;br /&gt;            1 1    0001&lt;br /&gt;            2 10   0010&lt;br /&gt;            3 11   0011&lt;br /&gt;            4 100  0100&lt;br /&gt;            5 101  0101&lt;br /&gt;            6 110  0110&lt;br /&gt;            7 111  0111&lt;br /&gt;            8 1000 1000&lt;br /&gt;            9 1001 1001&lt;br /&gt;           10 1010 1010&lt;br /&gt;           11 1011 1011&lt;br /&gt;           12 1100 1100&lt;br /&gt;           13 1101 1101&lt;br /&gt;           14 1110 1110&lt;br /&gt;           15 1111 1111&lt;br /&gt;&lt;br /&gt;16 rows selected&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;&lt;br /&gt;The next example is transformation to ternary numeral system:&lt;br /&gt;&lt;blockquote&gt;SQL&gt; with t as (select '012' str from dual),&lt;br /&gt;  2  t1 as (select str, level-1 lvl from t connect by level&lt;=power(length(str),4))&lt;br /&gt;  3  --&lt;br /&gt;  4   select initial_value, num, lpad(num,4,substr(str,1,1)) path from t1&lt;br /&gt;  5    model&lt;br /&gt;  6     partition by (lvl part)&lt;br /&gt;  7     dimension by (0 dim)&lt;br /&gt;  8     measures (lvl initial_value, lvl, cast(null as varchar2(4)) num, str)&lt;br /&gt;  9      rules iterate (999) until (lvl[0] = 0)&lt;br /&gt; 10       (num[0] = substr(str[0],mod(lvl[0],length(str[0]))+1,1)||num[0],&lt;br /&gt; 11        lvl[0] = trunc(lvl[0]/length(str[0]))&lt;br /&gt; 12       )&lt;br /&gt; 13   order by part&lt;br /&gt; 14  /&lt;br /&gt;&lt;br /&gt;INITIAL_VALUE NUM  PATH&lt;br /&gt;------------- ---- ----&lt;br /&gt;            0 0    0000&lt;br /&gt;            1 1    0001&lt;br /&gt;            2 2    0002&lt;br /&gt;            3 10   0010&lt;br /&gt;            4 11   0011&lt;br /&gt;            5 12   0012&lt;br /&gt;            6 20   0020&lt;br /&gt;            7 21   0021&lt;br /&gt;            8 22   0022&lt;br /&gt;            9 100  0100&lt;br /&gt;           10 101  0101&lt;br /&gt;           11 102  0102&lt;br /&gt;           12 110  0110&lt;br /&gt;           13 111  0111&lt;br /&gt;           14 112  0112&lt;br /&gt;           15 120  0120&lt;br /&gt;           16 121  0121&lt;br /&gt;           17 122  0122&lt;br /&gt;           18 200  0200&lt;br /&gt;           19 201  0201&lt;br /&gt;           20 202  0202&lt;br /&gt;           21 210  0210&lt;br /&gt;           22 211  0211&lt;br /&gt;           23 212  0212&lt;br /&gt;           24 220  0220&lt;br /&gt;           25 221  0221&lt;br /&gt;           26 222  0222&lt;br /&gt;           27 1000 1000&lt;br /&gt;           28 1001 1001&lt;br /&gt;           29 1002 1002&lt;br /&gt;           30 1010 1010&lt;br /&gt;           31 1011 1011&lt;br /&gt;           32 1012 1012&lt;br /&gt;           33 1020 1020&lt;br /&gt;           34 1021 1021&lt;br /&gt;           35 1022 1022&lt;br /&gt;           36 1100 1100&lt;br /&gt;           37 1101 1101&lt;br /&gt;           38 1102 1102&lt;br /&gt;           39 1110 1110&lt;br /&gt;           40 1111 1111&lt;br /&gt;           41 1112 1112&lt;br /&gt;           42 1120 1120&lt;br /&gt;           43 1121 1121&lt;br /&gt;           44 1122 1122&lt;br /&gt;           45 1200 1200&lt;br /&gt;           46 1201 1201&lt;br /&gt;           47 1202 1202&lt;br /&gt;           48 1210 1210&lt;br /&gt;           49 1211 1211&lt;br /&gt;           50 1212 1212&lt;br /&gt;           51 1220 1220&lt;br /&gt;           52 1221 1221&lt;br /&gt;           53 1222 1222&lt;br /&gt;           54 2000 2000&lt;br /&gt;           55 2001 2001&lt;br /&gt;           56 2002 2002&lt;br /&gt;           57 2010 2010&lt;br /&gt;           58 2011 2011&lt;br /&gt;           59 2012 2012&lt;br /&gt;           60 2020 2020&lt;br /&gt;           61 2021 2021&lt;br /&gt;           62 2022 2022&lt;br /&gt;           63 2100 2100&lt;br /&gt;           64 2101 2101&lt;br /&gt;           65 2102 2102&lt;br /&gt;           66 2110 2110&lt;br /&gt;           67 2111 2111&lt;br /&gt;           68 2112 2112&lt;br /&gt;           69 2120 2120&lt;br /&gt;           70 2121 2121&lt;br /&gt;           71 2122 2122&lt;br /&gt;           72 2200 2200&lt;br /&gt;           73 2201 2201&lt;br /&gt;           74 2202 2202&lt;br /&gt;           75 2210 2210&lt;br /&gt;           76 2211 2211&lt;br /&gt;           77 2212 2212&lt;br /&gt;           78 2220 2220&lt;br /&gt;           79 2221 2221&lt;br /&gt;           80 2222 2222&lt;br /&gt;&lt;br /&gt;81 rows selected&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;&lt;br /&gt;So at every iteration in our model clause we cut out the value at the appropriate position in our number. It is general algorithm of &lt;a href="http://en.wikipedia.org/wiki/Numeral_system#Change_of_radix"&gt;Radix Change&lt;/a&gt;.&lt;br /&gt;As we generate the amount of numbers that would be unique in the corresponding numeral system we get unique values of the needed length (M) by padding the first letter/digit from the input string.&lt;br /&gt;And instead of '01', '012' etc we can use any other string consisting of unique symbols and we get the needed result.&lt;br /&gt;&lt;br /&gt;By the way, if we input '0123456789' string we get transformation to the denary system. So the output would be the same to the inputed numbers:&lt;br /&gt;&lt;blockquote&gt;SQL&gt; with t as (select '0123456789' str from dual),&lt;br /&gt;  2  t1 as (select str, level-1 lvl from t connect by level&lt;=20/*power(length(str),4)*/)&lt;br /&gt;  3  --&lt;br /&gt;  4   select initial_value, num, lpad(num,4,substr(str,1,1)) path from t1&lt;br /&gt;  5    model&lt;br /&gt;  6     partition by (lvl part)&lt;br /&gt;  7     dimension by (0 dim)&lt;br /&gt;  8     measures (lvl initial_value, lvl, cast(null as varchar2(4)) num, str)&lt;br /&gt;  9      rules iterate (999) until (lvl[0] = 0)&lt;br /&gt; 10       (num[0] = substr(str[0],mod(lvl[0],length(str[0]))+1,1)||num[0],&lt;br /&gt; 11        lvl[0] = trunc(lvl[0]/length(str[0]))&lt;br /&gt; 12       )&lt;br /&gt; 13   order by part&lt;br /&gt; 14  /&lt;br /&gt;&lt;br /&gt;INITIAL_VALUE NUM  PATH&lt;br /&gt;------------- ---- ----&lt;br /&gt;            0 0    0000&lt;br /&gt;            1 1    0001&lt;br /&gt;            2 2    0002&lt;br /&gt;            3 3    0003&lt;br /&gt;            4 4    0004&lt;br /&gt;            5 5    0005&lt;br /&gt;            6 6    0006&lt;br /&gt;            7 7    0007&lt;br /&gt;            8 8    0008&lt;br /&gt;            9 9    0009&lt;br /&gt;           10 10   0010&lt;br /&gt;           11 11   0011&lt;br /&gt;           12 12   0012&lt;br /&gt;           13 13   0013&lt;br /&gt;           14 14   0014&lt;br /&gt;           15 15   0015&lt;br /&gt;           16 16   0016&lt;br /&gt;           17 17   0017&lt;br /&gt;           18 18   0018&lt;br /&gt;           19 19   0019&lt;br /&gt;&lt;br /&gt;20 rows selected&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;&lt;br /&gt;Hope, now it is more clear. If still not - don't hesitate to ask.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3060148920065298017-6255157643839493455?l=volder-notes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://volder-notes.blogspot.com/feeds/6255157643839493455/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3060148920065298017&amp;postID=6255157643839493455' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3060148920065298017/posts/default/6255157643839493455'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3060148920065298017/posts/default/6255157643839493455'/><link rel='alternate' type='text/html' href='http://volder-notes.blogspot.com/2008/01/combinatorial-problem.html' title='Combinatorial problem'/><author><name>Volder</name><uri>http://www.blogger.com/profile/15557750923919600413</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://i049.radikal.ru/0712/26/a24baea7e400.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3060148920065298017.post-2657080401151833105</id><published>2008-01-11T00:46:00.000+03:00</published><updated>2008-01-11T09:09:42.135+03:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Analytic Functions'/><title type='text'>Grouping overlapping number intervals, date periods etc...</title><content type='html'>Today I spent some time on a quite simple query. I'm speaking about this &lt;a href="http://forums.oracle.com/forums/thread.jspa?threadID=604240&amp;tstart=0"&gt;thread on OTN&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;It is a simple task - to get overlapping intervals grouped. Sometimes this problem is formulated as to find a continious date period out of several separate periods.&lt;br /&gt;Simple solution here would be to use analytic functions an so called start_of_group type of query:&lt;br /&gt;&lt;blockquote&gt;SQL&gt; with mytable as (select 1 id, 1 begin_data, 10 end_data from dual union all&lt;br /&gt;  2                   select 1 id, 5 begin_data, 7 end_data from dual union all&lt;br /&gt;  3                   select 1 id, 4 begin_data, 8 end_data from dual union all&lt;br /&gt;  4                   select 1 id, 11 begin_data, 15 end_data from dual union all&lt;br /&gt;  5                   select 1 id, 13 begin_data, 18 end_data from dual union all&lt;br /&gt;  6                   select 2 id, 1 begin_data, 18 end_data from dual union all&lt;br /&gt;  7                   select 2 id, 13 begin_data, 23 end_data from dual union all&lt;br /&gt;  8                   select 2 id, 31 begin_data, 34 end_data from dual)&lt;br /&gt;  9  select id, min(begin_data) beg_d, max(end_data) end_d&lt;br /&gt; 10    from (select t1.*,&lt;br /&gt; 11                 sum(start_of_group) over(partition by id order by begin_data, end_data) gr&lt;br /&gt; 12            from (select t.*,&lt;br /&gt; 13                         case&lt;br /&gt; 14                           when begin_data &gt;&lt;br /&gt; 15                                nvl(max(end_data)&lt;br /&gt; 16                                    over(partition by id order by begin_data,end_data&lt;br /&gt; 17                                         rows between unbounded preceding and 1 preceding),&lt;br /&gt; 18                                    begin_data-1)&lt;br /&gt; 19                           then 1&lt;br /&gt; 20                           else 0&lt;br /&gt; 21                         end start_of_group&lt;br /&gt; 22                    from mytable t) t1)&lt;br /&gt; 23   group by id, gr&lt;br /&gt; 24   order by 1, 2&lt;br /&gt; 25  /&lt;br /&gt;&lt;br /&gt;        ID      BEG_D      END_D&lt;br /&gt;---------- ---------- ----------&lt;br /&gt;         1          1         10&lt;br /&gt;         1         11         18&lt;br /&gt;         2          1         23&lt;br /&gt;         2         31         34&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;&lt;br /&gt;BTW if the table contains duplicate rows - I'd better add some unique key (e.g. add the most inner inline view with row_number() over() in there), and then use this row_number value to get rows sorted in the same order at all levels of query.&lt;br /&gt;&lt;br /&gt;I'm speaking about the following case:&lt;br /&gt;&lt;blockquote&gt;        ID BEGIN_DATA   END_DATA&lt;br /&gt;---------- ---------- ----------&lt;br /&gt;         1          1         10&lt;br /&gt;         1          2          7&lt;br /&gt;         1         11         15&lt;br /&gt;         1         11         15&lt;br /&gt;         1         14         17&lt;/blockquote&gt;&lt;br /&gt;Although, I couldn't simulate such a situation, but it is not clear for me, if there are two separate (same) sortings in the query, would oracle sort rows equally, when there are duplicate rows.&lt;br /&gt;&lt;br /&gt;I mean can't it happen, that while getting the value of start_of_group the Oracle gives a value of 1 to one row of the duplicate rows, and while summing this start_of_group exchange the places of these rows:&lt;br /&gt;&lt;blockquote&gt;        ID BEGIN_DATA   END_DATA START_OF_GROUP&lt;br /&gt;---------- ---------- ---------- --------------&lt;br /&gt;         1          1         10              1&lt;br /&gt;         1          2          7              0&lt;br /&gt;         1         11         15              0  &lt;--+&lt;br /&gt;                                                    |&lt;br /&gt;         1         11         15              1  &lt;--+&lt;br /&gt;         1         14         17              0 &lt;/blockquote&gt;&lt;br /&gt;In that case, the overall result would be wrong.&lt;br /&gt;If you know how Oracle will behave in such a case, please, leave a comment.&lt;br /&gt;&lt;br /&gt;PS&lt;br /&gt;A small test case shows that you shouldn't rely on it:&lt;br /&gt;&lt;blockquote&gt;SQL&gt; with test as (select 1 id, 1 val from dual union all&lt;br /&gt;  2                select 1 id, 10 val from dual union all&lt;br /&gt;  3                select 1 id, 5 val from dual union all&lt;br /&gt;  4                select 1 id, 7 val from dual)&lt;br /&gt;  5                --&lt;br /&gt;  6  select t2.*, row_number() over(order by id) rn3&lt;br /&gt;  7    from (select t1.*, row_number() over(order by id, val) rn2&lt;br /&gt;  8            from (select test.*, row_number() over(order by id) rn1 from test) t1&lt;br /&gt;  9           order by val) t2&lt;br /&gt; 10  /&lt;br /&gt;&lt;br /&gt;        ID        VAL        RN1        RN2        RN3&lt;br /&gt;---------- ---------- ---------- ---------- ----------&lt;br /&gt;         1          1          1          1          1&lt;br /&gt;         1         10          4          4          2&lt;br /&gt;         1          7          2          3          3&lt;br /&gt;         1          5          3          2          4&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;&lt;br /&gt;row_number() over(order by id) is absoultely the same in the most inner query and the most outer. But produce different results, because another sortings happened between these two layers. So even if there are no sortings between them happen, I'd shurely won't rely on it, and rewrite the query in the way I mentioned before:&lt;br /&gt;&lt;blockquote&gt;  select id, min(begin_data) beg_d, max(end_data) end_d&lt;br /&gt;    from (select t1.*,&lt;br /&gt;                 sum(start_of_group) over(partition by id order by &lt;span style="COLOR: rgb(204,0,0)"&gt;rn&lt;/span&gt;) gr&lt;br /&gt;            from (select t.*,&lt;br /&gt;                         case&lt;br /&gt;                           when begin_data &gt;&lt;br /&gt;                                nvl(max(end_data)&lt;br /&gt;                                    over(partition by id order by &lt;span style="COLOR: rgb(204,0,0)"&gt;rn&lt;/span&gt;&lt;br /&gt;                                         rows between unbounded preceding and 1 preceding),&lt;br /&gt;                                    begin_data - 1) &lt;br /&gt;                           then 1&lt;br /&gt;                           else 0&lt;br /&gt;                         end start_of_group&lt;br /&gt;                    from (select tt.*,&lt;br /&gt;                                 &lt;span style="COLOR: rgb(204,0,0)"&gt;row_number() over(partition by id order by begin_data, end_data) rn&lt;/span&gt;&lt;br /&gt;                            from mytable tt) t) t1)&lt;br /&gt;   group by id, gr&lt;br /&gt;   order by 1, 2&lt;/blockquote&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3060148920065298017-2657080401151833105?l=volder-notes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://volder-notes.blogspot.com/feeds/2657080401151833105/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3060148920065298017&amp;postID=2657080401151833105' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3060148920065298017/posts/default/2657080401151833105'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3060148920065298017/posts/default/2657080401151833105'/><link rel='alternate' type='text/html' href='http://volder-notes.blogspot.com/2008/01/grouping-overlapping-intervals.html' title='Grouping overlapping number intervals, date periods etc...'/><author><name>Volder</name><uri>http://www.blogger.com/profile/15557750923919600413</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://i049.radikal.ru/0712/26/a24baea7e400.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3060148920065298017.post-7720321650255032963</id><published>2008-01-09T19:02:00.000+03:00</published><updated>2008-01-09T19:10:46.290+03:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Model'/><title type='text'>Current month calendar (model clause)</title><content type='html'>I know there were plenty calendar creation scripts through SQL, but I couldn't find any using the Model clause.&lt;br /&gt;&lt;br /&gt;So here what I wrote:&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;SQL&gt; create or replace view calendar_view as&lt;br /&gt;  2  select case when dim=0 then to_char(dw1,' DY')&lt;br /&gt;  3              when trunc(sysdate,'mm')&lt;&gt;trunc(dw1,'mm') then null&lt;br /&gt;  4              when trunc(sysdate)=dw1 then '['||to_char(dw1,'dd')||']'&lt;br /&gt;  5              else to_char(dw1,' dd')&lt;br /&gt;  6         end dw1,&lt;br /&gt;  7         case when dim=0 then to_char(dw2,' DY')&lt;br /&gt;  8              when trunc(sysdate,'mm')&lt;&gt;trunc(dw2,'mm') then null&lt;br /&gt;  9              when trunc(sysdate)=dw2 then '['||to_char(dw2,'dd')||']'&lt;br /&gt; 10              else to_char(dw2,' dd')&lt;br /&gt; 11         end dw2,&lt;br /&gt; 12         case when dim=0 then to_char(dw3,' DY')&lt;br /&gt; 13              when trunc(sysdate,'mm')&lt;&gt;trunc(dw3,'mm') then null&lt;br /&gt; 14              when trunc(sysdate)=dw3 then '['||to_char(dw3,'dd')||']'&lt;br /&gt; 15              else to_char(dw3,' dd')&lt;br /&gt; 16         end dw3,&lt;br /&gt; 17         case when dim=0 then to_char(dw4,' DY')&lt;br /&gt; 18              when trunc(sysdate,'mm')&lt;&gt;trunc(dw4,'mm') then null&lt;br /&gt; 19              when trunc(sysdate)=dw4 then '['||to_char(dw4,'dd')||']'&lt;br /&gt; 20              else to_char(dw4,' dd')&lt;br /&gt; 21         end dw4,&lt;br /&gt; 22         case when dim=0 then to_char(dw5,' DY')&lt;br /&gt; 23              when trunc(sysdate,'mm')&lt;&gt;trunc(dw5,'mm') then null&lt;br /&gt; 24              when trunc(sysdate)=dw5 then '['||to_char(dw5,'dd')||']'&lt;br /&gt; 25              else to_char(dw5,' dd')&lt;br /&gt; 26         end dw5,&lt;br /&gt; 27         case when dim=0 then to_char(dw6,' DY')&lt;br /&gt; 28              when trunc(sysdate,'mm')&lt;&gt;trunc(dw6,'mm') then null&lt;br /&gt; 29              when trunc(sysdate)=dw6 then '['||to_char(dw6,'dd')||']'&lt;br /&gt; 30              else to_char(dw6,' dd')&lt;br /&gt; 31         end dw6,&lt;br /&gt; 32         case when dim=0 then to_char(dw7,' DY')&lt;br /&gt; 33              when trunc(sysdate,'mm')&lt;&gt;trunc(dw7,'mm') then null&lt;br /&gt; 34              when trunc(sysdate)=dw7 then '['||to_char(dw7,'dd')||']'&lt;br /&gt; 35              else to_char(dw7,' dd')&lt;br /&gt; 36         end dw7&lt;br /&gt; 37               from dual&lt;br /&gt; 38   model&lt;br /&gt; 39    dimension by (0 dim)&lt;br /&gt; 40    measures(cast(null as date) dw1,&lt;br /&gt; 41             cast(null as date) dw2,&lt;br /&gt; 42             cast(null as date) dw3,&lt;br /&gt; 43             cast(null as date) dw4,&lt;br /&gt; 44             cast(null as date) dw5,&lt;br /&gt; 45             cast(null as date) dw6,&lt;br /&gt; 46             cast(null as date) dw7)&lt;br /&gt; 47    rules iterate(7) until (dw7[iteration_number]&gt;last_day(sysdate))&lt;br /&gt; 48    (dw1[iteration_number]=trunc(sysdate,'mm')-to_char(trunc(sysdate,'mm'),'d')+1+7*(iteration_number-1),&lt;br /&gt; 49     dw2[iteration_number]=dw1[CV()]+1,&lt;br /&gt; 50     dw3[iteration_number]=dw1[CV()]+2,&lt;br /&gt; 51     dw4[iteration_number]=dw1[CV()]+3,&lt;br /&gt; 52     dw5[iteration_number]=dw1[CV()]+4,&lt;br /&gt; 53     dw6[iteration_number]=dw1[CV()]+5,&lt;br /&gt; 54     dw7[iteration_number]=dw1[CV()]+6&lt;br /&gt; 55     )&lt;br /&gt; 56  /&lt;br /&gt;&lt;br /&gt;View created&lt;br /&gt;&lt;br /&gt;SQL&gt; alter session set nls_territory=CIS nls_date_language=RUSSIAN;&lt;br /&gt;&lt;br /&gt;Session altered&lt;br /&gt;&lt;br /&gt;SQL&gt; select * from calendar_view;&lt;br /&gt;&lt;br /&gt;DW1  DW2  DW3  DW4  DW5  DW6  DW7&lt;br /&gt;---- ---- ---- ---- ---- ---- ----&lt;br /&gt; ПН   ВТ   СР   ЧТ   ПТ   СБ   ВС&lt;br /&gt;      01   02   03   04   05   06&lt;br /&gt; 07   08  [09]  10   11   12   13&lt;br /&gt; 14   15   16   17   18   19   20&lt;br /&gt; 21   22   23   24   25   26   27&lt;br /&gt; 28   29   30   31            &lt;br /&gt;&lt;br /&gt;6 rows selected&lt;br /&gt;&lt;br /&gt;SQL&gt; alter session set nls_territory=AMERICA nls_date_language=AMERICAN;&lt;br /&gt;&lt;br /&gt;Session altered&lt;br /&gt;&lt;br /&gt;SQL&gt; select * from calendar_view;&lt;br /&gt;&lt;br /&gt;DW1  DW2  DW3  DW4  DW5  DW6  DW7&lt;br /&gt;---- ---- ---- ---- ---- ---- ----&lt;br /&gt; SUN  MON  TUE  WED  THU  FRI  SAT&lt;br /&gt;           01   02   03   04   05&lt;br /&gt; 06   07   08  [09]  10   11   12&lt;br /&gt; 13   14   15   16   17   18   19&lt;br /&gt; 20   21   22   23   24   25   26&lt;br /&gt; 27   28   29   30   31       &lt;br /&gt;&lt;br /&gt;6 rows selected&lt;br /&gt;&lt;br /&gt;SQL&gt; alter session set nls_territory=GERMANY nls_date_language=GERMAN;&lt;br /&gt;&lt;br /&gt;Session altered&lt;br /&gt;&lt;br /&gt;SQL&gt; select * from calendar_view;&lt;br /&gt;&lt;br /&gt;DW1  DW2  DW3  DW4  DW5  DW6  DW7&lt;br /&gt;---- ---- ---- ---- ---- ---- ----&lt;br /&gt; MO   DI   MI   DO   FR   SA   SO&lt;br /&gt;      01   02   03   04   05   06&lt;br /&gt; 07   08  [09]  10   11   12   13&lt;br /&gt; 14   15   16   17   18   19   20&lt;br /&gt; 21   22   23   24   25   26   27&lt;br /&gt; 28   29   30   31            &lt;br /&gt;&lt;br /&gt;6 rows selected&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;The code is NLS-independant as you can see.&lt;br /&gt;It is pretty easy to understand to my mind, so I'm going to explain it in details as I did with queries in my previous posts, but if you have any questions - you are welcome.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3060148920065298017-7720321650255032963?l=volder-notes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://volder-notes.blogspot.com/feeds/7720321650255032963/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3060148920065298017&amp;postID=7720321650255032963' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3060148920065298017/posts/default/7720321650255032963'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3060148920065298017/posts/default/7720321650255032963'/><link rel='alternate' type='text/html' href='http://volder-notes.blogspot.com/2008/01/current-month-calendar-model-clause.html' title='Current month calendar (model clause)'/><author><name>Volder</name><uri>http://www.blogger.com/profile/15557750923919600413</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://i049.radikal.ru/0712/26/a24baea7e400.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3060148920065298017.post-5998685520894140714</id><published>2008-01-08T13:03:00.000+03:00</published><updated>2008-01-08T13:24:11.154+03:00</updated><title type='text'>Happy New Year</title><content type='html'>I know, that I'm a little bit late, but I'd like to say 'Happy New Year' to everyone who reads my blog. As a proverb says "better late than never" ;-)&lt;br /&gt;I was on vacation and was hurring on the last day not to be late for the flight departure.&lt;br /&gt;Now looking at holidays in the retrospective view  I liked the most the following (taken from &lt;a href="http://sql.ru/forum/actualthread.aspx?bid=3&amp;tid=509927&amp;pg=-1&amp;hl=%f0%e8%f1%ee%e2%e0%ed%e8%e5+%f1%ed%e5%e6%e8%ed%ee%ea#5113037"&gt;Russian forum&lt;/a&gt;) out of all funny queries on New Year's theme:&lt;br /&gt;&lt;blockquote&gt;SQL&gt; WITH&lt;br /&gt;  2  q AS (&lt;br /&gt;  3    SELECT&lt;br /&gt;  4      25-(trunc(ROWNUM/5)+1)*3 + MOD(-ROWNUM,5)*2 num&lt;br /&gt;  5    FROM dual&lt;br /&gt;  6    CONNECT BY LEVEL&lt;=20&lt;br /&gt;  7  )&lt;br /&gt;  8  SELECT&lt;br /&gt;  9    substr(TRANSLATE(dbms_random.VALUE || dbms_random.VALUE, '0123456789.', '*        *'), 1, num)||&lt;br /&gt; 10      lpad('/',(MAX(num) over ()-num+1),'/')||lpad('\',(MAX(num) over ()-num+1),'\')||&lt;br /&gt; 11          substr(TRANSLATE(dbms_random.VALUE || dbms_random.VALUE, '0123456789.', '*        *'), 1, num) new_years_tree&lt;br /&gt; 12  FROM q&lt;br /&gt; 13  /&lt;br /&gt;&lt;br /&gt;NEW_YEARS_TREE&lt;br /&gt;--------------------------------------------------------------------------------&lt;br /&gt;   *  *        * *  /\    *   * *  **    *&lt;br /&gt; **    * *   *    ///\\\       *  *&lt;br /&gt;*  *   * *      /////\\\\\   **       *&lt;br /&gt;  **   *     *///////\\\\\\\   *         *&lt;br /&gt;  **       * *     //\\        *     **&lt;br /&gt;  *       *   ***////\\\\       *&lt;br /&gt;      * *      //////\\\\\\&lt;br /&gt; *   *       ////////\\\\\\\\*    *  *&lt;br /&gt; ** *   * *//////////\\\\\\\\\\    *    *&lt;br /&gt;*   *    * *   */////\\\\\      *&lt;br /&gt;       * *    ///////\\\\\\\*    *   * *&lt;br /&gt;         *  /////////\\\\\\\\\  * *&lt;br /&gt;      *   ///////////\\\\\\\\\\\   *    *&lt;br /&gt;   *   */////////////\\\\\\\\\\\\\     *&lt;br /&gt;  *     *  * ////////\\\\\\\\* *   *&lt;br /&gt; *  **   * //////////\\\\\\\\\\  *     *&lt;br /&gt;       * ////////////\\\\\\\\\\\\  *&lt;br /&gt;* *    //////////////\\\\\\\\\\\\\\ *&lt;br /&gt;    *////////////////\\\\\\\\\\\\\\\\  * *&lt;br /&gt;****     *///////////\\\\\\\\\\\ * *     *&lt;br /&gt;&lt;br /&gt;20 rows selected&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3060148920065298017-5998685520894140714?l=volder-notes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://volder-notes.blogspot.com/feeds/5998685520894140714/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3060148920065298017&amp;postID=5998685520894140714' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3060148920065298017/posts/default/5998685520894140714'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3060148920065298017/posts/default/5998685520894140714'/><link rel='alternate' type='text/html' href='http://volder-notes.blogspot.com/2008/01/happy-new-year.html' title='Happy New Year'/><author><name>Volder</name><uri>http://www.blogger.com/profile/15557750923919600413</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://i049.radikal.ru/0712/26/a24baea7e400.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3060148920065298017.post-3517086010764352555</id><published>2007-12-19T00:53:00.001+03:00</published><updated>2008-01-20T00:38:15.923+03:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Model'/><category scheme='http://www.blogger.com/atom/ns#' term='Reports'/><category scheme='http://www.blogger.com/atom/ns#' term='Hierarchy'/><title type='text'>Reports: getting total of all children values in a tree</title><content type='html'>&lt;a href="http://volder-notes.blogspot.com/2007/12/reports-forecasting.html"&gt;... previous&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Well, actually, this note is a continuation of a model series on reports.&lt;br /&gt;So the structure of an article is the same: the problem description -&gt; non-model solution -&gt; model clause solution.&lt;br /&gt;&lt;br /&gt;But also this note is very similar to &lt;a href="http://volder-notes.blogspot.com/2007/12/summation-of-tree-values.html"&gt;summation of tree values&lt;/a&gt;, I've written a little bit earlier.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Input data:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;So let's get started.&lt;br /&gt;Our input for this post would be a table &lt;span style="font-weight:bold;"&gt;t&lt;/span&gt; from &lt;a href="http://volder-notes.blogspot.com/2007/12/reports-totals-subtotals.html"&gt;here&lt;/a&gt; and an additional table &lt;span style="font-weight:bold;"&gt;t_hrchy&lt;/span&gt; with a hierarchical structure of departments:&lt;br /&gt;&lt;blockquote&gt;SQL&gt; drop table t_hrchy;&lt;br /&gt;&lt;br /&gt;Table dropped&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;br /&gt;SQL&gt; create table t_hrchy as (select 'DEP2' dep, 'DEP1' dep_par from dual union all&lt;br /&gt;  2                           select 'DEP3' dep, 'DEP2' dep_par from dual union all&lt;br /&gt;  3                           select 'DEP4' dep, 'DEP3' dep_par from dual)&lt;br /&gt;  4  /&lt;br /&gt;&lt;br /&gt;Table created&lt;br /&gt;&lt;br /&gt;SQL&gt; select * from t_hrchy;&lt;br /&gt;&lt;br /&gt;DEP  DEP_PAR&lt;br /&gt;---- -------&lt;br /&gt;DEP2 DEP1&lt;br /&gt;DEP3 DEP2&lt;br /&gt;DEP4 DEP3&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;So this table tells us that DEP1 is a parent department for DEP2, DEP2 is a parent department for DEP3 and DEP3 is a parent department for DEP4.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Problem:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Now our task would be to create a report where in one column would be sales of a particular department output and in another column we need to get the sales of a current department and for all the departments that are under the current one in the organization structure.&lt;br /&gt;The data should be consolidated only inside the same year and inside the same city.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Non-Model solution:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;When the matter concerns the hierarchical structure the first thing that pops up in our head is &lt;a href="http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/queries003.htm#sthref3132"&gt;hierarchical queries with CONNECT BY&lt;/a&gt; clause.&lt;br /&gt;&lt;br /&gt;Elaborating this a little bit we can use the following query proposed by &lt;a href="http://rwijk.blogspot.com/"&gt;Rob van Wijk&lt;/a&gt; &lt;a href="http://forums.oracle.com/forums/thread.jspa?messageID=2232173#2232173"&gt;here&lt;/a&gt; or &lt;a href="http://forums.oracle.com/forums/thread.jspa?messageID=1968105#1968105"&gt;here&lt;/a&gt;:&lt;br /&gt;&lt;blockquote&gt;SQL&gt; select to_char(year,'yyyy') year,&lt;br /&gt;  2         city,&lt;br /&gt;  3         t.dep,&lt;br /&gt;  4         th.dep_par,&lt;br /&gt;  5         sales,&lt;br /&gt;  6         sum(connect_by_root sales) tot_sales&lt;br /&gt;  7    from t, t_hrchy th&lt;br /&gt;  8   where t.dep = th.dep(+)&lt;br /&gt;  9  connect by t.dep = prior dep_par&lt;br /&gt; 10         and prior year = year&lt;br /&gt; 11         and prior city = city&lt;br /&gt; 12   group by city, year, t.dep, dep_par, sales&lt;br /&gt; 13   order by 1,2,3;&lt;br /&gt;&lt;br /&gt;YEAR CITY   DEP  DEP_PAR      SALES  TOT_SALES&lt;br /&gt;---- ------ ---- ------- ---------- ----------&lt;br /&gt;2006 Moscow DEP1                562       2254&lt;br /&gt;2006 Moscow DEP2 DEP1           457       1692&lt;br /&gt;2006 Moscow DEP3 DEP2           890       1235&lt;br /&gt;2006 Moscow DEP4 DEP3           345        345&lt;br /&gt;2006 Omsk   DEP1                 23        264&lt;br /&gt;2006 Omsk   DEP2 DEP1           154        241&lt;br /&gt;2006 Omsk   DEP3 DEP2            87         87&lt;br /&gt;2007 Moscow DEP1                762       2926&lt;br /&gt;2007 Moscow DEP2 DEP1           657       2164&lt;br /&gt;2007 Moscow DEP3 DEP2          1020       1507&lt;br /&gt;2007 Moscow DEP4 DEP3           487        487&lt;br /&gt;2007 Omsk   DEP1                 34        403&lt;br /&gt;2007 Omsk   DEP2 DEP1           213        369&lt;br /&gt;2007 Omsk   DEP3 DEP2           156        156&lt;br /&gt;&lt;br /&gt;14 rows selected&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;Lets see what this query actually does.&lt;br /&gt;First we use an outer join of two tables - so for each row we get a parent department name.&lt;br /&gt;&lt;br /&gt;Then we build a hierarchy by using &lt;blockquote&gt;connect by t.dep = prior dep_par&lt;/blockquote&gt;Mention that hierarchy is built in the backward direction starting from the children and going up to their parents.&lt;br /&gt;&lt;br /&gt;Also we add in the connect by clause the following:&lt;blockquote&gt;       and prior year = year&lt;br /&gt;       and prior city = city&lt;/blockquote&gt;So we build hierarchy separartely for different years and cities.&lt;br /&gt;&lt;br /&gt;By using the operator&lt;blockquote&gt;connect_by_root sales&lt;/blockquote&gt;in every row we find a value for sales of the root elements for this or that thread in a hierarchy. As we didn't specify any START WITH condition - the hierarchy is built starting from all the departments present in the data.&lt;br /&gt;For example, let's look what would happen inside the partition for 2007 year Moscow city:&lt;br /&gt;&lt;blockquote&gt;SQL&gt; select to_char(year,'yyyy') year,&lt;br /&gt;  2         city,&lt;br /&gt;  3         t.dep,&lt;br /&gt;  4         th.dep_par,&lt;br /&gt;  5         sales,&lt;br /&gt;  6         connect_by_root sales root_sales,&lt;br /&gt;  7         level,&lt;br /&gt;  8         sys_connect_by_path(t.dep,'/') hrchy_path&lt;br /&gt;  9    from t, t_hrchy th&lt;br /&gt; 10   where t.dep = th.dep(+)&lt;br /&gt; 11     and trunc(t.year,'y')=to_date('01.01.2007','dd.mm.yyyy')&lt;br /&gt; 12     and t.city='Moscow'&lt;br /&gt; 13  connect by t.dep = prior dep_par&lt;br /&gt; 14         and prior year = year&lt;br /&gt; 15         and prior city = city&lt;br /&gt; 16  /&lt;br /&gt;&lt;br /&gt;YEAR CITY   DEP  DEP_PAR      SALES ROOT_SALES      LEVEL HRCHY_PATH&lt;br /&gt;---- ------ ---- ------- ---------- ---------- ---------- ----------------------------&lt;br /&gt;2007 Moscow DEP1                762        762          1 /DEP1&lt;br /&gt;2007 Moscow DEP2 DEP1           657        657          1 /DEP2&lt;br /&gt;2007 Moscow DEP1                762        657          2 /DEP2/DEP1&lt;br /&gt;2007 Moscow DEP3 DEP2          1020       1020          1 /DEP3&lt;br /&gt;2007 Moscow DEP2 DEP1           657       1020          2 /DEP3/DEP2&lt;br /&gt;2007 Moscow DEP1                762       1020          3 /DEP3/DEP2/DEP1&lt;br /&gt;2007 Moscow DEP4 DEP3           487        487          1 /DEP4&lt;br /&gt;2007 Moscow DEP3 DEP2          1020        487          2 /DEP4/DEP3&lt;br /&gt;2007 Moscow DEP2 DEP1           657        487          3 /DEP4/DEP3/DEP2&lt;br /&gt;2007 Moscow DEP1                762        487          4 /DEP4/DEP3/DEP2/DEP1&lt;br /&gt;&lt;br /&gt;10 rows selected&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;As you see DEP1 is met four times:&lt;br /&gt; - Firstly it stands for itself (level=1).&lt;br /&gt; - Then we begin to build hierarchy starting from DEP2 and DEP1 is met as a "child" (remember we build a hierarchy in the opposite direction) for DEP2 (level=2)&lt;br /&gt; - The next time we meet it as a "grandchild" for DEP3 (level=3)&lt;br /&gt; - And finally as a "great-grandchild" for DEP4 (level=4). &lt;br /&gt;&lt;br /&gt;In each row we have a value of sales for the root department. So if we group data by department and put a sum() aggregate function on connect_by_root(sales) column we get the sales of DEP1 and all the sales of all departments that are under DEP1 in one row.&lt;br /&gt;The same stuff will happen to all the other departments:&lt;br /&gt;&lt;blockquote&gt;SQL&gt; select to_char(year,'yyyy') year,&lt;br /&gt;  2         city,&lt;br /&gt;  3         t.dep,&lt;br /&gt;  4         th.dep_par,&lt;br /&gt;  5         sales,&lt;br /&gt;  6         sum(connect_by_root sales) tot_salesh&lt;br /&gt;  7    from t, t_hrchy th&lt;br /&gt;  8   where t.dep = th.dep(+)&lt;br /&gt;  9     and trunc(t.year,'y')=to_date('01.01.2007','dd.mm.yyyy')&lt;br /&gt; 10     and t.city='Moscow'&lt;br /&gt; 11  connect by t.dep = prior dep_par&lt;br /&gt; 12         and prior year = year&lt;br /&gt; 13         and prior city = city&lt;br /&gt; 14  group by year,city,t.dep,th.dep_par,sales&lt;br /&gt; 15  order by 1,2,3&lt;br /&gt; 16  /&lt;br /&gt;&lt;br /&gt;YEAR CITY   DEP  DEP_PAR      SALES TOT_SALESH&lt;br /&gt;---- ------ ---- ------- ---------- ----------&lt;br /&gt;2007 Moscow DEP1                762       2926&lt;br /&gt;2007 Moscow DEP2 DEP1           657       2164&lt;br /&gt;2007 Moscow DEP3 DEP2          1020       1507&lt;br /&gt;2007 Moscow DEP4 DEP3           487        487&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Model Solution&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Now we are proceeding and will solve the same problem using Model clause without any connect by's.&lt;br /&gt;&lt;br /&gt;To get less data to be outputed at every step - let's restrict our data to 2007 year and Moscow city only as we did in Non-Model solution:&lt;br /&gt;&lt;blockquote&gt;SQL&gt;  select * from t, t_hrchy th&lt;br /&gt;  2   where t.dep = th.dep(+)&lt;br /&gt;  3     and trunc(t.year,'y')=to_date('01.01.2007','dd.mm.yyyy')&lt;br /&gt;  4     and t.city='Moscow'&lt;br /&gt;  5     order by 3,1,2&lt;br /&gt;  6  /&lt;br /&gt;&lt;br /&gt;CITY   DEP  YEAR             SALES DEP  DEP_PAR&lt;br /&gt;------ ---- ----------- ---------- ---- -------&lt;br /&gt;Moscow DEP1 31.12.2007         762      &lt;br /&gt;Moscow DEP2 31.12.2007         657 DEP2 DEP1&lt;br /&gt;Moscow DEP3 31.12.2007        1020 DEP3 DEP2&lt;br /&gt;Moscow DEP4 31.12.2007         487 DEP4 DEP3&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;Let's think of what woud be placed in PARTITION BY part of our model. Remeber that partitions should be absolutely independent one of another.&lt;br /&gt;Yes - we would put YEAR and CITY there. As a task instructs us to treat sales of different years and cities &lt;span style="font-style:italic;"&gt;separately&lt;/span&gt;.&lt;br /&gt;In dimension we would place DEP and DEP_PART columns.&lt;br /&gt;And in measures - we'll put SALES, of course. And create one more measure called TOT_SALES with initial value of zero.&lt;br /&gt;&lt;blockquote&gt;SQL&gt;  select * from t, t_hrchy th&lt;br /&gt;  2   where t.dep = th.dep(+)&lt;br /&gt;  3     and trunc(t.year,'y')=to_date('01.01.2007','dd.mm.yyyy')&lt;br /&gt;  4     and t.city='Moscow'&lt;br /&gt;  5    model&lt;br /&gt;  6     partition by (year,city)&lt;br /&gt;  7     dimension by (t.dep dep,dep_par)&lt;br /&gt;  8     measures(sales, 0 tot_sales)&lt;br /&gt;  9     ()&lt;br /&gt; 10   order by 1,2,3&lt;br /&gt; 11  /&lt;br /&gt;&lt;br /&gt;YEAR        CITY   DEP  DEP_PAR      SALES  TOT_SALES&lt;br /&gt;----------- ------ ---- ------- ---------- ----------&lt;br /&gt;31.12.2007  Moscow DEP1                762          0&lt;br /&gt;31.12.2007  Moscow DEP2 DEP1           657          0&lt;br /&gt;31.12.2007  Moscow DEP3 DEP2          1020          0&lt;br /&gt;31.12.2007  Moscow DEP4 DEP3           487          0&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;&lt;br /&gt;Now we are going to write a rule for calculating the tot_sales.&lt;br /&gt;For any cell in our model the &lt;span style="font-weight:bold;"&gt;tot_sales&lt;/span&gt; should be equal to a sum of &lt;span style="font-weight:bold;"&gt;sales&lt;/span&gt; of the current department and all the &lt;span style="font-weight:bold;"&gt;tot_sales&lt;/span&gt; of all departments under the current.&lt;br /&gt;We can write it as:&lt;br /&gt;&lt;blockquote&gt;tot_sales[any,any] = sales[CV(),CV()]+nvl(sum(tot_sales)[any,CV(dep)],0)&lt;/blockquote&gt;&lt;br /&gt;BTW it would be correct in case when the tot_sales of the departments under the current one already included all the tot_sales of the departments under them.&lt;br /&gt;&lt;br /&gt;So what is important here is the &lt;span style="font-weight:bold;"&gt;ORDER &lt;/span&gt; in what the rules are executed.&lt;br /&gt;If we don't specify anything about the order of execution to our model it uses the default SEQUENTIAL ORDER and can easily throw out an error for us:&lt;br /&gt;&lt;blockquote&gt;SQL&gt;  select * from t, t_hrchy th&lt;br /&gt;  2   where t.dep = th.dep(+)&lt;br /&gt;  3     and trunc(t.year,'y')=to_date('01.01.2007','dd.mm.yyyy')&lt;br /&gt;  4     and t.city='Moscow'&lt;br /&gt;  5    model&lt;br /&gt;  6     partition by (year,city)&lt;br /&gt;  7     dimension by (t.dep dep,dep_par)&lt;br /&gt;  8     measures(sales, 0 tot_sales)&lt;br /&gt;  9      rules&lt;br /&gt; 10      (tot_sales[any,any] = sales[CV(),CV()]+nvl(sum(tot_sales)[any,CV(dep)],0))&lt;br /&gt; 11   order by 1,2,3&lt;br /&gt; 12  /&lt;br /&gt;&lt;br /&gt;...&lt;br /&gt;&lt;br /&gt;ORA-32637: Self cyclic rule in sequential order MODEL&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;&lt;br /&gt;But putting &lt;a href="http://download.oracle.com/docs/cd/B19306_01/server.102/b14223/sqlmodel.htm#BEIBEHCE"&gt;AUTOMATIC ORDER&lt;/a&gt; causes all rules to be evaluated in an order based on their logical dependencies.&lt;br /&gt;Let's check:&lt;br /&gt;&lt;blockquote&gt;SQL&gt;  select * from t, t_hrchy th&lt;br /&gt;  2   where t.dep = th.dep(+)&lt;br /&gt;  3     and trunc(t.year,'y')=to_date('01.01.2007','dd.mm.yyyy')&lt;br /&gt;  4     and t.city='Moscow'&lt;br /&gt;  5    model&lt;br /&gt;  6     partition by (year,city)&lt;br /&gt;  7     dimension by (t.dep dep,dep_par)&lt;br /&gt;  8     measures(sales, 0 tot_sales)&lt;br /&gt;  9      rules automatic order&lt;br /&gt; 10      (tot_sales[any,any] = sales[CV(),CV()]+nvl(sum(tot_sales)[any,CV(dep)],0))&lt;br /&gt; 11   order by 1,2,3&lt;br /&gt; 12  /&lt;br /&gt;&lt;br /&gt;YEAR        CITY   DEP  DEP_PAR      SALES  TOT_SALES&lt;br /&gt;----------- ------ ---- ------- ---------- ----------&lt;br /&gt;31.12.2007  Moscow DEP1                762       2926&lt;br /&gt;31.12.2007  Moscow DEP2 DEP1           657       2164&lt;br /&gt;31.12.2007  Moscow DEP3 DEP2          1020       1507&lt;br /&gt;31.12.2007  Moscow DEP4 DEP3           487        487&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;&lt;br /&gt;Voilà! This is exactly what we were looking for.&lt;br /&gt;&lt;br /&gt;Let's find a detailed description of what's happening when we don't use AUTOMATIC ORDER and what's changing when we begin to use it.&lt;br /&gt;First of all, the equivalent of our single rule would be actually a list of four rules for every row (unique set of dimension values) in our data:&lt;br /&gt;&lt;blockquote&gt; tot_sales['DEP1', null ]=sales['DEP1', null ]+nvl(sum(tot_sales)[any,'DEP1'],0),&lt;br /&gt; tot_sales['DEP2','DEP1']=sales['DEP2','DEP1']+nvl(sum(tot_sales)[any,'DEP2'],0),&lt;br /&gt; tot_sales['DEP3','DEP2']=sales['DEP3','DEP2']+nvl(sum(tot_sales)[any,'DEP3'],0),&lt;br /&gt; tot_sales['DEP4','DEP3']=sales['DEP4','DEP3']+nvl(sum(tot_sales)[any,'DEP4'],0)&lt;/blockquote&gt;&lt;br /&gt;But using SEQUENTIAL ORDER doesn't guarantee us any order of these rules execution. Because as it stated in the documetation: &lt;span style="font-style:italic;"&gt;SEQUENTIAL ORDER: This causes rules to be evaluated in the order they are written. This is the default.&lt;/span&gt;&lt;br /&gt; But we have only one rule in our model. So the detailed rules can be run in any order:&lt;br /&gt;for example first we find tot_sales['DEP1', null ] and then tot_sales['DEP2','DEP1'], or vice versa. And that is the problem, because this two possibilities will give us different results in the output.&lt;br /&gt;&lt;br /&gt;What actually ORA-32637 stays for is:&lt;br /&gt;&lt;span style="font-style:italic;"&gt;    &lt;span style="font-weight:bold;"&gt;Cause&lt;/span&gt;: A self-cyclic rule was detected in the sequential order MODEL. Sequential order MODELs cannot have self cyclic rules to guarantee that the results do not depend on the order of evaluation of the cells that are updated or upserted.&lt;br /&gt;    &lt;span style="font-weight:bold;"&gt;Action&lt;/span&gt;: Use ordered rule evaluation for this rule.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;So if use &lt;span style="font-weight:bold;"&gt;order by DEP&lt;/span&gt; we shouldn't have such an error, let's try:&lt;br /&gt;&lt;blockquote&gt;SQL&gt;  select * from t, t_hrchy th&lt;br /&gt;  2   where t.dep = th.dep(+)&lt;br /&gt;  3     and trunc(t.year,'y')=to_date('01.01.2007','dd.mm.yyyy')&lt;br /&gt;  4     and t.city='Moscow'&lt;br /&gt;  5    model&lt;br /&gt;  6     partition by (year,city)&lt;br /&gt;  7     dimension by (t.dep dep,dep_par)&lt;br /&gt;  8     measures(sales, 0 tot_sales)&lt;br /&gt;  9      rules&lt;br /&gt; 10       (tot_sales[any,any] order by dep  = sales[CV(),CV()]+nvl(sum(tot_sales)[any,CV(dep)],0))&lt;br /&gt; 11   order by 1,2,3&lt;br /&gt; 12  /&lt;br /&gt;&lt;br /&gt;YEAR        CITY   DEP  DEP_PAR      SALES  TOT_SALES&lt;br /&gt;----------- ------ ---- ------- ---------- ----------&lt;br /&gt;31.12.2007  Moscow DEP1                762        762&lt;br /&gt;31.12.2007  Moscow DEP2 DEP1           657        657&lt;br /&gt;31.12.2007  Moscow DEP3 DEP2          1020       1020&lt;br /&gt;31.12.2007  Moscow DEP4 DEP3           487        487&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;Yes, the query returns no error - because we explicitly defined the order of rules execution. But as you see we have wrong results. Because actually in our case we should &lt;span style="font-weight:bold;"&gt;order by dep DESC&lt;/span&gt;! You can try it yourself.&lt;br /&gt;While we don't know what are the dependences in the hierarchy data beforehand - let's give Oracle to decide - what are the dependencies between cells and simply use AUTOMATIC ORDER:&lt;br /&gt;&lt;blockquote&gt;SQL&gt;  select * from t, t_hrchy th&lt;br /&gt;  2   where t.dep = th.dep(+)&lt;br /&gt;  3    model&lt;br /&gt;  4     partition by (to_char(year,'yyyy') year,city)&lt;br /&gt;  5     dimension by (t.dep dep,dep_par)&lt;br /&gt;  6     measures(sales, 0 tot_sales)&lt;br /&gt;  7      rules automatic order&lt;br /&gt;  8       (tot_sales[any,any] = sales[CV(),CV()]+nvl(sum(tot_sales)[any,CV(dep)],0))&lt;br /&gt;  9   order by 1,2,3&lt;br /&gt; 10  /&lt;br /&gt;&lt;br /&gt;YEAR CITY   DEP  DEP_PAR      SALES  TOT_SALES&lt;br /&gt;---- ------ ---- ------- ---------- ----------&lt;br /&gt;2006 Moscow DEP1                562       2254&lt;br /&gt;2006 Moscow DEP2 DEP1           457       1692&lt;br /&gt;2006 Moscow DEP3 DEP2           890       1235&lt;br /&gt;2006 Moscow DEP4 DEP3           345        345&lt;br /&gt;2006 Omsk   DEP1                 23        264&lt;br /&gt;2006 Omsk   DEP2 DEP1           154        241&lt;br /&gt;2006 Omsk   DEP3 DEP2            87         87&lt;br /&gt;2007 Moscow DEP1                762       2926&lt;br /&gt;2007 Moscow DEP2 DEP1           657       2164&lt;br /&gt;2007 Moscow DEP3 DEP2          1020       1507&lt;br /&gt;2007 Moscow DEP4 DEP3           487        487&lt;br /&gt;2007 Omsk   DEP1                 34        403&lt;br /&gt;2007 Omsk   DEP2 DEP1           213        369&lt;br /&gt;2007 Omsk   DEP3 DEP2           156        156&lt;br /&gt;&lt;br /&gt;14 rows selected&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://volder-notes.blogspot.com/2008/01/reports-matrix-report.html"&gt;to be continued...&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3060148920065298017-3517086010764352555?l=volder-notes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://volder-notes.blogspot.com/feeds/3517086010764352555/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3060148920065298017&amp;postID=3517086010764352555' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3060148920065298017/posts/default/3517086010764352555'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3060148920065298017/posts/default/3517086010764352555'/><link rel='alternate' type='text/html' href='http://volder-notes.blogspot.com/2007/12/reports-getting-total-of-all-children.html' title='Reports: getting total of all children values in a tree'/><author><name>Volder</name><uri>http://www.blogger.com/profile/15557750923919600413</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://i049.radikal.ru/0712/26/a24baea7e400.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3060148920065298017.post-5565568852466584993</id><published>2007-12-16T18:19:00.000+03:00</published><updated>2008-01-19T22:55:39.439+03:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Model'/><category scheme='http://www.blogger.com/atom/ns#' term='Reports'/><title type='text'>Reports: forecasting</title><content type='html'>&lt;a href="http://volder-notes.blogspot.com/2007/12/reports-ratios-percentages.html"&gt;... previous&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;We carry on with our reports (the needed &lt;a href="http://volder-notes.blogspot.com/2007/12/reports-totals-subtotals.html"&gt;input data&lt;/a&gt;).&lt;br /&gt;Now what we need is to make a little forecast. We are requested to find out what the approximate sales for the next year would be.&lt;br /&gt;&lt;br /&gt;The &lt;span style="font-weight:bold;"&gt;growth rate&lt;/span&gt; is determined as a ratio of &lt;span style="font-style:italic;"&gt;current year sales (2007)&lt;/span&gt; to the &lt;span style="font-style:italic;"&gt;previous year sales (2006)&lt;/span&gt;, but it should be &lt;span style="font-style:italic;"&gt;calculated &lt;span style="font-weight:bold;"&gt;separately &lt;/span&gt;for different cities&lt;/span&gt;.&lt;br /&gt;So to find the growth rate for each row - we need to find total sales for the particular city for the current year and divide it on the total sales for the same city for the previous year.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Non-Model solution&lt;/span&gt;&lt;br /&gt;Without using model - we can find the growth rate in a scalar subquery:&lt;br /&gt;&lt;blockquote&gt;select sum(decode(to_char(year, 'yyyy'), '2007', sales)) /&lt;br /&gt;       sum(decode(to_char(year, 'yyyy'), '2006', sales))&lt;br /&gt;  from t&lt;br /&gt; where city = t1.city&lt;/blockquote&gt;where &lt;span style="font-style:italic;"&gt;t1.city&lt;/span&gt; should be a reference to the main outer query.&lt;br /&gt;If we put this scalar subquery directly into the query we already had, we get the following result:&lt;br /&gt;&lt;blockquote&gt;SQL&gt; select city,&lt;br /&gt;  2         dep,&lt;br /&gt;  3         sum(sales) sales_2007,&lt;br /&gt;  4         round(sum(sales) *&lt;br /&gt;  5               (select sum(decode(to_char(year, 'yyyy'), '2007', sales)) /&lt;br /&gt;  6                       sum(decode(to_char(year, 'yyyy'), '2006', sales))&lt;br /&gt;  7                  from t&lt;br /&gt;  8                 where city = t1.city),&lt;br /&gt;  9               2) sales_2008_forecast&lt;br /&gt; 10    from t t1&lt;br /&gt; 11   where trunc(year, 'y') = to_date('01.01.2007', 'dd.mm.yyyy')&lt;br /&gt; 12   group by rollup(city, dep)&lt;br /&gt; 13  /&lt;br /&gt;&lt;br /&gt;CITY   DEP  SALES_2007 SALES_2008_FORECAST&lt;br /&gt;------ ---- ---------- -------------------&lt;br /&gt;Omsk   DEP1         34                51,9&lt;br /&gt;Omsk   DEP2        213              325,15&lt;br /&gt;Omsk   DEP3        156              238,14&lt;br /&gt;Omsk               403              615,19&lt;br /&gt;Moscow DEP1        762              989,18&lt;br /&gt;Moscow DEP2        657              852,88&lt;br /&gt;Moscow DEP3       1020              1324,1&lt;br /&gt;Moscow DEP4        487              632,19&lt;br /&gt;Moscow            2926             3798,35&lt;br /&gt;                  3329 &lt;br /&gt;&lt;br /&gt;10 rows selected&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;As you see - we don't get a total for year 2008. This happens because &lt;em&gt;city &lt;/em&gt;value in this total row is equal to NULL. And while we calculating the growth rate for particular cities - we get no rate at all for this row. So as a result we get NULL.&lt;br /&gt;&lt;br /&gt;The workaround here would be first to get forecasting for detailed sales and then group with ROLLUP the sales for 2007 year and also for the predictable 2008 year.&lt;br /&gt;&lt;br /&gt;So it would look like the following:&lt;br /&gt;&lt;blockquote&gt;SQL&gt; select city,&lt;br /&gt;  2         dep,&lt;br /&gt;  3         sum(sales) sales_2007,&lt;br /&gt;  4         sum(sales2) sales_2008_forecast,&lt;br /&gt;  5         round(100 * (sum(sales2) / sum(sales) - 1), 2) || '%' growth_rate&lt;br /&gt;  6    from (select t0.*,&lt;br /&gt;  7                 round(sales *&lt;br /&gt;  8                       (select sum(decode(to_char(year, 'yyyy'), '2007', sales)) /&lt;br /&gt;  9                               sum(decode(to_char(year, 'yyyy'), '2006', sales))&lt;br /&gt; 10                          from t&lt;br /&gt; 11                         where city = t0.city),&lt;br /&gt; 12                       2) sales2&lt;br /&gt; 13            from t t0) t1&lt;br /&gt; 14   where trunc(year, 'y') = to_date('01.01.2007', 'dd.mm.yyyy')&lt;br /&gt; 15   group by rollup(city, dep)&lt;br /&gt; 16  /&lt;br /&gt;&lt;br /&gt;CITY   DEP  SALES_2007 SALES_2008_FORECAST GROWTH_RATE&lt;br /&gt;------ ---- ---------- ------------------- ---------------&lt;br /&gt;Omsk   DEP1         34                51,9 52.65%&lt;br /&gt;Omsk   DEP2        213              325,15 52.65%&lt;br /&gt;Omsk   DEP3        156              238,14 52.65%&lt;br /&gt;Omsk               403              615,19 52.65%&lt;br /&gt;Moscow DEP1        762              989,18 29.81%&lt;br /&gt;Moscow DEP2        657              852,88 29.81%&lt;br /&gt;Moscow DEP3       1020              1324,1 29.81%&lt;br /&gt;Moscow DEP4        487              632,19 29.81%&lt;br /&gt;Moscow            2926             3798,35 29.81%&lt;br /&gt;                  3329             4413,54 32.58%&lt;br /&gt;&lt;br /&gt;10 rows selected&lt;br /&gt;&lt;br /&gt;SQL&gt;&lt;/blockquote&gt;Now all the totals are where they should be. And we added an extra column with growth rate. So we can see, that sales in Omsk grew up ~53% and in Moscow increased ~30%. The average growth rate from 2006 until 2007 years was ~33%.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Model solution #1&lt;/span&gt;&lt;br /&gt;With the model solution instead of scalar subquery - we can use reference model clause.&lt;br /&gt;We put the following query into the reference model:&lt;br /&gt;&lt;blockquote&gt;SQL&gt; select city, to_char(year,'yyyy') y, sum(sales) sum_sales from t group by city,year;&lt;br /&gt;&lt;br /&gt;CITY   Y     SUM_SALES&lt;br /&gt;------ ---- ----------&lt;br /&gt;Moscow 2006       2254&lt;br /&gt;Omsk   2006        264&lt;br /&gt;Moscow 2007       2926&lt;br /&gt;Omsk   2007        403&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;And then we'll use the values from that resultset.&lt;br /&gt;&lt;blockquote&gt;SQL&gt; select city,dep,sales,sales_fc,round(100*(growth-1),2)||'%' growth_rate&lt;br /&gt;  2    from t where trunc(year,'y') = to_date('01.01.2007','dd.mm.yyyy')&lt;br /&gt;  3   model&lt;br /&gt;  4    reference r on (select city, to_char(year,'yyyy') y, sum(sales) sum_sales from t group by city,year)&lt;br /&gt;  5     dimension by (y, city)&lt;br /&gt;  6     measures(sum_sales)&lt;br /&gt;  7    main m&lt;br /&gt;  8    dimension by (0 total,city, dep)&lt;br /&gt;  9    measures(sales, 0 sales_FC, 0 growth)&lt;br /&gt; 10    rules upsert all&lt;br /&gt; 11     (sales[1,any,null]=sum(sales)[0,CV(),any],&lt;br /&gt; 12      sales[1,null,null]=sum(sales)[0,any,any],&lt;br /&gt; 13      growth[any,any,any]=r.sum_sales['2007',CV(city)]/r.sum_sales['2006',CV(city)],&lt;br /&gt; 14      sales_FC[any,any,any]=round(sales[CV(),CV(),CV()]*growth[CV(),CV(),CV()],2))&lt;br /&gt; 15      order by 2,3;&lt;br /&gt;&lt;br /&gt;CITY   DEP       SALES   SALES_FC GROWTH_RATE&lt;br /&gt;------ ---- ---------- ---------- -----------------------------------------&lt;br /&gt;Omsk   DEP1         34       51,9 52.65%&lt;br /&gt;Moscow DEP1        762     989,18 29.81%&lt;br /&gt;Omsk   DEP2        213     325,15 52.65%&lt;br /&gt;Moscow DEP2        657     852,88 29.81%&lt;br /&gt;Omsk   DEP3        156     238,14 52.65%&lt;br /&gt;Moscow DEP3       1020     1324,1 29.81%&lt;br /&gt;Moscow DEP4        487     632,19 29.81%&lt;br /&gt;Omsk               403     615,19 52.65%&lt;br /&gt;Moscow            2926    3798,35 29.81%&lt;br /&gt;                  3329            %&lt;br /&gt;&lt;br /&gt;10 rows selected&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;&lt;br /&gt;So we put our query into the reference clause, so that we can use it's cell values in the main part.&lt;br /&gt;We created a new measure &lt;span style="font-style:italic;"&gt;growth&lt;/span&gt; and in the rules part wrote:&lt;br /&gt;&lt;blockquote&gt;growth[any,any,any]=r.sum_sales['2007',CV(city)]/r.sum_sales['2006',CV(city)]&lt;/blockquote&gt;So for each cell we find a growth rate for the current row (current city) by dividing summary sales of the current year for this particular city to the same of the previous year.&lt;br /&gt;The next our step (rule) is finding the predicted sales for the next year:&lt;br /&gt;&lt;blockquote&gt;sales_FC[any,any,any]=round(sales[CV(),CV(),CV()]*growth[CV(),CV(),CV()],2)&lt;/blockquote&gt;And we are facing the same problem we had in the Non-Model part. We can't define any city in the total row - so we can't find a growth rate for this row.&lt;br /&gt;As you already know - we can easily write a rule to find the total sales for the next year as a sum of other cells:&lt;br /&gt;&lt;blockquote&gt;SQL&gt; select city,dep,sales,sales_fc,round(100*(growth-1),2)||'%' growth_rate&lt;br /&gt;  2    from t where trunc(year,'y') = to_date('01.01.2007','dd.mm.yyyy')&lt;br /&gt;  3   model&lt;br /&gt;  4    reference r on (select city, to_char(year,'yyyy') y, sum(sales) sum_sales from t group by city,year)&lt;br /&gt;  5     dimension by (y, city)&lt;br /&gt;  6     measures(sum_sales)&lt;br /&gt;  7    main m&lt;br /&gt;  8    dimension by (0 total,city, dep)&lt;br /&gt;  9    measures(sales, 0 sales_FC, 0 growth)&lt;br /&gt; 10    rules upsert all&lt;br /&gt; 11     (sales[1,any,null]=sum(sales)[0,CV(),any],&lt;br /&gt; 12      sales[1,null,null]=sum(sales)[0,any,any],&lt;br /&gt; 13      sales_FC[any,city is not null,any]=round(sales[CV(),CV(),CV()]*r.sum_sales['2007',CV(city)]/r.sum_sales['2006',CV(city)],2),&lt;br /&gt; 14      sales_FC[1,null,null]=sum(sales_FC)[1,any,null],&lt;br /&gt; 15      growth[any,any,any]=sales_FC[CV(),CV(),CV()]/sales[CV(),CV(),CV()])&lt;br /&gt; 16      order by 1,2;&lt;br /&gt;&lt;br /&gt;CITY   DEP       SALES   SALES_FC GROWTH_RATE&lt;br /&gt;------ ---- ---------- ---------- ------------------&lt;br /&gt;Moscow DEP1        762     989,18 29.81%&lt;br /&gt;Moscow DEP2        657     852,88 29.81%&lt;br /&gt;Moscow DEP3       1020     1324,1 29.81%&lt;br /&gt;Moscow DEP4        487     632,19 29.81%&lt;br /&gt;Moscow            2926    3798,35 29.81%&lt;br /&gt;Omsk   DEP1         34       51,9 52.65%&lt;br /&gt;Omsk   DEP2        213     325,15 52.65%&lt;br /&gt;Omsk   DEP3        156     238,14 52.65%&lt;br /&gt;Omsk               403     615,19 52.65%&lt;br /&gt;                  3329    4413,54 32.58%&lt;br /&gt;&lt;br /&gt;10 rows selected&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;You see that we changed the order of the rules and the rules themselves (that is because we want to find the growth rate for the overall total row, and hence we would find growth rates AFTER we get overall total sales value).&lt;br /&gt;First we find the forecasting sales for the next year for all the rows excluding the very total:&lt;br /&gt;&lt;blockquote&gt;sales_FC[any,city is not null,any]=&lt;br /&gt;      round(sales[CV(),CV(),CV()]*r.sum_sales['2007',CV(city)]/r.sum_sales['2006',CV(city)],2)&lt;/blockquote&gt;Then we find this total by summing up all the city subtotals:&lt;br /&gt;&lt;blockquote&gt;sales_FC[1,null,null]=sum(sales_FC)[1,any,null]&lt;/blockquote&gt;And finally we get the growth rate by dividing sales_FC (forecast) on the actual sales (current year sales):&lt;br /&gt;&lt;blockquote&gt;growth[any,any,any]=sales_FC[CV(),CV(),CV()]/sales[CV(),CV(),CV()]&lt;/blockquote&gt;&lt;br /&gt;After all this is the same result we had in the Non-Model solution.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Model solution #2&lt;/span&gt;&lt;br /&gt;Well, in both solutions (Non-Model &amp; Model #1) we accessed the table twice. First to get the main result and second one - to get the growth rate.&lt;br /&gt;&lt;br /&gt;Now we are going to access table only once by using extra dimension in Model and refusing from the reference model clause.&lt;br /&gt;&lt;blockquote&gt;SQL&gt; select city,dep,sales,sales_fc,round(100*(growth-1),2)||'%' growth_rate from t&lt;br /&gt;  2   model&lt;br /&gt;  3    return updated rows&lt;br /&gt;  4    dimension by (0 total,city, dep, to_char(year,'yyyy') year)&lt;br /&gt;  5    measures(sales, sales sales_FC,0 growth)&lt;br /&gt;  6    rules upsert all&lt;br /&gt;  7     (sales[1,any,null,'2007']=sum(sales)[0,CV(),any,CV()],&lt;br /&gt;  8      sales[1,null,null,'2007']=sum(sales)[0,any,any,CV()],&lt;br /&gt;  9      sales_FC[any,any,any,'2007']=round(sales[CV(),CV(),CV(),CV()]*&lt;br /&gt; 10                 sales[1,CV(),null,CV()]/sum(sales)[0,CV(),any,'2006'],2),&lt;br /&gt; 11      sales_FC[1,null,null,'2007']=sum(sales_FC)[1,any,null,'2007'],&lt;br /&gt; 12      growth[any,any,any,'2007']=sales_FC[CV(),CV(),CV(),CV()]/sales[CV(),CV(),CV(),CV()])&lt;br /&gt; 13      order by 1,2&lt;br /&gt; 14  /&lt;br /&gt;&lt;br /&gt;CITY   DEP       SALES   SALES_FC GROWTH_RATE&lt;br /&gt;------ ---- ---------- ---------- -----------------------------------------&lt;br /&gt;Moscow DEP1        762     989,18 29.81%&lt;br /&gt;Moscow DEP2        657     852,88 29.81%&lt;br /&gt;Moscow DEP3       1020     1324,1 29.81%&lt;br /&gt;Moscow DEP4        487     632,19 29.81%&lt;br /&gt;Moscow            2926    3798,35 29.81%&lt;br /&gt;Omsk   DEP1         34       51,9 52.65%&lt;br /&gt;Omsk   DEP2        213     325,15 52.65%&lt;br /&gt;Omsk   DEP3        156     238,14 52.65%&lt;br /&gt;Omsk               403     615,19 52.65%&lt;br /&gt;                  3329    4413,54 32.58%&lt;br /&gt;&lt;br /&gt;10 rows selected&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;What you should mention first is that we removed&lt;blockquote&gt;where trunc(year,'y') = to_date('01.01.2007','dd.mm.yyyy')&lt;/blockquote&gt;So now all the data (for all years: 2006 and 2007) is pumped into model clause.&lt;br /&gt;&lt;br /&gt;And you probably didn't miss that we put RETURN UPDATED ROWS, so only rows where cells were updated will be returned in the resultset.&lt;br /&gt;That's how we can avoid additional outer WHERE clause for our MODEL: we update only rows that are corresppondent to 2007 year, but not the 2006.&lt;br /&gt;&lt;br /&gt;Another thing changed is we added a dimension &lt;blockquote&gt;to_char(year,'yyyy') year&lt;/blockquote&gt;So now we can reference to sales of a particular year by putting it in the format like [...,'YYYY'].&lt;br /&gt;&lt;br /&gt;And the final amendment - would be adding a value for a new dimension in every cell reference ([&amp;lttotal_dim&amp;gt,&amp;ltcity_dim&amp;gt,&amp;ltdep_dim&amp;gt,'2007']). If we put it on the left side of a formula we can put just CV() on the right side in the place for &amp;ltyear_dim&amp;gt.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://volder-notes.blogspot.com/2007/12/reports-getting-total-of-all-children.html"&gt;to be continued ...&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3060148920065298017-5565568852466584993?l=volder-notes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://volder-notes.blogspot.com/feeds/5565568852466584993/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3060148920065298017&amp;postID=5565568852466584993' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3060148920065298017/posts/default/5565568852466584993'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3060148920065298017/posts/default/5565568852466584993'/><link rel='alternate' type='text/html' href='http://volder-notes.blogspot.com/2007/12/reports-forecasting.html' title='Reports: forecasting'/><author><name>Volder</name><uri>http://www.blogger.com/profile/15557750923919600413</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://i049.radikal.ru/0712/26/a24baea7e400.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3060148920065298017.post-4756273862079174648</id><published>2007-12-11T23:12:00.000+03:00</published><updated>2008-01-19T22:55:39.439+03:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Model'/><category scheme='http://www.blogger.com/atom/ns#' term='Reports'/><title type='text'>Reports: ratios &amp; percentages</title><content type='html'>&lt;a href="http://volder-notes.blogspot.com/2007/12/reports-totals-subtotals.html"&gt;... previous&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Let's continue with our spreadsheet report (input data you can find &lt;a href="http://volder-notes.blogspot.com/2007/12/reports-totals-subtotals.html"&gt;here&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;Imagine that we want to add a column that would represent sales percentage of each department in the total of it's city sales. So we need to know how much every department contributes to it's region sales.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Non-model solution:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Developing the group by solution with rollup - we could use &lt;a href="http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/functions124.htm#sthref1881"&gt;RATIO_TO_REPORT&lt;/a&gt; function for that purpose.&lt;br /&gt;&lt;br /&gt;Our query would look like:&lt;br /&gt;&lt;blockquote&gt;SQL&gt; select city, dep, sum(sales),&lt;br /&gt;  2         decode(grouping_id(city, dep),&lt;br /&gt;  3                0,&lt;br /&gt;  4                round(2 * 100 * ratio_to_report(sum(sales))&lt;br /&gt;  5                      over(partition by city),&lt;br /&gt;  6                      2) || '%') perc&lt;br /&gt;  7    from t&lt;br /&gt;  8   where trunc(year, 'y') = to_date('01.01.2007', 'dd.mm.yyyy')&lt;br /&gt;  9   group by rollup(city, dep)&lt;br /&gt; 10  /&lt;br /&gt;&lt;br /&gt;CITY   DEP  SUM(SALES) PERC&lt;br /&gt;------ ---- ---------- ---------&lt;br /&gt;Moscow DEP1        762 26.04%&lt;br /&gt;Moscow DEP2        657 22.45%&lt;br /&gt;Moscow DEP3       1020 34.86%&lt;br /&gt;Moscow DEP4        487 16.64%&lt;br /&gt;Moscow            2926 &lt;br /&gt;Omsk   DEP1         34 8.44%&lt;br /&gt;Omsk   DEP2        213 52.85%&lt;br /&gt;Omsk   DEP3        156 38.71%&lt;br /&gt;Omsk               403 &lt;br /&gt;                  3329 &lt;br /&gt;&lt;br /&gt;10 rows selected&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;Let's take a closer look.&lt;br /&gt;First of all we use &lt;a href="http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/functions063.htm#sthref1438"&gt;GROUPING_ID&lt;/a&gt; function - to understand when the data is detailed (comparing it to 0). When it is detailed - we return the ratio of the value in the current row to the total sales of the region. Otherwise we return NULL.&lt;br /&gt;But we need to remember that analytic functions are executed after the grouping has been done. So when our ratio_to_report would be applied - we will have doubled sales for each city: on the one hand - the sales of each department, and on the other hand, the sum of all departments sales (as a city subtotal) will also influence our ratio_to_report.&lt;br /&gt;That's why we multiply the function by 2, so we get correct results.&lt;br /&gt;With multiplying by 100 we get percentage values.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Model solution&lt;/span&gt;&lt;br /&gt;For getting the same result with Model clause - first of all we need to add a new measure:&lt;blockquote&gt;cast(null as varchar2(7)) PERC&lt;/blockquote&gt;So it would be our percentage column in the result set:&lt;br /&gt;&lt;blockquote&gt;SQL&gt; select * from t where trunc(year,'y') = to_date('01.01.2007','dd.mm.yyyy')&lt;br /&gt;  2   model&lt;br /&gt;  3    dimension by (0 total,city, dep)&lt;br /&gt;  4    measures(sales, cast(null as varchar2(7)) perc)&lt;br /&gt;  5    rules upsert all&lt;br /&gt;  6     (sales[1,any,null]=sum(sales)[0,CV(),any],&lt;br /&gt;  7      sales[1,null,null]=sum(sales)[0,any,any])&lt;br /&gt;  8      order by 2,3&lt;br /&gt;  9  /&lt;br /&gt;&lt;br /&gt;     TOTAL CITY   DEP       SALES PERC&lt;br /&gt;---------- ------ ---- ---------- -------&lt;br /&gt;         0 Moscow DEP1        762 &lt;br /&gt;         0 Moscow DEP2        657 &lt;br /&gt;         0 Moscow DEP3       1020 &lt;br /&gt;         0 Moscow DEP4        487 &lt;br /&gt;         1 Moscow            2926 &lt;br /&gt;         0 Omsk   DEP1         34 &lt;br /&gt;         0 Omsk   DEP2        213 &lt;br /&gt;         0 Omsk   DEP3        156 &lt;br /&gt;         1 Omsk               403 &lt;br /&gt;         1                   3329 &lt;br /&gt;&lt;br /&gt;10 rows selected&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;Now we need to create a rule for our column.&lt;br /&gt;So first of all we need to calculate values for all the cells with detail data. On the language of our model it would be perc[0,any,any]. By using a "Total" dimension with value=0 we will get only cells with detailed data. City and Department could be any. That is what we put on the left side of our formula.&lt;br /&gt;&lt;br /&gt;On the right side we need to find for each detailed row a ratio of the current row sales - to it's region sales total.&lt;br /&gt;So we would write sales[0,CV(),CV()]/sales[1,CV(),null]&lt;br /&gt;In the denominator we just put a reference to the cell with a region total. While it has been already calculated - we don't need to sum up all the sales of a particular city again.&lt;br /&gt;&lt;blockquote&gt;SQL&gt; select * from t where trunc(year,'y') = to_date('01.01.2007','dd.mm.yyyy')&lt;br /&gt;  2   model&lt;br /&gt;  3    dimension by (0 total,city, dep)&lt;br /&gt;  4    measures(sales, cast(null as varchar2(7)) perc)&lt;br /&gt;  5    rules upsert all&lt;br /&gt;  6     (sales[1,any,null]=sum(sales)[0,CV(),any],&lt;br /&gt;  7      sales[1,null,null]=sum(sales)[0,any,any],&lt;br /&gt;  8      perc[0,any,any]=round(sales[0,CV(),CV()]/sales[1,CV(),null]*100,2)||'%')&lt;br /&gt;  9      order by 2,3&lt;br /&gt; 10  /&lt;br /&gt;&lt;br /&gt;     TOTAL CITY   DEP       SALES PERC&lt;br /&gt;---------- ------ ---- ---------- -------&lt;br /&gt;         0 Moscow DEP1        762 26.04%&lt;br /&gt;         0 Moscow DEP2        657 22.45%&lt;br /&gt;         0 Moscow DEP3       1020 34.86%&lt;br /&gt;         0 Moscow DEP4        487 16.64%&lt;br /&gt;         0 Moscow                 %&lt;br /&gt;         1 Moscow            2926 &lt;br /&gt;         0 Omsk   DEP1         34 8.44%&lt;br /&gt;         0 Omsk   DEP2        213 52.85%&lt;br /&gt;         0 Omsk   DEP3        156 38.71%&lt;br /&gt;         0 Omsk                   %&lt;br /&gt;         1 Omsk               403 &lt;br /&gt;         1                   3329 &lt;br /&gt;         0                        %&lt;br /&gt;&lt;br /&gt;13 rows selected&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;Ooops. As you can see there are three redundant rows with only a '%' in our new column.&lt;br /&gt;Why did they appear? Because we placed UPSERT ALL.&lt;br /&gt;If we don't want a particular rule to be under a general semantic, which is mentioned after the RULES word, we can put a needed behaviour name directly before the rule. In our case it is UPDATE (no new rows generation -&gt; only update, for more information look &lt;a href="http://volder-notes.blogspot.com/2007/12/reports-totals-subtotals.html"&gt;Reports: totals &amp; subtotals&lt;/a&gt;):&lt;br /&gt;&lt;blockquote&gt;SQL&gt; select * from t where trunc(year,'y') = to_date('01.01.2007','dd.mm.yyyy')&lt;br /&gt;  2   model&lt;br /&gt;  3    dimension by (0 total,city, dep)&lt;br /&gt;  4    measures(sales, cast(null as varchar2(7)) perc)&lt;br /&gt;  5    rules upsert all&lt;br /&gt;  6     (sales[1,any,null]=sum(sales)[0,CV(),any],&lt;br /&gt;  7      sales[1,null,null]=sum(sales)[0,any,any],&lt;br /&gt;  8      update perc[0,any,any]=round(sales[0,CV(),CV()]/sales[1,CV(),null]*100,2)||'%')&lt;br /&gt;  9      order by 2,3&lt;br /&gt; 10  /&lt;br /&gt;&lt;br /&gt;     TOTAL CITY   DEP       SALES PERC&lt;br /&gt;---------- ------ ---- ---------- -------&lt;br /&gt;         0 Moscow DEP1        762 26.04%&lt;br /&gt;         0 Moscow DEP2        657 22.45%&lt;br /&gt;         0 Moscow DEP3       1020 34.86%&lt;br /&gt;         0 Moscow DEP4        487 16.64%&lt;br /&gt;         1 Moscow            2926 &lt;br /&gt;         0 Omsk   DEP1         34 8.44%&lt;br /&gt;         0 Omsk   DEP2        213 52.85%&lt;br /&gt;         0 Omsk   DEP3        156 38.71%&lt;br /&gt;         1 Omsk               403 &lt;br /&gt;         1                   3329 &lt;br /&gt;&lt;br /&gt;10 rows selected&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;&lt;br /&gt;This is actually what we were looking for!&lt;br /&gt;&lt;br /&gt;&lt;a href="http://volder-notes.blogspot.com/2007/12/reports-forecasting.html"&gt;to be continued ...&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3060148920065298017-4756273862079174648?l=volder-notes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://volder-notes.blogspot.com/feeds/4756273862079174648/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3060148920065298017&amp;postID=4756273862079174648' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3060148920065298017/posts/default/4756273862079174648'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3060148920065298017/posts/default/4756273862079174648'/><link rel='alternate' type='text/html' href='http://volder-notes.blogspot.com/2007/12/reports-ratios-percentages.html' title='Reports: ratios &amp; percentages'/><author><name>Volder</name><uri>http://www.blogger.com/profile/15557750923919600413</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://i049.radikal.ru/0712/26/a24baea7e400.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3060148920065298017.post-1739918413774437916</id><published>2007-12-09T18:20:00.000+03:00</published><updated>2008-01-19T22:55:39.440+03:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Model'/><category scheme='http://www.blogger.com/atom/ns#' term='Reports'/><title type='text'>Reports: totals &amp; subtotals</title><content type='html'>Starting from the current post I decided to write several notes on queries with &lt;a href="http://download.oracle.com/docs/cd/B19306_01/server.102/b14223/sqlmodel.htm#sthref1856"&gt;Model clause&lt;/a&gt;. Mostly they would be dedicated to the topics of producing diverse reports.&lt;br /&gt;In each case I'll give an opportunity how it can be done without using the model clause. And then - how the model clause could be used. So let's begin.&lt;br /&gt;Input data:&lt;br /&gt;&lt;blockquote&gt;SQL&gt; drop table t;&lt;br /&gt;&lt;br /&gt;Table dropped&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;br /&gt;SQL&gt; create table t as (select 'Omsk' city, 'DEP1' dep, to_date('31.12.2006','dd.mm.yyyy') year, 23 sales from dual union all&lt;br /&gt;  2                     select 'Omsk' city, 'DEP2' dep, to_date('31.12.2006','dd.mm.yyyy') year, 154 sales from dual union all&lt;br /&gt;  3                     select 'Omsk' city, 'DEP3' dep, to_date('31.12.2006','dd.mm.yyyy') year, 87 sales from dual union all&lt;br /&gt;  4                     select 'Moscow' city, 'DEP1' dep, to_date('31.12.2006','dd.mm.yyyy') year, 562 sales from dual union all&lt;br /&gt;  5                     select 'Moscow' city, 'DEP2' dep, to_date('31.12.2006','dd.mm.yyyy') year, 457 sales from dual union all&lt;br /&gt;  6                     select 'Moscow' city, 'DEP3' dep, to_date('31.12.2006','dd.mm.yyyy') year, 890 sales from dual union all&lt;br /&gt;  7                     select 'Moscow' city, 'DEP4' dep, to_date('31.12.2006','dd.mm.yyyy') year, 345 sales from dual union all&lt;br /&gt;  8                     select 'Omsk' city, 'DEP1' dep, to_date('31.12.2007','dd.mm.yyyy') year, 34 sales from dual union all&lt;br /&gt;  9                     select 'Omsk' city, 'DEP2' dep, to_date('31.12.2007','dd.mm.yyyy') year, 213 sales from dual union all&lt;br /&gt; 10                     select 'Omsk' city, 'DEP3' dep, to_date('31.12.2007','dd.mm.yyyy') year, 156 sales from dual union all&lt;br /&gt; 11                     select 'Moscow' city, 'DEP1' dep, to_date('31.12.2007','dd.mm.yyyy') year, 762 sales from dual union all&lt;br /&gt; 12                     select 'Moscow' city, 'DEP2' dep, to_date('31.12.2007','dd.mm.yyyy') year, 657 sales from dual union all&lt;br /&gt; 13                     select 'Moscow' city, 'DEP3' dep, to_date('31.12.2007','dd.mm.yyyy') year, 1020 sales from dual union all&lt;br /&gt; 14                     select 'Moscow' city, 'DEP4' dep, to_date('31.12.2007','dd.mm.yyyy') year, 487 sales from dual)&lt;br /&gt; 15  /&lt;br /&gt;&lt;br /&gt;Table created&lt;br /&gt;&lt;br /&gt;SQL&gt; commit;&lt;br /&gt;&lt;br /&gt;Commit complete&lt;br /&gt;&lt;br /&gt;SQL&gt;&lt;/blockquote&gt;&lt;blockquote&gt;SQL&gt;select * from t&lt;br /&gt;  2   order by 3,1,2&lt;br /&gt;  3  /&lt;br /&gt;&lt;br /&gt;CITY   DEP  YEAR             SALES&lt;br /&gt;------ ---- ----------- ----------&lt;br /&gt;Moscow DEP1 31.12.2006         562&lt;br /&gt;Moscow DEP2 31.12.2006         457&lt;br /&gt;Moscow DEP3 31.12.2006         890&lt;br /&gt;Moscow DEP4 31.12.2006         345&lt;br /&gt;Omsk   DEP1 31.12.2006          23&lt;br /&gt;Omsk   DEP2 31.12.2006         154&lt;br /&gt;Omsk   DEP3 31.12.2006          87&lt;br /&gt;Moscow DEP1 31.12.2007         762&lt;br /&gt;Moscow DEP2 31.12.2007         657&lt;br /&gt;Moscow DEP3 31.12.2007        1020&lt;br /&gt;Moscow DEP4 31.12.2007         487&lt;br /&gt;Omsk   DEP1 31.12.2007          34&lt;br /&gt;Omsk   DEP2 31.12.2007         213&lt;br /&gt;Omsk   DEP3 31.12.2007         156&lt;br /&gt;&lt;br /&gt;14 rows selected&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;This is a table of sales for a company in Russia, which has it's departments in two cities: Omsk (Dep 1,2,3) and in Moscow (Dep 1,2,3,4). The data about sales is given for two years: 2006 and 2007.&lt;br /&gt;&lt;br /&gt;Now we want to get sales for the year 2007. But we need not only detailed sales (as in a table), but also subtotals for each city. And a total for the whole company.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Non-Model Solution:&lt;/span&gt;&lt;br /&gt;Usually such reports are created by using extensions to Group By: &lt;a href="http://download.oracle.com/docs/cd/B19306_01/server.102/b14223/aggreg.htm#sthref1621"&gt;Cube&lt;/a&gt;, &lt;a href="http://download.oracle.com/docs/cd/B19306_01/server.102/b14223/aggreg.htm#sthref1612"&gt;Rollup&lt;/a&gt; &amp; &lt;a href="http://download.oracle.com/docs/cd/B19306_01/server.102/b14223/aggreg.htm#sthref1648"&gt;Grouping Sets expression&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;So our query would look like:&lt;br /&gt;&lt;blockquote&gt;SQL&gt;  select city, dep, sum(sales) sales&lt;br /&gt;  2     from t&lt;br /&gt;  3    where trunc(year, 'y') = to_date('01.01.2007', 'dd.mm.yyyy')&lt;br /&gt;  4    group by rollup(city, dep)&lt;br /&gt;  5    order by 1, 2&lt;br /&gt;  6  /&lt;br /&gt;&lt;br /&gt;CITY   DEP       SALES&lt;br /&gt;------ ---- ----------&lt;br /&gt;Moscow DEP1        762&lt;br /&gt;Moscow DEP2        657&lt;br /&gt;Moscow DEP3       1020&lt;br /&gt;Moscow DEP4        487&lt;br /&gt;Moscow            2926&lt;br /&gt;Omsk   DEP1         34&lt;br /&gt;Omsk   DEP2        213&lt;br /&gt;Omsk   DEP3        156&lt;br /&gt;Omsk               403&lt;br /&gt;                  3329&lt;br /&gt;&lt;br /&gt;10 rows selected&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Model solution:&lt;/span&gt;&lt;br /&gt;To create the same report using model clause, we have to execute the following query:&lt;br /&gt;&lt;blockquote&gt;SQL&gt; select city,dep,sales&lt;br /&gt;  2    from t&lt;br /&gt;  3   where trunc(year,'y') = to_date('01.01.2007','dd.mm.yyyy')&lt;br /&gt;  4     model&lt;br /&gt;  5      dimension by (0 total,city, dep)&lt;br /&gt;  6      measures(sales)&lt;br /&gt;  7      rules upsert all&lt;br /&gt;  8       (sales[1,any,null]=sum(sales)[0,CV(),any],&lt;br /&gt;  9        sales[1,null,null]=sum(sales)[0,any,any])&lt;br /&gt; 10        order by 1,2&lt;br /&gt; 11  /&lt;br /&gt;&lt;br /&gt;CITY   DEP       SALES&lt;br /&gt;------ ---- ----------&lt;br /&gt;Moscow DEP1        762&lt;br /&gt;Moscow DEP2        657&lt;br /&gt;Moscow DEP3       1020&lt;br /&gt;Moscow DEP4        487&lt;br /&gt;Moscow            2926&lt;br /&gt;Omsk   DEP1         34&lt;br /&gt;Omsk   DEP2        213&lt;br /&gt;Omsk   DEP3        156&lt;br /&gt;Omsk               403&lt;br /&gt;                  3329&lt;br /&gt;&lt;br /&gt;10 rows selected&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;Let's elaborate how it works.&lt;br /&gt;&lt;br /&gt;Using the condition in the WHERE clause - we get only data for the year 2007 in the result.&lt;br /&gt;Then we create dimensions: &lt;span style="font-style:italic;"&gt;City&lt;/span&gt; and &lt;span style="font-style:italic;"&gt;Dep&lt;/span&gt; - are the dimensions from the table. But we add one more dimension called &lt;span style="font-style:italic;"&gt;Total&lt;/span&gt; and by default assign value 0 to all the cells for that dimension.&lt;br /&gt;As a measure we take the field from our table &lt;span style="font-style:italic;"&gt;Sales&lt;/span&gt;.&lt;br /&gt;So at this stage we get as a result:&lt;br /&gt;&lt;blockquote&gt;SQL&gt; select *&lt;br /&gt;  2    from t&lt;br /&gt;  3   where trunc(year,'y') = to_date('01.01.2007','dd.mm.yyyy')&lt;br /&gt;  4     model&lt;br /&gt;  5      dimension by (0 total,city, dep)&lt;br /&gt;  6      measures(sales)&lt;br /&gt;  7      ()&lt;br /&gt;  8     order by 2,3&lt;br /&gt;  9  /&lt;br /&gt;&lt;br /&gt;     TOTAL CITY   DEP       SALES&lt;br /&gt;---------- ------ ---- ----------&lt;br /&gt;         0 Moscow DEP1        762&lt;br /&gt;         0 Moscow DEP2        657&lt;br /&gt;         0 Moscow DEP3       1020&lt;br /&gt;         0 Moscow DEP4        487&lt;br /&gt;         0 Omsk   DEP1         34&lt;br /&gt;         0 Omsk   DEP2        213&lt;br /&gt;         0 Omsk   DEP3        156&lt;br /&gt;&lt;br /&gt;7 rows selected&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;Now we need to create extra lines for the subtotals and total.&lt;br /&gt;We use rules for this.&lt;br /&gt;You should remember that there are two ways to refer to a cell in a rule of a model clause: positional or symbolic.&lt;br /&gt;&lt;br /&gt;Example of positional: sales['Omsk','DEP3'] or sales['Omsk','DEP4'].&lt;br /&gt;When a cell which is positionally referenced exists - it's value is updated. When it doesn't exist - the model clause will create a cell with such dimensions.&lt;br /&gt;&lt;br /&gt;Example of symbolic reference:&lt;br /&gt;sales[city='Omsk',dep='DEP3'], sales[city is any,dep='DEP3'] ,sales[any,dep='DEP3'].&lt;br /&gt;Actually the last one [any] is a positional reference - but I put it in the symbolic group because there's a remark in the documentation:&lt;br /&gt;&lt;span style="font-style:italic;"&gt;Note that ANY is treated as a symbolic reference even if it is specified positionally, because it really means that (dimension IS NOT NULL OR dimension IS NULL).&lt;/span&gt;&lt;br /&gt;A typical behaviour for symbolically referenced cells: when it exists - the value gets updated. When it doesn't exist - no cells are created.&lt;br /&gt;&lt;br /&gt;Don't forget about the cells which are referenced symbolically and positionally at the same time. By default they behave like all the references are symbolic.&lt;br /&gt;&lt;br /&gt;Well, actually we could use 3 types of behaviour: UPDATE, UPSERT and UPSERT ALL.&lt;br /&gt;For explicit determining we should place the name of behaviour directly after the RULES keyword. In this case it would be adjusted to all the rules.&lt;br /&gt;If we want to specify a particular rule - we can put the name of behaviour right before the rule (we'll see how it works in the next post).&lt;br /&gt;&lt;br /&gt;If we don't put any - the default behaviour UPSERT is used. How it works - I mentioned earlier.&lt;br /&gt;&lt;br /&gt;When we use UPDATE - all cells are got updated, and if they don't exist - no cells are created, whichever reference (positional or symbolic) is used.&lt;br /&gt;&lt;br /&gt;UPSERT ALL is a scpecial case - it can create cells even for symbolic reference.&lt;br /&gt;But not in all cases, for example - it wouldn't create any cells if we put a rule:&lt;br /&gt;sales[city='Voronezh','DEP5']=100 (because there's no city called 'Voronezh' in our table), but at the same time it would create a cell (which doesn't exist) for the rule sales[city='Omsk','DEP5']=100.&lt;br /&gt;Why?&lt;br /&gt;Because &lt;a href="http://download.oracle.com/docs/cd/B19306_01/server.102/b14223/sqlmodel.htm#BEIJAFFG"&gt;UPSERT ALL&lt;/a&gt; works in the following way (from the doc):&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style:italic;"&gt;Step 1 Find Cells&lt;/span&gt;&lt;br /&gt;Find the existing cells that satisfy all the symbolic predicates of the cell reference.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style:italic;"&gt;Step 2 Find Distinct Dimension Values&lt;/span&gt;&lt;br /&gt;Using just the dimensions that have symbolic references, find the distinct dimension value combinations of these cells.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style:italic;"&gt;Step 3 Perform a Cross Product&lt;/span&gt;&lt;br /&gt;Perform a cross product of these value combinations with the dimension values specified through positional references.&lt;br /&gt;&lt;span style="font-style:italic;"&gt;&lt;br /&gt;Step 4 Upsert New Cells&lt;/span&gt;&lt;br /&gt;The results of Step 3 are then used to upsert new cells into the array &lt;br /&gt;&lt;br /&gt;Just look:&lt;br /&gt;&lt;blockquote&gt;SQL&gt; select *&lt;br /&gt;  2    from t&lt;br /&gt;  3   where trunc(year,'y') = to_date('01.01.2007','dd.mm.yyyy')&lt;br /&gt;  4     model&lt;br /&gt;  5      return updated rows&lt;br /&gt;  6      dimension by (0 total,city, dep)&lt;br /&gt;  7      measures(sales)&lt;br /&gt;  8       rules upsert all&lt;br /&gt;  9      (sales[0,city='Omsk','DEP5']=100,&lt;br /&gt; 10       sales[0,city='Voronezh','DEP5']=200)&lt;br /&gt; 11     order by 2,3&lt;br /&gt; 12  /&lt;br /&gt;&lt;br /&gt;     TOTAL CITY   DEP       SALES&lt;br /&gt;---------- ------ ---- ----------&lt;br /&gt;         0 Omsk   DEP5        100&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;By putting RETURN UPDATED ROWS we get as a result only updated or inserted cells.&lt;br /&gt;So as you can see despite we had a rule with Voronezh - no cells were inserted.&lt;br /&gt;&lt;br /&gt;Let's get back to our subtotals and total.&lt;br /&gt;We had two rules:&lt;br /&gt;&lt;blockquote&gt;sales[1,any,null]=sum(sales)[0,CV(),any],&lt;br /&gt;sales[1,null,null]=sum(sales)[0,any,any]&lt;/blockquote&gt;&lt;br /&gt;But as you remember [any] is treated as a symbolic reference - and won't create any cells if we don't put UPSERT ALL semantics.&lt;br /&gt;&lt;blockquote&gt;SQL&gt; select city,dep,sales&lt;br /&gt;  2    from t&lt;br /&gt;  3   where trunc(year,'y') = to_date('01.01.2007','dd.mm.yyyy')&lt;br /&gt;  4     model return updated rows&lt;br /&gt;  5      dimension by (0 total,city, dep)&lt;br /&gt;  6      measures(sales)&lt;br /&gt;  7      rules&lt;br /&gt;  8       (sales[1,any,null]=sum(sales)[0,CV(),any])&lt;br /&gt;  9        order by 1,2&lt;br /&gt; 10  /&lt;br /&gt;&lt;br /&gt;CITY   DEP       SALES&lt;br /&gt;------ ---- ----------&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;If we put UPSERT ALL semantics, our rules would do the following:&lt;br /&gt;&lt;br /&gt;First rule: sales[1,any,null] will look for all dimension values that are referenced symbolically - city dimension: Omsk and Moscow.&lt;br /&gt;Then create a cross product with positionally referenced dimensions: we'll get two cells - sales[1,'Omsk',null] and sales[1,'Moscow',null]&lt;br /&gt;And these cells would be inserted.&lt;br /&gt;BTW the dimension total=1 would mean that these are totals or subtotals.&lt;br /&gt;&lt;br /&gt;Next rule sales[1,null,null] - will create a company total cell.&lt;br /&gt;On the right side of the rules we have grouping formulas which will sum up all the sales values needed. By using CV() we reference to the dimension value of the cell on the left side of the rule.&lt;br /&gt;&lt;br /&gt;PS&lt;br /&gt;We could get the same result without Total dimension:&lt;br /&gt;&lt;blockquote&gt;SQL&gt; select city,dep,sales&lt;br /&gt;  2    from t&lt;br /&gt;  3   where trunc(year,'y') = to_date('01.01.2007','dd.mm.yyyy')&lt;br /&gt;  4     model&lt;br /&gt;  5      dimension by (city, dep)&lt;br /&gt;  6      measures(sales)&lt;br /&gt;  7      rules upsert all&lt;br /&gt;  8       (sales[any,null]=sum(sales)[CV(),any],&lt;br /&gt;  9        sales[null,null]=sum(sales)[any,null])&lt;br /&gt; 10        order by 1,2&lt;br /&gt; 11  /&lt;br /&gt;&lt;br /&gt;CITY   DEP       SALES&lt;br /&gt;------ ---- ----------&lt;br /&gt;Moscow DEP1        762&lt;br /&gt;Moscow DEP2        657&lt;br /&gt;Moscow DEP3       1020&lt;br /&gt;Moscow DEP4        487&lt;br /&gt;Moscow            2926&lt;br /&gt;Omsk   DEP1         34&lt;br /&gt;Omsk   DEP2        213&lt;br /&gt;Omsk   DEP3        156&lt;br /&gt;Omsk               403&lt;br /&gt;                  3329&lt;br /&gt;&lt;br /&gt;10 rows selected&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;&lt;br /&gt;Here the rules are a little bit different.&lt;br /&gt;The company Total is calculated from the cities subtotals, that has already been calculated, while in the main part (with Total dimension) we calculated all totals and subtotals on detailed sales data.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://volder-notes.blogspot.com/2007/12/reports-ratios-percentages.html"&gt;to be continued ...&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3060148920065298017-1739918413774437916?l=volder-notes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://volder-notes.blogspot.com/feeds/1739918413774437916/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3060148920065298017&amp;postID=1739918413774437916' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3060148920065298017/posts/default/1739918413774437916'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3060148920065298017/posts/default/1739918413774437916'/><link rel='alternate' type='text/html' href='http://volder-notes.blogspot.com/2007/12/reports-totals-subtotals.html' title='Reports: totals &amp; subtotals'/><author><name>Volder</name><uri>http://www.blogger.com/profile/15557750923919600413</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://i049.radikal.ru/0712/26/a24baea7e400.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3060148920065298017.post-4750390607226563668</id><published>2007-12-06T13:40:00.000+03:00</published><updated>2007-12-19T01:21:28.632+03:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Hierarchy'/><title type='text'>Summation of tree values</title><content type='html'>&lt;strong&gt;Input data:&lt;/strong&gt;&lt;br /&gt;&lt;blockquote&gt;SQL&gt; drop table t;&lt;br /&gt;&lt;br /&gt;Table dropped&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;br /&gt;SQL&gt; create table t as (select 1 id,  20 val from dual union all&lt;br /&gt;  2                     select 2 id,  30 val from dual union all&lt;br /&gt;  3                     select 3 id,  15 val from dual union all&lt;br /&gt;  4                     select 4 id,  100 val from dual)&lt;br /&gt;  5  /&lt;br /&gt;&lt;br /&gt;Table created&lt;br /&gt;&lt;br /&gt;SQL&gt; commit;&lt;br /&gt;&lt;br /&gt;Commit complete&lt;br /&gt;&lt;br /&gt;SQL&gt; select * from t;&lt;br /&gt;&lt;br /&gt;        ID        VAL&lt;br /&gt;---------- ----------&lt;br /&gt;         1         20&lt;br /&gt;         2         30&lt;br /&gt;         3         15&lt;br /&gt;         4        100&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;&lt;strong&gt;Problem:&lt;/strong&gt;&lt;br /&gt;You want to build a hierarchy on that data and get the sum of the values of all preceding ancestors including current value.&lt;br /&gt;&lt;br /&gt;E.g. your hierarchy is:&lt;br /&gt;&lt;blockquote&gt;SQL&gt; select t1.*, level, sys_connect_by_path(id,'/') path&lt;br /&gt;  2    from t t1&lt;br /&gt;  3  connect by prior id = id - 1&lt;br /&gt;  4   start with id = 1&lt;br /&gt;  5  /&lt;br /&gt;&lt;br /&gt;        ID        VAL      LEVEL PATH&lt;br /&gt;---------- ---------- ---------- --------------&lt;br /&gt;         1         20          1 /1&lt;br /&gt;         2         30          2 /1/2&lt;br /&gt;         3         15          3 /1/2/3&lt;br /&gt;         4        100          4 /1/2/3/4&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;Solution here could probably be a scalar subquery, which will build tree in a backward direction starting from the current id and will compute sum of all the childs (ancestors in the outer query) values:&lt;br /&gt;&lt;blockquote&gt;SQL&gt; select t1.*,&lt;br /&gt;  2         level,&lt;br /&gt;  3         sys_connect_by_path(id,'/') path,&lt;br /&gt;  4         (select sum(val)&lt;br /&gt;  5            from t t2&lt;br /&gt;  6          connect by prior id = id + 1&lt;br /&gt;  7           start with t2.id = t1.id) summ&lt;br /&gt;  8    from t t1&lt;br /&gt;  9  connect by prior id = id - 1&lt;br /&gt; 10   start with id = 1&lt;br /&gt; 11  /&lt;br /&gt;&lt;br /&gt;        ID        VAL      LEVEL PATH                SUMM&lt;br /&gt;---------- ---------- ---------- ------------- ----------&lt;br /&gt;         1         20          1 /1                    20&lt;br /&gt;         2         30          2 /1/2                  50&lt;br /&gt;         3         15          3 /1/2/3                65&lt;br /&gt;         4        100          4 /1/2/3/4             165&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;So while we have 'prior id = id - 1' rule in the outer query we put 'prior id = id + 1' in a scalar subquery to move backward.&lt;br /&gt;&lt;br /&gt;But if the query was built in the following way:&lt;br /&gt;&lt;blockquote&gt;SQL&gt; select t1.*,&lt;br /&gt;  2         level,&lt;br /&gt;  3         sys_connect_by_path(id, '/') path&lt;br /&gt;  4    from t t1&lt;br /&gt;  5  connect by prior id &lt; id&lt;br /&gt;  6   start with id = 1&lt;br /&gt;  7  /&lt;br /&gt;&lt;br /&gt;        ID        VAL      LEVEL PATH&lt;br /&gt;---------- ---------- ---------- -----------------&lt;br /&gt;         1         20          1 /1&lt;br /&gt;         2         30          2 /1/2&lt;br /&gt;         3         15          3 /1/2/3&lt;br /&gt;         4        100          4 /1/2/3/4&lt;br /&gt;         4        100          3 /1/2/4&lt;br /&gt;         3         15          2 /1/3&lt;br /&gt;         4        100          3 /1/3/4&lt;br /&gt;         4        100          2 /1/4&lt;br /&gt;&lt;br /&gt;8 rows selected&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;What should we put in the connect by part of a scalar subquery?&lt;br /&gt;If we reverse condition to 'prior id &gt; id' in the subquery - we get wrong result. And it is not needed to be clarified why.&lt;br /&gt;&lt;blockquote&gt;SQL&gt; select t1.*,&lt;br /&gt;  2         level,&lt;br /&gt;  3         sys_connect_by_path(id,'/') path,&lt;br /&gt;  4         (select sum(val)&lt;br /&gt;  5            from t t2&lt;br /&gt;  6          connect by prior id &gt; id&lt;br /&gt;  7           start with t2.id = t1.id) summ&lt;br /&gt;  8    from t t1&lt;br /&gt;  9  connect by prior id &lt; id&lt;br /&gt; 10   start with id = 1&lt;br /&gt; 11  /&lt;br /&gt;&lt;br /&gt;        ID        VAL      LEVEL PATH               SUMM&lt;br /&gt;---------- ---------- ---------- ------------ ----------&lt;br /&gt;         1         20          1 /1                   20&lt;br /&gt;         2         30          2 /1/2                 50&lt;br /&gt;         3         15          3 /1/2/3               85&lt;br /&gt;         4        100          4 /1/2/3/4            255&lt;br /&gt;         4        100          3 /1/2/4              255&lt;br /&gt;         3         15          2 /1/3                 85&lt;br /&gt;         4        100          3 /1/3/4              255&lt;br /&gt;         4        100          2 /1/4                255&lt;br /&gt;&lt;br /&gt;8 rows selected&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;We need to get a backward motion in a particular order. And what can show us the correct order - right, &lt;a href="http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/functions164.htm#sthref2194"&gt;SYS_CONNECT_BY_PATH&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;The solution could be:&lt;br /&gt;&lt;blockquote&gt;SQL&gt;  select t1.*,&lt;br /&gt;  2          (select sum(val)&lt;br /&gt;  3             from t t2&lt;br /&gt;  4           connect by level &lt;= t1.lvl&lt;br /&gt;  5                  and id = regexp_substr(t1.path,'[^/]+',1,t1.lvl - level + 1)&lt;br /&gt;  6            start with t2.id = t1.id) summ&lt;br /&gt;  7     from (select t.*, level lvl, sys_connect_by_path(id, '/') path&lt;br /&gt;  8             from t&lt;br /&gt;  9           connect by prior id &lt; id&lt;br /&gt; 10            start with id = 1) t1&lt;br /&gt; 11  /&lt;br /&gt;&lt;br /&gt;        ID        VAL        LVL PATH               SUMM&lt;br /&gt;---------- ---------- ---------- ------------ ----------&lt;br /&gt;         1         20          1 /1                   20&lt;br /&gt;         2         30          2 /1/2                 50&lt;br /&gt;         3         15          3 /1/2/3               65&lt;br /&gt;         4        100          4 /1/2/3/4            165&lt;br /&gt;         4        100          3 /1/2/4              150&lt;br /&gt;         3         15          2 /1/3                 35&lt;br /&gt;         4        100          3 /1/3/4              135&lt;br /&gt;         4        100          2 /1/4                120&lt;br /&gt;&lt;br /&gt;8 rows selected&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;So at every level - we cut out and determine what the ID should be.&lt;br /&gt;&lt;br /&gt;BTW Put these conditions exactly in that order:&lt;br /&gt;&lt;blockquote&gt;...level &lt;= t1.lvl and id = regexp_substr(t1.path,'[^/]+',1,t1.lvl - level + 1)...&lt;/blockquote&gt;if you put it &lt;blockquote&gt;...id = regexp_substr(t1.path,'[^/]+',1,t1.lvl - level + 1)and level &lt;= t1.lvl...&lt;/blockquote&gt; you'll get:&lt;br /&gt;&lt;blockquote&gt;ORA-01428: argument '0' is out of range&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;PS Later I wrote another note on the same kinda a problem, which you can discover here: &lt;a href="http://volder-notes.blogspot.com/2007/12/reports-getting-total-of-all-children.html"&gt;Reports: getting total of all children values in a tree&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3060148920065298017-4750390607226563668?l=volder-notes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://volder-notes.blogspot.com/feeds/4750390607226563668/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3060148920065298017&amp;postID=4750390607226563668' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3060148920065298017/posts/default/4750390607226563668'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3060148920065298017/posts/default/4750390607226563668'/><link rel='alternate' type='text/html' href='http://volder-notes.blogspot.com/2007/12/summation-of-tree-values.html' title='Summation of tree values'/><author><name>Volder</name><uri>http://www.blogger.com/profile/15557750923919600413</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://i049.radikal.ru/0712/26/a24baea7e400.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3060148920065298017.post-7951552847884430663</id><published>2007-12-02T22:55:00.000+03:00</published><updated>2007-12-05T19:19:59.764+03:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Regular Expressions'/><title type='text'>Regexp 4: number of occurence (advanced)</title><content type='html'>&lt;a href="http://volder-notes.blogspot.com/2007/11/regexp-3-number-of-pattern-occurence.html"&gt;... previous&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;In this post I'll talk about a technique - that I also learned from &lt;a href="http://forums.oracle.com/forums/profile.jspa?userID=57657"&gt;cd&lt;/a&gt;.&lt;br /&gt;In the previous post - we got familiar with a technique to find the number of occurrencies of a substring in a string.&lt;br /&gt;&lt;br /&gt;But there are some drawbacks, e.g.:&lt;br /&gt;&lt;blockquote&gt;SQL&gt; with t as (select '(111)111-11-11 words-non-stop(222)222-22-22' str from dual)&lt;br /&gt;  2   --&lt;br /&gt;  3   select str,&lt;br /&gt;  4          nvl(length(regexp_replace(str,&lt;br /&gt;  5                                    '\(\d{3}\)(\d)\d{2}-\d{2}-\d{2}|.',&lt;br /&gt;  6                                    '\1')),&lt;br /&gt;  7              0) occurrencies&lt;br /&gt;  8     from t&lt;br /&gt;  9  /&lt;br /&gt;&lt;br /&gt;STR                                         OCCURRENCIES&lt;br /&gt;------------------------------------------- ------------&lt;br /&gt;(111)111-11-11 words-non-stop(222)222-22-22            2&lt;br /&gt;&lt;br /&gt;SQL&gt;&lt;/blockquote&gt;&lt;br /&gt;As we can see here - we find both examples of phone numbers as fitting to our requirments. Actually, yes - there are two occurrencies of our pattern. But usually when it is combined with another alphanumeric value - we don't want to take it into account. So the result should be only 1 occurrence.&lt;br /&gt;&lt;br /&gt;Well, let's assign the task more concrete: our phone numbers should be preceded or trailed by one of the following: space, comma or semicolon. And it also can be the first or the last structure in a string.&lt;br /&gt;&lt;br /&gt;So we may think, that if we add such symbols before and after our pattern - it can help us:&lt;br /&gt;&lt;blockquote&gt;SQL&gt; with t as (select '(111)111-11-11 words-non-stop(222)222-22-22' str from dual)&lt;br /&gt;  2   --&lt;br /&gt;  3   select str,&lt;br /&gt;  4          nvl(length(regexp_replace(str,&lt;br /&gt;  5                                    '(^|[ ,;])\(\d{3}\)(\d)\d{2}-\d{2}-\d{2}([ ,;]|$)|.',&lt;br /&gt;  6                                    '\2')),&lt;br /&gt;  7              0) occurrencies&lt;br /&gt;  8     from t&lt;br /&gt;  9  /&lt;br /&gt;&lt;br /&gt;STR                                         OCCURRENCIES&lt;br /&gt;------------------------------------------- ------------&lt;br /&gt;(111)111-11-11 words-non-stop(222)222-22-22            1&lt;br /&gt;&lt;br /&gt;SQL&gt;&lt;/blockquote&gt;&lt;br /&gt;Yep, in that case it was helpful.&lt;br /&gt;&lt;br /&gt;But if we take an example, when two phone numbers are coming one after another - separated by one space - it is not working:&lt;br /&gt;&lt;blockquote&gt;SQL&gt; with t as (select '(111)111-11-11 (222)222-22-22' str from dual)&lt;br /&gt;  2   --&lt;br /&gt;  3   select str,&lt;br /&gt;  4          nvl(length(regexp_replace(str,&lt;br /&gt;  5                                    '(^|[ ,;])\(\d{3}\)(\d)\d{2}-\d{2}-\d{2}([ ,;]|$)|.',&lt;br /&gt;  6                                    '\2')),&lt;br /&gt;  7              0) occurrencies&lt;br /&gt;  8     from t&lt;br /&gt;  9  /&lt;br /&gt;&lt;br /&gt;STR                           OCCURRENCIES&lt;br /&gt;----------------------------- ------------&lt;br /&gt;(111)111-11-11 (222)222-22-22            1&lt;br /&gt;&lt;br /&gt;SQL&gt;&lt;/blockquote&gt;&lt;br /&gt;Why? because it works in the following way:&lt;br /&gt;1. Finds the first pattern '(111)111-11-11 '.&lt;br /&gt;As you see - there's a space at the end.&lt;br /&gt;2. Replaces it with backreference - which in this case would be '1'.&lt;br /&gt;3. Then goes to the rest of the string which is '(222)222-22-22' and it is not the beginning of the string.&lt;br /&gt;So such a structure doesn't suit to our pattern - because it requires - space, comma or colon at the beginning. And we got no space - cause it was taken by the previous structure '(111)111-11-11 '.&lt;br /&gt;4. The function replaces this structure out.&lt;br /&gt;5. We get 1 occurrence as a result, instead of 2.&lt;br /&gt;&lt;br /&gt;What we can do here:&lt;br /&gt;The technique is very interesting:&lt;br /&gt;Firstly we place all the patterns - which don't fulfil our requirments. We separate them with the pipe (|), which means OR on the language of regular expressions.&lt;br /&gt;Then we place our desired pattern. And at the end we place '|.' as usual :)&lt;br /&gt;&lt;br /&gt;So the generalised pattern would look in the following way:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;regexp_replace(your_string,'[symbol1](our_pattern)+|(our_pattern)[symbol2]+|(our_pattern)|.','\3')&lt;/blockquote&gt;&lt;br /&gt;where &lt;em&gt;symbol1 &lt;/em&gt;- symbol or list of symbols, that can't precede our pattern,&lt;br /&gt;&lt;em&gt;symbol2 &lt;/em&gt;- symbol or list of symbols, that can't trail our pattern.&lt;br /&gt;&lt;br /&gt;And when we determine the backreference - we would place a reference to the third our pattern.&lt;br /&gt;So when the regexp engine will meet the '[symbol1](our_pattern)+' or '(our_pattern)[symbol2]+', the regexp will remove such structures (actually, it will replace it with our backreference, which points to the 3rd structure - so it would be null). And when it will meet the needed pattern (third one) - it would leave it for us.&lt;br /&gt;The |. at the end - will remove everything - that doesn't suits the previous three structures.&lt;br /&gt;&lt;br /&gt;If we get back to our example, we get:&lt;br /&gt;&lt;blockquote&gt;SQL&gt; with t as (select '(111)111-11-11 (222)222-22-22' str from dual)&lt;br /&gt;  2   --&lt;br /&gt;  3   select str,&lt;br /&gt;  4          nvl(length(regexp_replace(str,&lt;br /&gt;  5                                    '[^ ,;](\(\d{3}\)\d{3}-\d{2}-\d{2})+|\(\d{3}\)\d{3}-\d{2}-\d{2}[^ ,;]+|\(\d{3}\)(\d)\d{2}-\d{2}-\d{2}|.',&lt;br /&gt;  6                                    '\2')),&lt;br /&gt;  7              0) occurrencies&lt;br /&gt;  8     from t&lt;br /&gt;  9  /&lt;br /&gt;&lt;br /&gt;STR                           OCCURRENCIES&lt;br /&gt;----------------------------- ------------&lt;br /&gt;(111)111-11-11 (222)222-22-22            2&lt;br /&gt;&lt;br /&gt;SQL&gt;&lt;/blockquote&gt;&lt;br /&gt;Ok. Let's investigate why we put additional '+' two times.&lt;br /&gt;The first one was when we put it in the first OR structure.&lt;br /&gt;Let's remove it and look at the following example:&lt;br /&gt;&lt;blockquote&gt;SQL&gt; with t as (select '(111)111-11-11 (222)222-22-22 3(333)333-33-33(444)444-44-44' str from dual)&lt;br /&gt;  2  --&lt;br /&gt;  3  select str,&lt;br /&gt;  4         nvl(length(regexp_replace(str,&lt;br /&gt;  5                                   '[^ ,;](\(\d{3}\)\d{3}-\d{2}-\d{2})|\(\d{3}\)\d{3}-\d{2}-\d{2}[^ ,;]+|\(\d{3}\)(\d)\d{2}-\d{2}-\d{2}|.',&lt;br /&gt;  6                                   '\2')),&lt;br /&gt;  7             0) occurrencies_INCORRECT,&lt;br /&gt;  8         nvl(length(regexp_replace(str,&lt;br /&gt;  9                                   '[^ ,;](\(\d{3}\)\d{3}-\d{2}-\d{2})&lt;span style="color:#ff0000;"&gt;+&lt;/span&gt;|\(\d{3}\)\d{3}-\d{2}-\d{2}[^ ,;]+|\(\d{3}\)(\d)\d{2}-\d{2}-\d{2}|.',&lt;br /&gt; 10                                   '\2')),&lt;br /&gt; 11             0) occurrencies_CORRECT&lt;br /&gt; 12    from t&lt;br /&gt; 13  /&lt;br /&gt;&lt;br /&gt;STR                                                         OCCURRENCIES_INCORRECT OCCURRENCIES_CORRECT&lt;br /&gt;----------------------------------------------------------- ---------------------- --------------------&lt;br /&gt;(111)111-11-11 (222)222-22-22 3(333)333-33-33(444)444-44-44                      3                    2&lt;br /&gt;&lt;br /&gt;SQL&gt;&lt;/blockquote&gt;&lt;br /&gt;Actually, when we removed '+' (occurrencies_incorrect field) we got 3 as a result insted of 2 - why?&lt;br /&gt;Well, the problem as how you guessed in the '3(333)333-33-33(444)444-44-44' part.&lt;br /&gt;If we don't put '+' in the '[^ ,;](\(\d{3}\)\d{3}-\d{2}-\d{2})' - it would meet 3(333)333-33-33 and stop. After this structure will be removed - the regexp is going forward.&lt;br /&gt;It finds (444)444-44-44.&lt;br /&gt;Well it doesn't suit to the '[^ ,;](\(\d{3}\)\d{3}-\d{2}-\d{2})' - because there's no symbol [^ ,;] before it (we removed 3(333)333-33-33).&lt;br /&gt;Then it checks whether it suits to '\(\d{3}\)\d{3}-\d{2}-\d{2}[^ ,;]+'. Again negative - cause there is no '[^ ,;]+' at the end. Finally it suits to our third structure '\(\d{3}\)(\d)\d{2}-\d{2}-\d{2}' - and that's why it hits the final result set.&lt;br /&gt;If we put '+' after the first pattern - it would mean one or more occurrences of our pattern. So the whole 3(333)333-33-33(444)444-44-44 will fall under the first structure - and gets removed.&lt;br /&gt;&lt;br /&gt;Let's continue with the second '+'.&lt;br /&gt;Imagine the following situation:&lt;br /&gt;&lt;blockquote&gt;SQL&gt; with t as (select '(111)111-11-11 (222)222-22-22 (333)333-33-33A(444)444-44-44' str from dual)&lt;br /&gt;  2  --&lt;br /&gt;  3  select str,&lt;br /&gt;  4         nvl(length(regexp_replace(str,&lt;br /&gt;  5                                   '[^ ,;](\(\d{3}\)\d{3}-\d{2}-\d{2})+|\(\d{3}\)\d{3}-\d{2}-\d{2}[^ ,;]|\(\d{3}\)(\d)\d{2}-\d{2}-\d{2}|.',&lt;br /&gt;  6                                   '\2')),&lt;br /&gt;  7             0) occurrencies_INCORRECT,&lt;br /&gt;  8         nvl(length(regexp_replace(str,&lt;br /&gt;  9                                   '[^ ,;](\(\d{3}\)\d{3}-\d{2}-\d{2})+|\(\d{3}\)\d{3}-\d{2}-\d{2}[^ ,;]&lt;span style="color:#ff0000;"&gt;+&lt;/span&gt;|\(\d{3}\)(\d)\d{2}-\d{2}-\d{2}|.',&lt;br /&gt; 10                                   '\2')),&lt;br /&gt; 11             0) occurrencies_CORRECT&lt;br /&gt; 12    from t&lt;br /&gt; 13  /&lt;br /&gt;&lt;br /&gt;STR                                                         OCCURRENCIES_INCORRECT OCCURRENCIES_CORRECT&lt;br /&gt;----------------------------------------------------------- ---------------------- --------------------&lt;br /&gt;(111)111-11-11 (222)222-22-22 (333)333-33-33A(444)444-44-44                      3                    2&lt;br /&gt;&lt;br /&gt;SQL&gt;&lt;/blockquote&gt;&lt;br /&gt;So the problem again in the last two phone numbers. Now we put an extra 'A' between them. What causes the problem in that case:&lt;br /&gt;If we don't place '+' after the '[^ ,;]' in the second structure,&lt;br /&gt;the pattern '\(\d{3}\)\d{3}-\d{2}-\d{2}[^ ,;]' will meet (333)333-33-33A, with extra 'A' at the end - and removes it all.&lt;br /&gt;So when it continues with the rest of the line it recognizes (444)444-44-44 as a valid phone number.&lt;br /&gt;But when we add a '+' after '[^ ,;]' the pattern '\(\d{3}\)\d{3}-\d{2}-\d{2}[^ ,;]+' will cover the whole (333)333-33-33A(444)444-44-44, and remove it.&lt;br /&gt;&lt;br /&gt;PS&lt;br /&gt;Our mask in the regexp can be simplified a little.&lt;br /&gt;In general it would look like:&lt;br /&gt;&lt;blockquote&gt;regexp_replace(your_string,'[symbol1](our_pattern)+|(our_pattern)[symbol3]|.','\2')&lt;/blockquote&gt;&lt;br /&gt;Here everything means the same as in the previous examples - except &lt;em&gt;symbol3&lt;/em&gt;. Now it means any symbol that &lt;span style="font-weight:bold;"&gt;can&lt;/span&gt; trail our pattern.&lt;br /&gt;And in that case we should place the second structure as a backreference (instead of third in the previous examples).&lt;br /&gt;&lt;blockquote&gt;SQL&gt; with t as (select '(111)111-11-11 (222)222-22-22 3(333)333-33-33 (444)444-44-44' str from dual)&lt;br /&gt;  2    --&lt;br /&gt;  3    select str,&lt;br /&gt;  4           nvl(length(regexp_replace(str,&lt;br /&gt;  5                                     '[^ ,;](\(\d{3}\)\d{3}-\d{2}-\d{2})+|\(\d{3}\)(\d)\d{2}-\d{2}-\d{2}([ ,;]|$)|.',&lt;br /&gt;  6                                     '\2')),&lt;br /&gt;  7               0) occurrencies&lt;br /&gt;  8      from t&lt;br /&gt;  9  /&lt;br /&gt;&lt;br /&gt;STR                                                          OCCURRENCIES&lt;br /&gt;------------------------------------------------------------ ------------&lt;br /&gt;(111)111-11-11 (222)222-22-22 3(333)333-33-33 (444)444-44-44            3&lt;br /&gt;&lt;br /&gt;SQL&gt;&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;Hope you can find out how it works by yourself now :)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3060148920065298017-7951552847884430663?l=volder-notes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://volder-notes.blogspot.com/feeds/7951552847884430663/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3060148920065298017&amp;postID=7951552847884430663' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3060148920065298017/posts/default/7951552847884430663'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3060148920065298017/posts/default/7951552847884430663'/><link rel='alternate' type='text/html' href='http://volder-notes.blogspot.com/2007/12/regexp-4-number-of-occurence-advanced.html' title='Regexp 4: number of occurence (advanced)'/><author><name>Volder</name><uri>http://www.blogger.com/profile/15557750923919600413</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://i049.radikal.ru/0712/26/a24baea7e400.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3060148920065298017.post-223141976587740166</id><published>2007-11-29T15:56:00.000+03:00</published><updated>2007-12-05T19:20:18.860+03:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Regular Expressions'/><title type='text'>Regexp 3: number of pattern occurence</title><content type='html'>&lt;a href="http://volder-notes.blogspot.com/2007/11/regexp-getting-all-substrings-with.html"&gt;... previous&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;In that post I'll talk about a problem, which is very popular, according to the number of posts in forums with similar questions.&lt;br /&gt;&lt;br /&gt;Usually it is formulated as: "How to count the number of occurences of a substring with special pattern inside the string".&lt;br /&gt;Actually, everything we were talking about in two previous posts - can be very useful in solving such a task.&lt;br /&gt;&lt;br /&gt;Let our input data be the same as in the previous post:&lt;br /&gt;&lt;blockquote&gt;SQL&gt; with t as (select 'three numbers: (111)111-11-11 and [222]222-22-22 and (333)333-33-33' str from dual)&lt;br /&gt;  2  --&lt;br /&gt;  3  select t.* from t&lt;br /&gt;  4  /&lt;br /&gt;&lt;br /&gt;STR&lt;br /&gt;-------------------------------------------------------------------&lt;br /&gt;three numbers: (111)111-11-11 and [222]222-22-22 and (333)333-33-33&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;We have a string with phone numbers in it.&lt;br /&gt;We want to find how many numbers in the format (XXX)XXX-XX-XX are there.&lt;br /&gt;&lt;br /&gt;So what we can achieve at the moment? We can pull out all the substrings with that particular pattern in one column:&lt;br /&gt;&lt;blockquote&gt;SQL&gt; with t as (select 'three numbers: (111)111-11-11 and [222]222-22-22 and (333)333-33-33' str from dual)&lt;br /&gt;  2  --&lt;br /&gt;  3  select t.*,&lt;br /&gt;  4         regexp_replace(str, '\(\d{3}\)(\d{3}-\d{2}-\d{2}( |$))|.', '\1') phone_numbers&lt;br /&gt;  5    from t&lt;br /&gt;  6  /&lt;br /&gt;&lt;br /&gt;STR                                                                 PHONE_NUMBERS&lt;br /&gt;------------------------------------------------------------------- -------------------------&lt;br /&gt;three numbers: (111)111-11-11 and [222]222-22-22 and (333)333-33-33 111-11-11 333-33-33&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;What would be the technique to find the number of substrings? The answer would be "very easy".&lt;br /&gt;First instead of finding the substring according to the mask - we'll find only the first symbols of such substrings.&lt;br /&gt;Then using the function &lt;a href="http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/functions076.htm#sthref1537"&gt;LENGTH&lt;/a&gt; - we'll find the length of such a string - and it would be equal to the number of occurences - cause every symbol - would mean one occurenece.&lt;br /&gt;&lt;blockquote&gt;SQL&gt; with t as (select 'three numbers: (111)111-11-11 and [222]222-22-22 and (333)333-33-33' str from dual)&lt;br /&gt;  2  --&lt;br /&gt;  3  select regexp_replace(str, '\(\d{3}\)(\d{3}-\d{2}-\d{2}( |$))|.', '\1') phone_numbers,&lt;br /&gt;  4         regexp_replace(str, '\(\d{3}\)(\d)\d{2}-\d{2}-\d{2}|.', '\1') first_letters,&lt;br /&gt;  5         nvl(length(regexp_replace(str, '\(\d{3}\)(\d)\d{2}-\d{2}-\d{2}|.', '\1')), 0) occurrencies&lt;br /&gt;  6    from t&lt;br /&gt;  7  /&lt;br /&gt;&lt;br /&gt;PHONE_NUMBERS           FIRST_LETTERS OCCURRENCIES&lt;br /&gt;----------------------- ------------- ------------&lt;br /&gt;111-11-11 333-33-33     13                       2&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;What we have changed:&lt;br /&gt;instead of \d{3} standing for the first three digits of our phone numbers, we've written \d\d{2}. That is also three digits - but with the first one separated.&lt;br /&gt;Next step would be to place not the whole pattern in parenethes, but only the first digit: (\d)\d{2}&lt;br /&gt;That's why when we get the whole pattern replaced with the '\1' backreference - we get only the first digits: one from each occurenece of the pattern.&lt;br /&gt;After that we put LENGTH function and for the case when there no occurences - we put NVL(...,0).&lt;br /&gt;In our string we have 2 occurences of a pattern like XXX-XX-XX with preceding (XXX).&lt;br /&gt;And that's right.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://volder-notes.blogspot.com/2007/12/regexp-4-number-of-occurence-advanced.html"&gt;to be continued...&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3060148920065298017-223141976587740166?l=volder-notes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://volder-notes.blogspot.com/feeds/223141976587740166/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3060148920065298017&amp;postID=223141976587740166' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3060148920065298017/posts/default/223141976587740166'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3060148920065298017/posts/default/223141976587740166'/><link rel='alternate' type='text/html' href='http://volder-notes.blogspot.com/2007/11/regexp-3-number-of-pattern-occurence.html' title='Regexp 3: number of pattern occurence'/><author><name>Volder</name><uri>http://www.blogger.com/profile/15557750923919600413</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://i049.radikal.ru/0712/26/a24baea7e400.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3060148920065298017.post-351963209313148869</id><published>2007-11-28T18:12:00.000+03:00</published><updated>2007-12-05T19:20:35.488+03:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Regular Expressions'/><title type='text'>Regexp 2: getting all the substrings with the particular preceding or trailing patterns</title><content type='html'>&lt;a href="http://volder-notes.blogspot.com/2007/11/regexp-getting-all-occurences-of-mask.html"&gt;... previous&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Well, let's continue.&lt;br /&gt;&lt;br /&gt;In the previous post we learnt how to retrieve all the substrings of special pattern.&lt;br /&gt;In this post we'll improve our skills.&lt;br /&gt;&lt;br /&gt;Imagine that you have such data (again phone numbers):&lt;br /&gt;&lt;blockquote&gt;SQL&gt; with t as (select 'three numbers: (111)111-11-11 and [222]222-22-22 and (333)333-33-33' str from dual)&lt;br /&gt;  2             --&lt;br /&gt;  3             select * from t&lt;br /&gt;  4  /&lt;br /&gt;&lt;br /&gt;STR&lt;br /&gt;-------------------------------------------------------------------&lt;br /&gt;three numbers: (111)111-11-11 and [222]222-22-22 and (333)333-33-33&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;&lt;br /&gt;and you want to retrieve phone numbers of the following format: XXX-XX-XX, where X stands for any digit.&lt;br /&gt;But you want only those numbers which are preceded by (XXX) pattern, where X - again any number.&lt;br /&gt;So the number 222-22-22, despite the fact that it has XXX-XX-XX mask - doesn't meet our requirments. Because it has [222] standing before, but not (222).&lt;br /&gt;&lt;br /&gt;If we used regexp_susbtr - we would first retrieve phone number in (XXX)XXX-XX-XX format and then cut the (XXX) pattern possibly with another regexp_substr or regexp_replace, e.g.:&lt;br /&gt;&lt;blockquote&gt;SQL&gt; with t as (select 'three numbers: (111)111-11-11 and [222]222-22-22 and (333)333-33-33' str from dual)&lt;br /&gt;  2             --&lt;br /&gt;  3             select t.*,&lt;br /&gt;  4                    regexp_substr(str, '\(\d{3}\)\d{3}-\d{2}-\d{2}') step1,&lt;br /&gt;  5                    regexp_replace(regexp_substr(str,&lt;br /&gt;  6                                                 '\(\d{3}\)\d{3}-\d{2}-\d{2}'),&lt;br /&gt;  7                                   '\(\d{3}\)') step2&lt;br /&gt;  8               from t&lt;br /&gt;  9  /&lt;br /&gt;&lt;br /&gt;STR                                                                 STEP1                 STEP2&lt;br /&gt;------------------------------------------------------------------- --------------------- ---------------&lt;br /&gt;three numbers: (111)111-11-11 and [222]222-22-22 and (333)333-33-33 (111)111-11-11        111-11-11&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;&lt;br /&gt;And again regexp_substr is only for one occurenece at a time.&lt;br /&gt;&lt;br /&gt;Using the trick - we could use the following construction:&lt;br /&gt;&lt;blockquote&gt;regexp_replace(your_string,'preceding_pattern(substring_mask)trailing_pattern|.','\1')&lt;/blockquote&gt;&lt;br /&gt;this will retrieve all the substrings according to the mask 'substring_mask', but with the condition that it is preceded and trailed by particular patterns.&lt;br /&gt;&lt;br /&gt;So in our situation we could use:&lt;br /&gt;&lt;blockquote&gt;SQL&gt; with t as (select 'three numbers: (111)111-11-11 and [222]222-22-22 and (333)333-33-33' str from dual)&lt;br /&gt;  2             --&lt;br /&gt;  3             select t.*,&lt;br /&gt;  4                    regexp_replace(str, '\(\d{3}\)(\d{3}-\d{2}-\d{2}( |$))|.','\1') phone_numbers&lt;br /&gt;  5               from t&lt;br /&gt;  6  /&lt;br /&gt;&lt;br /&gt;STR                                                                 PHONE_NUMBERS&lt;br /&gt;------------------------------------------------------------------- ------------------------&lt;br /&gt;three numbers: (111)111-11-11 and [222]222-22-22 and (333)333-33-33 111-11-11 333-33-33&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;&lt;br /&gt;That's it.&lt;br /&gt;As you can see - there's no 222-22-22 phone number in the result set.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://volder-notes.blogspot.com/2007/11/regexp-3-number-of-pattern-occurence.html"&gt;to be continued...&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3060148920065298017-351963209313148869?l=volder-notes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://volder-notes.blogspot.com/feeds/351963209313148869/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3060148920065298017&amp;postID=351963209313148869' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3060148920065298017/posts/default/351963209313148869'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3060148920065298017/posts/default/351963209313148869'/><link rel='alternate' type='text/html' href='http://volder-notes.blogspot.com/2007/11/regexp-getting-all-substrings-with.html' title='Regexp 2: getting all the substrings with the particular preceding or trailing patterns'/><author><name>Volder</name><uri>http://www.blogger.com/profile/15557750923919600413</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://i049.radikal.ru/0712/26/a24baea7e400.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3060148920065298017.post-3115917300483538743</id><published>2007-11-28T10:43:00.000+03:00</published><updated>2007-12-05T09:08:17.993+03:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Regular Expressions'/><title type='text'>Regexp 1: getting all the needed substrings</title><content type='html'>I decided to write some notes on regular expressions. And especially about one trick, that sometimes can be very useful.&lt;br /&gt;It's not my invention. First time I found it on the following &lt;a href="http://forums.oracle.com/forums/thread.jspa?threadID=430647"&gt;OTN thread&lt;/a&gt; written by &lt;a href="http://forums.oracle.com/forums/profile.jspa?userID=57657"&gt;cd&lt;/a&gt;. It's a pity I don't know his full name. All I know is that he's from Austria and that he's an expert on regular expressions :)&lt;br /&gt;&lt;br /&gt;The trick is very simple: if you want to get all the occurences of substring of the particular mask from a string, just put:&lt;br /&gt;&lt;blockquote&gt;regexp_replace(your_string,'(regexp_mask)|.','\1')&lt;/blockquote&gt;&lt;br /&gt;Those small pipe and dot (|.) at the end are making a huge deal.&lt;br /&gt;&lt;br /&gt;Let's get to the example. Imagine you have a table and one field is of varchar type, where there's information stored about phone numbers. But it is not pure phone numbers, there's some additional text around them, e.g.:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;SQL&gt; with t as (select 'That''s phone number 1: (916)809-23-34 and that''s the second one: (918)234-12-09' str from dual union all&lt;br /&gt;  2             select 'Call me later on (926)507-15-34 or on (913)432-23-21'                                  from dual union all&lt;br /&gt;  3             select 'two numbers: (917)888-34-34 and (903)234-43-11'                                        from dual)&lt;br /&gt;  4             --&lt;br /&gt;  5             select * from t&lt;br /&gt;  6  /&lt;br /&gt;&lt;br /&gt;STR&lt;br /&gt;-------------------------------------------------------------------------------&lt;br /&gt;That's phone number 1: (916)809-23-34 and that's the second one: (918)234-12-09&lt;br /&gt;Call me later on (926)507-15-34 or on (913)432-23-21&lt;br /&gt;two numbers: (917)888-34-34 and (903)234-43-11&lt;br /&gt;&lt;br /&gt;SQL&gt;&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;What you want is to drag out only phone numbers that have a mask (XXX)XXX-XX-XX, where X stands for any digit.&lt;br /&gt;&lt;br /&gt;What could you do if you didn't know about the trick I mentioned?&lt;br /&gt;you could use regexp_susbtr:&lt;br /&gt;&lt;blockquote&gt;SQL&gt; with t as (select 'That''s phone number 1: (916)809-23-34 and that''s the second one: [234]123-43-23 and the third one: (918)234-12-09' str from dual union all&lt;br /&gt;  2             select 'Call me later on (926)507-15-34 or on (913)432-23-21 or on [234]123-43-23'             from dual union all&lt;br /&gt;  3             select 'two numbers: (917)888-34-34 and (903)234-43-11'                                        from dual)&lt;br /&gt;  4             --&lt;br /&gt;  5            select t.*,&lt;br /&gt;  6                   regexp_substr(str, '\(\d{3}\)\d{3}(-\d{2}){2}') phone_num1,&lt;br /&gt;  7                   regexp_substr(str, '\(\d{3}\)\d{3}(-\d{2}){2}', 1, 2) phone_num2&lt;br /&gt;  8              from t&lt;br /&gt;  9  /&lt;br /&gt;&lt;br /&gt;STR                                                                              PHONE_NUM1        PHONE_NUM2&lt;br /&gt;-------------------------------------------------------------------------------- ----------------- ------------------&lt;br /&gt;That's phone number 1: (916)809-23-34 and that's the second one: [234]123-43-23  (916)809-23-34    (918)234-12-09&lt;br /&gt;Call me later on (926)507-15-34 or on (913)432-23-21 or on [234]123-43-23        (926)507-15-34    (913)432-23-21&lt;br /&gt;two numbers: (917)888-34-34 and (903)234-43-11                                   (917)888-34-34    (903)234-43-11&lt;br /&gt;&lt;br /&gt;SQL&gt;&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;But as you can see - you should write a separate regexp_substr column for each occurence of substring, because regexp_substr can pull out only one occurence at a time. And sometimes you don't know beforehand how many occurences of substring are there.&lt;br /&gt;&lt;br /&gt;BTW I used \d for digit - if you're on version less than 10.2.x.x you can use [:digit:] or [0-9] instead.&lt;br /&gt;&lt;br /&gt;And now let's see what we can do using a tricky regexp_replace:&lt;br /&gt;&lt;blockquote&gt;SQL&gt; with t as (select 'That''s phone number 1: (916)809-23-34 and that''s the second one: (918)234-12-09' str from dual union all&lt;br /&gt;  2             select 'Call me later on (926)507-15-34 or on (913)432-23-21'                                  from dual union all&lt;br /&gt;  3             select 'two numbers: (917)888-34-34 and (903)234-43-11'                                        from dual)&lt;br /&gt;  4  --&lt;br /&gt;  5             select t.*,&lt;br /&gt;  6                    regexp_replace(str,&lt;br /&gt;  7                                   '(\(\d{3}\)\d{3}(-\d{2}){2}( |$))|.',&lt;br /&gt;  8                                   '\1') full_phone_numbers&lt;br /&gt;  9               from t&lt;br /&gt; 10  /&lt;br /&gt;&lt;br /&gt;STR                                                                             FULL_PHONE_NUMBERS&lt;br /&gt;------------------------------------------------------------------------------- --------------------------------------&lt;br /&gt;That's phone number 1: (916)809-23-34 and that's the second one: (918)234-12-09 (916)809-23-34 (918)234-12-09&lt;br /&gt;Call me later on (926)507-15-34 or on (913)432-23-21                            (926)507-15-34 (913)432-23-21&lt;br /&gt;two numbers: (917)888-34-34 and (903)234-43-11                                  (917)888-34-34 (903)234-43-11&lt;br /&gt;&lt;br /&gt;SQL&gt;&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;You see - everything that doesn't correspond to the mask is removed, and all occurences of the needed phone numbers are placed in one column.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://volder-notes.blogspot.com/2007/11/regexp-getting-all-substrings-with.html"&gt;to be continued...&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3060148920065298017-3115917300483538743?l=volder-notes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://volder-notes.blogspot.com/feeds/3115917300483538743/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3060148920065298017&amp;postID=3115917300483538743' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3060148920065298017/posts/default/3115917300483538743'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3060148920065298017/posts/default/3115917300483538743'/><link rel='alternate' type='text/html' href='http://volder-notes.blogspot.com/2007/11/regexp-getting-all-occurences-of-mask.html' title='Regexp 1: getting all the needed substrings'/><author><name>Volder</name><uri>http://www.blogger.com/profile/15557750923919600413</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://i049.radikal.ru/0712/26/a24baea7e400.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3060148920065298017.post-1018555846027269013</id><published>2007-10-25T14:13:00.000+04:00</published><updated>2007-11-28T16:32:40.842+03:00</updated><title type='text'>Using collections with 10g features</title><content type='html'>After a small lull I decided to post some thoughts on collections. Especially those features that were introduced starting from 10g Oracle version.&lt;br /&gt;&lt;br /&gt;Actually, the legs are growing from &lt;a href="http://forums.oracle.com/forums/thread.jspa?threadID=578383&amp;amp;tstart=0"&gt;that thread&lt;/a&gt; on OTN forum. A lot of solutions were given, and for sure I would use one of them on the OP's place, but I just decided to show how It can be done using collections.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Problem description:&lt;/strong&gt;&lt;br /&gt;&lt;strong&gt;&lt;/strong&gt;&lt;br /&gt;&lt;strong&gt;&lt;em&gt;Input data (from the forum):&lt;/em&gt;&lt;/strong&gt;&lt;br /&gt;"ok I have 4 columns returned in my query which can either have a value of 'FF' or null".&lt;br /&gt;&lt;br /&gt;So we have a table:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;SQL&gt; create table t as &lt;br /&gt;  2            (select 'FF' as col1, null as col2, null as col3, null as col4 from dual union all&lt;br /&gt;  3             select 'FF', 'FF', 'FF', null from dual union all&lt;br /&gt;  4             select 'FF', 'FF', null, 'FF' from dual union all&lt;br /&gt;  5             select 'FF', null, 'FF', null from dual)&lt;br /&gt;  6  /&lt;br /&gt;&lt;br /&gt;Table created.&lt;br /&gt;&lt;br /&gt;SQL&gt; select * from t;&lt;br /&gt;&lt;br /&gt;COL1 COL2 COL3 COL4&lt;br /&gt;---- ---- ---- ----&lt;br /&gt;FF             &lt;br /&gt;FF   FF   FF   &lt;br /&gt;FF   FF        FF&lt;br /&gt;FF        FF   &lt;br /&gt;SQL&gt;&lt;/blockquote&gt;&lt;br /&gt;&lt;strong&gt;&lt;em&gt;Goal&lt;/em&gt;&lt;/strong&gt;:&lt;br /&gt;"I want to create a fifth column with a calculation which basically states..." how many fields have 'FF' value in each row. Or it can be formulated as: "how many columns have non-null values" (it's not the same, but when the possible values domain consists of only 'FF' or NULL - it is quite similar).&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;&lt;em&gt;Expected output:&lt;/em&gt;&lt;/strong&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;COL1       COL2       COL3       COL4             COL5&lt;br /&gt;---------- ---------- ---------- ---------- ----------&lt;br /&gt;FF         NULL       NULL       NULL                1&lt;br /&gt;FF         FF         FF         NULL                3&lt;br /&gt;FF         FF         NULL       FF                  3&lt;br /&gt;FF         NULL       FF         NULL                2&lt;br /&gt; &lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;&lt;strong&gt;&lt;em&gt;Solution:&lt;/em&gt;&lt;/strong&gt;&lt;br /&gt;First of all let's create a collection consisting of varchar elements:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;SQL&gt; create type test as table of varchar2(100);&lt;br /&gt;  2  /&lt;br /&gt;&lt;br /&gt;Type created.&lt;br /&gt;SQL&gt;&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;Then we execute a query which would return us a collection in the fifth field, where all the values from the previous four columns would be stored:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;SQL&gt; column col1 format A10;&lt;br /&gt;SQL&gt; column col2 format A10;&lt;br /&gt;SQL&gt; column col3 format A10;&lt;br /&gt;SQL&gt; column col4 format A10;&lt;br /&gt;SQL&gt; column coll format A30;&lt;br /&gt;SQL&gt; set null 'NULL';&lt;br /&gt;&lt;br /&gt;SQL&gt; select col1, col2, col3, col4,&lt;br /&gt;  2         cast(multiset(&lt;br /&gt;  3               select decode(level, 1, col1, 2, col2, 3, col3, 4, col4) from dual&lt;br /&gt;  4              connect by level &lt;= 4) as test) coll&lt;br /&gt;  5    from t&lt;br /&gt;  6  /&lt;br /&gt;&lt;br /&gt;COL1       COL2       COL3       COL4       COLL&lt;br /&gt;---------- ---------- ---------- ---------- ------------------------------&lt;br /&gt;FF         NULL       NULL       NULL       TEST('FF', NULL, NULL, NULL)&lt;br /&gt;FF         FF         FF         NULL       TEST('FF', 'FF', 'FF', NULL)&lt;br /&gt;FF         FF         NULL       FF         TEST('FF', 'FF', NULL, 'FF')&lt;br /&gt;FF         NULL       FF         NULL       TEST('FF', NULL, 'FF', NULL)&lt;br /&gt;SQL&gt;&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;Now we look for a new operator &lt;a href="http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/operators006.htm#sthref896"&gt;MULTISET INTERSECT&lt;/a&gt;.&lt;br /&gt;So the main idea of our solution is to get rid of needless elements in the initial collection and then count how many elements are left there.&lt;br /&gt;&lt;br /&gt;So if we intersect our collection with a collection storing only 'FF' elements - we'll get rid of redundant NULL elements.&lt;br /&gt;&lt;br /&gt;Let's do it:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;SQL&gt; select col1,col2,col3,col4,&lt;br /&gt;  2         cast(multiset(&lt;br /&gt;  3              select decode(level,1,col1,2,col2,3,col3,4,col4) from dual &lt;br /&gt;  4             connect by level&lt;=4) as test)&lt;br /&gt;  5         multiset intersect all&lt;br /&gt;  6         cast(multiset (select 'FF' from dual  connect by level &lt;= 4) as test) coll from t&lt;br /&gt;  7  /&lt;br /&gt;&lt;br /&gt;COL1       COL2       COL3       COL4       COLL&lt;br /&gt;---------- ---------- ---------- ---------- ------------------------------&lt;br /&gt;FF         NULL       NULL       NULL       TEST('FF')&lt;br /&gt;FF         FF         FF         NULL       TEST('FF', 'FF', 'FF')&lt;br /&gt;FF         FF         NULL       FF         TEST('FF', 'FF', 'FF')&lt;br /&gt;FF         NULL       FF         NULL       TEST('FF', 'FF')&lt;br /&gt;SQL&gt;&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;I use ALL key word explicitly (although it is default value) to distinguish that we don't want to loose values wich are duplicated ('FF' in our case).&lt;br /&gt;&lt;br /&gt;Another important point here is making the second collection consist of not less elements than the first does (4 elements in our case - I use simple &lt;em&gt;connect by level&lt;=4&lt;/em&gt;), because if you leave only one 'FF' element in the second collection - you'll get also only one element in the resulting collection:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;SQL&gt; select col1, col2, col3, col4,&lt;br /&gt;  2           cast(multiset(select decode(level,1,col1,2,col2,3,col3,4,col4) from dual &lt;br /&gt;  3                connect by level&lt;=4) as test)&lt;br /&gt;  4           multiset intersect all&lt;br /&gt;  5           cast(multiset (select 'FF' from dual) as test) coll from t&lt;br /&gt;  6  /&lt;br /&gt;&lt;br /&gt;COL1       COL2       COL3       COL4       COLL&lt;br /&gt;---------- ---------- ---------- ---------- ------------------------------&lt;br /&gt;FF         NULL       NULL       NULL       TEST('FF')&lt;br /&gt;FF         FF         FF         NULL       TEST('FF')&lt;br /&gt;FF         FF         NULL       FF         TEST('FF')&lt;br /&gt;FF         NULL       FF         NULL       TEST('FF')&lt;br /&gt;SQL&gt;&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;It is described in the documentation:&lt;br /&gt;&lt;em&gt;For example, if a particular value occurs &lt;strong&gt;m&lt;/strong&gt; times in nested_table1 and &lt;strong&gt;n&lt;/strong&gt; times in nested_table2, then the result would contain the element &lt;strong&gt;min(m,n)&lt;/strong&gt; times.&lt;/em&gt;&lt;br /&gt;&lt;br /&gt;Ok, now the last our step - is counting how many elements are there in the resulting collection. It can easily be done with a &lt;a href="http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/functions015.htm#sthref1079"&gt;CARDINALITY&lt;/a&gt; function.&lt;br /&gt;&lt;br /&gt;So the final query would be:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;SQL&gt; select col1,col2,col3,col4,&lt;br /&gt;  2         cardinality(cast(multiset(&lt;br /&gt;  3              select decode(level,1,col1,2,col2,3,col3,4,col4) from dual &lt;br /&gt;  4             connect by level&lt;=4) as test)&lt;br /&gt;  5         multiset intersect all&lt;br /&gt;  6         cast(multiset (select 'FF' from dual  connect by level &lt;= 4) as test)) coll from t&lt;br /&gt;  7  /&lt;br /&gt;&lt;br /&gt;COL1       COL2       COL3       COL4             COLL&lt;br /&gt;---------- ---------- ---------- ---------- ----------&lt;br /&gt;FF         NULL       NULL       NULL                1&lt;br /&gt;FF         FF         FF         NULL                3&lt;br /&gt;FF         FF         NULL       FF                  3&lt;br /&gt;FF         NULL       FF         NULL                2&lt;br /&gt;SQL&gt;&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;Alternative solution would be using as a second collection - a collection with null elements, then intersect them both, and counting the column value as &lt;em&gt;4  -  number_of_null_column&lt;/em&gt;s, where 4 is total number of fields in our row.&lt;br /&gt;&lt;br /&gt;But we can use here another new multiset operator &lt;a href="http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/operators006.htm#sthref891"&gt;MULTISET EXCEPT&lt;/a&gt;:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;SQL&gt; select col1, col2, col3, col4,&lt;br /&gt;  2         cardinality(cast(multiset(select decode(level, 1, col1, 2, col2, 3, col3, 4, col4) from dual&lt;br /&gt;  3              connect by level &lt;= 4) as test)&lt;br /&gt;  4         multiset except all&lt;br /&gt;  5         cast(multiset(select null from dual connect by level&lt;=4) as test)) coll from t&lt;br /&gt;  6  /&lt;br /&gt;&lt;br /&gt;COL1       COL2       COL3       COL4             COLL&lt;br /&gt;---------- ---------- ---------- ---------- ----------&lt;br /&gt;FF         NULL       NULL       NULL                1&lt;br /&gt;FF         FF         FF         NULL                3&lt;br /&gt;FF         FF         NULL       FF                  3&lt;br /&gt;FF         NULL       FF         NULL                2&lt;br /&gt;SQL&gt;&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;It is returning all elements from the first collection which are not met in the second one.&lt;br /&gt;&lt;br /&gt;PS&lt;br /&gt;strange for me was the following problem:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;SQL&gt; select col1, col2, col3, col4,&lt;br /&gt;  2         cardinality(cast(multiset(select col1 from dual union all&lt;br /&gt;  3                                   select col2 from dual union all&lt;br /&gt;  4                                   select col3 from dual union all&lt;br /&gt;  5                                   select col4 from dual) as test)&lt;br /&gt;  6         multiset intersect all&lt;br /&gt;  7         cast(multiset (select 'FF' from dual connect by level &lt;= 4) as test)) coll from t&lt;br /&gt;  8  /&lt;br /&gt;&lt;br /&gt;...&lt;br /&gt;&lt;br /&gt;ORA-03001: unimplemented feature&lt;br /&gt;SQL&gt;&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;So we can't pass a query with UNION ALL to the &lt;a href="http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/functions016.htm#sthref1088"&gt;CAST(MULTISET(...) as ...)&lt;/a&gt; function. But I didn't find such a kind of limitation in the doc.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3060148920065298017-1018555846027269013?l=volder-notes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://volder-notes.blogspot.com/feeds/1018555846027269013/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3060148920065298017&amp;postID=1018555846027269013' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3060148920065298017/posts/default/1018555846027269013'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3060148920065298017/posts/default/1018555846027269013'/><link rel='alternate' type='text/html' href='http://volder-notes.blogspot.com/2007/10/using-collections-with-10g-features.html' title='Using collections with 10g features'/><author><name>Volder</name><uri>http://www.blogger.com/profile/15557750923919600413</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://i049.radikal.ru/0712/26/a24baea7e400.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3060148920065298017.post-8358568076302902308</id><published>2007-10-12T16:07:00.000+04:00</published><updated>2007-10-12T16:27:27.307+04:00</updated><title type='text'>Learning XHTML &amp; CSS</title><content type='html'>For everyone who'd like to get familiar with XHTML &amp;amp; CSS stuff I highly recommend the book &lt;br/&gt;&lt;a href="http://www.headfirstlabs.com/books/hfhtml/"&gt;&lt;img src="http://www.headfirstlabs.com/Images/hfhtml_cover.gif" alt="Head First HTML with CSS &amp;amp; XHTML"/&gt;&lt;br/&gt;&lt;strong&gt;Head First HTML with CSS &amp;amp; XHTML.&lt;/strong&gt;&lt;/a&gt;&lt;br /&gt;A week ago I didn't even know what CSS is, but today I've changed the appearence of my blog :))&lt;br /&gt;The one thing was irritating that the width of my blog was fixed, and when it was read from the monitors with high resolution it looked like a narrow column, but now the width of the post bodies is flexible depending on the width of the browser.&lt;br /&gt;I like it.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3060148920065298017-8358568076302902308?l=volder-notes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://volder-notes.blogspot.com/feeds/8358568076302902308/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3060148920065298017&amp;postID=8358568076302902308' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3060148920065298017/posts/default/8358568076302902308'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3060148920065298017/posts/default/8358568076302902308'/><link rel='alternate' type='text/html' href='http://volder-notes.blogspot.com/2007/10/learning-xhtml-css.html' title='Learning XHTML &amp; CSS'/><author><name>Volder</name><uri>http://www.blogger.com/profile/15557750923919600413</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://i049.radikal.ru/0712/26/a24baea7e400.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-3060148920065298017.post-7229087560914242182</id><published>2007-10-03T17:12:00.000+04:00</published><updated>2007-12-01T01:59:07.660+03:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Regular Expressions'/><title type='text'>Removing duplicate elements from the string</title><content type='html'>Hi there! :)&lt;br /&gt;Well, this is actually my first post in this newly created blog.&lt;br /&gt;The idea of keeping a blog was born a day ago, when there was a question on the Oracle forum (&lt;a href="http://forums.oracle.com/forums/message.jspa?messageID=2112727#2112727"&gt;this thread&lt;/a&gt;). I’ve met the &lt;a href="http://forums.oracle.com/forums/thread.jspa?threadID=567585"&gt;same one&lt;/a&gt; only a few days ago – but couldn’t find it, cause there were some problems with the search engine I usually face when I need to find something :))&lt;br /&gt;So I decided – why don’t I keep my own blog – where I can post such solutions that can be interesting for others. And here we are!&lt;br /&gt;&lt;br /&gt;Ok, let's go back to the problem, mentioned in those links.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Problem description:&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;&lt;em style="FONT-WEIGHT: bold"&gt;Input data:&lt;/em&gt;&lt;br /&gt;&lt;br /&gt;There is a string containing elements. They can be separated with any symbol, e.g. a comma:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;SQL&gt; with t as (select 'elem1, elem2, elem3, elem1, elem3, elem2, elem2' str&lt;br /&gt;                 from dual)&lt;br /&gt;          --&lt;br /&gt;         select * from t&lt;br /&gt;    /&lt;br /&gt;&lt;br /&gt;STR&lt;br /&gt;-----------------------------------------------&lt;br /&gt;elem1, elem2, elem3, elem1, elem3, elem2, elem2&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;&lt;br /&gt;...or any other clear way, for example, it was formulated as "3 letter codes". So in this case there are no element delimiters, but we know that each element is of three same consequtive letters, e.g.:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;SQL&gt; with t as (select 'AAABBBCCCBBBDDDAAAEEEBBB' str from dual)&lt;br /&gt;            --&lt;br /&gt;          select * from t&lt;br /&gt;    /&lt;br /&gt;&lt;br /&gt;STR&lt;br /&gt;------------------------&lt;br /&gt;AAABBBCCCBBBDDDAAAEEEBBB&lt;br /&gt;&lt;br /&gt;SQL&gt;&lt;/blockquote&gt;&lt;br /&gt;&lt;span style="FONT-WEIGHT: bold; FONT-STYLE: italic"&gt;Goal:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The objective is to remove duplicate elements out of the string, to leave only one specimen of each element.&lt;br /&gt;&lt;br /&gt;&lt;em style="FONT-WEIGHT: bold"&gt;Expected output:&lt;/em&gt;&lt;br /&gt;&lt;br /&gt;So in the first case the result should be:&lt;br /&gt;elem1, elem2, elem3&lt;br /&gt;&lt;br /&gt;In the second:&lt;br /&gt;AAABBBCCCDDDEEE&lt;br /&gt;&lt;br /&gt;&lt;em style="FONT-WEIGHT: bold"&gt;Solution:&lt;/em&gt;&lt;br /&gt;&lt;br /&gt;In both cases the first thing what we need - is to pick out elements from the string.&lt;br /&gt;Regular expressions are very friendly for us in that job (espesially, when there are no delimiters).&lt;br /&gt;So elements like AAA, BBB, CCC etc. can be written as '([[:alpha:]])\1{2}' - any letter trailed by two same letters.&lt;br /&gt;&lt;br /&gt;Now we want to understand - what would be the mask for such an element followed by another elements and again the same first element, e.g. AAABBBCCCAAA, or it can be followed directly by the same element: AAAAAA.&lt;br /&gt;In language of regular expressions it would look like '(([[:alpha:]])\2{2}).*\1'&lt;br /&gt;&lt;br /&gt;That's it!&lt;br /&gt;Now if we put regexp_replace(str, '(([[:alpha:]])\2{2})(.*)\1','\1\3') we would throw out one last element which is met firstly and has duplicate values in the string. It would be last, because we used greedy operator '*'.&lt;br /&gt;&lt;br /&gt;So for example:&lt;br /&gt;&lt;blockquote&gt;SQL&gt; with t as (select 'AAABBBCCCBBBEEEDDDAAAEEEBBBEEE' str from dual)&lt;br /&gt;    --&lt;br /&gt;       select str,&lt;br /&gt;              regexp_replace(str, '(([[:alpha:]])\2{2})(.*)\1','\1\3') new_str&lt;br /&gt;         from t&lt;br /&gt;    /&lt;br /&gt;&lt;br /&gt;STR                            NEW_STR&lt;br /&gt;------------------------------ -------------------------------&lt;br /&gt;AAABBBCCCBBBEEEDDDAAAEEEBBBEEE AAABBBCCCBBBEEEDDDEEEBBB&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;What is the logic of this operation:&lt;br /&gt;&lt;br /&gt;&lt;ol&gt;&lt;li&gt;We look for the first element which has duplicated values. In our case it is AAA.&lt;/li&gt;&lt;li&gt;Then regular expression operator finds the last AAA met in the string and removes it.&lt;/li&gt;&lt;li&gt;Then, it goes to the rest of the string and again finds the first element, which has duplicated values in the rest of the string, now it is EEE.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;And finally removes the last EEE element.&lt;/li&gt;&lt;li&gt;In our example it is the end of the operations, but if the string is longer it would proceed the previously mentioned operations again and again.&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;It would be more comprehensible if we mark the first met element with &lt;span style="COLOR: rgb(0,153,0)"&gt;green&lt;/span&gt;, and the last, which would be removed, with &lt;span style="COLOR: rgb(204,0,0)"&gt;red&lt;/span&gt;:&lt;br /&gt;'&lt;span style="COLOR: rgb(0,153,0)"&gt;AAA&lt;/span&gt;BBBCCCBBBEEEDDD&lt;span style="COLOR: rgb(204,0,0)"&gt;AAA&lt;/span&gt;&lt;span style="COLOR: rgb(0,153,0)"&gt;EEE&lt;/span&gt;BBB&lt;span style="COLOR: rgb(204,0,0)"&gt;EEE&lt;/span&gt;'.&lt;br /&gt;As you can see, the first EEE element is not really the first EEE element in the string. It is the first one in the rest of the string after we removed AAA element.&lt;br /&gt;&lt;br /&gt;If we iteratively apply this regular expression to our string - finally we remove all duplicated elements. But about it later. Now let's improve our expression a little bit.&lt;br /&gt;First we'll change greedy operator '*' to non-greedy '*?'. So that we will do our job in fewer iterations.&lt;br /&gt;&lt;br /&gt;Let's look at the following example:&lt;br /&gt;we have string 'AAABBBAAAAAACCCAAA'.&lt;br /&gt;with greedy regexp_replace(str, '(([[:alpha:]])\2{2})(.*)\1','\1\3') we'll have iterations:&lt;br /&gt;&lt;br /&gt;&lt;ol&gt;&lt;li&gt;&lt;span style="COLOR: rgb(0,153,0)"&gt;AAA&lt;/span&gt;BBBAAAAAACCC&lt;span style="COLOR: rgb(204,0,0)"&gt;AAA&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="COLOR: rgb(0,153,0)"&gt;AAA&lt;/span&gt;BBBAAA&lt;span style="COLOR: rgb(204,0,0)"&gt;AAA&lt;/span&gt;CCC&lt;/li&gt;&lt;li&gt;&lt;span style="COLOR: rgb(0,153,0)"&gt;AAA&lt;/span&gt;BBB&lt;span style="COLOR: rgb(204,0,0)"&gt;AAA&lt;/span&gt;CCC&lt;/li&gt;&lt;/ol&gt;result: 'AAABBBCCC'. So it took us 3 times to iterate.&lt;br /&gt;And with the non-greedy regexp_replace(str, '(([[:alpha:]])\2{2})(.*?)\1','\1\3') we'll have iterations:&lt;br /&gt;&lt;br /&gt;&lt;ol&gt;&lt;li&gt;&lt;span style="COLOR: rgb(0,153,0)"&gt;AAA&lt;/span&gt;BBB&lt;span style="COLOR: rgb(204,0,0)"&gt;AAA&lt;/span&gt;&lt;span style="COLOR: rgb(0,153,0)"&gt;AAA&lt;/span&gt;CCC&lt;span style="COLOR: rgb(204,0,0)"&gt;AAA&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="COLOR: rgb(0,153,0)"&gt;AAA&lt;/span&gt;BBB&lt;span style="COLOR: rgb(204,0,0)"&gt;AAA&lt;/span&gt;CCC&lt;/li&gt;&lt;/ol&gt;The same result and achieved in two iterations.&lt;br /&gt;&lt;br /&gt;Let's make one more improvement: add '+' after the backreference '\1'. In the language of regular expressions - it means one or more occurencies of the element which is preceding this '+'.&lt;br /&gt;Let's imagine we have a string: 'AAABBBAAAAAAAAAAAA'.&lt;br /&gt;With regexp_replace(str, '(([[:alpha:]])\2{2})(.*?)\1','\1\3') we'll make three iterations:&lt;br /&gt;&lt;br /&gt;&lt;ol&gt;&lt;li&gt;&lt;span style="COLOR: rgb(0,153,0)"&gt;AAA&lt;/span&gt;BBB&lt;span style="COLOR: rgb(204,0,0)"&gt;AAA&lt;/span&gt;&lt;span style="COLOR: rgb(0,153,0)"&gt;AAA&lt;/span&gt;&lt;span style="COLOR: rgb(204,0,0)"&gt;AAA&lt;/span&gt;AAA&lt;/li&gt;&lt;li&gt;&lt;span style="COLOR: rgb(0,153,0)"&gt;AAA&lt;/span&gt;BBB&lt;span style="COLOR: rgb(204,0,0)"&gt;AAA&lt;/span&gt;&lt;span style="COLOR: rgb(0,0,0)"&gt;AAA&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="COLOR: rgb(0,153,0)"&gt;AAA&lt;/span&gt;BBB&lt;span style="COLOR: rgb(204,0,0)"&gt;AAA&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;span style="COLOR: rgb(204,0,0)"&gt;&lt;span style="COLOR: rgb(0,0,0)"&gt;If we place '+' and use &lt;/span&gt;&lt;/span&gt;regexp_replace(str, '(([[:alpha:]])\2{2})(.*?)\1+','\1\3') we'll have to make only one iteration:&lt;br /&gt;&lt;br /&gt;&lt;ol&gt;&lt;li&gt;&lt;span style="COLOR: rgb(0,153,0)"&gt;AAA&lt;/span&gt;BBB&lt;span style="COLOR: rgb(204,0,0)"&gt;AAA&lt;/span&gt;&lt;span style="COLOR: rgb(204,0,0)"&gt;AAAAAAAAA&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;Finally, it is a time for implementing an iterative mechanism of applying the same function to the string. Starting from Oracle version 10g there was a nice Model clause introduced, which can be used to proceed operations iteratively.&lt;br /&gt;&lt;br /&gt;So, the final query would look like:&lt;br /&gt;&lt;blockquote&gt;SQL&gt; WITH t AS (SELECT&lt;br /&gt;    'AAABBBCCCBBBDDDAAAEEEBBB' str FROM dual)&lt;br /&gt;    --&lt;br /&gt;     select str, str_new from t&lt;br /&gt;      model&lt;br /&gt;       dimension by (0 dim)&lt;br /&gt;       measures(str, str str_new)&lt;br /&gt;        rules iterate(100) until (str_new[0] = previous(str_new[0]))&lt;br /&gt;         (str_new[0]=regexp_replace(str_new[0],'(([[:alpha:]])\2{2})(.*?)\1+','\1\3'));&lt;br /&gt;&lt;br /&gt;STR                      STR_NEW&lt;br /&gt;------------------------ ------------------------&lt;br /&gt;AAABBBCCCBBBDDDAAAEEEBBB AAABBBCCCDDDEEE&lt;br /&gt;&lt;br /&gt;SQL&gt;&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;Two words about Model clause here:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;With &lt;span style="FONT-STYLE: italic"&gt;regexp_replace&lt;/span&gt; you are already familiar. It is applyed to the string during each iteration;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="FONT-STYLE: italic"&gt;iterate(100)&lt;/span&gt; means maximum of iterations could be proceeded is 100 (it can be increased if you want);&lt;/li&gt;&lt;li&gt;&lt;span style="FONT-STYLE: italic"&gt;until (str[0] = previous(str[0]))&lt;/span&gt; means stop iterations when the string is not changed during the previous iteration.&lt;/li&gt;&lt;/ul&gt;Well, that's it.&lt;br /&gt;&lt;br /&gt;Now let's take the case when there is a delimited string, e.g. with commas. In that case we need to change our regular expression a little bit. Aketi Jyuuzou made this in one of the mentioned links. I just clarify it for readers. So for a string like 'elem1, elem2, elem3, elem1, elem3, elem2, elem2' we'll need the following:&lt;br /&gt;regexp_replace(str,'(^|,)([^,]+,)(.*?,)?\2+','\1\2\3')&lt;br /&gt;I just added non-greedy '*?' and '+' in the end, but you have already read about the impact of these. So you understand what are they needed for.&lt;br /&gt;&lt;br /&gt;Backreference &lt;span style="FONT-STYLE: italic"&gt;\1&lt;/span&gt; stands for &lt;span style="FONT-STYLE: italic"&gt;(^|,)&lt;/span&gt; - this is the symbol before the first met duplicated element. It is either begining of the string (^), either a comma trailing the previous element (,).&lt;br /&gt;Backreference \2 stands for the &lt;span style="FONT-STYLE: italic"&gt;([^,]+,) &lt;/span&gt;- this is the element itself, which means one or more non-comma symbols followed by a comma.&lt;br /&gt;Backreference \3 stands for &lt;span style="FONT-STYLE: italic"&gt;(.*?,)?&lt;/span&gt; which is the minimum (non-greedy) number of symbols before one or more duplicated value (\2+).&lt;br /&gt;&lt;br /&gt;So the final query would look like:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;SQL&gt; WITH t AS (SELECT 'elem1,elem2,elem3,elem1,elem3,elem2,elem2' str FROM dual)&lt;br /&gt;    --&lt;br /&gt;     select str, rtrim(str_new,',') new_str from t&lt;br /&gt;      model&lt;br /&gt;       dimension by (0 dim)&lt;br /&gt;       measures(str, str||',' str_new)&lt;br /&gt;        rules iterate(100) until (str_new[0] = previous(str_new[0]))&lt;br /&gt;         (str_new[0]=regexp_replace(str_new[0],'(^|,)([^,]+,)(.*?,)?\2+','\1\2\3'));&lt;br /&gt;&lt;br /&gt;STR                                       NEW_STR&lt;br /&gt;----------------------------------------- -------------------------&lt;br /&gt;elem1,elem2,elem3,elem1,elem3,elem2,elem2 elem1,elem2,elem3&lt;br /&gt;&lt;br /&gt;SQL&gt; &lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;Hope it was useful!&lt;br /&gt;&lt;br /&gt;PS&lt;br /&gt;It is my first post, so I'll be grateful if you post your comments and notices about it. Was it too comlicatedely stated or maybe too detailed. Well,waiting for your replies :)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/3060148920065298017-7229087560914242182?l=volder-notes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://volder-notes.blogspot.com/feeds/7229087560914242182/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=3060148920065298017&amp;postID=7229087560914242182' title='16 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/3060148920065298017/posts/default/7229087560914242182'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/3060148920065298017/posts/default/7229087560914242182'/><link rel='alternate' type='text/html' href='http://volder-notes.blogspot.com/2007/10/removing-duplicate-elements-from-string.html' title='Removing duplicate elements from the string'/><author><name>Volder</name><uri>http://www.blogger.com/profile/15557750923919600413</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://i049.radikal.ru/0712/26/a24baea7e400.jpg'/></author><thr:total>16</thr:total></entry></feed>
