SQL> SELECT EXTRACTVALUE(VALUE(P), '/Valute/NumCode') NUMCODE,It is a link to our central bank site where rates of different currencies are listed in rouble equivalent.
2 EXTRACTVALUE(VALUE(P), '/Valute/CharCode') CHARCODE,
3 EXTRACTVALUE(VALUE(P), '/Valute/Nominal') NOMINAL,
4 EXTRACTVALUE(VALUE(P), '/Valute/Name') NAME,
5 EXTRACTVALUE(VALUE(P), '/Valute/Value') VALUE
6 FROM TABLE(XMLSEQUENCE(HTTPURITYPE('http://www.cbr.ru/scripts/XML_daily.asp?date_req=22.03.2008')
7 .GETXML().EXTRACT('//ValCurs/Valute'))) P
8 WHERE EXTRACTVALUE(VALUE(P), '/Valute/CharCode') = 'USD'
9 /
NUMCODE CHARCODE NOMINAL NAME VALUE
-------------------- -------------------- -------------------- -------------------- --------------------
840 USD 1 Aieea? NOA 23,7773
SQL>
I have not come across such stuff before - so after reading manual on this topic, I found out that the encoding is not sent explicitly from this cbr http url in the content-type:
SQL> set serveroutput onas we can find from the documentation:
SQL>
SQL> DECLARE
2 httpuri HTTPURIType;
3 BEGIN
4 httpuri := HTTPURIType('http://www.rambler.ru/news/top/rss.html');
5 DBMS_OUTPUT.put_line('rambler: ' || httpuri.getContentType());
6 httpuri := HTTPURIType('http://www.cbr.ru/scripts/XML_daily.asp?date_req=22.03.2008');
7 DBMS_OUTPUT.put_line('cbr: ' || httpuri.getContentType());
8 END;
9 /
rambler: text/xml; charset=windows-1251
cbr: text/xml
PL/SQL procedure successfully completed
SQL>
The HTTPURITYPE is a subtype of the UriType that provides support for the HTTP protocol. This uses the UTL_HTTP package underneath to access the HTTP URLs.
And in the chapter about UTL_HTTP we can find that:
Following the HTTP protocol standard specification, if the media type of a request or a response is text, but the character set information is missing in the Content-Type header, the character set of the request or response body should default to ISO-8859-1. A response created for a request inherits the default body character set of the request instead of the body character set of the current session. The default body character set is ISO-8859-1 in a database user session.
So this is what happens in our case - default encoding is ISO-8859-1 and names of currencies are wrongly displayed.
The easiest way is to specify the default charset with SET_BODY_CHARSET:
SQL> set colwidth 20So now the currency name is properly displayed. But this method affects only current session. And this default charset will be left for future requests also.
SQL> select utl_http.get_body_charset() from dual;
UTL_HTTP.GET_BODY_CH
--------------------
ISO-8859-1
SQL>
SQL> select * from v$nls_valid_values
2 where value like '%1251%'
3 /
PARAMETER VALUE ISDEPRECATED
-------------------- -------------------- ------------
CHARACTERSET CL8MSWIN1251 FALSE
SQL> exec utl_http.set_body_charset('CL8MSWIN1251');
PL/SQL procedure successfully completed
SQL> select utl_http.get_body_charset() from dual;
UTL_HTTP.GET_BODY_CH
--------------------
CL8MSWIN1251
SQL>
SQL> SELECT EXTRACTVALUE(VALUE(P), '/Valute/NumCode') NUMCODE,
2 EXTRACTVALUE(VALUE(P), '/Valute/CharCode') CHARCODE,
3 EXTRACTVALUE(VALUE(P), '/Valute/Nominal') NOMINAL,
4 EXTRACTVALUE(VALUE(P), '/Valute/Name') NAME,
5 EXTRACTVALUE(VALUE(P), '/Valute/Value') VALUE
6 FROM TABLE(XMLSEQUENCE(HTTPURITYPE('http://www.cbr.ru/scripts/XML_daily.asp?date_req=22.03.2008')
7 .GETXML().EXTRACT('//ValCurs/Valute'))) P
8 WHERE EXTRACTVALUE(VALUE(P), '/Valute/CharCode') = 'USD'
9 /
NUMCODE CHARCODE NOMINAL NAME VALUE
-------------------- -------------------- -------------------- -------------------- --------------------
840 USD 1 Доллар США 23,7773
SQL>
Another way - is a little bit difficult.
It is based on the fact, that in case you pump blob from the http link - the result encoding is not changed:
URIType Methods:
getBLOB() | Returns the target of the URI as a BLOB value. No character conversion is performed, and the character encoding is that of the URI target. This method can also be used to fetch binary data. |
So the idea is to transform this blob value into XML with the needed encoding.
I found a link from the OTN forum, which can be very useful: http://forums.oracle.com/forums/thread.jspa?messageID=1864895
We can easily create a function, that will take as an input the blob value and the needed encoding for conversion and would return as a result XMLType value:
SQL> create or replace function BLOB2XML(l_blob blob, l_charset varchar2)As we can see the default charset for UTL_HTTP package has not being changed.
2 return xmltype is
3 l_clob clob;
4 l_src_offset number;
5 l_dest_offset number;
6 l_blob_csid number := nvl(nls_charset_id(l_charset), dbms_lob.default_csid);
7 v_lang_context number := dbms_lob.default_lang_ctx;
8 l_warning number;
9 l_amount number;
10 begin
11 dbms_lob.createtemporary(l_clob, true);
12 l_src_offset := 1;
13 l_dest_offset := 1;
14 l_amount := dbms_lob.getlength(l_blob);
15 dbms_lob.converttoclob(l_clob,
16 l_blob,
17 l_amount,
18 l_dest_offset,
19 l_src_offset,
20 l_blob_csid,
21 v_lang_context,
22 l_warning);
23 return xmltype(l_clob);
24 end;
25 /
Function created
SQL> show err blob2xml;
No errors for FUNCTION SYSTEM.BLOB2XML
SQL> set colwidth 20
SQL> select utl_http.get_body_charset() from dual;
UTL_HTTP.GET_BODY_CH
--------------------
ISO-8859-1
SQL>
SQL> SELECT EXTRACTVALUE(VALUE(P), '/Valute/NumCode') NUMCODE,
2 EXTRACTVALUE(VALUE(P), '/Valute/CharCode') CHARCODE,
3 EXTRACTVALUE(VALUE(P), '/Valute/Nominal') NOMINAL,
4 EXTRACTVALUE(VALUE(P), '/Valute/Name') NAME,
5 EXTRACTVALUE(VALUE(P), '/Valute/Value') VALUE
6 FROM TABLE(XMLSEQUENCE(blob2XML(HTTPURITYPE('http://www.cbr.ru/scripts/XML_daily.asp?date_req=22.03.2008')
7 .GETblob(),'CL8MSWIN1251').EXTRACT('//ValCurs/Valute'))) P
8 WHERE EXTRACTVALUE(VALUE(P), '/Valute/CharCode') = 'USD'
9 /
NUMCODE CHARCODE NOMINAL NAME VALUE
-------------------- -------------------- -------------------- -------------------- --------------------
840 USD 1 Доллар США 23,7773
SQL> select utl_http.get_body_charset() from dual;
UTL_HTTP.GET_BODY_CH
--------------------
ISO-8859-1
SQL>
Just to be sure that the problem was in encoding - let's try our query with ISO-8859-1 charset:
SQL> select * from v$nls_valid_valuesYes, this is exactly what we got at our initial query.
2 where upper(value) like '%ISO%8859%1'
3 /
PARAMETER VALUE ISDEPRECATED
-------------------- -------------------- ------------
CHARACTERSET WE8ISO8859P1 FALSE
SQL>
SQL> SELECT EXTRACTVALUE(VALUE(P), '/Valute/NumCode') NUMCODE,
2 EXTRACTVALUE(VALUE(P), '/Valute/CharCode') CHARCODE,
3 EXTRACTVALUE(VALUE(P), '/Valute/Nominal') NOMINAL,
4 EXTRACTVALUE(VALUE(P), '/Valute/Name') NAME,
5 EXTRACTVALUE(VALUE(P), '/Valute/Value') VALUE
6 FROM TABLE(XMLSEQUENCE(blob2XML(HTTPURITYPE('http://www.cbr.ru/scripts/XML_daily.asp?date_req=22.03.2008')
7 .GETblob(),'WE8ISO8859P1').EXTRACT('//ValCurs/Valute'))) P
8 WHERE EXTRACTVALUE(VALUE(P), '/Valute/CharCode') = 'USD'
9 /
NUMCODE CHARCODE NOMINAL NAME VALUE
-------------------- -------------------- -------------------- -------------------- --------------------
840 USD 1 Aieea? NOA 23,7773
SQL>