Task:
There is a string value given, comprising a number of elements that are put into particular patterns on both sides.
We need to get all such patterns and inside value out of the string.
E.g. the beginning pattern is '<<<' and the ending pattern is '>>>'.
Solution:
SQL> with t as (select 'xyz<<<testdata>>>123' col1 from dual union all
2 select 'zzz<<<test><<<data>>>ssf' from dual union all
3 select 'sss<<<test><data>>>sffsd' from dual union all
4 select 'ggg<<<test>>>34345<<<data>>>lks' from dual)
5 --
6 select t.*,
7 reverse(regexp_replace(reverse(regexp_replace(col1, '(<<<.*?>>>)|.', '\1')),
8 '(>>>.*?<<<)|.', '\1')) col2
9 from t;
COL1 COL2
------------------------------- ----------------------------------
xyz<<<testdata>>>123 <<<testdata>>>
zzz<<<test><<<data>>>ssf <<<data>>>
sss<<<test><data>>>sffsd <<<test><data>>>
ggg<<<test>>>34345<<<data>>>lks <<<test>>><<<data>>>
SQL>
Explanation:
The first thing what we are doing - is getting all patterns that start from '<<<' and end with '>>>'. By using non-greedy search we have as a result not only one long value, but separated patterns (it is well seen in the fourth row):
SQL> with t as (select 'xyz<<<testdata>>>123' col1 from dual union all
2 select 'zzz<<<test><<<data>>>ssf' from dual union all
3 select 'sss<<<test><data>>>sffsd' from dual union all
4 select 'ggg<<<test>>>34345<<<data>>>lks' from dual)
5 --
6 select t.*,
7 regexp_replace(col1, '(<<<.*?>>>)|.', '\1') col2
8 from t;
COL1 COL2
------------------------------- ----------------------------------
xyz<<<testdata>>>123 <<<testdata>>>
zzz<<<test><<<data>>>ssf <<<test><<<data>>>
sss<<<test><data>>>sffsd <<<test><data>>>
ggg<<<test>>>34345<<<data>>>lks <<<test>>><<<data>>>
SQL>
This technique of getting only desired pattern from the string is described here.
But the result we got is not 100% what were going to receive.
As you can see in the second line there is a resulting value '<<<test><<<data>>>', which is not correct. Because there is an opening pattern '<<<' is met inside the value. So it looks as only '<<<data>>>' should be returned.
For that purpose we need to read the string starting from the end and moving in a backward direction to the very beginning of the value.
In such cases using REVERSE() function could be a solution, although it is not documented. So first we reverse the string and secondly we pass it in the usual way from beginning to the end.
And by the way as we have it reversed all our patterns should also be reversed: the first one '>>>' would be met and then closing '<<<'. In the end we just put the second REVERSE() to get the initial direction of the string:
SQL> with t as (select 'xyz<<<testdata>>>123' col1 from dual union all
2 select 'zzz<<<test><<<data>>>ssf' from dual union all
3 select 'sss<<<test><data>>>sffsd' from dual union all
4 select 'ggg<<<test>>>34345<<<data>>>lks' from dual)
5 --
6 select t.*,
7 regexp_replace(col1, '(<<<.*?>>>)|.', '\1') col2,
8 reverse(regexp_replace(col1, '(<<<.*?>>>)|.', '\1')) col3,
9 regexp_replace(reverse(regexp_replace(col1, '(<<<.*?>>>)|.', '\1')),'(>>>.*?<<<)|.', '\1') col4,
10 reverse(regexp_replace(reverse(regexp_replace(col1, '(<<<.*?>>>)|.', '\1')),'(>>>.*?<<<)|.', '\1')) col5
11 from t;
COL1 COL2 COL3 COL4 COL5
------------------------------- ------------------------ ------------------------ ---------------------- ---------------------
xyz<<<testdata>>>123 <<<testdata>>> >>>atadtset<<< >>>atadtset<<< <<<testdata>>>
zzz<<<test><<<data>>>ssf <<<test><<<data>>> >>>atad<<<>tset<<< >>>atad<<< <<<data>>>
sss<<<test><data>>>sffsd <<<test><data>>> >>>atad<>tset<<< >>>atad<>tset<<< <<<test><data>>>
ggg<<<test>>>34345<<<data>>>lks <<<test>>><<<data>>> >>>atad<<<>>>tset<<< >>>atad<<<>>>tset<<< <<<test>>><<<data>>>
SQL>
As you can see the redundant part '<<<test>' have dissapeared from the line #2.
PS
The opening pattern and closing pattern could be any, e.g. '<<~' as opening and '><<-' as closing, in this case the query would look like:
SQL> with t as (select 'xyz<<~testdata><<-123' col1 from dual union all
2 select 'zzz<<~test><<-data><<-ssf' from dual union all
3 select 'sss<<~test>>>><<-data>>~sffsd' from dual union all
4 select 'ggg<<~test><<-34345<<~data<<~data2><<-lks' from dual)
5 --
6 select t.*,
7 reverse(regexp_replace(reverse(regexp_replace(col1, '(<<~.*?><<-)|.', '\1')),'(-<<>.*?~<<)|.', '\1')) col2
8 from t;
COL1 COL2
----------------------------------------- -----------------------------------
xyz<<~testdata><<-123 <<~testdata><<-
zzz<<~test><<-data><<-ssf <<~test><<-
sss<<~test>>>><<-data>>~sffsd <<~test>>>><<-
ggg<<~test><<-34345<<~data<<~data2><<-lks <<~test><<-<<~data2><<-
SQL>
2 comments:
The use of the REVERSE function is very questionable w/o understanding the harm it can make. The results are unpredictable.
For instance:
SQL> COLUMN col1 FORMAT A30
SQL> COLUMN col2 FORMAT A30
SQL> WITH
2 t AS
3 (
4 SELECT 'xyz<<~é,ô,ÿ><<-123' col1 FROM dual
5 UNION ALL
6 SELECT 'xyz<<~whatever><<-123' col1 FROM dual
7 )
8 SELECT t.*
9 , REVERSE(REGEXP_REPLACE(REVERSE(REGEXP_REPLACE(col1, '(<<~.*?><<-)|.', '\1')),'(-<<>.*?~<<)|.', '\1')) col2
10 FROM t
11 /
COL1 COL2
------------------------------ ------------------------------
xyz<<~é,ô,ÿ><<-123
xyz<<~whatever><<-123 <<~whatever><<-
[SQL formatting would be clumsy here, just copy it somewhere and use a monospaced font]
That's probably why the solution [which has ideas borrowed elsewhere :-)] was not rated.
Sure, Vladimir.
I had to mention it explicitly, that REVERSE() would not work properly in case of multibyte characters. My fault.
PS
about "borrowing ideas" - I wouldn't be so categorical unless I'm 100% sure ;o)
If I take someones ideas - I usually post a link to the resource.
Post a Comment