In that post I'll talk about a problem, which is very popular, according to the number of posts in forums with similar questions.
Usually it is formulated as: "How to count the number of occurences of a substring with special pattern inside the string".
Actually, everything we were talking about in two previous posts - can be very useful in solving such a task.
Let our input data be the same as in the previous post:
SQL> with t as (select 'three numbers: (111)111-11-11 and [222]222-22-22 and (333)333-33-33' str from dual)We have a string with phone numbers in it.
2 --
3 select t.* from t
4 /
STR
-------------------------------------------------------------------
three numbers: (111)111-11-11 and [222]222-22-22 and (333)333-33-33
SQL>
We want to find how many numbers in the format (XXX)XXX-XX-XX are there.
So what we can achieve at the moment? We can pull out all the substrings with that particular pattern in one column:
SQL> with t as (select 'three numbers: (111)111-11-11 and [222]222-22-22 and (333)333-33-33' str from dual)What would be the technique to find the number of substrings? The answer would be "very easy".
2 --
3 select t.*,
4 regexp_replace(str, '\(\d{3}\)(\d{3}-\d{2}-\d{2}( |$))|.', '\1') phone_numbers
5 from t
6 /
STR PHONE_NUMBERS
------------------------------------------------------------------- -------------------------
three numbers: (111)111-11-11 and [222]222-22-22 and (333)333-33-33 111-11-11 333-33-33
SQL>
First instead of finding the substring according to the mask - we'll find only the first symbols of such substrings.
Then using the function LENGTH - we'll find the length of such a string - and it would be equal to the number of occurences - cause every symbol - would mean one occurenece.
SQL> with t as (select 'three numbers: (111)111-11-11 and [222]222-22-22 and (333)333-33-33' str from dual)What we have changed:
2 --
3 select regexp_replace(str, '\(\d{3}\)(\d{3}-\d{2}-\d{2}( |$))|.', '\1') phone_numbers,
4 regexp_replace(str, '\(\d{3}\)(\d)\d{2}-\d{2}-\d{2}|.', '\1') first_letters,
5 nvl(length(regexp_replace(str, '\(\d{3}\)(\d)\d{2}-\d{2}-\d{2}|.', '\1')), 0) occurrencies
6 from t
7 /
PHONE_NUMBERS FIRST_LETTERS OCCURRENCIES
----------------------- ------------- ------------
111-11-11 333-33-33 13 2
SQL>
instead of \d{3} standing for the first three digits of our phone numbers, we've written \d\d{2}. That is also three digits - but with the first one separated.
Next step would be to place not the whole pattern in parenethes, but only the first digit: (\d)\d{2}
That's why when we get the whole pattern replaced with the '\1' backreference - we get only the first digits: one from each occurenece of the pattern.
After that we put LENGTH function and for the case when there no occurences - we put NVL(...,0).
In our string we have 2 occurences of a pattern like XXX-XX-XX with preceding (XXX).
And that's right.
to be continued...