In that post I'll talk about a problem, which is very popular, according to the number of posts in forums with similar questions.

Usually it is formulated as: "How to count the number of occurences of a substring with special pattern inside the string".

Actually, everything we were talking about in two previous posts - can be very useful in solving such a task.

Let our input data be the same as in the previous post:

SQL> with t as (select 'three numbers: (111)111-11-11 and [222]222-22-22 and (333)333-33-33' str from dual)We have a string with phone numbers in it.

2 --

3 select t.* from t

4 /

STR

-------------------------------------------------------------------

three numbers: (111)111-11-11 and [222]222-22-22 and (333)333-33-33

SQL>

We want to find how many numbers in the format (XXX)XXX-XX-XX are there.

So what we can achieve at the moment? We can pull out all the substrings with that particular pattern in one column:

SQL> with t as (select 'three numbers: (111)111-11-11 and [222]222-22-22 and (333)333-33-33' str from dual)What would be the technique to find the number of substrings? The answer would be "very easy".

2 --

3 select t.*,

4 regexp_replace(str, '\(\d{3}\)(\d{3}-\d{2}-\d{2}( |$))|.', '\1') phone_numbers

5 from t

6 /

STR PHONE_NUMBERS

------------------------------------------------------------------- -------------------------

three numbers: (111)111-11-11 and [222]222-22-22 and (333)333-33-33 111-11-11 333-33-33

SQL>

First instead of finding the substring according to the mask - we'll find only the first symbols of such substrings.

Then using the function LENGTH - we'll find the length of such a string - and it would be equal to the number of occurences - cause every symbol - would mean one occurenece.

SQL> with t as (select 'three numbers: (111)111-11-11 and [222]222-22-22 and (333)333-33-33' str from dual)What we have changed:

2 --

3 select regexp_replace(str, '\(\d{3}\)(\d{3}-\d{2}-\d{2}( |$))|.', '\1') phone_numbers,

4 regexp_replace(str, '\(\d{3}\)(\d)\d{2}-\d{2}-\d{2}|.', '\1') first_letters,

5 nvl(length(regexp_replace(str, '\(\d{3}\)(\d)\d{2}-\d{2}-\d{2}|.', '\1')), 0) occurrencies

6 from t

7 /

PHONE_NUMBERS FIRST_LETTERS OCCURRENCIES

----------------------- ------------- ------------

111-11-11 333-33-33 13 2

SQL>

instead of \d{3} standing for the first three digits of our phone numbers, we've written \d\d{2}. That is also three digits - but with the first one separated.

Next step would be to place not the whole pattern in parenethes, but only the first digit: (\d)\d{2}

That's why when we get the whole pattern replaced with the '\1' backreference - we get only the first digits: one from each occurenece of the pattern.

After that we put LENGTH function and for the case when there no occurences - we put NVL(...,0).

In our string we have 2 occurences of a pattern like XXX-XX-XX with preceding (XXX).

And that's right.

to be continued...

## 1 comments:

Thank you. I was looking for this.

I had something simpler, I just needed to find the number of commas in a string.

I did len(dbo.RegExReplace(@string,'[^,]',''))

Post a Comment