Oracle: is it possible to trim a string and count the number of occurances, and insert to a new table?












0















My source table looks like this:



id|value|count


Value is a String of values separated by semicolons(;). For example it may look like this



A;B;C;D;


Some may not have values at a certain position, like this



A;;;D;


First, I've selectively moved records to a new table(targettable) based on positions with values using regexp. I achieved this by using [^;]+; for having some value between the semicolons, and [^;]*; for those positions I don't care about. For example, if I wanted the 1st and 4th place to have values, I could incorporate regexp with insert into like this



insert into
targettable tt (id, value, count)
SELECT some_seq.nextval,value, count
FROM source table
WHERE
regexp_like(value, '^[^;]+;[^;]*;[^;]*;[^;]+;')


so now my new table has a list of records that have values at the 1st and 4th position. It may look like this



1|A;B;C;D;|2
2|B;;;E;|1
3|A;D;;D|3


Next there are 2 things I want to do. 1. get rid of values other than 1st and 4th. 2.combine identical values and add up their count. For example, record 1 and 3 are the same, so I want to trim so they become A;D;, and then add their count, so 2+3=5. Now my new table looks like this



1|A;D;|5
2|B;E;|1


As long as I can somehow get to the final table from source table, I don't care about the steps. The intermediate table is not required, but it may help me achieve the final result. I'm not sure if I can go any further with Orcale though. If not, I'll have to move and process the records with Java. Bear in mind I have millions of records, so I would consider the Oracle method if it is possible.










share|improve this question

























  • Should your final entry show B;E; rather than A;E;? And how do you decide which ID value to keep when you consolidate duplicates - or is that arbitrary? More sample data, including the count values and rows that will be excluded, might be useful.

    – Alex Poole
    Jan 2 at 10:00











  • @AlexPoole yes B;E; sorry my bad. I corrected it. ID is not important, it can be any generated numbers.

    – Andy
    Jan 2 at 10:03
















0















My source table looks like this:



id|value|count


Value is a String of values separated by semicolons(;). For example it may look like this



A;B;C;D;


Some may not have values at a certain position, like this



A;;;D;


First, I've selectively moved records to a new table(targettable) based on positions with values using regexp. I achieved this by using [^;]+; for having some value between the semicolons, and [^;]*; for those positions I don't care about. For example, if I wanted the 1st and 4th place to have values, I could incorporate regexp with insert into like this



insert into
targettable tt (id, value, count)
SELECT some_seq.nextval,value, count
FROM source table
WHERE
regexp_like(value, '^[^;]+;[^;]*;[^;]*;[^;]+;')


so now my new table has a list of records that have values at the 1st and 4th position. It may look like this



1|A;B;C;D;|2
2|B;;;E;|1
3|A;D;;D|3


Next there are 2 things I want to do. 1. get rid of values other than 1st and 4th. 2.combine identical values and add up their count. For example, record 1 and 3 are the same, so I want to trim so they become A;D;, and then add their count, so 2+3=5. Now my new table looks like this



1|A;D;|5
2|B;E;|1


As long as I can somehow get to the final table from source table, I don't care about the steps. The intermediate table is not required, but it may help me achieve the final result. I'm not sure if I can go any further with Orcale though. If not, I'll have to move and process the records with Java. Bear in mind I have millions of records, so I would consider the Oracle method if it is possible.










share|improve this question

























  • Should your final entry show B;E; rather than A;E;? And how do you decide which ID value to keep when you consolidate duplicates - or is that arbitrary? More sample data, including the count values and rows that will be excluded, might be useful.

    – Alex Poole
    Jan 2 at 10:00











  • @AlexPoole yes B;E; sorry my bad. I corrected it. ID is not important, it can be any generated numbers.

    – Andy
    Jan 2 at 10:03














0












0








0








My source table looks like this:



id|value|count


Value is a String of values separated by semicolons(;). For example it may look like this



A;B;C;D;


Some may not have values at a certain position, like this



A;;;D;


First, I've selectively moved records to a new table(targettable) based on positions with values using regexp. I achieved this by using [^;]+; for having some value between the semicolons, and [^;]*; for those positions I don't care about. For example, if I wanted the 1st and 4th place to have values, I could incorporate regexp with insert into like this



insert into
targettable tt (id, value, count)
SELECT some_seq.nextval,value, count
FROM source table
WHERE
regexp_like(value, '^[^;]+;[^;]*;[^;]*;[^;]+;')


so now my new table has a list of records that have values at the 1st and 4th position. It may look like this



1|A;B;C;D;|2
2|B;;;E;|1
3|A;D;;D|3


Next there are 2 things I want to do. 1. get rid of values other than 1st and 4th. 2.combine identical values and add up their count. For example, record 1 and 3 are the same, so I want to trim so they become A;D;, and then add their count, so 2+3=5. Now my new table looks like this



1|A;D;|5
2|B;E;|1


As long as I can somehow get to the final table from source table, I don't care about the steps. The intermediate table is not required, but it may help me achieve the final result. I'm not sure if I can go any further with Orcale though. If not, I'll have to move and process the records with Java. Bear in mind I have millions of records, so I would consider the Oracle method if it is possible.










share|improve this question
















My source table looks like this:



id|value|count


Value is a String of values separated by semicolons(;). For example it may look like this



A;B;C;D;


Some may not have values at a certain position, like this



A;;;D;


First, I've selectively moved records to a new table(targettable) based on positions with values using regexp. I achieved this by using [^;]+; for having some value between the semicolons, and [^;]*; for those positions I don't care about. For example, if I wanted the 1st and 4th place to have values, I could incorporate regexp with insert into like this



insert into
targettable tt (id, value, count)
SELECT some_seq.nextval,value, count
FROM source table
WHERE
regexp_like(value, '^[^;]+;[^;]*;[^;]*;[^;]+;')


so now my new table has a list of records that have values at the 1st and 4th position. It may look like this



1|A;B;C;D;|2
2|B;;;E;|1
3|A;D;;D|3


Next there are 2 things I want to do. 1. get rid of values other than 1st and 4th. 2.combine identical values and add up their count. For example, record 1 and 3 are the same, so I want to trim so they become A;D;, and then add their count, so 2+3=5. Now my new table looks like this



1|A;D;|5
2|B;E;|1


As long as I can somehow get to the final table from source table, I don't care about the steps. The intermediate table is not required, but it may help me achieve the final result. I'm not sure if I can go any further with Orcale though. If not, I'll have to move and process the records with Java. Bear in mind I have millions of records, so I would consider the Oracle method if it is possible.







oracle






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 2 at 10:01







Andy

















asked Jan 2 at 9:40









AndyAndy

847




847













  • Should your final entry show B;E; rather than A;E;? And how do you decide which ID value to keep when you consolidate duplicates - or is that arbitrary? More sample data, including the count values and rows that will be excluded, might be useful.

    – Alex Poole
    Jan 2 at 10:00











  • @AlexPoole yes B;E; sorry my bad. I corrected it. ID is not important, it can be any generated numbers.

    – Andy
    Jan 2 at 10:03



















  • Should your final entry show B;E; rather than A;E;? And how do you decide which ID value to keep when you consolidate duplicates - or is that arbitrary? More sample data, including the count values and rows that will be excluded, might be useful.

    – Alex Poole
    Jan 2 at 10:00











  • @AlexPoole yes B;E; sorry my bad. I corrected it. ID is not important, it can be any generated numbers.

    – Andy
    Jan 2 at 10:03

















Should your final entry show B;E; rather than A;E;? And how do you decide which ID value to keep when you consolidate duplicates - or is that arbitrary? More sample data, including the count values and rows that will be excluded, might be useful.

– Alex Poole
Jan 2 at 10:00





Should your final entry show B;E; rather than A;E;? And how do you decide which ID value to keep when you consolidate duplicates - or is that arbitrary? More sample data, including the count values and rows that will be excluded, might be useful.

– Alex Poole
Jan 2 at 10:00













@AlexPoole yes B;E; sorry my bad. I corrected it. ID is not important, it can be any generated numbers.

– Andy
Jan 2 at 10:03





@AlexPoole yes B;E; sorry my bad. I corrected it. ID is not important, it can be any generated numbers.

– Andy
Jan 2 at 10:03












2 Answers
2






active

oldest

votes


















1














You should be able to skip the intermediate table; just extract the 1st and 4th elements, using the regexp_substr() function, while checking that those are not null:



select regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) -- first position
|| ';' || regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) -- fourth position
|| ';' as value, -- if you want trailing semicolon
count
from source
where regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) is not null
and regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) is not null;

VALUE COUNT
------------------ ----------
A;D; 2
B;E; 1
A;D; 3


and then aggregate those results:



select value, sum(count) as count
from (
select regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) -- first position
|| ';' || regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) -- fourth position
|| ';' as value, -- if you want trailing semicolon
count
from source
where regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) is not null
and regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) is not null
)
group by value;

VALUE COUNT
------------------ ----------
A;D; 5
B;E; 1


Then for your insert you can use that query, either with an auto-increment ID (12c+), or setting an ID from a sequence via a trigger, or possibly wrapped in another level of subquery to get the value explicitly:



insert into target (id, value, count)
select some_seq.nextval, value, count
from (
select value, sum(count) as count
from (
select regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) -- first position
|| ';' || regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) -- fourth position
|| ';' as value, -- if you want trailing semicolon
count
from source
where regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) is not null
and regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) is not null
)
group by value
);


If you're creating a new sequence to do that, so they start from 1, you can use rownum or row_number() instead.





Incidentally, using a keyword or a function name like count as a column name is confusing (sum(count) !?); those might not be your real names though.






share|improve this answer
























  • I really like this method. Now I just need to find a way to dynamically create the sql with java, but I think I'll manage that. Also, if the source table had 5 million rows, how much time do you think the sql will take to execute? Do you think it's pretty efficient? I only have a small test database so I can't verify this atm, but I think it will still be faster than anything I write. Also, your step by step explanation is fantastic. Thank you!

    – Andy
    Jan 3 at 2:03













  • I should probably rename count. You are right, it is confusing now that you mentioned it.

    – Andy
    Jan 3 at 2:16






  • 1





    It won't be very efficient, but you're hampered by your data model - storing multiple values in a single column like that isn't what you're supposed to do. (You should be able to do something a bit faster with instr/substr instead as regular expressions are slower.) If this is something you'll need to do often and/or with different positions then you'd be better off redesigning your table; or if you can't, create a materialized view of the deconstructed values, or maybe add function-based indexes for each extracted element.

    – Alex Poole
    Jan 3 at 9:09











  • It's pretty much a one-time thing so as long as it doesn't take days, I'm okay with that. I'll test it out next week. Thank you again.

    – Andy
    Jan 4 at 3:15



















0














I would use regexp_replace to remove the 2nd and 3rd parts of the string, combined with an aggregate query to get the total count, like :



SELECT
regexp_replace(value, '^[^;]+;([^;]*;[^;]*;)[^;]+;', ''),
SUM(count)
FROM source table
WHERE
regexp_like(value, '^[^;]+;[^;]*;[^;]*;[^;]+;')
GROUP BY
regexp_replace(value, '^[^;]+;([^;]*;[^;]*;)[^;]+;', '')





share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54004070%2foracle-is-it-possible-to-trim-a-string-and-count-the-number-of-occurances-and%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    You should be able to skip the intermediate table; just extract the 1st and 4th elements, using the regexp_substr() function, while checking that those are not null:



    select regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) -- first position
    || ';' || regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) -- fourth position
    || ';' as value, -- if you want trailing semicolon
    count
    from source
    where regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) is not null
    and regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) is not null;

    VALUE COUNT
    ------------------ ----------
    A;D; 2
    B;E; 1
    A;D; 3


    and then aggregate those results:



    select value, sum(count) as count
    from (
    select regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) -- first position
    || ';' || regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) -- fourth position
    || ';' as value, -- if you want trailing semicolon
    count
    from source
    where regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) is not null
    and regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) is not null
    )
    group by value;

    VALUE COUNT
    ------------------ ----------
    A;D; 5
    B;E; 1


    Then for your insert you can use that query, either with an auto-increment ID (12c+), or setting an ID from a sequence via a trigger, or possibly wrapped in another level of subquery to get the value explicitly:



    insert into target (id, value, count)
    select some_seq.nextval, value, count
    from (
    select value, sum(count) as count
    from (
    select regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) -- first position
    || ';' || regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) -- fourth position
    || ';' as value, -- if you want trailing semicolon
    count
    from source
    where regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) is not null
    and regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) is not null
    )
    group by value
    );


    If you're creating a new sequence to do that, so they start from 1, you can use rownum or row_number() instead.





    Incidentally, using a keyword or a function name like count as a column name is confusing (sum(count) !?); those might not be your real names though.






    share|improve this answer
























    • I really like this method. Now I just need to find a way to dynamically create the sql with java, but I think I'll manage that. Also, if the source table had 5 million rows, how much time do you think the sql will take to execute? Do you think it's pretty efficient? I only have a small test database so I can't verify this atm, but I think it will still be faster than anything I write. Also, your step by step explanation is fantastic. Thank you!

      – Andy
      Jan 3 at 2:03













    • I should probably rename count. You are right, it is confusing now that you mentioned it.

      – Andy
      Jan 3 at 2:16






    • 1





      It won't be very efficient, but you're hampered by your data model - storing multiple values in a single column like that isn't what you're supposed to do. (You should be able to do something a bit faster with instr/substr instead as regular expressions are slower.) If this is something you'll need to do often and/or with different positions then you'd be better off redesigning your table; or if you can't, create a materialized view of the deconstructed values, or maybe add function-based indexes for each extracted element.

      – Alex Poole
      Jan 3 at 9:09











    • It's pretty much a one-time thing so as long as it doesn't take days, I'm okay with that. I'll test it out next week. Thank you again.

      – Andy
      Jan 4 at 3:15
















    1














    You should be able to skip the intermediate table; just extract the 1st and 4th elements, using the regexp_substr() function, while checking that those are not null:



    select regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) -- first position
    || ';' || regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) -- fourth position
    || ';' as value, -- if you want trailing semicolon
    count
    from source
    where regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) is not null
    and regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) is not null;

    VALUE COUNT
    ------------------ ----------
    A;D; 2
    B;E; 1
    A;D; 3


    and then aggregate those results:



    select value, sum(count) as count
    from (
    select regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) -- first position
    || ';' || regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) -- fourth position
    || ';' as value, -- if you want trailing semicolon
    count
    from source
    where regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) is not null
    and regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) is not null
    )
    group by value;

    VALUE COUNT
    ------------------ ----------
    A;D; 5
    B;E; 1


    Then for your insert you can use that query, either with an auto-increment ID (12c+), or setting an ID from a sequence via a trigger, or possibly wrapped in another level of subquery to get the value explicitly:



    insert into target (id, value, count)
    select some_seq.nextval, value, count
    from (
    select value, sum(count) as count
    from (
    select regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) -- first position
    || ';' || regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) -- fourth position
    || ';' as value, -- if you want trailing semicolon
    count
    from source
    where regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) is not null
    and regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) is not null
    )
    group by value
    );


    If you're creating a new sequence to do that, so they start from 1, you can use rownum or row_number() instead.





    Incidentally, using a keyword or a function name like count as a column name is confusing (sum(count) !?); those might not be your real names though.






    share|improve this answer
























    • I really like this method. Now I just need to find a way to dynamically create the sql with java, but I think I'll manage that. Also, if the source table had 5 million rows, how much time do you think the sql will take to execute? Do you think it's pretty efficient? I only have a small test database so I can't verify this atm, but I think it will still be faster than anything I write. Also, your step by step explanation is fantastic. Thank you!

      – Andy
      Jan 3 at 2:03













    • I should probably rename count. You are right, it is confusing now that you mentioned it.

      – Andy
      Jan 3 at 2:16






    • 1





      It won't be very efficient, but you're hampered by your data model - storing multiple values in a single column like that isn't what you're supposed to do. (You should be able to do something a bit faster with instr/substr instead as regular expressions are slower.) If this is something you'll need to do often and/or with different positions then you'd be better off redesigning your table; or if you can't, create a materialized view of the deconstructed values, or maybe add function-based indexes for each extracted element.

      – Alex Poole
      Jan 3 at 9:09











    • It's pretty much a one-time thing so as long as it doesn't take days, I'm okay with that. I'll test it out next week. Thank you again.

      – Andy
      Jan 4 at 3:15














    1












    1








    1







    You should be able to skip the intermediate table; just extract the 1st and 4th elements, using the regexp_substr() function, while checking that those are not null:



    select regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) -- first position
    || ';' || regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) -- fourth position
    || ';' as value, -- if you want trailing semicolon
    count
    from source
    where regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) is not null
    and regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) is not null;

    VALUE COUNT
    ------------------ ----------
    A;D; 2
    B;E; 1
    A;D; 3


    and then aggregate those results:



    select value, sum(count) as count
    from (
    select regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) -- first position
    || ';' || regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) -- fourth position
    || ';' as value, -- if you want trailing semicolon
    count
    from source
    where regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) is not null
    and regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) is not null
    )
    group by value;

    VALUE COUNT
    ------------------ ----------
    A;D; 5
    B;E; 1


    Then for your insert you can use that query, either with an auto-increment ID (12c+), or setting an ID from a sequence via a trigger, or possibly wrapped in another level of subquery to get the value explicitly:



    insert into target (id, value, count)
    select some_seq.nextval, value, count
    from (
    select value, sum(count) as count
    from (
    select regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) -- first position
    || ';' || regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) -- fourth position
    || ';' as value, -- if you want trailing semicolon
    count
    from source
    where regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) is not null
    and regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) is not null
    )
    group by value
    );


    If you're creating a new sequence to do that, so they start from 1, you can use rownum or row_number() instead.





    Incidentally, using a keyword or a function name like count as a column name is confusing (sum(count) !?); those might not be your real names though.






    share|improve this answer













    You should be able to skip the intermediate table; just extract the 1st and 4th elements, using the regexp_substr() function, while checking that those are not null:



    select regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) -- first position
    || ';' || regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) -- fourth position
    || ';' as value, -- if you want trailing semicolon
    count
    from source
    where regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) is not null
    and regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) is not null;

    VALUE COUNT
    ------------------ ----------
    A;D; 2
    B;E; 1
    A;D; 3


    and then aggregate those results:



    select value, sum(count) as count
    from (
    select regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) -- first position
    || ';' || regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) -- fourth position
    || ';' as value, -- if you want trailing semicolon
    count
    from source
    where regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) is not null
    and regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) is not null
    )
    group by value;

    VALUE COUNT
    ------------------ ----------
    A;D; 5
    B;E; 1


    Then for your insert you can use that query, either with an auto-increment ID (12c+), or setting an ID from a sequence via a trigger, or possibly wrapped in another level of subquery to get the value explicitly:



    insert into target (id, value, count)
    select some_seq.nextval, value, count
    from (
    select value, sum(count) as count
    from (
    select regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) -- first position
    || ';' || regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) -- fourth position
    || ';' as value, -- if you want trailing semicolon
    count
    from source
    where regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) is not null
    and regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) is not null
    )
    group by value
    );


    If you're creating a new sequence to do that, so they start from 1, you can use rownum or row_number() instead.





    Incidentally, using a keyword or a function name like count as a column name is confusing (sum(count) !?); those might not be your real names though.







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Jan 2 at 10:14









    Alex PooleAlex Poole

    134k6108181




    134k6108181













    • I really like this method. Now I just need to find a way to dynamically create the sql with java, but I think I'll manage that. Also, if the source table had 5 million rows, how much time do you think the sql will take to execute? Do you think it's pretty efficient? I only have a small test database so I can't verify this atm, but I think it will still be faster than anything I write. Also, your step by step explanation is fantastic. Thank you!

      – Andy
      Jan 3 at 2:03













    • I should probably rename count. You are right, it is confusing now that you mentioned it.

      – Andy
      Jan 3 at 2:16






    • 1





      It won't be very efficient, but you're hampered by your data model - storing multiple values in a single column like that isn't what you're supposed to do. (You should be able to do something a bit faster with instr/substr instead as regular expressions are slower.) If this is something you'll need to do often and/or with different positions then you'd be better off redesigning your table; or if you can't, create a materialized view of the deconstructed values, or maybe add function-based indexes for each extracted element.

      – Alex Poole
      Jan 3 at 9:09











    • It's pretty much a one-time thing so as long as it doesn't take days, I'm okay with that. I'll test it out next week. Thank you again.

      – Andy
      Jan 4 at 3:15



















    • I really like this method. Now I just need to find a way to dynamically create the sql with java, but I think I'll manage that. Also, if the source table had 5 million rows, how much time do you think the sql will take to execute? Do you think it's pretty efficient? I only have a small test database so I can't verify this atm, but I think it will still be faster than anything I write. Also, your step by step explanation is fantastic. Thank you!

      – Andy
      Jan 3 at 2:03













    • I should probably rename count. You are right, it is confusing now that you mentioned it.

      – Andy
      Jan 3 at 2:16






    • 1





      It won't be very efficient, but you're hampered by your data model - storing multiple values in a single column like that isn't what you're supposed to do. (You should be able to do something a bit faster with instr/substr instead as regular expressions are slower.) If this is something you'll need to do often and/or with different positions then you'd be better off redesigning your table; or if you can't, create a materialized view of the deconstructed values, or maybe add function-based indexes for each extracted element.

      – Alex Poole
      Jan 3 at 9:09











    • It's pretty much a one-time thing so as long as it doesn't take days, I'm okay with that. I'll test it out next week. Thank you again.

      – Andy
      Jan 4 at 3:15

















    I really like this method. Now I just need to find a way to dynamically create the sql with java, but I think I'll manage that. Also, if the source table had 5 million rows, how much time do you think the sql will take to execute? Do you think it's pretty efficient? I only have a small test database so I can't verify this atm, but I think it will still be faster than anything I write. Also, your step by step explanation is fantastic. Thank you!

    – Andy
    Jan 3 at 2:03







    I really like this method. Now I just need to find a way to dynamically create the sql with java, but I think I'll manage that. Also, if the source table had 5 million rows, how much time do you think the sql will take to execute? Do you think it's pretty efficient? I only have a small test database so I can't verify this atm, but I think it will still be faster than anything I write. Also, your step by step explanation is fantastic. Thank you!

    – Andy
    Jan 3 at 2:03















    I should probably rename count. You are right, it is confusing now that you mentioned it.

    – Andy
    Jan 3 at 2:16





    I should probably rename count. You are right, it is confusing now that you mentioned it.

    – Andy
    Jan 3 at 2:16




    1




    1





    It won't be very efficient, but you're hampered by your data model - storing multiple values in a single column like that isn't what you're supposed to do. (You should be able to do something a bit faster with instr/substr instead as regular expressions are slower.) If this is something you'll need to do often and/or with different positions then you'd be better off redesigning your table; or if you can't, create a materialized view of the deconstructed values, or maybe add function-based indexes for each extracted element.

    – Alex Poole
    Jan 3 at 9:09





    It won't be very efficient, but you're hampered by your data model - storing multiple values in a single column like that isn't what you're supposed to do. (You should be able to do something a bit faster with instr/substr instead as regular expressions are slower.) If this is something you'll need to do often and/or with different positions then you'd be better off redesigning your table; or if you can't, create a materialized view of the deconstructed values, or maybe add function-based indexes for each extracted element.

    – Alex Poole
    Jan 3 at 9:09













    It's pretty much a one-time thing so as long as it doesn't take days, I'm okay with that. I'll test it out next week. Thank you again.

    – Andy
    Jan 4 at 3:15





    It's pretty much a one-time thing so as long as it doesn't take days, I'm okay with that. I'll test it out next week. Thank you again.

    – Andy
    Jan 4 at 3:15













    0














    I would use regexp_replace to remove the 2nd and 3rd parts of the string, combined with an aggregate query to get the total count, like :



    SELECT
    regexp_replace(value, '^[^;]+;([^;]*;[^;]*;)[^;]+;', ''),
    SUM(count)
    FROM source table
    WHERE
    regexp_like(value, '^[^;]+;[^;]*;[^;]*;[^;]+;')
    GROUP BY
    regexp_replace(value, '^[^;]+;([^;]*;[^;]*;)[^;]+;', '')





    share|improve this answer




























      0














      I would use regexp_replace to remove the 2nd and 3rd parts of the string, combined with an aggregate query to get the total count, like :



      SELECT
      regexp_replace(value, '^[^;]+;([^;]*;[^;]*;)[^;]+;', ''),
      SUM(count)
      FROM source table
      WHERE
      regexp_like(value, '^[^;]+;[^;]*;[^;]*;[^;]+;')
      GROUP BY
      regexp_replace(value, '^[^;]+;([^;]*;[^;]*;)[^;]+;', '')





      share|improve this answer


























        0












        0








        0







        I would use regexp_replace to remove the 2nd and 3rd parts of the string, combined with an aggregate query to get the total count, like :



        SELECT
        regexp_replace(value, '^[^;]+;([^;]*;[^;]*;)[^;]+;', ''),
        SUM(count)
        FROM source table
        WHERE
        regexp_like(value, '^[^;]+;[^;]*;[^;]*;[^;]+;')
        GROUP BY
        regexp_replace(value, '^[^;]+;([^;]*;[^;]*;)[^;]+;', '')





        share|improve this answer













        I would use regexp_replace to remove the 2nd and 3rd parts of the string, combined with an aggregate query to get the total count, like :



        SELECT
        regexp_replace(value, '^[^;]+;([^;]*;[^;]*;)[^;]+;', ''),
        SUM(count)
        FROM source table
        WHERE
        regexp_like(value, '^[^;]+;[^;]*;[^;]*;[^;]+;')
        GROUP BY
        regexp_replace(value, '^[^;]+;([^;]*;[^;]*;)[^;]+;', '')






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Jan 2 at 10:26









        GMBGMB

        18.6k31028




        18.6k31028






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54004070%2foracle-is-it-possible-to-trim-a-string-and-count-the-number-of-occurances-and%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            MongoDB - Not Authorized To Execute Command

            in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith

            Npm cannot find a required file even through it is in the searched directory