Oracle: is it possible to trim a string and count the number of occurances, and insert to a new table?

My source table looks like this:

id|value|count

Value is a String of values separated by semicolons(;). For example it may look like this

A;B;C;D;

Some may not have values at a certain position, like this

A;;;D;

First, I've selectively moved records to a new table(targettable) based on positions with values using regexp. I achieved this by using [^;]+; for having some value between the semicolons, and [^;]*; for those positions I don't care about. For example, if I wanted the 1st and 4th place to have values, I could incorporate regexp with insert into like this

insert into

    targettable tt (id, value, count)

    SELECT some_seq.nextval,value, count

    FROM source table

    WHERE

    regexp_like(value, '^[^;]+;[^;]*;[^;]*;[^;]+;')

so now my new table has a list of records that have values at the 1st and 4th position. It may look like this

1|A;B;C;D;|2

2|B;;;E;|1

3|A;D;;D|3

Next there are 2 things I want to do. 1. get rid of values other than 1st and 4th. 2.combine identical values and add up their count. For example, record 1 and 3 are the same, so I want to trim so they become A;D;, and then add their count, so 2+3=5. Now my new table looks like this

1|A;D;|5

2|B;E;|1

As long as I can somehow get to the final table from source table, I don't care about the steps. The intermediate table is not required, but it may help me achieve the final result. I'm not sure if I can go any further with Orcale though. If not, I'll have to move and process the records with Java. Bear in mind I have millions of records, so I would consider the Oracle method if it is possible.

edited Jan 2 at 10:01

asked Jan 2 at 9:40

Andy

847

Should your final entry show B;E; rather than A;E;? And how do you decide which ID value to keep when you consolidate duplicates - or is that arbitrary? More sample data, including the count values and rows that will be excluded, might be useful.

– Alex Poole
Jan 2 at 10:00

@AlexPoole yes B;E; sorry my bad. I corrected it. ID is not important, it can be any generated numbers.

– Andy
Jan 2 at 10:03

add a comment |

My source table looks like this:

id|value|count

Value is a String of values separated by semicolons(;). For example it may look like this

A;B;C;D;

Some may not have values at a certain position, like this

A;;;D;

insert into

    targettable tt (id, value, count)

    SELECT some_seq.nextval,value, count

    FROM source table

    WHERE

    regexp_like(value, '^[^;]+;[^;]*;[^;]*;[^;]+;')

so now my new table has a list of records that have values at the 1st and 4th position. It may look like this

1|A;B;C;D;|2

2|B;;;E;|1

3|A;D;;D|3

1|A;D;|5

2|B;E;|1

edited Jan 2 at 10:01

asked Jan 2 at 9:40

Andy

847

Should your final entry show B;E; rather than A;E;? And how do you decide which ID value to keep when you consolidate duplicates - or is that arbitrary? More sample data, including the count values and rows that will be excluded, might be useful.

– Alex Poole
Jan 2 at 10:00

@AlexPoole yes B;E; sorry my bad. I corrected it. ID is not important, it can be any generated numbers.

– Andy
Jan 2 at 10:03

add a comment |

My source table looks like this:

id|value|count

Value is a String of values separated by semicolons(;). For example it may look like this

A;B;C;D;

Some may not have values at a certain position, like this

A;;;D;

insert into

    targettable tt (id, value, count)

    SELECT some_seq.nextval,value, count

    FROM source table

    WHERE

    regexp_like(value, '^[^;]+;[^;]*;[^;]*;[^;]+;')

so now my new table has a list of records that have values at the 1st and 4th position. It may look like this

1|A;B;C;D;|2

2|B;;;E;|1

3|A;D;;D|3

1|A;D;|5

2|B;E;|1

edited Jan 2 at 10:01

asked Jan 2 at 9:40

Andy

847

My source table looks like this:

id|value|count

Value is a String of values separated by semicolons(;). For example it may look like this

A;B;C;D;

Some may not have values at a certain position, like this

A;;;D;

insert into

    targettable tt (id, value, count)

    SELECT some_seq.nextval,value, count

    FROM source table

    WHERE

    regexp_like(value, '^[^;]+;[^;]*;[^;]*;[^;]+;')

so now my new table has a list of records that have values at the 1st and 4th position. It may look like this

1|A;B;C;D;|2

2|B;;;E;|1

3|A;D;;D|3

1|A;D;|5

2|B;E;|1

oracle

edited Jan 2 at 10:01

asked Jan 2 at 9:40

Andy

847

edited Jan 2 at 10:01

asked Jan 2 at 9:40

Andy

847

edited Jan 2 at 10:01

asked Jan 2 at 9:40

Andy

847

asked Jan 2 at 9:40

Andy

847

asked Jan 2 at 9:40

Andy

847

Should your final entry show B;E; rather than A;E;? And how do you decide which ID value to keep when you consolidate duplicates - or is that arbitrary? More sample data, including the count values and rows that will be excluded, might be useful.

– Alex Poole
Jan 2 at 10:00

@AlexPoole yes B;E; sorry my bad. I corrected it. ID is not important, it can be any generated numbers.

– Andy
Jan 2 at 10:03

add a comment |

Should your final entry show B;E; rather than A;E;? And how do you decide which ID value to keep when you consolidate duplicates - or is that arbitrary? More sample data, including the count values and rows that will be excluded, might be useful.

– Alex Poole
Jan 2 at 10:00

@AlexPoole yes B;E; sorry my bad. I corrected it. ID is not important, it can be any generated numbers.

– Andy
Jan 2 at 10:03

Should your final entry show B;E; rather than A;E;? And how do you decide which ID value to keep when you consolidate duplicates - or is that arbitrary? More sample data, including the count values and rows that will be excluded, might be useful.

– Alex Poole
Jan 2 at 10:00

@AlexPoole yes B;E; sorry my bad. I corrected it. ID is not important, it can be any generated numbers.

– Andy
Jan 2 at 10:03

add a comment |

2 Answers
2

active

oldest

votes

You should be able to skip the intermediate table; just extract the 1st and 4th elements, using the regexp_substr() function, while checking that those are not null:

select regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) -- first position

  || ';' || regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) -- fourth position

  || ';' as value, -- if you want trailing semicolon

  count

from source

where regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) is not null

and regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) is not null;



VALUE                   COUNT

------------------ ----------

A;D;                        2

B;E;                        1

A;D;                        3

and then aggregate those results:

select value, sum(count) as count

from (

  select regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) -- first position

    || ';' || regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) -- fourth position

    || ';' as value, -- if you want trailing semicolon

    count

  from source

  where regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) is not null

  and regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) is not null

)

group by value;



VALUE                   COUNT

------------------ ----------

A;D;                        5

B;E;                        1

Then for your insert you can use that query, either with an auto-increment ID (12c+), or setting an ID from a sequence via a trigger, or possibly wrapped in another level of subquery to get the value explicitly:

insert into target (id, value, count)

select some_seq.nextval, value, count

from (

  select value, sum(count) as count

  from (

    select regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) -- first position

      || ';' || regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) -- fourth position

      || ';' as value, -- if you want trailing semicolon

      count

    from source

    where regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) is not null

    and regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) is not null

  )

  group by value

);

If you're creating a new sequence to do that, so they start from 1, you can use rownum or row_number() instead.

Incidentally, using a keyword or a function name like count as a column name is confusing (sum(count) !?); those might not be your real names though.

answered Jan 2 at 10:14

Alex Poole

134k6108181

I really like this method. Now I just need to find a way to dynamically create the sql with java, but I think I'll manage that. Also, if the source table had 5 million rows, how much time do you think the sql will take to execute? Do you think it's pretty efficient? I only have a small test database so I can't verify this atm, but I think it will still be faster than anything I write. Also, your step by step explanation is fantastic. Thank you!

– Andy
Jan 3 at 2:03

I should probably rename count. You are right, it is confusing now that you mentioned it.

– Andy
Jan 3 at 2:16

1

It won't be very efficient, but you're hampered by your data model - storing multiple values in a single column like that isn't what you're supposed to do. (You should be able to do something a bit faster with instr/substr instead as regular expressions are slower.) If this is something you'll need to do often and/or with different positions then you'd be better off redesigning your table; or if you can't, create a materialized view of the deconstructed values, or maybe add function-based indexes for each extracted element.

– Alex Poole
Jan 3 at 9:09

It's pretty much a one-time thing so as long as it doesn't take days, I'm okay with that. I'll test it out next week. Thank you again.

– Andy
Jan 4 at 3:15

add a comment |

I would use regexp_replace to remove the 2nd and 3rd parts of the string, combined with an aggregate query to get the total count, like :

SELECT

    regexp_replace(value, '^[^;]+;([^;]*;[^;]*;)[^;]+;', ''),

    SUM(count)

FROM source table

WHERE

    regexp_like(value, '^[^;]+;[^;]*;[^;]*;[^;]+;')

GROUP BY

    regexp_replace(value, '^[^;]+;([^;]*;[^;]*;)[^;]+;', '')

answered Jan 2 at 10:26

GMB

18.6k31028

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54004070%2foracle-is-it-possible-to-trim-a-string-and-count-the-number-of-occurances-and%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

You should be able to skip the intermediate table; just extract the 1st and 4th elements, using the regexp_substr() function, while checking that those are not null:

select regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) -- first position

  || ';' || regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) -- fourth position

  || ';' as value, -- if you want trailing semicolon

  count

from source

where regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) is not null

and regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) is not null;



VALUE                   COUNT

------------------ ----------

A;D;                        2

B;E;                        1

A;D;                        3

and then aggregate those results:

select value, sum(count) as count

from (

  select regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) -- first position

    || ';' || regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) -- fourth position

    || ';' as value, -- if you want trailing semicolon

    count

  from source

  where regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) is not null

  and regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) is not null

)

group by value;



VALUE                   COUNT

------------------ ----------

A;D;                        5

B;E;                        1

insert into target (id, value, count)

select some_seq.nextval, value, count

from (

  select value, sum(count) as count

  from (

    select regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) -- first position

      || ';' || regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) -- fourth position

      || ';' as value, -- if you want trailing semicolon

      count

    from source

    where regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) is not null

    and regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) is not null

  )

  group by value

);

If you're creating a new sequence to do that, so they start from 1, you can use rownum or row_number() instead.

Incidentally, using a keyword or a function name like count as a column name is confusing (sum(count) !?); those might not be your real names though.

answered Jan 2 at 10:14

Alex Poole

134k6108181

I really like this method. Now I just need to find a way to dynamically create the sql with java, but I think I'll manage that. Also, if the source table had 5 million rows, how much time do you think the sql will take to execute? Do you think it's pretty efficient? I only have a small test database so I can't verify this atm, but I think it will still be faster than anything I write. Also, your step by step explanation is fantastic. Thank you!

– Andy
Jan 3 at 2:03

I should probably rename count. You are right, it is confusing now that you mentioned it.

– Andy
Jan 3 at 2:16

1

It won't be very efficient, but you're hampered by your data model - storing multiple values in a single column like that isn't what you're supposed to do. (You should be able to do something a bit faster with instr/substr instead as regular expressions are slower.) If this is something you'll need to do often and/or with different positions then you'd be better off redesigning your table; or if you can't, create a materialized view of the deconstructed values, or maybe add function-based indexes for each extracted element.

– Alex Poole
Jan 3 at 9:09

It's pretty much a one-time thing so as long as it doesn't take days, I'm okay with that. I'll test it out next week. Thank you again.

– Andy
Jan 4 at 3:15

add a comment |

You should be able to skip the intermediate table; just extract the 1st and 4th elements, using the regexp_substr() function, while checking that those are not null:

select regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) -- first position

  || ';' || regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) -- fourth position

  || ';' as value, -- if you want trailing semicolon

  count

from source

where regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) is not null

and regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) is not null;



VALUE                   COUNT

------------------ ----------

A;D;                        2

B;E;                        1

A;D;                        3

and then aggregate those results:

select value, sum(count) as count

from (

  select regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) -- first position

    || ';' || regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) -- fourth position

    || ';' as value, -- if you want trailing semicolon

    count

  from source

  where regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) is not null

  and regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) is not null

)

group by value;



VALUE                   COUNT

------------------ ----------

A;D;                        5

B;E;                        1

insert into target (id, value, count)

select some_seq.nextval, value, count

from (

  select value, sum(count) as count

  from (

    select regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) -- first position

      || ';' || regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) -- fourth position

      || ';' as value, -- if you want trailing semicolon

      count

    from source

    where regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) is not null

    and regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) is not null

  )

  group by value

);

If you're creating a new sequence to do that, so they start from 1, you can use rownum or row_number() instead.

Incidentally, using a keyword or a function name like count as a column name is confusing (sum(count) !?); those might not be your real names though.

answered Jan 2 at 10:14

Alex Poole

134k6108181

I really like this method. Now I just need to find a way to dynamically create the sql with java, but I think I'll manage that. Also, if the source table had 5 million rows, how much time do you think the sql will take to execute? Do you think it's pretty efficient? I only have a small test database so I can't verify this atm, but I think it will still be faster than anything I write. Also, your step by step explanation is fantastic. Thank you!

– Andy
Jan 3 at 2:03

I should probably rename count. You are right, it is confusing now that you mentioned it.

– Andy
Jan 3 at 2:16

1

It won't be very efficient, but you're hampered by your data model - storing multiple values in a single column like that isn't what you're supposed to do. (You should be able to do something a bit faster with instr/substr instead as regular expressions are slower.) If this is something you'll need to do often and/or with different positions then you'd be better off redesigning your table; or if you can't, create a materialized view of the deconstructed values, or maybe add function-based indexes for each extracted element.

– Alex Poole
Jan 3 at 9:09

It's pretty much a one-time thing so as long as it doesn't take days, I'm okay with that. I'll test it out next week. Thank you again.

– Andy
Jan 4 at 3:15

add a comment |

You should be able to skip the intermediate table; just extract the 1st and 4th elements, using the regexp_substr() function, while checking that those are not null:

select regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) -- first position

  || ';' || regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) -- fourth position

  || ';' as value, -- if you want trailing semicolon

  count

from source

where regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) is not null

and regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) is not null;



VALUE                   COUNT

------------------ ----------

A;D;                        2

B;E;                        1

A;D;                        3

and then aggregate those results:

select value, sum(count) as count

from (

  select regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) -- first position

    || ';' || regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) -- fourth position

    || ';' as value, -- if you want trailing semicolon

    count

  from source

  where regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) is not null

  and regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) is not null

)

group by value;



VALUE                   COUNT

------------------ ----------

A;D;                        5

B;E;                        1

insert into target (id, value, count)

select some_seq.nextval, value, count

from (

  select value, sum(count) as count

  from (

    select regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) -- first position

      || ';' || regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) -- fourth position

      || ';' as value, -- if you want trailing semicolon

      count

    from source

    where regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) is not null

    and regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) is not null

  )

  group by value

);

If you're creating a new sequence to do that, so they start from 1, you can use rownum or row_number() instead.

Incidentally, using a keyword or a function name like count as a column name is confusing (sum(count) !?); those might not be your real names though.

answered Jan 2 at 10:14

Alex Poole

134k6108181

You should be able to skip the intermediate table; just extract the 1st and 4th elements, using the regexp_substr() function, while checking that those are not null:

select regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) -- first position

  || ';' || regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) -- fourth position

  || ';' as value, -- if you want trailing semicolon

  count

from source

where regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) is not null

and regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) is not null;



VALUE                   COUNT

------------------ ----------

A;D;                        2

B;E;                        1

A;D;                        3

and then aggregate those results:

select value, sum(count) as count

from (

  select regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) -- first position

    || ';' || regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) -- fourth position

    || ';' as value, -- if you want trailing semicolon

    count

  from source

  where regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) is not null

  and regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) is not null

)

group by value;



VALUE                   COUNT

------------------ ----------

A;D;                        5

B;E;                        1

insert into target (id, value, count)

select some_seq.nextval, value, count

from (

  select value, sum(count) as count

  from (

    select regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) -- first position

      || ';' || regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) -- fourth position

      || ';' as value, -- if you want trailing semicolon

      count

    from source

    where regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) is not null

    and regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) is not null

  )

  group by value

);

If you're creating a new sequence to do that, so they start from 1, you can use rownum or row_number() instead.

Incidentally, using a keyword or a function name like count as a column name is confusing (sum(count) !?); those might not be your real names though.

answered Jan 2 at 10:14

Alex Poole

134k6108181

answered Jan 2 at 10:14

Alex Poole

134k6108181

answered Jan 2 at 10:14

Alex Poole

134k6108181

answered Jan 2 at 10:14

Alex Poole

134k6108181

I really like this method. Now I just need to find a way to dynamically create the sql with java, but I think I'll manage that. Also, if the source table had 5 million rows, how much time do you think the sql will take to execute? Do you think it's pretty efficient? I only have a small test database so I can't verify this atm, but I think it will still be faster than anything I write. Also, your step by step explanation is fantastic. Thank you!

– Andy
Jan 3 at 2:03

I should probably rename count. You are right, it is confusing now that you mentioned it.

– Andy
Jan 3 at 2:16

1

It won't be very efficient, but you're hampered by your data model - storing multiple values in a single column like that isn't what you're supposed to do. (You should be able to do something a bit faster with instr/substr instead as regular expressions are slower.) If this is something you'll need to do often and/or with different positions then you'd be better off redesigning your table; or if you can't, create a materialized view of the deconstructed values, or maybe add function-based indexes for each extracted element.

– Alex Poole
Jan 3 at 9:09

It's pretty much a one-time thing so as long as it doesn't take days, I'm okay with that. I'll test it out next week. Thank you again.

– Andy
Jan 4 at 3:15

add a comment |

I really like this method. Now I just need to find a way to dynamically create the sql with java, but I think I'll manage that. Also, if the source table had 5 million rows, how much time do you think the sql will take to execute? Do you think it's pretty efficient? I only have a small test database so I can't verify this atm, but I think it will still be faster than anything I write. Also, your step by step explanation is fantastic. Thank you!

– Andy
Jan 3 at 2:03

I should probably rename count. You are right, it is confusing now that you mentioned it.

– Andy
Jan 3 at 2:16

1

It won't be very efficient, but you're hampered by your data model - storing multiple values in a single column like that isn't what you're supposed to do. (You should be able to do something a bit faster with instr/substr instead as regular expressions are slower.) If this is something you'll need to do often and/or with different positions then you'd be better off redesigning your table; or if you can't, create a materialized view of the deconstructed values, or maybe add function-based indexes for each extracted element.

– Alex Poole
Jan 3 at 9:09

It's pretty much a one-time thing so as long as it doesn't take days, I'm okay with that. I'll test it out next week. Thank you again.

– Andy
Jan 4 at 3:15

I really like this method. Now I just need to find a way to dynamically create the sql with java, but I think I'll manage that. Also, if the source table had 5 million rows, how much time do you think the sql will take to execute? Do you think it's pretty efficient? I only have a small test database so I can't verify this atm, but I think it will still be faster than anything I write. Also, your step by step explanation is fantastic. Thank you!

– Andy
Jan 3 at 2:03

I should probably rename count. You are right, it is confusing now that you mentioned it.

– Andy
Jan 3 at 2:16

It won't be very efficient, but you're hampered by your data model - storing multiple values in a single column like that isn't what you're supposed to do. (You should be able to do something a bit faster with instr/substr instead as regular expressions are slower.) If this is something you'll need to do often and/or with different positions then you'd be better off redesigning your table; or if you can't, create a materialized view of the deconstructed values, or maybe add function-based indexes for each extracted element.

– Alex Poole
Jan 3 at 9:09

It's pretty much a one-time thing so as long as it doesn't take days, I'm okay with that. I'll test it out next week. Thank you again.

– Andy
Jan 4 at 3:15

add a comment |

I would use regexp_replace to remove the 2nd and 3rd parts of the string, combined with an aggregate query to get the total count, like :

SELECT

    regexp_replace(value, '^[^;]+;([^;]*;[^;]*;)[^;]+;', ''),

    SUM(count)

FROM source table

WHERE

    regexp_like(value, '^[^;]+;[^;]*;[^;]*;[^;]+;')

GROUP BY

    regexp_replace(value, '^[^;]+;([^;]*;[^;]*;)[^;]+;', '')

answered Jan 2 at 10:26

GMB

18.6k31028

add a comment |

I would use regexp_replace to remove the 2nd and 3rd parts of the string, combined with an aggregate query to get the total count, like :

SELECT

    regexp_replace(value, '^[^;]+;([^;]*;[^;]*;)[^;]+;', ''),

    SUM(count)

FROM source table

WHERE

    regexp_like(value, '^[^;]+;[^;]*;[^;]*;[^;]+;')

GROUP BY

    regexp_replace(value, '^[^;]+;([^;]*;[^;]*;)[^;]+;', '')

answered Jan 2 at 10:26

GMB

18.6k31028

add a comment |

I would use regexp_replace to remove the 2nd and 3rd parts of the string, combined with an aggregate query to get the total count, like :

SELECT

    regexp_replace(value, '^[^;]+;([^;]*;[^;]*;)[^;]+;', ''),

    SUM(count)

FROM source table

WHERE

    regexp_like(value, '^[^;]+;[^;]*;[^;]*;[^;]+;')

GROUP BY

    regexp_replace(value, '^[^;]+;([^;]*;[^;]*;)[^;]+;', '')

answered Jan 2 at 10:26

GMB

18.6k31028

I would use regexp_replace to remove the 2nd and 3rd parts of the string, combined with an aggregate query to get the total count, like :

SELECT

    regexp_replace(value, '^[^;]+;([^;]*;[^;]*;)[^;]+;', ''),

    SUM(count)

FROM source table

WHERE

    regexp_like(value, '^[^;]+;[^;]*;[^;]*;[^;]+;')

GROUP BY

    regexp_replace(value, '^[^;]+;([^;]*;[^;]*;)[^;]+;', '')

answered Jan 2 at 10:26

GMB

18.6k31028

answered Jan 2 at 10:26

GMB

18.6k31028

answered Jan 2 at 10:26

GMB

18.6k31028

answered Jan 2 at 10:26

GMB

18.6k31028

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu