Oracle: is it possible to trim a string and count the number of occurances, and insert to a new table?
My source table looks like this:
id|value|count
Value is a String of values separated by semicolons(;). For example it may look like this
A;B;C;D;
Some may not have values at a certain position, like this
A;;;D;
First, I've selectively moved records to a new table(targettable) based on positions with values using regexp. I achieved this by using [^;]+;
for having some value between the semicolons, and [^;]*;
for those positions I don't care about. For example, if I wanted the 1st and 4th place to have values, I could incorporate regexp with insert into like this
insert into
targettable tt (id, value, count)
SELECT some_seq.nextval,value, count
FROM source table
WHERE
regexp_like(value, '^[^;]+;[^;]*;[^;]*;[^;]+;')
so now my new table has a list of records that have values at the 1st and 4th position. It may look like this
1|A;B;C;D;|2
2|B;;;E;|1
3|A;D;;D|3
Next there are 2 things I want to do. 1. get rid of values other than 1st and 4th. 2.combine identical values and add up their count. For example, record 1 and 3 are the same, so I want to trim so they become A;D;, and then add their count, so 2+3=5. Now my new table looks like this
1|A;D;|5
2|B;E;|1
As long as I can somehow get to the final table from source table, I don't care about the steps. The intermediate table is not required, but it may help me achieve the final result. I'm not sure if I can go any further with Orcale though. If not, I'll have to move and process the records with Java. Bear in mind I have millions of records, so I would consider the Oracle method if it is possible.
oracle
add a comment |
My source table looks like this:
id|value|count
Value is a String of values separated by semicolons(;). For example it may look like this
A;B;C;D;
Some may not have values at a certain position, like this
A;;;D;
First, I've selectively moved records to a new table(targettable) based on positions with values using regexp. I achieved this by using [^;]+;
for having some value between the semicolons, and [^;]*;
for those positions I don't care about. For example, if I wanted the 1st and 4th place to have values, I could incorporate regexp with insert into like this
insert into
targettable tt (id, value, count)
SELECT some_seq.nextval,value, count
FROM source table
WHERE
regexp_like(value, '^[^;]+;[^;]*;[^;]*;[^;]+;')
so now my new table has a list of records that have values at the 1st and 4th position. It may look like this
1|A;B;C;D;|2
2|B;;;E;|1
3|A;D;;D|3
Next there are 2 things I want to do. 1. get rid of values other than 1st and 4th. 2.combine identical values and add up their count. For example, record 1 and 3 are the same, so I want to trim so they become A;D;, and then add their count, so 2+3=5. Now my new table looks like this
1|A;D;|5
2|B;E;|1
As long as I can somehow get to the final table from source table, I don't care about the steps. The intermediate table is not required, but it may help me achieve the final result. I'm not sure if I can go any further with Orcale though. If not, I'll have to move and process the records with Java. Bear in mind I have millions of records, so I would consider the Oracle method if it is possible.
oracle
Should your final entry showB;E;
rather thanA;E;
? And how do you decide which ID value to keep when you consolidate duplicates - or is that arbitrary? More sample data, including the count values and rows that will be excluded, might be useful.
– Alex Poole
Jan 2 at 10:00
@AlexPoole yes B;E; sorry my bad. I corrected it. ID is not important, it can be any generated numbers.
– Andy
Jan 2 at 10:03
add a comment |
My source table looks like this:
id|value|count
Value is a String of values separated by semicolons(;). For example it may look like this
A;B;C;D;
Some may not have values at a certain position, like this
A;;;D;
First, I've selectively moved records to a new table(targettable) based on positions with values using regexp. I achieved this by using [^;]+;
for having some value between the semicolons, and [^;]*;
for those positions I don't care about. For example, if I wanted the 1st and 4th place to have values, I could incorporate regexp with insert into like this
insert into
targettable tt (id, value, count)
SELECT some_seq.nextval,value, count
FROM source table
WHERE
regexp_like(value, '^[^;]+;[^;]*;[^;]*;[^;]+;')
so now my new table has a list of records that have values at the 1st and 4th position. It may look like this
1|A;B;C;D;|2
2|B;;;E;|1
3|A;D;;D|3
Next there are 2 things I want to do. 1. get rid of values other than 1st and 4th. 2.combine identical values and add up their count. For example, record 1 and 3 are the same, so I want to trim so they become A;D;, and then add their count, so 2+3=5. Now my new table looks like this
1|A;D;|5
2|B;E;|1
As long as I can somehow get to the final table from source table, I don't care about the steps. The intermediate table is not required, but it may help me achieve the final result. I'm not sure if I can go any further with Orcale though. If not, I'll have to move and process the records with Java. Bear in mind I have millions of records, so I would consider the Oracle method if it is possible.
oracle
My source table looks like this:
id|value|count
Value is a String of values separated by semicolons(;). For example it may look like this
A;B;C;D;
Some may not have values at a certain position, like this
A;;;D;
First, I've selectively moved records to a new table(targettable) based on positions with values using regexp. I achieved this by using [^;]+;
for having some value between the semicolons, and [^;]*;
for those positions I don't care about. For example, if I wanted the 1st and 4th place to have values, I could incorporate regexp with insert into like this
insert into
targettable tt (id, value, count)
SELECT some_seq.nextval,value, count
FROM source table
WHERE
regexp_like(value, '^[^;]+;[^;]*;[^;]*;[^;]+;')
so now my new table has a list of records that have values at the 1st and 4th position. It may look like this
1|A;B;C;D;|2
2|B;;;E;|1
3|A;D;;D|3
Next there are 2 things I want to do. 1. get rid of values other than 1st and 4th. 2.combine identical values and add up their count. For example, record 1 and 3 are the same, so I want to trim so they become A;D;, and then add their count, so 2+3=5. Now my new table looks like this
1|A;D;|5
2|B;E;|1
As long as I can somehow get to the final table from source table, I don't care about the steps. The intermediate table is not required, but it may help me achieve the final result. I'm not sure if I can go any further with Orcale though. If not, I'll have to move and process the records with Java. Bear in mind I have millions of records, so I would consider the Oracle method if it is possible.
oracle
oracle
edited Jan 2 at 10:01
Andy
asked Jan 2 at 9:40
AndyAndy
847
847
Should your final entry showB;E;
rather thanA;E;
? And how do you decide which ID value to keep when you consolidate duplicates - or is that arbitrary? More sample data, including the count values and rows that will be excluded, might be useful.
– Alex Poole
Jan 2 at 10:00
@AlexPoole yes B;E; sorry my bad. I corrected it. ID is not important, it can be any generated numbers.
– Andy
Jan 2 at 10:03
add a comment |
Should your final entry showB;E;
rather thanA;E;
? And how do you decide which ID value to keep when you consolidate duplicates - or is that arbitrary? More sample data, including the count values and rows that will be excluded, might be useful.
– Alex Poole
Jan 2 at 10:00
@AlexPoole yes B;E; sorry my bad. I corrected it. ID is not important, it can be any generated numbers.
– Andy
Jan 2 at 10:03
Should your final entry show
B;E;
rather than A;E;
? And how do you decide which ID value to keep when you consolidate duplicates - or is that arbitrary? More sample data, including the count values and rows that will be excluded, might be useful.– Alex Poole
Jan 2 at 10:00
Should your final entry show
B;E;
rather than A;E;
? And how do you decide which ID value to keep when you consolidate duplicates - or is that arbitrary? More sample data, including the count values and rows that will be excluded, might be useful.– Alex Poole
Jan 2 at 10:00
@AlexPoole yes B;E; sorry my bad. I corrected it. ID is not important, it can be any generated numbers.
– Andy
Jan 2 at 10:03
@AlexPoole yes B;E; sorry my bad. I corrected it. ID is not important, it can be any generated numbers.
– Andy
Jan 2 at 10:03
add a comment |
2 Answers
2
active
oldest
votes
You should be able to skip the intermediate table; just extract the 1st and 4th elements, using the regexp_substr()
function, while checking that those are not null:
select regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) -- first position
|| ';' || regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) -- fourth position
|| ';' as value, -- if you want trailing semicolon
count
from source
where regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) is not null
and regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) is not null;
VALUE COUNT
------------------ ----------
A;D; 2
B;E; 1
A;D; 3
and then aggregate those results:
select value, sum(count) as count
from (
select regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) -- first position
|| ';' || regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) -- fourth position
|| ';' as value, -- if you want trailing semicolon
count
from source
where regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) is not null
and regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) is not null
)
group by value;
VALUE COUNT
------------------ ----------
A;D; 5
B;E; 1
Then for your insert you can use that query, either with an auto-increment ID (12c+), or setting an ID from a sequence via a trigger, or possibly wrapped in another level of subquery to get the value explicitly:
insert into target (id, value, count)
select some_seq.nextval, value, count
from (
select value, sum(count) as count
from (
select regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) -- first position
|| ';' || regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) -- fourth position
|| ';' as value, -- if you want trailing semicolon
count
from source
where regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) is not null
and regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) is not null
)
group by value
);
If you're creating a new sequence to do that, so they start from 1, you can use rownum
or row_number()
instead.
Incidentally, using a keyword or a function name like count
as a column name is confusing (sum(count)
!?); those might not be your real names though.
I really like this method. Now I just need to find a way to dynamically create the sql with java, but I think I'll manage that. Also, if the source table had 5 million rows, how much time do you think the sql will take to execute? Do you think it's pretty efficient? I only have a small test database so I can't verify this atm, but I think it will still be faster than anything I write. Also, your step by step explanation is fantastic. Thank you!
– Andy
Jan 3 at 2:03
I should probably rename count. You are right, it is confusing now that you mentioned it.
– Andy
Jan 3 at 2:16
1
It won't be very efficient, but you're hampered by your data model - storing multiple values in a single column like that isn't what you're supposed to do. (You should be able to do something a bit faster withinstr
/substr
instead as regular expressions are slower.) If this is something you'll need to do often and/or with different positions then you'd be better off redesigning your table; or if you can't, create a materialized view of the deconstructed values, or maybe add function-based indexes for each extracted element.
– Alex Poole
Jan 3 at 9:09
It's pretty much a one-time thing so as long as it doesn't take days, I'm okay with that. I'll test it out next week. Thank you again.
– Andy
Jan 4 at 3:15
add a comment |
I would use regexp_replace
to remove the 2nd and 3rd parts of the string, combined with an aggregate query to get the total count, like :
SELECT
regexp_replace(value, '^[^;]+;([^;]*;[^;]*;)[^;]+;', ''),
SUM(count)
FROM source table
WHERE
regexp_like(value, '^[^;]+;[^;]*;[^;]*;[^;]+;')
GROUP BY
regexp_replace(value, '^[^;]+;([^;]*;[^;]*;)[^;]+;', '')
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54004070%2foracle-is-it-possible-to-trim-a-string-and-count-the-number-of-occurances-and%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
You should be able to skip the intermediate table; just extract the 1st and 4th elements, using the regexp_substr()
function, while checking that those are not null:
select regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) -- first position
|| ';' || regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) -- fourth position
|| ';' as value, -- if you want trailing semicolon
count
from source
where regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) is not null
and regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) is not null;
VALUE COUNT
------------------ ----------
A;D; 2
B;E; 1
A;D; 3
and then aggregate those results:
select value, sum(count) as count
from (
select regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) -- first position
|| ';' || regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) -- fourth position
|| ';' as value, -- if you want trailing semicolon
count
from source
where regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) is not null
and regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) is not null
)
group by value;
VALUE COUNT
------------------ ----------
A;D; 5
B;E; 1
Then for your insert you can use that query, either with an auto-increment ID (12c+), or setting an ID from a sequence via a trigger, or possibly wrapped in another level of subquery to get the value explicitly:
insert into target (id, value, count)
select some_seq.nextval, value, count
from (
select value, sum(count) as count
from (
select regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) -- first position
|| ';' || regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) -- fourth position
|| ';' as value, -- if you want trailing semicolon
count
from source
where regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) is not null
and regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) is not null
)
group by value
);
If you're creating a new sequence to do that, so they start from 1, you can use rownum
or row_number()
instead.
Incidentally, using a keyword or a function name like count
as a column name is confusing (sum(count)
!?); those might not be your real names though.
I really like this method. Now I just need to find a way to dynamically create the sql with java, but I think I'll manage that. Also, if the source table had 5 million rows, how much time do you think the sql will take to execute? Do you think it's pretty efficient? I only have a small test database so I can't verify this atm, but I think it will still be faster than anything I write. Also, your step by step explanation is fantastic. Thank you!
– Andy
Jan 3 at 2:03
I should probably rename count. You are right, it is confusing now that you mentioned it.
– Andy
Jan 3 at 2:16
1
It won't be very efficient, but you're hampered by your data model - storing multiple values in a single column like that isn't what you're supposed to do. (You should be able to do something a bit faster withinstr
/substr
instead as regular expressions are slower.) If this is something you'll need to do often and/or with different positions then you'd be better off redesigning your table; or if you can't, create a materialized view of the deconstructed values, or maybe add function-based indexes for each extracted element.
– Alex Poole
Jan 3 at 9:09
It's pretty much a one-time thing so as long as it doesn't take days, I'm okay with that. I'll test it out next week. Thank you again.
– Andy
Jan 4 at 3:15
add a comment |
You should be able to skip the intermediate table; just extract the 1st and 4th elements, using the regexp_substr()
function, while checking that those are not null:
select regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) -- first position
|| ';' || regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) -- fourth position
|| ';' as value, -- if you want trailing semicolon
count
from source
where regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) is not null
and regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) is not null;
VALUE COUNT
------------------ ----------
A;D; 2
B;E; 1
A;D; 3
and then aggregate those results:
select value, sum(count) as count
from (
select regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) -- first position
|| ';' || regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) -- fourth position
|| ';' as value, -- if you want trailing semicolon
count
from source
where regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) is not null
and regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) is not null
)
group by value;
VALUE COUNT
------------------ ----------
A;D; 5
B;E; 1
Then for your insert you can use that query, either with an auto-increment ID (12c+), or setting an ID from a sequence via a trigger, or possibly wrapped in another level of subquery to get the value explicitly:
insert into target (id, value, count)
select some_seq.nextval, value, count
from (
select value, sum(count) as count
from (
select regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) -- first position
|| ';' || regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) -- fourth position
|| ';' as value, -- if you want trailing semicolon
count
from source
where regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) is not null
and regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) is not null
)
group by value
);
If you're creating a new sequence to do that, so they start from 1, you can use rownum
or row_number()
instead.
Incidentally, using a keyword or a function name like count
as a column name is confusing (sum(count)
!?); those might not be your real names though.
I really like this method. Now I just need to find a way to dynamically create the sql with java, but I think I'll manage that. Also, if the source table had 5 million rows, how much time do you think the sql will take to execute? Do you think it's pretty efficient? I only have a small test database so I can't verify this atm, but I think it will still be faster than anything I write. Also, your step by step explanation is fantastic. Thank you!
– Andy
Jan 3 at 2:03
I should probably rename count. You are right, it is confusing now that you mentioned it.
– Andy
Jan 3 at 2:16
1
It won't be very efficient, but you're hampered by your data model - storing multiple values in a single column like that isn't what you're supposed to do. (You should be able to do something a bit faster withinstr
/substr
instead as regular expressions are slower.) If this is something you'll need to do often and/or with different positions then you'd be better off redesigning your table; or if you can't, create a materialized view of the deconstructed values, or maybe add function-based indexes for each extracted element.
– Alex Poole
Jan 3 at 9:09
It's pretty much a one-time thing so as long as it doesn't take days, I'm okay with that. I'll test it out next week. Thank you again.
– Andy
Jan 4 at 3:15
add a comment |
You should be able to skip the intermediate table; just extract the 1st and 4th elements, using the regexp_substr()
function, while checking that those are not null:
select regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) -- first position
|| ';' || regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) -- fourth position
|| ';' as value, -- if you want trailing semicolon
count
from source
where regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) is not null
and regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) is not null;
VALUE COUNT
------------------ ----------
A;D; 2
B;E; 1
A;D; 3
and then aggregate those results:
select value, sum(count) as count
from (
select regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) -- first position
|| ';' || regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) -- fourth position
|| ';' as value, -- if you want trailing semicolon
count
from source
where regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) is not null
and regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) is not null
)
group by value;
VALUE COUNT
------------------ ----------
A;D; 5
B;E; 1
Then for your insert you can use that query, either with an auto-increment ID (12c+), or setting an ID from a sequence via a trigger, or possibly wrapped in another level of subquery to get the value explicitly:
insert into target (id, value, count)
select some_seq.nextval, value, count
from (
select value, sum(count) as count
from (
select regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) -- first position
|| ';' || regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) -- fourth position
|| ';' as value, -- if you want trailing semicolon
count
from source
where regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) is not null
and regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) is not null
)
group by value
);
If you're creating a new sequence to do that, so they start from 1, you can use rownum
or row_number()
instead.
Incidentally, using a keyword or a function name like count
as a column name is confusing (sum(count)
!?); those might not be your real names though.
You should be able to skip the intermediate table; just extract the 1st and 4th elements, using the regexp_substr()
function, while checking that those are not null:
select regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) -- first position
|| ';' || regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) -- fourth position
|| ';' as value, -- if you want trailing semicolon
count
from source
where regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) is not null
and regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) is not null;
VALUE COUNT
------------------ ----------
A;D; 2
B;E; 1
A;D; 3
and then aggregate those results:
select value, sum(count) as count
from (
select regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) -- first position
|| ';' || regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) -- fourth position
|| ';' as value, -- if you want trailing semicolon
count
from source
where regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) is not null
and regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) is not null
)
group by value;
VALUE COUNT
------------------ ----------
A;D; 5
B;E; 1
Then for your insert you can use that query, either with an auto-increment ID (12c+), or setting an ID from a sequence via a trigger, or possibly wrapped in another level of subquery to get the value explicitly:
insert into target (id, value, count)
select some_seq.nextval, value, count
from (
select value, sum(count) as count
from (
select regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) -- first position
|| ';' || regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) -- fourth position
|| ';' as value, -- if you want trailing semicolon
count
from source
where regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) is not null
and regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) is not null
)
group by value
);
If you're creating a new sequence to do that, so they start from 1, you can use rownum
or row_number()
instead.
Incidentally, using a keyword or a function name like count
as a column name is confusing (sum(count)
!?); those might not be your real names though.
answered Jan 2 at 10:14
Alex PooleAlex Poole
134k6108181
134k6108181
I really like this method. Now I just need to find a way to dynamically create the sql with java, but I think I'll manage that. Also, if the source table had 5 million rows, how much time do you think the sql will take to execute? Do you think it's pretty efficient? I only have a small test database so I can't verify this atm, but I think it will still be faster than anything I write. Also, your step by step explanation is fantastic. Thank you!
– Andy
Jan 3 at 2:03
I should probably rename count. You are right, it is confusing now that you mentioned it.
– Andy
Jan 3 at 2:16
1
It won't be very efficient, but you're hampered by your data model - storing multiple values in a single column like that isn't what you're supposed to do. (You should be able to do something a bit faster withinstr
/substr
instead as regular expressions are slower.) If this is something you'll need to do often and/or with different positions then you'd be better off redesigning your table; or if you can't, create a materialized view of the deconstructed values, or maybe add function-based indexes for each extracted element.
– Alex Poole
Jan 3 at 9:09
It's pretty much a one-time thing so as long as it doesn't take days, I'm okay with that. I'll test it out next week. Thank you again.
– Andy
Jan 4 at 3:15
add a comment |
I really like this method. Now I just need to find a way to dynamically create the sql with java, but I think I'll manage that. Also, if the source table had 5 million rows, how much time do you think the sql will take to execute? Do you think it's pretty efficient? I only have a small test database so I can't verify this atm, but I think it will still be faster than anything I write. Also, your step by step explanation is fantastic. Thank you!
– Andy
Jan 3 at 2:03
I should probably rename count. You are right, it is confusing now that you mentioned it.
– Andy
Jan 3 at 2:16
1
It won't be very efficient, but you're hampered by your data model - storing multiple values in a single column like that isn't what you're supposed to do. (You should be able to do something a bit faster withinstr
/substr
instead as regular expressions are slower.) If this is something you'll need to do often and/or with different positions then you'd be better off redesigning your table; or if you can't, create a materialized view of the deconstructed values, or maybe add function-based indexes for each extracted element.
– Alex Poole
Jan 3 at 9:09
It's pretty much a one-time thing so as long as it doesn't take days, I'm okay with that. I'll test it out next week. Thank you again.
– Andy
Jan 4 at 3:15
I really like this method. Now I just need to find a way to dynamically create the sql with java, but I think I'll manage that. Also, if the source table had 5 million rows, how much time do you think the sql will take to execute? Do you think it's pretty efficient? I only have a small test database so I can't verify this atm, but I think it will still be faster than anything I write. Also, your step by step explanation is fantastic. Thank you!
– Andy
Jan 3 at 2:03
I really like this method. Now I just need to find a way to dynamically create the sql with java, but I think I'll manage that. Also, if the source table had 5 million rows, how much time do you think the sql will take to execute? Do you think it's pretty efficient? I only have a small test database so I can't verify this atm, but I think it will still be faster than anything I write. Also, your step by step explanation is fantastic. Thank you!
– Andy
Jan 3 at 2:03
I should probably rename count. You are right, it is confusing now that you mentioned it.
– Andy
Jan 3 at 2:16
I should probably rename count. You are right, it is confusing now that you mentioned it.
– Andy
Jan 3 at 2:16
1
1
It won't be very efficient, but you're hampered by your data model - storing multiple values in a single column like that isn't what you're supposed to do. (You should be able to do something a bit faster with
instr
/substr
instead as regular expressions are slower.) If this is something you'll need to do often and/or with different positions then you'd be better off redesigning your table; or if you can't, create a materialized view of the deconstructed values, or maybe add function-based indexes for each extracted element.– Alex Poole
Jan 3 at 9:09
It won't be very efficient, but you're hampered by your data model - storing multiple values in a single column like that isn't what you're supposed to do. (You should be able to do something a bit faster with
instr
/substr
instead as regular expressions are slower.) If this is something you'll need to do often and/or with different positions then you'd be better off redesigning your table; or if you can't, create a materialized view of the deconstructed values, or maybe add function-based indexes for each extracted element.– Alex Poole
Jan 3 at 9:09
It's pretty much a one-time thing so as long as it doesn't take days, I'm okay with that. I'll test it out next week. Thank you again.
– Andy
Jan 4 at 3:15
It's pretty much a one-time thing so as long as it doesn't take days, I'm okay with that. I'll test it out next week. Thank you again.
– Andy
Jan 4 at 3:15
add a comment |
I would use regexp_replace
to remove the 2nd and 3rd parts of the string, combined with an aggregate query to get the total count, like :
SELECT
regexp_replace(value, '^[^;]+;([^;]*;[^;]*;)[^;]+;', ''),
SUM(count)
FROM source table
WHERE
regexp_like(value, '^[^;]+;[^;]*;[^;]*;[^;]+;')
GROUP BY
regexp_replace(value, '^[^;]+;([^;]*;[^;]*;)[^;]+;', '')
add a comment |
I would use regexp_replace
to remove the 2nd and 3rd parts of the string, combined with an aggregate query to get the total count, like :
SELECT
regexp_replace(value, '^[^;]+;([^;]*;[^;]*;)[^;]+;', ''),
SUM(count)
FROM source table
WHERE
regexp_like(value, '^[^;]+;[^;]*;[^;]*;[^;]+;')
GROUP BY
regexp_replace(value, '^[^;]+;([^;]*;[^;]*;)[^;]+;', '')
add a comment |
I would use regexp_replace
to remove the 2nd and 3rd parts of the string, combined with an aggregate query to get the total count, like :
SELECT
regexp_replace(value, '^[^;]+;([^;]*;[^;]*;)[^;]+;', ''),
SUM(count)
FROM source table
WHERE
regexp_like(value, '^[^;]+;[^;]*;[^;]*;[^;]+;')
GROUP BY
regexp_replace(value, '^[^;]+;([^;]*;[^;]*;)[^;]+;', '')
I would use regexp_replace
to remove the 2nd and 3rd parts of the string, combined with an aggregate query to get the total count, like :
SELECT
regexp_replace(value, '^[^;]+;([^;]*;[^;]*;)[^;]+;', ''),
SUM(count)
FROM source table
WHERE
regexp_like(value, '^[^;]+;[^;]*;[^;]*;[^;]+;')
GROUP BY
regexp_replace(value, '^[^;]+;([^;]*;[^;]*;)[^;]+;', '')
answered Jan 2 at 10:26


GMBGMB
18.6k31028
18.6k31028
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54004070%2foracle-is-it-possible-to-trim-a-string-and-count-the-number-of-occurances-and%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Should your final entry show
B;E;
rather thanA;E;
? And how do you decide which ID value to keep when you consolidate duplicates - or is that arbitrary? More sample data, including the count values and rows that will be excluded, might be useful.– Alex Poole
Jan 2 at 10:00
@AlexPoole yes B;E; sorry my bad. I corrected it. ID is not important, it can be any generated numbers.
– Andy
Jan 2 at 10:03