Partitioning data for a timestamp query
I have paritioned data on s3 I would like to access via spectrum. The current format file structure is similar to: s3://bucket/dir/year=2018/month=11/day=19/hour=12/file.parquet
I partitioned the data using glue, by parsing a field I use for timestamps, ts
. Most queries I will do will be on the ts
field, as they are timestamp range queries that are more granular than daily(may span multiple days, or less than one day, but time is often involved.
How would I go about creating hourly(preferred, daily would work if needed) partitions on my data so when I query the ts
(or another timestamp) field, it will access the partitions correctly. If needed I can recreate my data with different paritions. Most examples/docs just bucket data daily, and use the date field in the query.
I would be happy to provide more information if needed.
Thank you!
Example query would be something like:
SELECT * FROM spectrum.data
WHERE ts between '2018-11-19 17:30:00' AND '2018-11-20 04:45:00'
amazon-redshift amazon-redshift-spectrum
add a comment |
I have paritioned data on s3 I would like to access via spectrum. The current format file structure is similar to: s3://bucket/dir/year=2018/month=11/day=19/hour=12/file.parquet
I partitioned the data using glue, by parsing a field I use for timestamps, ts
. Most queries I will do will be on the ts
field, as they are timestamp range queries that are more granular than daily(may span multiple days, or less than one day, but time is often involved.
How would I go about creating hourly(preferred, daily would work if needed) partitions on my data so when I query the ts
(or another timestamp) field, it will access the partitions correctly. If needed I can recreate my data with different paritions. Most examples/docs just bucket data daily, and use the date field in the query.
I would be happy to provide more information if needed.
Thank you!
Example query would be something like:
SELECT * FROM spectrum.data
WHERE ts between '2018-11-19 17:30:00' AND '2018-11-20 04:45:00'
amazon-redshift amazon-redshift-spectrum
please share your example query?
– Jon Scott
Nov 20 '18 at 18:23
Updated the original post, let me know if anything is unclear.
– Eumcoz
Nov 20 '18 at 19:11
add a comment |
I have paritioned data on s3 I would like to access via spectrum. The current format file structure is similar to: s3://bucket/dir/year=2018/month=11/day=19/hour=12/file.parquet
I partitioned the data using glue, by parsing a field I use for timestamps, ts
. Most queries I will do will be on the ts
field, as they are timestamp range queries that are more granular than daily(may span multiple days, or less than one day, but time is often involved.
How would I go about creating hourly(preferred, daily would work if needed) partitions on my data so when I query the ts
(or another timestamp) field, it will access the partitions correctly. If needed I can recreate my data with different paritions. Most examples/docs just bucket data daily, and use the date field in the query.
I would be happy to provide more information if needed.
Thank you!
Example query would be something like:
SELECT * FROM spectrum.data
WHERE ts between '2018-11-19 17:30:00' AND '2018-11-20 04:45:00'
amazon-redshift amazon-redshift-spectrum
I have paritioned data on s3 I would like to access via spectrum. The current format file structure is similar to: s3://bucket/dir/year=2018/month=11/day=19/hour=12/file.parquet
I partitioned the data using glue, by parsing a field I use for timestamps, ts
. Most queries I will do will be on the ts
field, as they are timestamp range queries that are more granular than daily(may span multiple days, or less than one day, but time is often involved.
How would I go about creating hourly(preferred, daily would work if needed) partitions on my data so when I query the ts
(or another timestamp) field, it will access the partitions correctly. If needed I can recreate my data with different paritions. Most examples/docs just bucket data daily, and use the date field in the query.
I would be happy to provide more information if needed.
Thank you!
Example query would be something like:
SELECT * FROM spectrum.data
WHERE ts between '2018-11-19 17:30:00' AND '2018-11-20 04:45:00'
amazon-redshift amazon-redshift-spectrum
amazon-redshift amazon-redshift-spectrum
edited Nov 20 '18 at 19:11
Eumcoz
asked Nov 19 '18 at 17:44
EumcozEumcoz
1,5351229
1,5351229
please share your example query?
– Jon Scott
Nov 20 '18 at 18:23
Updated the original post, let me know if anything is unclear.
– Eumcoz
Nov 20 '18 at 19:11
add a comment |
please share your example query?
– Jon Scott
Nov 20 '18 at 18:23
Updated the original post, let me know if anything is unclear.
– Eumcoz
Nov 20 '18 at 19:11
please share your example query?
– Jon Scott
Nov 20 '18 at 18:23
please share your example query?
– Jon Scott
Nov 20 '18 at 18:23
Updated the original post, let me know if anything is unclear.
– Eumcoz
Nov 20 '18 at 19:11
Updated the original post, let me know if anything is unclear.
– Eumcoz
Nov 20 '18 at 19:11
add a comment |
1 Answer
1
active
oldest
votes
Spectrum is not so intuitive. You probably will need to convert timestamp to year, month, day ...
And than do something like WHERE (year > x AND year < y) AND (month > x1 AND month < x2) AND ...
Looks ugly.
You can consider doing something else :
- s3://bucket/dir/date=2018-11-19/time=17:30:00/file.parquet
In that case your query will be more simple
WHERE ( date < '2018-11-19' AND date > '2018-11-17') AND ( time < '17:30:00' AND time > '17:20:00')
OR using BETWEEN
https://docs.aws.amazon.com/redshift/latest/dg/r_range_condition.html
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53380050%2fpartitioning-data-for-a-timestamp-query%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Spectrum is not so intuitive. You probably will need to convert timestamp to year, month, day ...
And than do something like WHERE (year > x AND year < y) AND (month > x1 AND month < x2) AND ...
Looks ugly.
You can consider doing something else :
- s3://bucket/dir/date=2018-11-19/time=17:30:00/file.parquet
In that case your query will be more simple
WHERE ( date < '2018-11-19' AND date > '2018-11-17') AND ( time < '17:30:00' AND time > '17:20:00')
OR using BETWEEN
https://docs.aws.amazon.com/redshift/latest/dg/r_range_condition.html
add a comment |
Spectrum is not so intuitive. You probably will need to convert timestamp to year, month, day ...
And than do something like WHERE (year > x AND year < y) AND (month > x1 AND month < x2) AND ...
Looks ugly.
You can consider doing something else :
- s3://bucket/dir/date=2018-11-19/time=17:30:00/file.parquet
In that case your query will be more simple
WHERE ( date < '2018-11-19' AND date > '2018-11-17') AND ( time < '17:30:00' AND time > '17:20:00')
OR using BETWEEN
https://docs.aws.amazon.com/redshift/latest/dg/r_range_condition.html
add a comment |
Spectrum is not so intuitive. You probably will need to convert timestamp to year, month, day ...
And than do something like WHERE (year > x AND year < y) AND (month > x1 AND month < x2) AND ...
Looks ugly.
You can consider doing something else :
- s3://bucket/dir/date=2018-11-19/time=17:30:00/file.parquet
In that case your query will be more simple
WHERE ( date < '2018-11-19' AND date > '2018-11-17') AND ( time < '17:30:00' AND time > '17:20:00')
OR using BETWEEN
https://docs.aws.amazon.com/redshift/latest/dg/r_range_condition.html
Spectrum is not so intuitive. You probably will need to convert timestamp to year, month, day ...
And than do something like WHERE (year > x AND year < y) AND (month > x1 AND month < x2) AND ...
Looks ugly.
You can consider doing something else :
- s3://bucket/dir/date=2018-11-19/time=17:30:00/file.parquet
In that case your query will be more simple
WHERE ( date < '2018-11-19' AND date > '2018-11-17') AND ( time < '17:30:00' AND time > '17:20:00')
OR using BETWEEN
https://docs.aws.amazon.com/redshift/latest/dg/r_range_condition.html
answered Dec 11 '18 at 8:44


yuriy polonskiyyuriy polonskiy
112
112
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53380050%2fpartitioning-data-for-a-timestamp-query%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
please share your example query?
– Jon Scott
Nov 20 '18 at 18:23
Updated the original post, let me know if anything is unclear.
– Eumcoz
Nov 20 '18 at 19:11