Is there a way to group timestamp data by 30 day intervals starting from the min(date) and add them as...
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
I am trying to use the min() value of a timestamp as a starting point and then group data by 30 day intervals in order to get a count of occurrences for each unique value within the timestamp date range as columns
i have two tables that i am joining together to get a count. Table 1 (page_creation) has 2 columns labeled link and dt_crtd. Table 2(page visits) has 2 other columns labeled url and date. the tables are being joined by joining table1.link = table2.pagevisits.
After the join i get a table similar to this:
+-------------------+------------------------+
| url | date |
+-------------------+------------------------+
| www.google.com | 2018-01-01 00:00:00' |
| www.google.com | 2018-01-02 00:00:00' |
| www.google.com | 2018-02-01 00:00:00' |
| www.google.com | 2018-02-05 00:00:00' |
| www.google.com | 2018-03-04 00:00:00' |
| www.facebook.com | 2014-01-05 00:00:00' |
| www.facebook.com | 2014-01-07 00:00:00' |
| www.facebook.com | 2014-04-02 00:00:00' |
| www.facebook.com | 2014-04-10 00:00:00' |
| www.facebook.com | 2014-04-11 00:00:00' |
| www.facebook.com | 2014-05-01 00:00:00' |
| www.twitter.com | 2016-02-01 00:00:00' |
| www.twitter.com | 2016-03-04 00:00:00' |
+---------------------+----------------------+
what i am trying to get is results that pull this :
+-------------------+------------------------+------------+------------+-------------+
| url | MIN_Date | Interval 1 | Interval 2| Interval 3 |
+-------------------+------------------------+-------------+-----------+-------------+
| www.google.com | 2018-01-01 00:00:00' | 2 | 2 | 1
| www.facebook.com | 2014-01-05 00:00:00' | 2 | 0 | 1
| www.twitter.com | 2016-02-01 00:00:00' | 1 | 1 | 0
+---------------------+----------------------+-------------+-----------+-------------+
So the 30 day intervals begin from the min(date) as shown in Interval 1 and are counted every 30 days.
Ive looked at other questions such as :
Group rows by 7 days interval starting from a certain date
MySQL query to select min datetime grouped by 30 day intervals
However it did not seem to answer my specific problem.
Ive also looked into pivot syntax but noticed it is only supported for certain DBMS.
Any help would be greatly appreciated.
Thank you.
sql google-bigquery google-query-language
add a comment |
I am trying to use the min() value of a timestamp as a starting point and then group data by 30 day intervals in order to get a count of occurrences for each unique value within the timestamp date range as columns
i have two tables that i am joining together to get a count. Table 1 (page_creation) has 2 columns labeled link and dt_crtd. Table 2(page visits) has 2 other columns labeled url and date. the tables are being joined by joining table1.link = table2.pagevisits.
After the join i get a table similar to this:
+-------------------+------------------------+
| url | date |
+-------------------+------------------------+
| www.google.com | 2018-01-01 00:00:00' |
| www.google.com | 2018-01-02 00:00:00' |
| www.google.com | 2018-02-01 00:00:00' |
| www.google.com | 2018-02-05 00:00:00' |
| www.google.com | 2018-03-04 00:00:00' |
| www.facebook.com | 2014-01-05 00:00:00' |
| www.facebook.com | 2014-01-07 00:00:00' |
| www.facebook.com | 2014-04-02 00:00:00' |
| www.facebook.com | 2014-04-10 00:00:00' |
| www.facebook.com | 2014-04-11 00:00:00' |
| www.facebook.com | 2014-05-01 00:00:00' |
| www.twitter.com | 2016-02-01 00:00:00' |
| www.twitter.com | 2016-03-04 00:00:00' |
+---------------------+----------------------+
what i am trying to get is results that pull this :
+-------------------+------------------------+------------+------------+-------------+
| url | MIN_Date | Interval 1 | Interval 2| Interval 3 |
+-------------------+------------------------+-------------+-----------+-------------+
| www.google.com | 2018-01-01 00:00:00' | 2 | 2 | 1
| www.facebook.com | 2014-01-05 00:00:00' | 2 | 0 | 1
| www.twitter.com | 2016-02-01 00:00:00' | 1 | 1 | 0
+---------------------+----------------------+-------------+-----------+-------------+
So the 30 day intervals begin from the min(date) as shown in Interval 1 and are counted every 30 days.
Ive looked at other questions such as :
Group rows by 7 days interval starting from a certain date
MySQL query to select min datetime grouped by 30 day intervals
However it did not seem to answer my specific problem.
Ive also looked into pivot syntax but noticed it is only supported for certain DBMS.
Any help would be greatly appreciated.
Thank you.
sql google-bigquery google-query-language
are you looking to get a variable number of columns, depending on how many 30 day intervals are found?
– Felipe Hoffa
Jan 3 at 7:03
add a comment |
I am trying to use the min() value of a timestamp as a starting point and then group data by 30 day intervals in order to get a count of occurrences for each unique value within the timestamp date range as columns
i have two tables that i am joining together to get a count. Table 1 (page_creation) has 2 columns labeled link and dt_crtd. Table 2(page visits) has 2 other columns labeled url and date. the tables are being joined by joining table1.link = table2.pagevisits.
After the join i get a table similar to this:
+-------------------+------------------------+
| url | date |
+-------------------+------------------------+
| www.google.com | 2018-01-01 00:00:00' |
| www.google.com | 2018-01-02 00:00:00' |
| www.google.com | 2018-02-01 00:00:00' |
| www.google.com | 2018-02-05 00:00:00' |
| www.google.com | 2018-03-04 00:00:00' |
| www.facebook.com | 2014-01-05 00:00:00' |
| www.facebook.com | 2014-01-07 00:00:00' |
| www.facebook.com | 2014-04-02 00:00:00' |
| www.facebook.com | 2014-04-10 00:00:00' |
| www.facebook.com | 2014-04-11 00:00:00' |
| www.facebook.com | 2014-05-01 00:00:00' |
| www.twitter.com | 2016-02-01 00:00:00' |
| www.twitter.com | 2016-03-04 00:00:00' |
+---------------------+----------------------+
what i am trying to get is results that pull this :
+-------------------+------------------------+------------+------------+-------------+
| url | MIN_Date | Interval 1 | Interval 2| Interval 3 |
+-------------------+------------------------+-------------+-----------+-------------+
| www.google.com | 2018-01-01 00:00:00' | 2 | 2 | 1
| www.facebook.com | 2014-01-05 00:00:00' | 2 | 0 | 1
| www.twitter.com | 2016-02-01 00:00:00' | 1 | 1 | 0
+---------------------+----------------------+-------------+-----------+-------------+
So the 30 day intervals begin from the min(date) as shown in Interval 1 and are counted every 30 days.
Ive looked at other questions such as :
Group rows by 7 days interval starting from a certain date
MySQL query to select min datetime grouped by 30 day intervals
However it did not seem to answer my specific problem.
Ive also looked into pivot syntax but noticed it is only supported for certain DBMS.
Any help would be greatly appreciated.
Thank you.
sql google-bigquery google-query-language
I am trying to use the min() value of a timestamp as a starting point and then group data by 30 day intervals in order to get a count of occurrences for each unique value within the timestamp date range as columns
i have two tables that i am joining together to get a count. Table 1 (page_creation) has 2 columns labeled link and dt_crtd. Table 2(page visits) has 2 other columns labeled url and date. the tables are being joined by joining table1.link = table2.pagevisits.
After the join i get a table similar to this:
+-------------------+------------------------+
| url | date |
+-------------------+------------------------+
| www.google.com | 2018-01-01 00:00:00' |
| www.google.com | 2018-01-02 00:00:00' |
| www.google.com | 2018-02-01 00:00:00' |
| www.google.com | 2018-02-05 00:00:00' |
| www.google.com | 2018-03-04 00:00:00' |
| www.facebook.com | 2014-01-05 00:00:00' |
| www.facebook.com | 2014-01-07 00:00:00' |
| www.facebook.com | 2014-04-02 00:00:00' |
| www.facebook.com | 2014-04-10 00:00:00' |
| www.facebook.com | 2014-04-11 00:00:00' |
| www.facebook.com | 2014-05-01 00:00:00' |
| www.twitter.com | 2016-02-01 00:00:00' |
| www.twitter.com | 2016-03-04 00:00:00' |
+---------------------+----------------------+
what i am trying to get is results that pull this :
+-------------------+------------------------+------------+------------+-------------+
| url | MIN_Date | Interval 1 | Interval 2| Interval 3 |
+-------------------+------------------------+-------------+-----------+-------------+
| www.google.com | 2018-01-01 00:00:00' | 2 | 2 | 1
| www.facebook.com | 2014-01-05 00:00:00' | 2 | 0 | 1
| www.twitter.com | 2016-02-01 00:00:00' | 1 | 1 | 0
+---------------------+----------------------+-------------+-----------+-------------+
So the 30 day intervals begin from the min(date) as shown in Interval 1 and are counted every 30 days.
Ive looked at other questions such as :
Group rows by 7 days interval starting from a certain date
MySQL query to select min datetime grouped by 30 day intervals
However it did not seem to answer my specific problem.
Ive also looked into pivot syntax but noticed it is only supported for certain DBMS.
Any help would be greatly appreciated.
Thank you.
sql google-bigquery google-query-language
sql google-bigquery google-query-language
asked Jan 3 at 5:13
CCPCCP
439
439
are you looking to get a variable number of columns, depending on how many 30 day intervals are found?
– Felipe Hoffa
Jan 3 at 7:03
add a comment |
are you looking to get a variable number of columns, depending on how many 30 day intervals are found?
– Felipe Hoffa
Jan 3 at 7:03
are you looking to get a variable number of columns, depending on how many 30 day intervals are found?
– Felipe Hoffa
Jan 3 at 7:03
are you looking to get a variable number of columns, depending on how many 30 day intervals are found?
– Felipe Hoffa
Jan 3 at 7:03
add a comment |
3 Answers
3
active
oldest
votes
If you are using BigQuery, I would recommend:
countif()
to count a boolean value
timestamp_add()
to add intervals to timestamps
The exact boundaries are a bit vague, but I would go for:
select pc.url,
countif(pv.date >= pc.dt_crtd and
pv.date < timestamp_add(pc.dt_crtd, interval 30 day
) as Interval_00_29,
countif(pv.date >= timestamp_add(pc.dt_crtd, interval 30 day) and
pv.date < timestamp_add(pc.dt_crtd, interval 60 day
) as Interval_30_59,
countif(pv.date >= timestamp_add(pc.dt_crtd, interval 60 day) and
pv.date < timestamp_add(pc.dt_crtd, interval 90 day
) as Interval_60_89
from page_creation pc join
page_visits pv
on pc.link = pv.url
group by pc.url
add a comment |
If I understood your question clearly, you want to calculate page visits between 30 , 60 , 90 days intervals after page creation. If it's the requirement, try below SQL code :-
select a11.url
,Sum(case when a12.date between a11.dt_crtd and a11.dt_crtd+30 then 1 else 0) Interval_1
,Sum(case when a12.date between a11.dt_crtd+31 and a11.dt_crtd+60 then 1 else 0) Interval_2
,Sum(case when a12.date between a11.dt_crtd+61 and a11.dt_crtd+90 then 1 else 0) Interval_3
from page_creation a11
join page_visits a12
on a11.link = a12.url
group by a11.url
i will try that when i go into the office later. Does it matter that dt_crtd is a timestamp? In other words will adding +30 work without needing to convert the timestamp into Char()? Also to answer your question, yes you understood my question clearly! Thank you so much.
– CCP
Jan 3 at 12:28
add a comment |
The way I am reading your scenario and especially based on example of After the join i get a table similar to ...
is that you have two tables that you need to UNION
- not to JOIN
So, based on that reading below example is for BigQuery Standard SQL (project.dataset.page_creation
and project.dataset.page_visits
are here just to mimic your Table 1 and Table2)
#standardSQL
WITH `project.dataset.page_creation` AS (
SELECT 'www.google.com' link, TIMESTAMP '2018-01-01 00:00:00' dt_crtd UNION ALL
SELECT 'www.facebook.com', '2014-01-05 00:00:00' UNION ALL
SELECT 'www.twitter.com', '2016-02-01 00:00:00'
), `project.dataset.page_visits` AS (
SELECT 'www.google.com' url, TIMESTAMP '2018-01-02 00:00:00' dt UNION ALL
SELECT 'www.google.com', '2018-02-01 00:00:00' UNION ALL
SELECT 'www.google.com', '2018-02-05 00:00:00' UNION ALL
SELECT 'www.google.com', '2018-03-04 00:00:00' UNION ALL
SELECT 'www.facebook.com', '2014-01-07 00:00:00' UNION ALL
SELECT 'www.facebook.com', '2014-04-02 00:00:00' UNION ALL
SELECT 'www.facebook.com', '2014-04-10 00:00:00' UNION ALL
SELECT 'www.facebook.com', '2014-04-11 00:00:00' UNION ALL
SELECT 'www.facebook.com', '2014-05-01 00:00:00' UNION ALL
SELECT 'www.twitter.com', '2016-03-04 00:00:00'
), `After the join` AS (
SELECT url, dt FROM `project.dataset.page_visits` UNION DISTINCT
SELECT link, dt_crtd FROM `project.dataset.page_creation`
)
SELECT
url, min_date,
COUNTIF(dt BETWEEN min_date AND TIMESTAMP_ADD(min_date, INTERVAL 29 DAY)) Interval_1,
COUNTIF(dt BETWEEN TIMESTAMP_ADD(min_date, INTERVAL 30 DAY) AND TIMESTAMP_ADD(min_date, INTERVAL 59 DAY)) Interval_2,
COUNTIF(dt BETWEEN TIMESTAMP_ADD(min_date, INTERVAL 60 DAY) AND TIMESTAMP_ADD(min_date, INTERVAL 89 DAY)) Interval_3
FROM (
SELECT url, dt, MIN(dt) OVER(PARTITION BY url ORDER BY dt) min_date
FROM `After the join`
)
GROUP BY url, min_date
with result as
Row url min_date Interval_1 Interval_2 Interval_3
1 www.facebook.com 2014-01-05 00:00:00 UTC 2 0 1
2 www.google.com 2018-01-01 00:00:00 UTC 2 2 1
3 www.twitter.com 2016-02-01 00:00:00 UTC 1 1 0
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54016684%2fis-there-a-way-to-group-timestamp-data-by-30-day-intervals-starting-from-the-min%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
If you are using BigQuery, I would recommend:
countif()
to count a boolean value
timestamp_add()
to add intervals to timestamps
The exact boundaries are a bit vague, but I would go for:
select pc.url,
countif(pv.date >= pc.dt_crtd and
pv.date < timestamp_add(pc.dt_crtd, interval 30 day
) as Interval_00_29,
countif(pv.date >= timestamp_add(pc.dt_crtd, interval 30 day) and
pv.date < timestamp_add(pc.dt_crtd, interval 60 day
) as Interval_30_59,
countif(pv.date >= timestamp_add(pc.dt_crtd, interval 60 day) and
pv.date < timestamp_add(pc.dt_crtd, interval 90 day
) as Interval_60_89
from page_creation pc join
page_visits pv
on pc.link = pv.url
group by pc.url
add a comment |
If you are using BigQuery, I would recommend:
countif()
to count a boolean value
timestamp_add()
to add intervals to timestamps
The exact boundaries are a bit vague, but I would go for:
select pc.url,
countif(pv.date >= pc.dt_crtd and
pv.date < timestamp_add(pc.dt_crtd, interval 30 day
) as Interval_00_29,
countif(pv.date >= timestamp_add(pc.dt_crtd, interval 30 day) and
pv.date < timestamp_add(pc.dt_crtd, interval 60 day
) as Interval_30_59,
countif(pv.date >= timestamp_add(pc.dt_crtd, interval 60 day) and
pv.date < timestamp_add(pc.dt_crtd, interval 90 day
) as Interval_60_89
from page_creation pc join
page_visits pv
on pc.link = pv.url
group by pc.url
add a comment |
If you are using BigQuery, I would recommend:
countif()
to count a boolean value
timestamp_add()
to add intervals to timestamps
The exact boundaries are a bit vague, but I would go for:
select pc.url,
countif(pv.date >= pc.dt_crtd and
pv.date < timestamp_add(pc.dt_crtd, interval 30 day
) as Interval_00_29,
countif(pv.date >= timestamp_add(pc.dt_crtd, interval 30 day) and
pv.date < timestamp_add(pc.dt_crtd, interval 60 day
) as Interval_30_59,
countif(pv.date >= timestamp_add(pc.dt_crtd, interval 60 day) and
pv.date < timestamp_add(pc.dt_crtd, interval 90 day
) as Interval_60_89
from page_creation pc join
page_visits pv
on pc.link = pv.url
group by pc.url
If you are using BigQuery, I would recommend:
countif()
to count a boolean value
timestamp_add()
to add intervals to timestamps
The exact boundaries are a bit vague, but I would go for:
select pc.url,
countif(pv.date >= pc.dt_crtd and
pv.date < timestamp_add(pc.dt_crtd, interval 30 day
) as Interval_00_29,
countif(pv.date >= timestamp_add(pc.dt_crtd, interval 30 day) and
pv.date < timestamp_add(pc.dt_crtd, interval 60 day
) as Interval_30_59,
countif(pv.date >= timestamp_add(pc.dt_crtd, interval 60 day) and
pv.date < timestamp_add(pc.dt_crtd, interval 90 day
) as Interval_60_89
from page_creation pc join
page_visits pv
on pc.link = pv.url
group by pc.url
answered Jan 3 at 12:30
Gordon LinoffGordon Linoff
794k37318421
794k37318421
add a comment |
add a comment |
If I understood your question clearly, you want to calculate page visits between 30 , 60 , 90 days intervals after page creation. If it's the requirement, try below SQL code :-
select a11.url
,Sum(case when a12.date between a11.dt_crtd and a11.dt_crtd+30 then 1 else 0) Interval_1
,Sum(case when a12.date between a11.dt_crtd+31 and a11.dt_crtd+60 then 1 else 0) Interval_2
,Sum(case when a12.date between a11.dt_crtd+61 and a11.dt_crtd+90 then 1 else 0) Interval_3
from page_creation a11
join page_visits a12
on a11.link = a12.url
group by a11.url
i will try that when i go into the office later. Does it matter that dt_crtd is a timestamp? In other words will adding +30 work without needing to convert the timestamp into Char()? Also to answer your question, yes you understood my question clearly! Thank you so much.
– CCP
Jan 3 at 12:28
add a comment |
If I understood your question clearly, you want to calculate page visits between 30 , 60 , 90 days intervals after page creation. If it's the requirement, try below SQL code :-
select a11.url
,Sum(case when a12.date between a11.dt_crtd and a11.dt_crtd+30 then 1 else 0) Interval_1
,Sum(case when a12.date between a11.dt_crtd+31 and a11.dt_crtd+60 then 1 else 0) Interval_2
,Sum(case when a12.date between a11.dt_crtd+61 and a11.dt_crtd+90 then 1 else 0) Interval_3
from page_creation a11
join page_visits a12
on a11.link = a12.url
group by a11.url
i will try that when i go into the office later. Does it matter that dt_crtd is a timestamp? In other words will adding +30 work without needing to convert the timestamp into Char()? Also to answer your question, yes you understood my question clearly! Thank you so much.
– CCP
Jan 3 at 12:28
add a comment |
If I understood your question clearly, you want to calculate page visits between 30 , 60 , 90 days intervals after page creation. If it's the requirement, try below SQL code :-
select a11.url
,Sum(case when a12.date between a11.dt_crtd and a11.dt_crtd+30 then 1 else 0) Interval_1
,Sum(case when a12.date between a11.dt_crtd+31 and a11.dt_crtd+60 then 1 else 0) Interval_2
,Sum(case when a12.date between a11.dt_crtd+61 and a11.dt_crtd+90 then 1 else 0) Interval_3
from page_creation a11
join page_visits a12
on a11.link = a12.url
group by a11.url
If I understood your question clearly, you want to calculate page visits between 30 , 60 , 90 days intervals after page creation. If it's the requirement, try below SQL code :-
select a11.url
,Sum(case when a12.date between a11.dt_crtd and a11.dt_crtd+30 then 1 else 0) Interval_1
,Sum(case when a12.date between a11.dt_crtd+31 and a11.dt_crtd+60 then 1 else 0) Interval_2
,Sum(case when a12.date between a11.dt_crtd+61 and a11.dt_crtd+90 then 1 else 0) Interval_3
from page_creation a11
join page_visits a12
on a11.link = a12.url
group by a11.url
answered Jan 3 at 7:54


GauravSGauravS
865510
865510
i will try that when i go into the office later. Does it matter that dt_crtd is a timestamp? In other words will adding +30 work without needing to convert the timestamp into Char()? Also to answer your question, yes you understood my question clearly! Thank you so much.
– CCP
Jan 3 at 12:28
add a comment |
i will try that when i go into the office later. Does it matter that dt_crtd is a timestamp? In other words will adding +30 work without needing to convert the timestamp into Char()? Also to answer your question, yes you understood my question clearly! Thank you so much.
– CCP
Jan 3 at 12:28
i will try that when i go into the office later. Does it matter that dt_crtd is a timestamp? In other words will adding +30 work without needing to convert the timestamp into Char()? Also to answer your question, yes you understood my question clearly! Thank you so much.
– CCP
Jan 3 at 12:28
i will try that when i go into the office later. Does it matter that dt_crtd is a timestamp? In other words will adding +30 work without needing to convert the timestamp into Char()? Also to answer your question, yes you understood my question clearly! Thank you so much.
– CCP
Jan 3 at 12:28
add a comment |
The way I am reading your scenario and especially based on example of After the join i get a table similar to ...
is that you have two tables that you need to UNION
- not to JOIN
So, based on that reading below example is for BigQuery Standard SQL (project.dataset.page_creation
and project.dataset.page_visits
are here just to mimic your Table 1 and Table2)
#standardSQL
WITH `project.dataset.page_creation` AS (
SELECT 'www.google.com' link, TIMESTAMP '2018-01-01 00:00:00' dt_crtd UNION ALL
SELECT 'www.facebook.com', '2014-01-05 00:00:00' UNION ALL
SELECT 'www.twitter.com', '2016-02-01 00:00:00'
), `project.dataset.page_visits` AS (
SELECT 'www.google.com' url, TIMESTAMP '2018-01-02 00:00:00' dt UNION ALL
SELECT 'www.google.com', '2018-02-01 00:00:00' UNION ALL
SELECT 'www.google.com', '2018-02-05 00:00:00' UNION ALL
SELECT 'www.google.com', '2018-03-04 00:00:00' UNION ALL
SELECT 'www.facebook.com', '2014-01-07 00:00:00' UNION ALL
SELECT 'www.facebook.com', '2014-04-02 00:00:00' UNION ALL
SELECT 'www.facebook.com', '2014-04-10 00:00:00' UNION ALL
SELECT 'www.facebook.com', '2014-04-11 00:00:00' UNION ALL
SELECT 'www.facebook.com', '2014-05-01 00:00:00' UNION ALL
SELECT 'www.twitter.com', '2016-03-04 00:00:00'
), `After the join` AS (
SELECT url, dt FROM `project.dataset.page_visits` UNION DISTINCT
SELECT link, dt_crtd FROM `project.dataset.page_creation`
)
SELECT
url, min_date,
COUNTIF(dt BETWEEN min_date AND TIMESTAMP_ADD(min_date, INTERVAL 29 DAY)) Interval_1,
COUNTIF(dt BETWEEN TIMESTAMP_ADD(min_date, INTERVAL 30 DAY) AND TIMESTAMP_ADD(min_date, INTERVAL 59 DAY)) Interval_2,
COUNTIF(dt BETWEEN TIMESTAMP_ADD(min_date, INTERVAL 60 DAY) AND TIMESTAMP_ADD(min_date, INTERVAL 89 DAY)) Interval_3
FROM (
SELECT url, dt, MIN(dt) OVER(PARTITION BY url ORDER BY dt) min_date
FROM `After the join`
)
GROUP BY url, min_date
with result as
Row url min_date Interval_1 Interval_2 Interval_3
1 www.facebook.com 2014-01-05 00:00:00 UTC 2 0 1
2 www.google.com 2018-01-01 00:00:00 UTC 2 2 1
3 www.twitter.com 2016-02-01 00:00:00 UTC 1 1 0
add a comment |
The way I am reading your scenario and especially based on example of After the join i get a table similar to ...
is that you have two tables that you need to UNION
- not to JOIN
So, based on that reading below example is for BigQuery Standard SQL (project.dataset.page_creation
and project.dataset.page_visits
are here just to mimic your Table 1 and Table2)
#standardSQL
WITH `project.dataset.page_creation` AS (
SELECT 'www.google.com' link, TIMESTAMP '2018-01-01 00:00:00' dt_crtd UNION ALL
SELECT 'www.facebook.com', '2014-01-05 00:00:00' UNION ALL
SELECT 'www.twitter.com', '2016-02-01 00:00:00'
), `project.dataset.page_visits` AS (
SELECT 'www.google.com' url, TIMESTAMP '2018-01-02 00:00:00' dt UNION ALL
SELECT 'www.google.com', '2018-02-01 00:00:00' UNION ALL
SELECT 'www.google.com', '2018-02-05 00:00:00' UNION ALL
SELECT 'www.google.com', '2018-03-04 00:00:00' UNION ALL
SELECT 'www.facebook.com', '2014-01-07 00:00:00' UNION ALL
SELECT 'www.facebook.com', '2014-04-02 00:00:00' UNION ALL
SELECT 'www.facebook.com', '2014-04-10 00:00:00' UNION ALL
SELECT 'www.facebook.com', '2014-04-11 00:00:00' UNION ALL
SELECT 'www.facebook.com', '2014-05-01 00:00:00' UNION ALL
SELECT 'www.twitter.com', '2016-03-04 00:00:00'
), `After the join` AS (
SELECT url, dt FROM `project.dataset.page_visits` UNION DISTINCT
SELECT link, dt_crtd FROM `project.dataset.page_creation`
)
SELECT
url, min_date,
COUNTIF(dt BETWEEN min_date AND TIMESTAMP_ADD(min_date, INTERVAL 29 DAY)) Interval_1,
COUNTIF(dt BETWEEN TIMESTAMP_ADD(min_date, INTERVAL 30 DAY) AND TIMESTAMP_ADD(min_date, INTERVAL 59 DAY)) Interval_2,
COUNTIF(dt BETWEEN TIMESTAMP_ADD(min_date, INTERVAL 60 DAY) AND TIMESTAMP_ADD(min_date, INTERVAL 89 DAY)) Interval_3
FROM (
SELECT url, dt, MIN(dt) OVER(PARTITION BY url ORDER BY dt) min_date
FROM `After the join`
)
GROUP BY url, min_date
with result as
Row url min_date Interval_1 Interval_2 Interval_3
1 www.facebook.com 2014-01-05 00:00:00 UTC 2 0 1
2 www.google.com 2018-01-01 00:00:00 UTC 2 2 1
3 www.twitter.com 2016-02-01 00:00:00 UTC 1 1 0
add a comment |
The way I am reading your scenario and especially based on example of After the join i get a table similar to ...
is that you have two tables that you need to UNION
- not to JOIN
So, based on that reading below example is for BigQuery Standard SQL (project.dataset.page_creation
and project.dataset.page_visits
are here just to mimic your Table 1 and Table2)
#standardSQL
WITH `project.dataset.page_creation` AS (
SELECT 'www.google.com' link, TIMESTAMP '2018-01-01 00:00:00' dt_crtd UNION ALL
SELECT 'www.facebook.com', '2014-01-05 00:00:00' UNION ALL
SELECT 'www.twitter.com', '2016-02-01 00:00:00'
), `project.dataset.page_visits` AS (
SELECT 'www.google.com' url, TIMESTAMP '2018-01-02 00:00:00' dt UNION ALL
SELECT 'www.google.com', '2018-02-01 00:00:00' UNION ALL
SELECT 'www.google.com', '2018-02-05 00:00:00' UNION ALL
SELECT 'www.google.com', '2018-03-04 00:00:00' UNION ALL
SELECT 'www.facebook.com', '2014-01-07 00:00:00' UNION ALL
SELECT 'www.facebook.com', '2014-04-02 00:00:00' UNION ALL
SELECT 'www.facebook.com', '2014-04-10 00:00:00' UNION ALL
SELECT 'www.facebook.com', '2014-04-11 00:00:00' UNION ALL
SELECT 'www.facebook.com', '2014-05-01 00:00:00' UNION ALL
SELECT 'www.twitter.com', '2016-03-04 00:00:00'
), `After the join` AS (
SELECT url, dt FROM `project.dataset.page_visits` UNION DISTINCT
SELECT link, dt_crtd FROM `project.dataset.page_creation`
)
SELECT
url, min_date,
COUNTIF(dt BETWEEN min_date AND TIMESTAMP_ADD(min_date, INTERVAL 29 DAY)) Interval_1,
COUNTIF(dt BETWEEN TIMESTAMP_ADD(min_date, INTERVAL 30 DAY) AND TIMESTAMP_ADD(min_date, INTERVAL 59 DAY)) Interval_2,
COUNTIF(dt BETWEEN TIMESTAMP_ADD(min_date, INTERVAL 60 DAY) AND TIMESTAMP_ADD(min_date, INTERVAL 89 DAY)) Interval_3
FROM (
SELECT url, dt, MIN(dt) OVER(PARTITION BY url ORDER BY dt) min_date
FROM `After the join`
)
GROUP BY url, min_date
with result as
Row url min_date Interval_1 Interval_2 Interval_3
1 www.facebook.com 2014-01-05 00:00:00 UTC 2 0 1
2 www.google.com 2018-01-01 00:00:00 UTC 2 2 1
3 www.twitter.com 2016-02-01 00:00:00 UTC 1 1 0
The way I am reading your scenario and especially based on example of After the join i get a table similar to ...
is that you have two tables that you need to UNION
- not to JOIN
So, based on that reading below example is for BigQuery Standard SQL (project.dataset.page_creation
and project.dataset.page_visits
are here just to mimic your Table 1 and Table2)
#standardSQL
WITH `project.dataset.page_creation` AS (
SELECT 'www.google.com' link, TIMESTAMP '2018-01-01 00:00:00' dt_crtd UNION ALL
SELECT 'www.facebook.com', '2014-01-05 00:00:00' UNION ALL
SELECT 'www.twitter.com', '2016-02-01 00:00:00'
), `project.dataset.page_visits` AS (
SELECT 'www.google.com' url, TIMESTAMP '2018-01-02 00:00:00' dt UNION ALL
SELECT 'www.google.com', '2018-02-01 00:00:00' UNION ALL
SELECT 'www.google.com', '2018-02-05 00:00:00' UNION ALL
SELECT 'www.google.com', '2018-03-04 00:00:00' UNION ALL
SELECT 'www.facebook.com', '2014-01-07 00:00:00' UNION ALL
SELECT 'www.facebook.com', '2014-04-02 00:00:00' UNION ALL
SELECT 'www.facebook.com', '2014-04-10 00:00:00' UNION ALL
SELECT 'www.facebook.com', '2014-04-11 00:00:00' UNION ALL
SELECT 'www.facebook.com', '2014-05-01 00:00:00' UNION ALL
SELECT 'www.twitter.com', '2016-03-04 00:00:00'
), `After the join` AS (
SELECT url, dt FROM `project.dataset.page_visits` UNION DISTINCT
SELECT link, dt_crtd FROM `project.dataset.page_creation`
)
SELECT
url, min_date,
COUNTIF(dt BETWEEN min_date AND TIMESTAMP_ADD(min_date, INTERVAL 29 DAY)) Interval_1,
COUNTIF(dt BETWEEN TIMESTAMP_ADD(min_date, INTERVAL 30 DAY) AND TIMESTAMP_ADD(min_date, INTERVAL 59 DAY)) Interval_2,
COUNTIF(dt BETWEEN TIMESTAMP_ADD(min_date, INTERVAL 60 DAY) AND TIMESTAMP_ADD(min_date, INTERVAL 89 DAY)) Interval_3
FROM (
SELECT url, dt, MIN(dt) OVER(PARTITION BY url ORDER BY dt) min_date
FROM `After the join`
)
GROUP BY url, min_date
with result as
Row url min_date Interval_1 Interval_2 Interval_3
1 www.facebook.com 2014-01-05 00:00:00 UTC 2 0 1
2 www.google.com 2018-01-01 00:00:00 UTC 2 2 1
3 www.twitter.com 2016-02-01 00:00:00 UTC 1 1 0
answered Jan 3 at 22:18


Mikhail BerlyantMikhail Berlyant
62.9k43874
62.9k43874
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54016684%2fis-there-a-way-to-group-timestamp-data-by-30-day-intervals-starting-from-the-min%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
are you looking to get a variable number of columns, depending on how many 30 day intervals are found?
– Felipe Hoffa
Jan 3 at 7:03