In R, how do I split timestamp interval data into regular slots?












6















I'm working with data describing events having start and end time. For instance, it could be in a form of:



enter image description here



I'd like to convert this data to a form where I could count occurrences of events in regular intervals in order to be able to answer question of how many events were taking place from 13:15:00 to 13:29:59 and 13:30:00 to 13:45:00 and so on. In the example above the second record could be counted in first slot simply using start time, but wouldn't be counted as 'ongoing' in the second regular interval, even when end time was integrated.



To work with 15 min intervals I came up with a clumsy solution that uses tidyr::uncount to 'expand' dataset to 24*4=96 15 min intervals through the 24h period and then filtering ones that are inside specified intervals.



library(tidyverse)
library(lubridate)
library(magrittr)

df1 <- tibble::tibble(
id = c(1, 2),
start_date = c(ymd_hms("2018-12-10 14:45:51", tz = "Australia/Brisbane"),
ymd_hms("2018-12-10 13:29:37", tz = "Australia/Brisbane")),
end_date = c(ymd_hms("2018-12-10 14:59:04", tz = "Australia/Brisbane"),
ymd_hms("2018-12-10 14:02:37", tz = "Australia/Brisbane")))

df2 <- df1 %>%
mutate(episode = 96) %>%
tidyr::uncount(episode, .id = "sequence")

df2$int_start <- rep(
seq(ymd_hms("2018-12-10 00:00:00", tz = "Australia/Brisbane"),
ymd_hms("2018-12-10 23:59:59", tz = "Australia/Brisbane"),
by = "15 mins"),
2)

df2$int_end <- df2$int_start + 899

df2 %<>%
filter(int_end > start_date & int_start < end_date )


That gives me data in desired format:



enter image description here



But I have a feeling there must be smarter way to do that. Expansion for instance will not be great solution for very large dataset and/or when the time interval is small (I think). It'll probably be also a pain to extend it to work with multiple days (I think).



In Stata one could also use stsplit command to accomplish something similar. I tried tinkering with survSplit from survival package but ended up with lots of records as well:



df1$status <- 1
df1$start_date <- as.numeric(df1$start_date)
df1$end_date <- as.numeric(df1$end_date)

df3 <- survSplit(Surv(end_date, status) ~., df1,
cut=seq(from=as.numeric(as.POSIXct("2018-12-10 00:00:00")),
to=as.numeric(as.POSIXct("2018-12-10 00:00:00")) + 24*60*60,
by=900),
start = "start_int",
id="new_id",
episode ="episode")

df3$start_int <- as.POSIXct(df3$start_int, origin = "1970-01-01", tz = "Australia/Brisbane")
df3$start_date <- as.POSIXct(df3$start_date, origin = "1970-01-01", tz = "Australia/Brisbane")
df3$end_date <- as.POSIXct(df3$end_date, origin = "1970-01-01", tz = "Australia/Brisbane")


Any pointers to a better way to achieve such task?










share|improve this question

























  • in the second paragraph, do you mean: 14:30:00 and 14:45:00 (not 13:XX)?

    – Khaynes
    Jan 2 at 3:21













  • @Khaynes I clarified a bit - hope that helps

    – radek
    Jan 2 at 3:56






  • 1





    you might want to check out IRanges package, data.table::foverlaps or non-equi joins using data.table

    – chinsoon12
    Jan 2 at 5:50











  • and also stackoverflow.com/questions/10862056/…

    – chinsoon12
    Jan 2 at 6:10











  • Are you looking for how many events were taking place across the whole interval (ie started before the interval began and ended after the interval ended), or were taking place at some point in the interval (ie started before the interval ended and ended after the interval began)? Or are you looking for instantaneous counts of "how many are active at a given instant" and using intervals as samples?

    – Jon Spring
    Jan 2 at 7:20
















6















I'm working with data describing events having start and end time. For instance, it could be in a form of:



enter image description here



I'd like to convert this data to a form where I could count occurrences of events in regular intervals in order to be able to answer question of how many events were taking place from 13:15:00 to 13:29:59 and 13:30:00 to 13:45:00 and so on. In the example above the second record could be counted in first slot simply using start time, but wouldn't be counted as 'ongoing' in the second regular interval, even when end time was integrated.



To work with 15 min intervals I came up with a clumsy solution that uses tidyr::uncount to 'expand' dataset to 24*4=96 15 min intervals through the 24h period and then filtering ones that are inside specified intervals.



library(tidyverse)
library(lubridate)
library(magrittr)

df1 <- tibble::tibble(
id = c(1, 2),
start_date = c(ymd_hms("2018-12-10 14:45:51", tz = "Australia/Brisbane"),
ymd_hms("2018-12-10 13:29:37", tz = "Australia/Brisbane")),
end_date = c(ymd_hms("2018-12-10 14:59:04", tz = "Australia/Brisbane"),
ymd_hms("2018-12-10 14:02:37", tz = "Australia/Brisbane")))

df2 <- df1 %>%
mutate(episode = 96) %>%
tidyr::uncount(episode, .id = "sequence")

df2$int_start <- rep(
seq(ymd_hms("2018-12-10 00:00:00", tz = "Australia/Brisbane"),
ymd_hms("2018-12-10 23:59:59", tz = "Australia/Brisbane"),
by = "15 mins"),
2)

df2$int_end <- df2$int_start + 899

df2 %<>%
filter(int_end > start_date & int_start < end_date )


That gives me data in desired format:



enter image description here



But I have a feeling there must be smarter way to do that. Expansion for instance will not be great solution for very large dataset and/or when the time interval is small (I think). It'll probably be also a pain to extend it to work with multiple days (I think).



In Stata one could also use stsplit command to accomplish something similar. I tried tinkering with survSplit from survival package but ended up with lots of records as well:



df1$status <- 1
df1$start_date <- as.numeric(df1$start_date)
df1$end_date <- as.numeric(df1$end_date)

df3 <- survSplit(Surv(end_date, status) ~., df1,
cut=seq(from=as.numeric(as.POSIXct("2018-12-10 00:00:00")),
to=as.numeric(as.POSIXct("2018-12-10 00:00:00")) + 24*60*60,
by=900),
start = "start_int",
id="new_id",
episode ="episode")

df3$start_int <- as.POSIXct(df3$start_int, origin = "1970-01-01", tz = "Australia/Brisbane")
df3$start_date <- as.POSIXct(df3$start_date, origin = "1970-01-01", tz = "Australia/Brisbane")
df3$end_date <- as.POSIXct(df3$end_date, origin = "1970-01-01", tz = "Australia/Brisbane")


Any pointers to a better way to achieve such task?










share|improve this question

























  • in the second paragraph, do you mean: 14:30:00 and 14:45:00 (not 13:XX)?

    – Khaynes
    Jan 2 at 3:21













  • @Khaynes I clarified a bit - hope that helps

    – radek
    Jan 2 at 3:56






  • 1





    you might want to check out IRanges package, data.table::foverlaps or non-equi joins using data.table

    – chinsoon12
    Jan 2 at 5:50











  • and also stackoverflow.com/questions/10862056/…

    – chinsoon12
    Jan 2 at 6:10











  • Are you looking for how many events were taking place across the whole interval (ie started before the interval began and ended after the interval ended), or were taking place at some point in the interval (ie started before the interval ended and ended after the interval began)? Or are you looking for instantaneous counts of "how many are active at a given instant" and using intervals as samples?

    – Jon Spring
    Jan 2 at 7:20














6












6








6








I'm working with data describing events having start and end time. For instance, it could be in a form of:



enter image description here



I'd like to convert this data to a form where I could count occurrences of events in regular intervals in order to be able to answer question of how many events were taking place from 13:15:00 to 13:29:59 and 13:30:00 to 13:45:00 and so on. In the example above the second record could be counted in first slot simply using start time, but wouldn't be counted as 'ongoing' in the second regular interval, even when end time was integrated.



To work with 15 min intervals I came up with a clumsy solution that uses tidyr::uncount to 'expand' dataset to 24*4=96 15 min intervals through the 24h period and then filtering ones that are inside specified intervals.



library(tidyverse)
library(lubridate)
library(magrittr)

df1 <- tibble::tibble(
id = c(1, 2),
start_date = c(ymd_hms("2018-12-10 14:45:51", tz = "Australia/Brisbane"),
ymd_hms("2018-12-10 13:29:37", tz = "Australia/Brisbane")),
end_date = c(ymd_hms("2018-12-10 14:59:04", tz = "Australia/Brisbane"),
ymd_hms("2018-12-10 14:02:37", tz = "Australia/Brisbane")))

df2 <- df1 %>%
mutate(episode = 96) %>%
tidyr::uncount(episode, .id = "sequence")

df2$int_start <- rep(
seq(ymd_hms("2018-12-10 00:00:00", tz = "Australia/Brisbane"),
ymd_hms("2018-12-10 23:59:59", tz = "Australia/Brisbane"),
by = "15 mins"),
2)

df2$int_end <- df2$int_start + 899

df2 %<>%
filter(int_end > start_date & int_start < end_date )


That gives me data in desired format:



enter image description here



But I have a feeling there must be smarter way to do that. Expansion for instance will not be great solution for very large dataset and/or when the time interval is small (I think). It'll probably be also a pain to extend it to work with multiple days (I think).



In Stata one could also use stsplit command to accomplish something similar. I tried tinkering with survSplit from survival package but ended up with lots of records as well:



df1$status <- 1
df1$start_date <- as.numeric(df1$start_date)
df1$end_date <- as.numeric(df1$end_date)

df3 <- survSplit(Surv(end_date, status) ~., df1,
cut=seq(from=as.numeric(as.POSIXct("2018-12-10 00:00:00")),
to=as.numeric(as.POSIXct("2018-12-10 00:00:00")) + 24*60*60,
by=900),
start = "start_int",
id="new_id",
episode ="episode")

df3$start_int <- as.POSIXct(df3$start_int, origin = "1970-01-01", tz = "Australia/Brisbane")
df3$start_date <- as.POSIXct(df3$start_date, origin = "1970-01-01", tz = "Australia/Brisbane")
df3$end_date <- as.POSIXct(df3$end_date, origin = "1970-01-01", tz = "Australia/Brisbane")


Any pointers to a better way to achieve such task?










share|improve this question
















I'm working with data describing events having start and end time. For instance, it could be in a form of:



enter image description here



I'd like to convert this data to a form where I could count occurrences of events in regular intervals in order to be able to answer question of how many events were taking place from 13:15:00 to 13:29:59 and 13:30:00 to 13:45:00 and so on. In the example above the second record could be counted in first slot simply using start time, but wouldn't be counted as 'ongoing' in the second regular interval, even when end time was integrated.



To work with 15 min intervals I came up with a clumsy solution that uses tidyr::uncount to 'expand' dataset to 24*4=96 15 min intervals through the 24h period and then filtering ones that are inside specified intervals.



library(tidyverse)
library(lubridate)
library(magrittr)

df1 <- tibble::tibble(
id = c(1, 2),
start_date = c(ymd_hms("2018-12-10 14:45:51", tz = "Australia/Brisbane"),
ymd_hms("2018-12-10 13:29:37", tz = "Australia/Brisbane")),
end_date = c(ymd_hms("2018-12-10 14:59:04", tz = "Australia/Brisbane"),
ymd_hms("2018-12-10 14:02:37", tz = "Australia/Brisbane")))

df2 <- df1 %>%
mutate(episode = 96) %>%
tidyr::uncount(episode, .id = "sequence")

df2$int_start <- rep(
seq(ymd_hms("2018-12-10 00:00:00", tz = "Australia/Brisbane"),
ymd_hms("2018-12-10 23:59:59", tz = "Australia/Brisbane"),
by = "15 mins"),
2)

df2$int_end <- df2$int_start + 899

df2 %<>%
filter(int_end > start_date & int_start < end_date )


That gives me data in desired format:



enter image description here



But I have a feeling there must be smarter way to do that. Expansion for instance will not be great solution for very large dataset and/or when the time interval is small (I think). It'll probably be also a pain to extend it to work with multiple days (I think).



In Stata one could also use stsplit command to accomplish something similar. I tried tinkering with survSplit from survival package but ended up with lots of records as well:



df1$status <- 1
df1$start_date <- as.numeric(df1$start_date)
df1$end_date <- as.numeric(df1$end_date)

df3 <- survSplit(Surv(end_date, status) ~., df1,
cut=seq(from=as.numeric(as.POSIXct("2018-12-10 00:00:00")),
to=as.numeric(as.POSIXct("2018-12-10 00:00:00")) + 24*60*60,
by=900),
start = "start_int",
id="new_id",
episode ="episode")

df3$start_int <- as.POSIXct(df3$start_int, origin = "1970-01-01", tz = "Australia/Brisbane")
df3$start_date <- as.POSIXct(df3$start_date, origin = "1970-01-01", tz = "Australia/Brisbane")
df3$end_date <- as.POSIXct(df3$end_date, origin = "1970-01-01", tz = "Australia/Brisbane")


Any pointers to a better way to achieve such task?







r






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 2 at 3:55







radek

















asked Jan 2 at 3:11









radekradek

3,73943460




3,73943460













  • in the second paragraph, do you mean: 14:30:00 and 14:45:00 (not 13:XX)?

    – Khaynes
    Jan 2 at 3:21













  • @Khaynes I clarified a bit - hope that helps

    – radek
    Jan 2 at 3:56






  • 1





    you might want to check out IRanges package, data.table::foverlaps or non-equi joins using data.table

    – chinsoon12
    Jan 2 at 5:50











  • and also stackoverflow.com/questions/10862056/…

    – chinsoon12
    Jan 2 at 6:10











  • Are you looking for how many events were taking place across the whole interval (ie started before the interval began and ended after the interval ended), or were taking place at some point in the interval (ie started before the interval ended and ended after the interval began)? Or are you looking for instantaneous counts of "how many are active at a given instant" and using intervals as samples?

    – Jon Spring
    Jan 2 at 7:20



















  • in the second paragraph, do you mean: 14:30:00 and 14:45:00 (not 13:XX)?

    – Khaynes
    Jan 2 at 3:21













  • @Khaynes I clarified a bit - hope that helps

    – radek
    Jan 2 at 3:56






  • 1





    you might want to check out IRanges package, data.table::foverlaps or non-equi joins using data.table

    – chinsoon12
    Jan 2 at 5:50











  • and also stackoverflow.com/questions/10862056/…

    – chinsoon12
    Jan 2 at 6:10











  • Are you looking for how many events were taking place across the whole interval (ie started before the interval began and ended after the interval ended), or were taking place at some point in the interval (ie started before the interval ended and ended after the interval began)? Or are you looking for instantaneous counts of "how many are active at a given instant" and using intervals as samples?

    – Jon Spring
    Jan 2 at 7:20

















in the second paragraph, do you mean: 14:30:00 and 14:45:00 (not 13:XX)?

– Khaynes
Jan 2 at 3:21







in the second paragraph, do you mean: 14:30:00 and 14:45:00 (not 13:XX)?

– Khaynes
Jan 2 at 3:21















@Khaynes I clarified a bit - hope that helps

– radek
Jan 2 at 3:56





@Khaynes I clarified a bit - hope that helps

– radek
Jan 2 at 3:56




1




1





you might want to check out IRanges package, data.table::foverlaps or non-equi joins using data.table

– chinsoon12
Jan 2 at 5:50





you might want to check out IRanges package, data.table::foverlaps or non-equi joins using data.table

– chinsoon12
Jan 2 at 5:50













and also stackoverflow.com/questions/10862056/…

– chinsoon12
Jan 2 at 6:10





and also stackoverflow.com/questions/10862056/…

– chinsoon12
Jan 2 at 6:10













Are you looking for how many events were taking place across the whole interval (ie started before the interval began and ended after the interval ended), or were taking place at some point in the interval (ie started before the interval ended and ended after the interval began)? Or are you looking for instantaneous counts of "how many are active at a given instant" and using intervals as samples?

– Jon Spring
Jan 2 at 7:20





Are you looking for how many events were taking place across the whole interval (ie started before the interval began and ended after the interval ended), or were taking place at some point in the interval (ie started before the interval ended and ended after the interval began)? Or are you looking for instantaneous counts of "how many are active at a given instant" and using intervals as samples?

– Jon Spring
Jan 2 at 7:20












1 Answer
1






active

oldest

votes


















2














You might also approach this by thinking of each start_time as adding one active event and each end_time as reducing active events by one. This approach lets you identify the active events at any given instant, and it scales well. (I've used something similar to count millions of events and it's basically instantaneous.)



df2 <- df1 %>%
gather(type, time, start_date:end_date) %>%
mutate(event_chg = if_else(type == "start_date", 1, -1)) %>%
arrange(time) %>%
mutate(active_events = cumsum(event_chg))

df2
# A tibble: 4 x 5
# id type time event_chg active_events
# <dbl> <chr> <dttm> <dbl> <dbl>
#1 2 start_date 2018-12-10 13:29:37 1 1
#2 2 end_date 2018-12-10 14:02:37 -1 0
#3 1 start_date 2018-12-10 14:45:51 1 1
#4 1 end_date 2018-12-10 14:59:04 -1 0

ggplot(df2, aes(time, active_events)) + geom_step()


enter image description here



If you want to also assess the active count at regular intervals, you could integrate those intervals into your output data frame like this:



df2b <- df1 %>%
gather(type, time, start_date:end_date) %>%
mutate(event_chg = if_else(type == "start_date", 1, -1)) %>%
# NEW SECTION HERE
bind_rows(data_frame(type = "marker",
time = seq.POSIXt(ymd_h(2018121013, tz = "Australia/Brisbane"),
ymd_h(2018121016, tz = "Australia/Brisbane"),
by = 15*60), # 15 minutes of seconds = 15*60
event_chg = 0)) %>%
# END OF NEW SECTION
arrange(time) %>%
mutate(active_events = cumsum(event_chg))


Then it's possible to plot those counts directly, or filter the output data frame to see them. In this case, event id 1 occurred entirely between two 15-minute intervals.



ggplot(df2b, aes(time, active_events, label = active_events)) + 
geom_step() +
geom_point(data = df2b %>% filter(type == "marker")) +
geom_text(data = df2b %>% filter(type == "marker"), vjust = -0.5)


enter image description here






share|improve this answer


























  • Thank you Jon. Superb solution. As a small side note: df2b %>% filter(type == "marker") %>% ggplot() + geom_col(aes(time, active_events)) gives you quick overview of 'rounded' counts.

    – radek
    Jan 3 at 4:32











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54000814%2fin-r-how-do-i-split-timestamp-interval-data-into-regular-slots%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









2














You might also approach this by thinking of each start_time as adding one active event and each end_time as reducing active events by one. This approach lets you identify the active events at any given instant, and it scales well. (I've used something similar to count millions of events and it's basically instantaneous.)



df2 <- df1 %>%
gather(type, time, start_date:end_date) %>%
mutate(event_chg = if_else(type == "start_date", 1, -1)) %>%
arrange(time) %>%
mutate(active_events = cumsum(event_chg))

df2
# A tibble: 4 x 5
# id type time event_chg active_events
# <dbl> <chr> <dttm> <dbl> <dbl>
#1 2 start_date 2018-12-10 13:29:37 1 1
#2 2 end_date 2018-12-10 14:02:37 -1 0
#3 1 start_date 2018-12-10 14:45:51 1 1
#4 1 end_date 2018-12-10 14:59:04 -1 0

ggplot(df2, aes(time, active_events)) + geom_step()


enter image description here



If you want to also assess the active count at regular intervals, you could integrate those intervals into your output data frame like this:



df2b <- df1 %>%
gather(type, time, start_date:end_date) %>%
mutate(event_chg = if_else(type == "start_date", 1, -1)) %>%
# NEW SECTION HERE
bind_rows(data_frame(type = "marker",
time = seq.POSIXt(ymd_h(2018121013, tz = "Australia/Brisbane"),
ymd_h(2018121016, tz = "Australia/Brisbane"),
by = 15*60), # 15 minutes of seconds = 15*60
event_chg = 0)) %>%
# END OF NEW SECTION
arrange(time) %>%
mutate(active_events = cumsum(event_chg))


Then it's possible to plot those counts directly, or filter the output data frame to see them. In this case, event id 1 occurred entirely between two 15-minute intervals.



ggplot(df2b, aes(time, active_events, label = active_events)) + 
geom_step() +
geom_point(data = df2b %>% filter(type == "marker")) +
geom_text(data = df2b %>% filter(type == "marker"), vjust = -0.5)


enter image description here






share|improve this answer


























  • Thank you Jon. Superb solution. As a small side note: df2b %>% filter(type == "marker") %>% ggplot() + geom_col(aes(time, active_events)) gives you quick overview of 'rounded' counts.

    – radek
    Jan 3 at 4:32
















2














You might also approach this by thinking of each start_time as adding one active event and each end_time as reducing active events by one. This approach lets you identify the active events at any given instant, and it scales well. (I've used something similar to count millions of events and it's basically instantaneous.)



df2 <- df1 %>%
gather(type, time, start_date:end_date) %>%
mutate(event_chg = if_else(type == "start_date", 1, -1)) %>%
arrange(time) %>%
mutate(active_events = cumsum(event_chg))

df2
# A tibble: 4 x 5
# id type time event_chg active_events
# <dbl> <chr> <dttm> <dbl> <dbl>
#1 2 start_date 2018-12-10 13:29:37 1 1
#2 2 end_date 2018-12-10 14:02:37 -1 0
#3 1 start_date 2018-12-10 14:45:51 1 1
#4 1 end_date 2018-12-10 14:59:04 -1 0

ggplot(df2, aes(time, active_events)) + geom_step()


enter image description here



If you want to also assess the active count at regular intervals, you could integrate those intervals into your output data frame like this:



df2b <- df1 %>%
gather(type, time, start_date:end_date) %>%
mutate(event_chg = if_else(type == "start_date", 1, -1)) %>%
# NEW SECTION HERE
bind_rows(data_frame(type = "marker",
time = seq.POSIXt(ymd_h(2018121013, tz = "Australia/Brisbane"),
ymd_h(2018121016, tz = "Australia/Brisbane"),
by = 15*60), # 15 minutes of seconds = 15*60
event_chg = 0)) %>%
# END OF NEW SECTION
arrange(time) %>%
mutate(active_events = cumsum(event_chg))


Then it's possible to plot those counts directly, or filter the output data frame to see them. In this case, event id 1 occurred entirely between two 15-minute intervals.



ggplot(df2b, aes(time, active_events, label = active_events)) + 
geom_step() +
geom_point(data = df2b %>% filter(type == "marker")) +
geom_text(data = df2b %>% filter(type == "marker"), vjust = -0.5)


enter image description here






share|improve this answer


























  • Thank you Jon. Superb solution. As a small side note: df2b %>% filter(type == "marker") %>% ggplot() + geom_col(aes(time, active_events)) gives you quick overview of 'rounded' counts.

    – radek
    Jan 3 at 4:32














2












2








2







You might also approach this by thinking of each start_time as adding one active event and each end_time as reducing active events by one. This approach lets you identify the active events at any given instant, and it scales well. (I've used something similar to count millions of events and it's basically instantaneous.)



df2 <- df1 %>%
gather(type, time, start_date:end_date) %>%
mutate(event_chg = if_else(type == "start_date", 1, -1)) %>%
arrange(time) %>%
mutate(active_events = cumsum(event_chg))

df2
# A tibble: 4 x 5
# id type time event_chg active_events
# <dbl> <chr> <dttm> <dbl> <dbl>
#1 2 start_date 2018-12-10 13:29:37 1 1
#2 2 end_date 2018-12-10 14:02:37 -1 0
#3 1 start_date 2018-12-10 14:45:51 1 1
#4 1 end_date 2018-12-10 14:59:04 -1 0

ggplot(df2, aes(time, active_events)) + geom_step()


enter image description here



If you want to also assess the active count at regular intervals, you could integrate those intervals into your output data frame like this:



df2b <- df1 %>%
gather(type, time, start_date:end_date) %>%
mutate(event_chg = if_else(type == "start_date", 1, -1)) %>%
# NEW SECTION HERE
bind_rows(data_frame(type = "marker",
time = seq.POSIXt(ymd_h(2018121013, tz = "Australia/Brisbane"),
ymd_h(2018121016, tz = "Australia/Brisbane"),
by = 15*60), # 15 minutes of seconds = 15*60
event_chg = 0)) %>%
# END OF NEW SECTION
arrange(time) %>%
mutate(active_events = cumsum(event_chg))


Then it's possible to plot those counts directly, or filter the output data frame to see them. In this case, event id 1 occurred entirely between two 15-minute intervals.



ggplot(df2b, aes(time, active_events, label = active_events)) + 
geom_step() +
geom_point(data = df2b %>% filter(type == "marker")) +
geom_text(data = df2b %>% filter(type == "marker"), vjust = -0.5)


enter image description here






share|improve this answer















You might also approach this by thinking of each start_time as adding one active event and each end_time as reducing active events by one. This approach lets you identify the active events at any given instant, and it scales well. (I've used something similar to count millions of events and it's basically instantaneous.)



df2 <- df1 %>%
gather(type, time, start_date:end_date) %>%
mutate(event_chg = if_else(type == "start_date", 1, -1)) %>%
arrange(time) %>%
mutate(active_events = cumsum(event_chg))

df2
# A tibble: 4 x 5
# id type time event_chg active_events
# <dbl> <chr> <dttm> <dbl> <dbl>
#1 2 start_date 2018-12-10 13:29:37 1 1
#2 2 end_date 2018-12-10 14:02:37 -1 0
#3 1 start_date 2018-12-10 14:45:51 1 1
#4 1 end_date 2018-12-10 14:59:04 -1 0

ggplot(df2, aes(time, active_events)) + geom_step()


enter image description here



If you want to also assess the active count at regular intervals, you could integrate those intervals into your output data frame like this:



df2b <- df1 %>%
gather(type, time, start_date:end_date) %>%
mutate(event_chg = if_else(type == "start_date", 1, -1)) %>%
# NEW SECTION HERE
bind_rows(data_frame(type = "marker",
time = seq.POSIXt(ymd_h(2018121013, tz = "Australia/Brisbane"),
ymd_h(2018121016, tz = "Australia/Brisbane"),
by = 15*60), # 15 minutes of seconds = 15*60
event_chg = 0)) %>%
# END OF NEW SECTION
arrange(time) %>%
mutate(active_events = cumsum(event_chg))


Then it's possible to plot those counts directly, or filter the output data frame to see them. In this case, event id 1 occurred entirely between two 15-minute intervals.



ggplot(df2b, aes(time, active_events, label = active_events)) + 
geom_step() +
geom_point(data = df2b %>% filter(type == "marker")) +
geom_text(data = df2b %>% filter(type == "marker"), vjust = -0.5)


enter image description here







share|improve this answer














share|improve this answer



share|improve this answer








edited Jan 2 at 7:25

























answered Jan 2 at 6:51









Jon SpringJon Spring

7,1081829




7,1081829













  • Thank you Jon. Superb solution. As a small side note: df2b %>% filter(type == "marker") %>% ggplot() + geom_col(aes(time, active_events)) gives you quick overview of 'rounded' counts.

    – radek
    Jan 3 at 4:32



















  • Thank you Jon. Superb solution. As a small side note: df2b %>% filter(type == "marker") %>% ggplot() + geom_col(aes(time, active_events)) gives you quick overview of 'rounded' counts.

    – radek
    Jan 3 at 4:32

















Thank you Jon. Superb solution. As a small side note: df2b %>% filter(type == "marker") %>% ggplot() + geom_col(aes(time, active_events)) gives you quick overview of 'rounded' counts.

– radek
Jan 3 at 4:32





Thank you Jon. Superb solution. As a small side note: df2b %>% filter(type == "marker") %>% ggplot() + geom_col(aes(time, active_events)) gives you quick overview of 'rounded' counts.

– radek
Jan 3 at 4:32




















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54000814%2fin-r-how-do-i-split-timestamp-interval-data-into-regular-slots%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

MongoDB - Not Authorized To Execute Command

How to fix TextFormField cause rebuild widget in Flutter

Npm cannot find a required file even through it is in the searched directory