Using Dplyr within a user-defined function to summarise data then plot it
I am trying to use dplyr within a function to create a user-defined function that I can pass multiple arguments to summarise data with dplyr then plot it with ggplot.
Here is some sample data and what I am trying to do with dplyr then plot
df <-data.frame(Year = c("2006", "2006", "2006", "2007", "2007", "2007", "2008", "2009", "2010", "2010", "2009", "2009"), JudicialOrientation = c("Defense", "Plaintiff", "Plaintiff", "Neutral", "Defense", "Plaintiff", "Defense", "Plaintiff", "Neutral", "Neutral", "Plaintiff","Defense"), Loss = c(100000, 100, 2500, 100000, 25000, 0, 7500, 5200, 900, 100, 0, 50))
df1 <- df %>%
group_by(Year, JudicialOrientation) %>%
summarise(MeanLoss =mean(Loss))
ggplot(df1, aes(x = JudicialOrientation, y = MeanLoss, color = Year, group =Year)) +
geom_line() +
geom_point()
I am now trying to replicate this into a user function so that I can pass different variables to get similar results.
Here is my attempt so far:
ConsistencyPlot <- function(df,var1,timevar,lossvar){
df1 <- df %>%
group_by_(df[timevar], df[var1]) %>%
summarise_(MeanLoss = mean(df[lossvar]))
ggplot(df1, aes(x = var1, y = MeanLoss, color = timevar, group = timevar)) +
geom_line() +
geom_point()
}
ConsistencyPlot(df,"JudicialOrientation","Year",'Loss')
I am replicating the same logic and passing in df
as my dataframe, var1
as the JudicialOrientation
, timevar
as Year
and lossvar
as my vector of Loss
values that I want averaged through summarise
. I cannot get the same results however so I feel like I am missing something with how these functions are used within a closure.
r ggplot2 dplyr aggregate
add a comment |
I am trying to use dplyr within a function to create a user-defined function that I can pass multiple arguments to summarise data with dplyr then plot it with ggplot.
Here is some sample data and what I am trying to do with dplyr then plot
df <-data.frame(Year = c("2006", "2006", "2006", "2007", "2007", "2007", "2008", "2009", "2010", "2010", "2009", "2009"), JudicialOrientation = c("Defense", "Plaintiff", "Plaintiff", "Neutral", "Defense", "Plaintiff", "Defense", "Plaintiff", "Neutral", "Neutral", "Plaintiff","Defense"), Loss = c(100000, 100, 2500, 100000, 25000, 0, 7500, 5200, 900, 100, 0, 50))
df1 <- df %>%
group_by(Year, JudicialOrientation) %>%
summarise(MeanLoss =mean(Loss))
ggplot(df1, aes(x = JudicialOrientation, y = MeanLoss, color = Year, group =Year)) +
geom_line() +
geom_point()
I am now trying to replicate this into a user function so that I can pass different variables to get similar results.
Here is my attempt so far:
ConsistencyPlot <- function(df,var1,timevar,lossvar){
df1 <- df %>%
group_by_(df[timevar], df[var1]) %>%
summarise_(MeanLoss = mean(df[lossvar]))
ggplot(df1, aes(x = var1, y = MeanLoss, color = timevar, group = timevar)) +
geom_line() +
geom_point()
}
ConsistencyPlot(df,"JudicialOrientation","Year",'Loss')
I am replicating the same logic and passing in df
as my dataframe, var1
as the JudicialOrientation
, timevar
as Year
and lossvar
as my vector of Loss
values that I want averaged through summarise
. I cannot get the same results however so I feel like I am missing something with how these functions are used within a closure.
r ggplot2 dplyr aggregate
add a comment |
I am trying to use dplyr within a function to create a user-defined function that I can pass multiple arguments to summarise data with dplyr then plot it with ggplot.
Here is some sample data and what I am trying to do with dplyr then plot
df <-data.frame(Year = c("2006", "2006", "2006", "2007", "2007", "2007", "2008", "2009", "2010", "2010", "2009", "2009"), JudicialOrientation = c("Defense", "Plaintiff", "Plaintiff", "Neutral", "Defense", "Plaintiff", "Defense", "Plaintiff", "Neutral", "Neutral", "Plaintiff","Defense"), Loss = c(100000, 100, 2500, 100000, 25000, 0, 7500, 5200, 900, 100, 0, 50))
df1 <- df %>%
group_by(Year, JudicialOrientation) %>%
summarise(MeanLoss =mean(Loss))
ggplot(df1, aes(x = JudicialOrientation, y = MeanLoss, color = Year, group =Year)) +
geom_line() +
geom_point()
I am now trying to replicate this into a user function so that I can pass different variables to get similar results.
Here is my attempt so far:
ConsistencyPlot <- function(df,var1,timevar,lossvar){
df1 <- df %>%
group_by_(df[timevar], df[var1]) %>%
summarise_(MeanLoss = mean(df[lossvar]))
ggplot(df1, aes(x = var1, y = MeanLoss, color = timevar, group = timevar)) +
geom_line() +
geom_point()
}
ConsistencyPlot(df,"JudicialOrientation","Year",'Loss')
I am replicating the same logic and passing in df
as my dataframe, var1
as the JudicialOrientation
, timevar
as Year
and lossvar
as my vector of Loss
values that I want averaged through summarise
. I cannot get the same results however so I feel like I am missing something with how these functions are used within a closure.
r ggplot2 dplyr aggregate
I am trying to use dplyr within a function to create a user-defined function that I can pass multiple arguments to summarise data with dplyr then plot it with ggplot.
Here is some sample data and what I am trying to do with dplyr then plot
df <-data.frame(Year = c("2006", "2006", "2006", "2007", "2007", "2007", "2008", "2009", "2010", "2010", "2009", "2009"), JudicialOrientation = c("Defense", "Plaintiff", "Plaintiff", "Neutral", "Defense", "Plaintiff", "Defense", "Plaintiff", "Neutral", "Neutral", "Plaintiff","Defense"), Loss = c(100000, 100, 2500, 100000, 25000, 0, 7500, 5200, 900, 100, 0, 50))
df1 <- df %>%
group_by(Year, JudicialOrientation) %>%
summarise(MeanLoss =mean(Loss))
ggplot(df1, aes(x = JudicialOrientation, y = MeanLoss, color = Year, group =Year)) +
geom_line() +
geom_point()
I am now trying to replicate this into a user function so that I can pass different variables to get similar results.
Here is my attempt so far:
ConsistencyPlot <- function(df,var1,timevar,lossvar){
df1 <- df %>%
group_by_(df[timevar], df[var1]) %>%
summarise_(MeanLoss = mean(df[lossvar]))
ggplot(df1, aes(x = var1, y = MeanLoss, color = timevar, group = timevar)) +
geom_line() +
geom_point()
}
ConsistencyPlot(df,"JudicialOrientation","Year",'Loss')
I am replicating the same logic and passing in df
as my dataframe, var1
as the JudicialOrientation
, timevar
as Year
and lossvar
as my vector of Loss
values that I want averaged through summarise
. I cannot get the same results however so I feel like I am missing something with how these functions are used within a closure.
r ggplot2 dplyr aggregate
r ggplot2 dplyr aggregate
edited Nov 21 '18 at 17:59


Tjebo
2,4311429
2,4311429
asked Nov 21 '18 at 14:30
Coldchain9Coldchain9
325
325
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
First of all, inside dplyr functions you don't need to call variables indexing the dataframe like df[, timevar]
. Use just the variable name. Besides that, when indexing a dataframe you have to specify if you are calling columns or rows, so df[timevar]
is wrong.
About the function, it's a problem of evaluation.
This structure below is working:
ConsistencyPlot <- function(df, var1, timevar, lossvar){
var1 <- enquo(var1)
timevar <- enquo(timevar)
lossvar <- enquo(lossvar)
df1 <- df %>%
group_by(!!timevar, !!var1) %>%
summarise(MeanLoss = mean(!!lossvar))
ggplot(df1, aes(x = !!var1, y = MeanLoss, color = !!timevar, group = !!timevar)) +
geom_line() +
geom_point()
}
Look that the parameters were transformed with enquo()
and then passed in the function using !!
. So, you can pass the arguments without quoting them.
ConsistencyPlot(df, JudicialOrientation, Year, Loss)
I hope you find it useful.
I realized I was referencing column names wrong right after I posted this question. Can you explain to me what !! is doing? This is exactly what I wanted. Thank you very much.
– Coldchain9
Nov 21 '18 at 15:19
It is an unquoting operator. See?"!!"
.
– Anonymous coward
Nov 21 '18 at 15:20
1
It's exactly what @Anonymouscoward said. For a deepier exaplanation, take a look here. Happy to help.
– Bruno Pinheiro
Nov 21 '18 at 15:29
I guess I am just trying to figure out why does the quoting-unquoting methodology work vs just sending the arguments in their unquoted form. It can't recognize the variables if I send them in so why does it work when they are enquoted then immediately unquoted with !!. I read ?enquo and if I understand correctly, the quosure maintains the original environment but !! just removes the quotes for evaluation purposes?
– Coldchain9
Nov 21 '18 at 16:12
2
@Coldchain9: see this for further explanation stackoverflow.com/questions/51738267/…
– Tung
Nov 21 '18 at 16:36
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53414314%2fusing-dplyr-within-a-user-defined-function-to-summarise-data-then-plot-it%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
First of all, inside dplyr functions you don't need to call variables indexing the dataframe like df[, timevar]
. Use just the variable name. Besides that, when indexing a dataframe you have to specify if you are calling columns or rows, so df[timevar]
is wrong.
About the function, it's a problem of evaluation.
This structure below is working:
ConsistencyPlot <- function(df, var1, timevar, lossvar){
var1 <- enquo(var1)
timevar <- enquo(timevar)
lossvar <- enquo(lossvar)
df1 <- df %>%
group_by(!!timevar, !!var1) %>%
summarise(MeanLoss = mean(!!lossvar))
ggplot(df1, aes(x = !!var1, y = MeanLoss, color = !!timevar, group = !!timevar)) +
geom_line() +
geom_point()
}
Look that the parameters were transformed with enquo()
and then passed in the function using !!
. So, you can pass the arguments without quoting them.
ConsistencyPlot(df, JudicialOrientation, Year, Loss)
I hope you find it useful.
I realized I was referencing column names wrong right after I posted this question. Can you explain to me what !! is doing? This is exactly what I wanted. Thank you very much.
– Coldchain9
Nov 21 '18 at 15:19
It is an unquoting operator. See?"!!"
.
– Anonymous coward
Nov 21 '18 at 15:20
1
It's exactly what @Anonymouscoward said. For a deepier exaplanation, take a look here. Happy to help.
– Bruno Pinheiro
Nov 21 '18 at 15:29
I guess I am just trying to figure out why does the quoting-unquoting methodology work vs just sending the arguments in their unquoted form. It can't recognize the variables if I send them in so why does it work when they are enquoted then immediately unquoted with !!. I read ?enquo and if I understand correctly, the quosure maintains the original environment but !! just removes the quotes for evaluation purposes?
– Coldchain9
Nov 21 '18 at 16:12
2
@Coldchain9: see this for further explanation stackoverflow.com/questions/51738267/…
– Tung
Nov 21 '18 at 16:36
add a comment |
First of all, inside dplyr functions you don't need to call variables indexing the dataframe like df[, timevar]
. Use just the variable name. Besides that, when indexing a dataframe you have to specify if you are calling columns or rows, so df[timevar]
is wrong.
About the function, it's a problem of evaluation.
This structure below is working:
ConsistencyPlot <- function(df, var1, timevar, lossvar){
var1 <- enquo(var1)
timevar <- enquo(timevar)
lossvar <- enquo(lossvar)
df1 <- df %>%
group_by(!!timevar, !!var1) %>%
summarise(MeanLoss = mean(!!lossvar))
ggplot(df1, aes(x = !!var1, y = MeanLoss, color = !!timevar, group = !!timevar)) +
geom_line() +
geom_point()
}
Look that the parameters were transformed with enquo()
and then passed in the function using !!
. So, you can pass the arguments without quoting them.
ConsistencyPlot(df, JudicialOrientation, Year, Loss)
I hope you find it useful.
I realized I was referencing column names wrong right after I posted this question. Can you explain to me what !! is doing? This is exactly what I wanted. Thank you very much.
– Coldchain9
Nov 21 '18 at 15:19
It is an unquoting operator. See?"!!"
.
– Anonymous coward
Nov 21 '18 at 15:20
1
It's exactly what @Anonymouscoward said. For a deepier exaplanation, take a look here. Happy to help.
– Bruno Pinheiro
Nov 21 '18 at 15:29
I guess I am just trying to figure out why does the quoting-unquoting methodology work vs just sending the arguments in their unquoted form. It can't recognize the variables if I send them in so why does it work when they are enquoted then immediately unquoted with !!. I read ?enquo and if I understand correctly, the quosure maintains the original environment but !! just removes the quotes for evaluation purposes?
– Coldchain9
Nov 21 '18 at 16:12
2
@Coldchain9: see this for further explanation stackoverflow.com/questions/51738267/…
– Tung
Nov 21 '18 at 16:36
add a comment |
First of all, inside dplyr functions you don't need to call variables indexing the dataframe like df[, timevar]
. Use just the variable name. Besides that, when indexing a dataframe you have to specify if you are calling columns or rows, so df[timevar]
is wrong.
About the function, it's a problem of evaluation.
This structure below is working:
ConsistencyPlot <- function(df, var1, timevar, lossvar){
var1 <- enquo(var1)
timevar <- enquo(timevar)
lossvar <- enquo(lossvar)
df1 <- df %>%
group_by(!!timevar, !!var1) %>%
summarise(MeanLoss = mean(!!lossvar))
ggplot(df1, aes(x = !!var1, y = MeanLoss, color = !!timevar, group = !!timevar)) +
geom_line() +
geom_point()
}
Look that the parameters were transformed with enquo()
and then passed in the function using !!
. So, you can pass the arguments without quoting them.
ConsistencyPlot(df, JudicialOrientation, Year, Loss)
I hope you find it useful.
First of all, inside dplyr functions you don't need to call variables indexing the dataframe like df[, timevar]
. Use just the variable name. Besides that, when indexing a dataframe you have to specify if you are calling columns or rows, so df[timevar]
is wrong.
About the function, it's a problem of evaluation.
This structure below is working:
ConsistencyPlot <- function(df, var1, timevar, lossvar){
var1 <- enquo(var1)
timevar <- enquo(timevar)
lossvar <- enquo(lossvar)
df1 <- df %>%
group_by(!!timevar, !!var1) %>%
summarise(MeanLoss = mean(!!lossvar))
ggplot(df1, aes(x = !!var1, y = MeanLoss, color = !!timevar, group = !!timevar)) +
geom_line() +
geom_point()
}
Look that the parameters were transformed with enquo()
and then passed in the function using !!
. So, you can pass the arguments without quoting them.
ConsistencyPlot(df, JudicialOrientation, Year, Loss)
I hope you find it useful.
edited Nov 21 '18 at 15:32
answered Nov 21 '18 at 15:12


Bruno PinheiroBruno Pinheiro
402112
402112
I realized I was referencing column names wrong right after I posted this question. Can you explain to me what !! is doing? This is exactly what I wanted. Thank you very much.
– Coldchain9
Nov 21 '18 at 15:19
It is an unquoting operator. See?"!!"
.
– Anonymous coward
Nov 21 '18 at 15:20
1
It's exactly what @Anonymouscoward said. For a deepier exaplanation, take a look here. Happy to help.
– Bruno Pinheiro
Nov 21 '18 at 15:29
I guess I am just trying to figure out why does the quoting-unquoting methodology work vs just sending the arguments in their unquoted form. It can't recognize the variables if I send them in so why does it work when they are enquoted then immediately unquoted with !!. I read ?enquo and if I understand correctly, the quosure maintains the original environment but !! just removes the quotes for evaluation purposes?
– Coldchain9
Nov 21 '18 at 16:12
2
@Coldchain9: see this for further explanation stackoverflow.com/questions/51738267/…
– Tung
Nov 21 '18 at 16:36
add a comment |
I realized I was referencing column names wrong right after I posted this question. Can you explain to me what !! is doing? This is exactly what I wanted. Thank you very much.
– Coldchain9
Nov 21 '18 at 15:19
It is an unquoting operator. See?"!!"
.
– Anonymous coward
Nov 21 '18 at 15:20
1
It's exactly what @Anonymouscoward said. For a deepier exaplanation, take a look here. Happy to help.
– Bruno Pinheiro
Nov 21 '18 at 15:29
I guess I am just trying to figure out why does the quoting-unquoting methodology work vs just sending the arguments in their unquoted form. It can't recognize the variables if I send them in so why does it work when they are enquoted then immediately unquoted with !!. I read ?enquo and if I understand correctly, the quosure maintains the original environment but !! just removes the quotes for evaluation purposes?
– Coldchain9
Nov 21 '18 at 16:12
2
@Coldchain9: see this for further explanation stackoverflow.com/questions/51738267/…
– Tung
Nov 21 '18 at 16:36
I realized I was referencing column names wrong right after I posted this question. Can you explain to me what !! is doing? This is exactly what I wanted. Thank you very much.
– Coldchain9
Nov 21 '18 at 15:19
I realized I was referencing column names wrong right after I posted this question. Can you explain to me what !! is doing? This is exactly what I wanted. Thank you very much.
– Coldchain9
Nov 21 '18 at 15:19
It is an unquoting operator. See
?"!!"
.– Anonymous coward
Nov 21 '18 at 15:20
It is an unquoting operator. See
?"!!"
.– Anonymous coward
Nov 21 '18 at 15:20
1
1
It's exactly what @Anonymouscoward said. For a deepier exaplanation, take a look here. Happy to help.
– Bruno Pinheiro
Nov 21 '18 at 15:29
It's exactly what @Anonymouscoward said. For a deepier exaplanation, take a look here. Happy to help.
– Bruno Pinheiro
Nov 21 '18 at 15:29
I guess I am just trying to figure out why does the quoting-unquoting methodology work vs just sending the arguments in their unquoted form. It can't recognize the variables if I send them in so why does it work when they are enquoted then immediately unquoted with !!. I read ?enquo and if I understand correctly, the quosure maintains the original environment but !! just removes the quotes for evaluation purposes?
– Coldchain9
Nov 21 '18 at 16:12
I guess I am just trying to figure out why does the quoting-unquoting methodology work vs just sending the arguments in their unquoted form. It can't recognize the variables if I send them in so why does it work when they are enquoted then immediately unquoted with !!. I read ?enquo and if I understand correctly, the quosure maintains the original environment but !! just removes the quotes for evaluation purposes?
– Coldchain9
Nov 21 '18 at 16:12
2
2
@Coldchain9: see this for further explanation stackoverflow.com/questions/51738267/…
– Tung
Nov 21 '18 at 16:36
@Coldchain9: see this for further explanation stackoverflow.com/questions/51738267/…
– Tung
Nov 21 '18 at 16:36
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53414314%2fusing-dplyr-within-a-user-defined-function-to-summarise-data-then-plot-it%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown