Using apply() function to iterate over different data types doesn't work
I want to write a function that dynamically uses different correlation methods depending on the scale of measure of the feature (continuous, dichotomous, ordinal). The label is always continuous. My idea was to use the apply() function, so iterate over every feature (aka column), check it's scale of measure (numeric, factor with two levels, factor with more than two levels) and then use the appropriate correlation function. Unfortunately my code seems to convert every feature into a character vector and as consequence the condition in the if statement is always false for every column. I don't know why my code is doing this. How can I prevent my code from converting my features to character vectors?
set.seed(42)
foo <- sample(c("x", "y"), 200, replace = T, prob = c(0.7, 0.3))
bar <- sample(c(1,2,3,4,5),200,replace = T,prob=c(0.5,0.05,0.1,0.1,0.25))
y <- sample(c(1,2,3,4,5),200,replace = T,prob=c(0.25,0.1,0.1,0.05,0.5))
data <- data.frame(foo,bar,y)
features <- data[, !names(data) %in% 'y']
dyn.corr <- function(x,y){
# print out structure of every column
print(str(x))
# if feature is numeric and has more than two outcomes use corr.test
if(is.numeric(x) & length(unique(x))>2){
result <- corr.test(x,y)[['r']]
} else {
result <- "else"
}
}
result <- apply(features,2,dyn.corr,y)
r if-statement apply correlation
add a comment |
I want to write a function that dynamically uses different correlation methods depending on the scale of measure of the feature (continuous, dichotomous, ordinal). The label is always continuous. My idea was to use the apply() function, so iterate over every feature (aka column), check it's scale of measure (numeric, factor with two levels, factor with more than two levels) and then use the appropriate correlation function. Unfortunately my code seems to convert every feature into a character vector and as consequence the condition in the if statement is always false for every column. I don't know why my code is doing this. How can I prevent my code from converting my features to character vectors?
set.seed(42)
foo <- sample(c("x", "y"), 200, replace = T, prob = c(0.7, 0.3))
bar <- sample(c(1,2,3,4,5),200,replace = T,prob=c(0.5,0.05,0.1,0.1,0.25))
y <- sample(c(1,2,3,4,5),200,replace = T,prob=c(0.25,0.1,0.1,0.05,0.5))
data <- data.frame(foo,bar,y)
features <- data[, !names(data) %in% 'y']
dyn.corr <- function(x,y){
# print out structure of every column
print(str(x))
# if feature is numeric and has more than two outcomes use corr.test
if(is.numeric(x) & length(unique(x))>2){
result <- corr.test(x,y)[['r']]
} else {
result <- "else"
}
}
result <- apply(features,2,dyn.corr,y)
r if-statement apply correlation
2
In general, please don't includerm(list = ls())
in your questions. I copy/pasted your code into R and almost ran it before noticing that line. I would have been very sad to lose some of the things I had been working on.
– Gregor
Nov 20 '18 at 16:06
add a comment |
I want to write a function that dynamically uses different correlation methods depending on the scale of measure of the feature (continuous, dichotomous, ordinal). The label is always continuous. My idea was to use the apply() function, so iterate over every feature (aka column), check it's scale of measure (numeric, factor with two levels, factor with more than two levels) and then use the appropriate correlation function. Unfortunately my code seems to convert every feature into a character vector and as consequence the condition in the if statement is always false for every column. I don't know why my code is doing this. How can I prevent my code from converting my features to character vectors?
set.seed(42)
foo <- sample(c("x", "y"), 200, replace = T, prob = c(0.7, 0.3))
bar <- sample(c(1,2,3,4,5),200,replace = T,prob=c(0.5,0.05,0.1,0.1,0.25))
y <- sample(c(1,2,3,4,5),200,replace = T,prob=c(0.25,0.1,0.1,0.05,0.5))
data <- data.frame(foo,bar,y)
features <- data[, !names(data) %in% 'y']
dyn.corr <- function(x,y){
# print out structure of every column
print(str(x))
# if feature is numeric and has more than two outcomes use corr.test
if(is.numeric(x) & length(unique(x))>2){
result <- corr.test(x,y)[['r']]
} else {
result <- "else"
}
}
result <- apply(features,2,dyn.corr,y)
r if-statement apply correlation
I want to write a function that dynamically uses different correlation methods depending on the scale of measure of the feature (continuous, dichotomous, ordinal). The label is always continuous. My idea was to use the apply() function, so iterate over every feature (aka column), check it's scale of measure (numeric, factor with two levels, factor with more than two levels) and then use the appropriate correlation function. Unfortunately my code seems to convert every feature into a character vector and as consequence the condition in the if statement is always false for every column. I don't know why my code is doing this. How can I prevent my code from converting my features to character vectors?
set.seed(42)
foo <- sample(c("x", "y"), 200, replace = T, prob = c(0.7, 0.3))
bar <- sample(c(1,2,3,4,5),200,replace = T,prob=c(0.5,0.05,0.1,0.1,0.25))
y <- sample(c(1,2,3,4,5),200,replace = T,prob=c(0.25,0.1,0.1,0.05,0.5))
data <- data.frame(foo,bar,y)
features <- data[, !names(data) %in% 'y']
dyn.corr <- function(x,y){
# print out structure of every column
print(str(x))
# if feature is numeric and has more than two outcomes use corr.test
if(is.numeric(x) & length(unique(x))>2){
result <- corr.test(x,y)[['r']]
} else {
result <- "else"
}
}
result <- apply(features,2,dyn.corr,y)
r if-statement apply correlation
r if-statement apply correlation
edited Nov 20 '18 at 16:41
Johannes Wiesner
asked Nov 20 '18 at 16:02
Johannes WiesnerJohannes Wiesner
167
167
2
In general, please don't includerm(list = ls())
in your questions. I copy/pasted your code into R and almost ran it before noticing that line. I would have been very sad to lose some of the things I had been working on.
– Gregor
Nov 20 '18 at 16:06
add a comment |
2
In general, please don't includerm(list = ls())
in your questions. I copy/pasted your code into R and almost ran it before noticing that line. I would have been very sad to lose some of the things I had been working on.
– Gregor
Nov 20 '18 at 16:06
2
2
In general, please don't include
rm(list = ls())
in your questions. I copy/pasted your code into R and almost ran it before noticing that line. I would have been very sad to lose some of the things I had been working on.– Gregor
Nov 20 '18 at 16:06
In general, please don't include
rm(list = ls())
in your questions. I copy/pasted your code into R and almost ran it before noticing that line. I would have been very sad to lose some of the things I had been working on.– Gregor
Nov 20 '18 at 16:06
add a comment |
1 Answer
1
active
oldest
votes
apply
is built for matrices. When you apply
to a data frame, the first thing that happens is coercing your data frame to a matrix. A matrix can only have one data type, so all columns of your data are converted to the most general type among them when this happens.
Use sapply
or lapply
to work with columns of a data frame.
This should work fine (I tried to test, but I don't know what package to load to get the corr.test
function.)
result <- sapply(features, dyn.corr, income)
Thanks a lot, that solved my problem!
– Johannes Wiesner
Nov 20 '18 at 16:42
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53396939%2fusing-apply-function-to-iterate-over-different-data-types-doesnt-work%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
apply
is built for matrices. When you apply
to a data frame, the first thing that happens is coercing your data frame to a matrix. A matrix can only have one data type, so all columns of your data are converted to the most general type among them when this happens.
Use sapply
or lapply
to work with columns of a data frame.
This should work fine (I tried to test, but I don't know what package to load to get the corr.test
function.)
result <- sapply(features, dyn.corr, income)
Thanks a lot, that solved my problem!
– Johannes Wiesner
Nov 20 '18 at 16:42
add a comment |
apply
is built for matrices. When you apply
to a data frame, the first thing that happens is coercing your data frame to a matrix. A matrix can only have one data type, so all columns of your data are converted to the most general type among them when this happens.
Use sapply
or lapply
to work with columns of a data frame.
This should work fine (I tried to test, but I don't know what package to load to get the corr.test
function.)
result <- sapply(features, dyn.corr, income)
Thanks a lot, that solved my problem!
– Johannes Wiesner
Nov 20 '18 at 16:42
add a comment |
apply
is built for matrices. When you apply
to a data frame, the first thing that happens is coercing your data frame to a matrix. A matrix can only have one data type, so all columns of your data are converted to the most general type among them when this happens.
Use sapply
or lapply
to work with columns of a data frame.
This should work fine (I tried to test, but I don't know what package to load to get the corr.test
function.)
result <- sapply(features, dyn.corr, income)
apply
is built for matrices. When you apply
to a data frame, the first thing that happens is coercing your data frame to a matrix. A matrix can only have one data type, so all columns of your data are converted to the most general type among them when this happens.
Use sapply
or lapply
to work with columns of a data frame.
This should work fine (I tried to test, but I don't know what package to load to get the corr.test
function.)
result <- sapply(features, dyn.corr, income)
answered Nov 20 '18 at 16:06


GregorGregor
63.6k990170
63.6k990170
Thanks a lot, that solved my problem!
– Johannes Wiesner
Nov 20 '18 at 16:42
add a comment |
Thanks a lot, that solved my problem!
– Johannes Wiesner
Nov 20 '18 at 16:42
Thanks a lot, that solved my problem!
– Johannes Wiesner
Nov 20 '18 at 16:42
Thanks a lot, that solved my problem!
– Johannes Wiesner
Nov 20 '18 at 16:42
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53396939%2fusing-apply-function-to-iterate-over-different-data-types-doesnt-work%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
In general, please don't include
rm(list = ls())
in your questions. I copy/pasted your code into R and almost ran it before noticing that line. I would have been very sad to lose some of the things I had been working on.– Gregor
Nov 20 '18 at 16:06