For loop and if statements in R

I have a dataframe orange_train which has 231 variables and 50,000 observations. I want to check each variable for NA's or Zero's. If sum of NA (for factors) and Zero's(for numeric and integers) is greater than 75% of the 50,000, I want to eliminate those variables. My code is as below: But its not working as expected:

counting_na <- function(x) {sum(is.na(x))}

counting_zero <- function(x){length(which(x==0))}



for(i in 1:ncol(orange_train)){

  if (class(orange_train$Var[i])=='numeric' && sum(is.na(orange_train$Var[i]))< 32500) 

    {print(orange_train$Var[i])}

  else (class(orange_train$Var[i])=='integer' && [enter image description here][1]counting_zero(orange_train$Var[i]) < 32500)

  {print(orange_train$Var[i])}

Could someone please help me with the code. I have been struggling for a long time now and am very new to R.

my columns have headers Var1 - Var231 and the data types are numeric, factors and integers. I hope this helps

edited Nov 22 '18 at 0:58

asked Nov 21 '18 at 21:46

Sindhu Viswanathan

counting_zero <- function(x) sum(x==0)

– jogo
Nov 21 '18 at 21:49

1

It would be helpful if you gave a sample of what your data looks like using dput(). Also, you're looping over the columns in orange_train, but you're indexing over the rows in one variable. Perhaps you mean orange_train[[i]], instead of orange_train$Var[i]?

– mickey
Nov 21 '18 at 22:03

1

Welcome to SO! Please read How to Ask give a Minimal, Complete, and Verifiable example in your question! Copy the output of dput(head(orange_train, 10)) in your question!

– jogo
Nov 21 '18 at 22:04

my columns have headers Var1 - Var231 and the data types are numeric, factors and integers. I hope this helps

– Sindhu Viswanathan
Nov 22 '18 at 0:29

@SindhuViswanathan, it does, but then you are still indexing them improperly. You could use orange_train[paste0('Var', i)] instead.

– mickey
Nov 22 '18 at 2:37

add a comment |

counting_na <- function(x) {sum(is.na(x))}

counting_zero <- function(x){length(which(x==0))}



for(i in 1:ncol(orange_train)){

  if (class(orange_train$Var[i])=='numeric' && sum(is.na(orange_train$Var[i]))< 32500) 

    {print(orange_train$Var[i])}

  else (class(orange_train$Var[i])=='integer' && [enter image description here][1]counting_zero(orange_train$Var[i]) < 32500)

  {print(orange_train$Var[i])}

Could someone please help me with the code. I have been struggling for a long time now and am very new to R.

my columns have headers Var1 - Var231 and the data types are numeric, factors and integers. I hope this helps

edited Nov 22 '18 at 0:58

asked Nov 21 '18 at 21:46

Sindhu Viswanathan

counting_zero <- function(x) sum(x==0)

– jogo
Nov 21 '18 at 21:49

1

It would be helpful if you gave a sample of what your data looks like using dput(). Also, you're looping over the columns in orange_train, but you're indexing over the rows in one variable. Perhaps you mean orange_train[[i]], instead of orange_train$Var[i]?

– mickey
Nov 21 '18 at 22:03

1

Welcome to SO! Please read How to Ask give a Minimal, Complete, and Verifiable example in your question! Copy the output of dput(head(orange_train, 10)) in your question!

– jogo
Nov 21 '18 at 22:04

my columns have headers Var1 - Var231 and the data types are numeric, factors and integers. I hope this helps

– Sindhu Viswanathan
Nov 22 '18 at 0:29

@SindhuViswanathan, it does, but then you are still indexing them improperly. You could use orange_train[paste0('Var', i)] instead.

– mickey
Nov 22 '18 at 2:37

add a comment |

counting_na <- function(x) {sum(is.na(x))}

counting_zero <- function(x){length(which(x==0))}



for(i in 1:ncol(orange_train)){

  if (class(orange_train$Var[i])=='numeric' && sum(is.na(orange_train$Var[i]))< 32500) 

    {print(orange_train$Var[i])}

  else (class(orange_train$Var[i])=='integer' && [enter image description here][1]counting_zero(orange_train$Var[i]) < 32500)

  {print(orange_train$Var[i])}

Could someone please help me with the code. I have been struggling for a long time now and am very new to R.

my columns have headers Var1 - Var231 and the data types are numeric, factors and integers. I hope this helps

edited Nov 22 '18 at 0:58

asked Nov 21 '18 at 21:46

Sindhu Viswanathan

counting_na <- function(x) {sum(is.na(x))}

counting_zero <- function(x){length(which(x==0))}



for(i in 1:ncol(orange_train)){

  if (class(orange_train$Var[i])=='numeric' && sum(is.na(orange_train$Var[i]))< 32500) 

    {print(orange_train$Var[i])}

  else (class(orange_train$Var[i])=='integer' && [enter image description here][1]counting_zero(orange_train$Var[i]) < 32500)

  {print(orange_train$Var[i])}

Could someone please help me with the code. I have been struggling for a long time now and am very new to R.

my columns have headers Var1 - Var231 and the data types are numeric, factors and integers. I hope this helps

edited Nov 22 '18 at 0:58

asked Nov 21 '18 at 21:46

Sindhu Viswanathan

edited Nov 22 '18 at 0:58

asked Nov 21 '18 at 21:46

Sindhu Viswanathan

edited Nov 22 '18 at 0:58

asked Nov 21 '18 at 21:46

Sindhu Viswanathan

asked Nov 21 '18 at 21:46

Sindhu Viswanathan

asked Nov 21 '18 at 21:46

Sindhu Viswanathan

counting_zero <- function(x) sum(x==0)

– jogo
Nov 21 '18 at 21:49

1

It would be helpful if you gave a sample of what your data looks like using dput(). Also, you're looping over the columns in orange_train, but you're indexing over the rows in one variable. Perhaps you mean orange_train[[i]], instead of orange_train$Var[i]?

– mickey
Nov 21 '18 at 22:03

1

Welcome to SO! Please read How to Ask give a Minimal, Complete, and Verifiable example in your question! Copy the output of dput(head(orange_train, 10)) in your question!

– jogo
Nov 21 '18 at 22:04

my columns have headers Var1 - Var231 and the data types are numeric, factors and integers. I hope this helps

– Sindhu Viswanathan
Nov 22 '18 at 0:29

@SindhuViswanathan, it does, but then you are still indexing them improperly. You could use orange_train[paste0('Var', i)] instead.

– mickey
Nov 22 '18 at 2:37

add a comment |

counting_zero <- function(x) sum(x==0)

– jogo
Nov 21 '18 at 21:49

1

It would be helpful if you gave a sample of what your data looks like using dput(). Also, you're looping over the columns in orange_train, but you're indexing over the rows in one variable. Perhaps you mean orange_train[[i]], instead of orange_train$Var[i]?

– mickey
Nov 21 '18 at 22:03

1

Welcome to SO! Please read How to Ask give a Minimal, Complete, and Verifiable example in your question! Copy the output of dput(head(orange_train, 10)) in your question!

– jogo
Nov 21 '18 at 22:04

my columns have headers Var1 - Var231 and the data types are numeric, factors and integers. I hope this helps

– Sindhu Viswanathan
Nov 22 '18 at 0:29

@SindhuViswanathan, it does, but then you are still indexing them improperly. You could use orange_train[paste0('Var', i)] instead.

– mickey
Nov 22 '18 at 2:37

counting_zero <- function(x) sum(x==0)

– jogo
Nov 21 '18 at 21:49

It would be helpful if you gave a sample of what your data looks like using dput(). Also, you're looping over the columns in orange_train, but you're indexing over the rows in one variable. Perhaps you mean orange_train[[i]], instead of orange_train$Var[i]?

– mickey
Nov 21 '18 at 22:03

Welcome to SO! Please read How to Ask give a Minimal, Complete, and Verifiable example in your question! Copy the output of dput(head(orange_train, 10)) in your question!

– jogo
Nov 21 '18 at 22:04

my columns have headers Var1 - Var231 and the data types are numeric, factors and integers. I hope this helps

– Sindhu Viswanathan
Nov 22 '18 at 0:29

@SindhuViswanathan, it does, but then you are still indexing them improperly. You could use orange_train[paste0('Var', i)] instead.

– mickey
Nov 22 '18 at 2:37

add a comment |

1 Answer
1

active

oldest

votes

Example data

set.seed(10)



df <- data.frame(a = sample(c(NA, LETTERS[1]), 100, T, prob = c(.75, .25))

                 , b = sample(0:1, 100, T, prob = c(.75, .25)))

Calculate the percentages for each column (percent NA for factor, percent 0 for numeric)

percents <- 

  sapply(df, function(x){

    if(is.factor(x)) mean(is.na(x)) 

    else if(is.numeric(x)) mean(x == 0) 

    else NA})



percents

#    a    b 

# 0.84 0.75

Remove the ones greater than 75%

df[percents > 0.75] <- NULL



names(df)

#[1] "b"

You can see that the column a was removed, because it was a factor with 84% NAs

edited Nov 21 '18 at 22:21

answered Nov 21 '18 at 22:14

IceCreamToucan

9,7611816

This worked like a charm! Thank you @IceCreamToucan!!! I appreciate your timely help!

– Sindhu Viswanathan
Nov 22 '18 at 4:21

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53420918%2ffor-loop-and-if-statements-in-r%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Example data

set.seed(10)



df <- data.frame(a = sample(c(NA, LETTERS[1]), 100, T, prob = c(.75, .25))

                 , b = sample(0:1, 100, T, prob = c(.75, .25)))

Calculate the percentages for each column (percent NA for factor, percent 0 for numeric)

percents <- 

  sapply(df, function(x){

    if(is.factor(x)) mean(is.na(x)) 

    else if(is.numeric(x)) mean(x == 0) 

    else NA})



percents

#    a    b 

# 0.84 0.75

Remove the ones greater than 75%

df[percents > 0.75] <- NULL



names(df)

#[1] "b"

You can see that the column a was removed, because it was a factor with 84% NAs

edited Nov 21 '18 at 22:21

answered Nov 21 '18 at 22:14

IceCreamToucan

9,7611816

This worked like a charm! Thank you @IceCreamToucan!!! I appreciate your timely help!

– Sindhu Viswanathan
Nov 22 '18 at 4:21

add a comment |

Example data

set.seed(10)



df <- data.frame(a = sample(c(NA, LETTERS[1]), 100, T, prob = c(.75, .25))

                 , b = sample(0:1, 100, T, prob = c(.75, .25)))

Calculate the percentages for each column (percent NA for factor, percent 0 for numeric)

percents <- 

  sapply(df, function(x){

    if(is.factor(x)) mean(is.na(x)) 

    else if(is.numeric(x)) mean(x == 0) 

    else NA})



percents

#    a    b 

# 0.84 0.75

Remove the ones greater than 75%

df[percents > 0.75] <- NULL



names(df)

#[1] "b"

You can see that the column a was removed, because it was a factor with 84% NAs

edited Nov 21 '18 at 22:21

answered Nov 21 '18 at 22:14

IceCreamToucan

9,7611816

This worked like a charm! Thank you @IceCreamToucan!!! I appreciate your timely help!

– Sindhu Viswanathan
Nov 22 '18 at 4:21

add a comment |

Example data

set.seed(10)



df <- data.frame(a = sample(c(NA, LETTERS[1]), 100, T, prob = c(.75, .25))

                 , b = sample(0:1, 100, T, prob = c(.75, .25)))

Calculate the percentages for each column (percent NA for factor, percent 0 for numeric)

percents <- 

  sapply(df, function(x){

    if(is.factor(x)) mean(is.na(x)) 

    else if(is.numeric(x)) mean(x == 0) 

    else NA})



percents

#    a    b 

# 0.84 0.75

Remove the ones greater than 75%

df[percents > 0.75] <- NULL



names(df)

#[1] "b"

You can see that the column a was removed, because it was a factor with 84% NAs

edited Nov 21 '18 at 22:21

answered Nov 21 '18 at 22:14

IceCreamToucan

9,7611816

Example data

set.seed(10)



df <- data.frame(a = sample(c(NA, LETTERS[1]), 100, T, prob = c(.75, .25))

                 , b = sample(0:1, 100, T, prob = c(.75, .25)))

Calculate the percentages for each column (percent NA for factor, percent 0 for numeric)

percents <- 

  sapply(df, function(x){

    if(is.factor(x)) mean(is.na(x)) 

    else if(is.numeric(x)) mean(x == 0) 

    else NA})



percents

#    a    b 

# 0.84 0.75

Remove the ones greater than 75%

df[percents > 0.75] <- NULL



names(df)

#[1] "b"

You can see that the column a was removed, because it was a factor with 84% NAs

edited Nov 21 '18 at 22:21

answered Nov 21 '18 at 22:14

IceCreamToucan

9,7611816

edited Nov 21 '18 at 22:21

answered Nov 21 '18 at 22:14

IceCreamToucan

9,7611816

answered Nov 21 '18 at 22:14

IceCreamToucan

9,7611816

answered Nov 21 '18 at 22:14

IceCreamToucan

9,7611816

This worked like a charm! Thank you @IceCreamToucan!!! I appreciate your timely help!

– Sindhu Viswanathan
Nov 22 '18 at 4:21

add a comment |

This worked like a charm! Thank you @IceCreamToucan!!! I appreciate your timely help!

– Sindhu Viswanathan
Nov 22 '18 at 4:21

This worked like a charm! Thank you @IceCreamToucan!!! I appreciate your timely help!

– Sindhu Viswanathan
Nov 22 '18 at 4:21

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu