Delete rows from a dataframe from multi variables in a database

I have the following data.frame:

dage ded dht dwt marital inc smoke time number

31   5  65 110       1   1     0    0      0

38   5  70 148       1   4     0    0      0

32   1  99 999       1   2     1    1      1

28   4  99 999       1  98     3    4      2

35   4  99 999       1   7     0    0      0

33   4  98 998       1  99     0    0      0

I want to remove any row that has the number 99 or 999 (or both).

data.frame structure:

df <- structure(list(dage = c(31L, 38L, 32L, 28L, 35L, 33L), ded = c(5L, 

5L, 1L, 4L, 4L, 4L), dht = c(65L, 70L, 99L, 99L, 99L, 98L), dwt = c(110L, 

148L, 999L, 999L, 999L, 998L), marital = c(1L, 1L, 1L, 1L, 1L, 

1L), inc = c(1L, 4L, 2L, 98L, 7L, 99L), smoke = c(0L, 0L, 1L, 

3L, 0L, 0L), time = c(0L, 0L, 1L, 4L, 0L, 0L), number = c(0L, 

0L, 1L, 2L, 0L, 0L)), row.names = c(NA, -6L), class = "data.frame")

edited Jan 2 at 5:41

Khaynes

717721

asked Jan 2 at 0:51

Christos Varvarrigos

274

It was better with the textual data. Having an image means people can't just copy-paste your data to try it out on their own system.

– iod
Jan 2 at 1:08

add a comment |

I have the following data.frame:

dage ded dht dwt marital inc smoke time number

31   5  65 110       1   1     0    0      0

38   5  70 148       1   4     0    0      0

32   1  99 999       1   2     1    1      1

28   4  99 999       1  98     3    4      2

35   4  99 999       1   7     0    0      0

33   4  98 998       1  99     0    0      0

I want to remove any row that has the number 99 or 999 (or both).

data.frame structure:

df <- structure(list(dage = c(31L, 38L, 32L, 28L, 35L, 33L), ded = c(5L, 

5L, 1L, 4L, 4L, 4L), dht = c(65L, 70L, 99L, 99L, 99L, 98L), dwt = c(110L, 

148L, 999L, 999L, 999L, 998L), marital = c(1L, 1L, 1L, 1L, 1L, 

1L), inc = c(1L, 4L, 2L, 98L, 7L, 99L), smoke = c(0L, 0L, 1L, 

3L, 0L, 0L), time = c(0L, 0L, 1L, 4L, 0L, 0L), number = c(0L, 

0L, 1L, 2L, 0L, 0L)), row.names = c(NA, -6L), class = "data.frame")

edited Jan 2 at 5:41

Khaynes

717721

asked Jan 2 at 0:51

Christos Varvarrigos

274

It was better with the textual data. Having an image means people can't just copy-paste your data to try it out on their own system.

– iod
Jan 2 at 1:08

add a comment |

I have the following data.frame:

dage ded dht dwt marital inc smoke time number

31   5  65 110       1   1     0    0      0

38   5  70 148       1   4     0    0      0

32   1  99 999       1   2     1    1      1

28   4  99 999       1  98     3    4      2

35   4  99 999       1   7     0    0      0

33   4  98 998       1  99     0    0      0

I want to remove any row that has the number 99 or 999 (or both).

data.frame structure:

df <- structure(list(dage = c(31L, 38L, 32L, 28L, 35L, 33L), ded = c(5L, 

5L, 1L, 4L, 4L, 4L), dht = c(65L, 70L, 99L, 99L, 99L, 98L), dwt = c(110L, 

148L, 999L, 999L, 999L, 998L), marital = c(1L, 1L, 1L, 1L, 1L, 

1L), inc = c(1L, 4L, 2L, 98L, 7L, 99L), smoke = c(0L, 0L, 1L, 

3L, 0L, 0L), time = c(0L, 0L, 1L, 4L, 0L, 0L), number = c(0L, 

0L, 1L, 2L, 0L, 0L)), row.names = c(NA, -6L), class = "data.frame")

edited Jan 2 at 5:41

Khaynes

717721

asked Jan 2 at 0:51

Christos Varvarrigos

274

I have the following data.frame:

dage ded dht dwt marital inc smoke time number

31   5  65 110       1   1     0    0      0

38   5  70 148       1   4     0    0      0

32   1  99 999       1   2     1    1      1

28   4  99 999       1  98     3    4      2

35   4  99 999       1   7     0    0      0

33   4  98 998       1  99     0    0      0

I want to remove any row that has the number 99 or 999 (or both).

data.frame structure:

df <- structure(list(dage = c(31L, 38L, 32L, 28L, 35L, 33L), ded = c(5L, 

5L, 1L, 4L, 4L, 4L), dht = c(65L, 70L, 99L, 99L, 99L, 98L), dwt = c(110L, 

148L, 999L, 999L, 999L, 998L), marital = c(1L, 1L, 1L, 1L, 1L, 

1L), inc = c(1L, 4L, 2L, 98L, 7L, 99L), smoke = c(0L, 0L, 1L, 

3L, 0L, 0L), time = c(0L, 0L, 1L, 4L, 0L, 0L), number = c(0L, 

0L, 1L, 2L, 0L, 0L)), row.names = c(NA, -6L), class = "data.frame")

r dataframe

edited Jan 2 at 5:41

Khaynes

717721

asked Jan 2 at 0:51

Christos Varvarrigos

274

edited Jan 2 at 5:41

Khaynes

717721

asked Jan 2 at 0:51

Christos Varvarrigos

274

edited Jan 2 at 5:41

Khaynes

717721

edited Jan 2 at 5:41

Khaynes

717721

edited Jan 2 at 5:41

Khaynes

717721

asked Jan 2 at 0:51

Christos Varvarrigos

274

asked Jan 2 at 0:51

Christos Varvarrigos

274

asked Jan 2 at 0:51

Christos Varvarrigos

274

It was better with the textual data. Having an image means people can't just copy-paste your data to try it out on their own system.

– iod
Jan 2 at 1:08

add a comment |

It was better with the textual data. Having an image means people can't just copy-paste your data to try it out on their own system.

– iod
Jan 2 at 1:08

It was better with the textual data. Having an image means people can't just copy-paste your data to try it out on their own system.

– iod
Jan 2 at 1:08

add a comment |

5 Answers
5

active

oldest

votes

Using rowSums

df[rowSums(df[,c('dht','dwt')]==99|df[,c('dht','dwt')]==999)==0,]

  ded dht dwt

1   5  65 110

2   5  70 148

6   4  98 998

answered Jan 2 at 1:12

Wen-Ben

117k83469

add a comment |

You can replace 99 and 999 with NA first.

dat[dat == 99 | dat == 999] <- NA

And then use na.omit or complete.cases.

na.omit(dat)

#   dage ded dht dwt marital inc smoke time number

# 1   31   5  65 110       1   1     0    0      0

# 2   38   5  70 148       1   4     0    0      0



dat[complete.cases(dat), ]

#   dage ded dht dwt marital inc smoke time number

# 1   31   5  65 110       1   1     0    0      0

# 2   38   5  70 148       1   4     0    0      0

DATA

dat <- read.table(text = "dage ded dht dwt marital inc smoke time number

31   5  65 110       1   1     0    0      0

38   5  70 148       1   4     0    0      0

32   1  99 999       1   2     1    1      1

28   4  99 999       1  98     3    4      2

35   4  99 999       1   7     0    0      0

33   4  98 998       1  99     0    0      0",

                  header = TRUE)

answered Jan 2 at 2:30

www

28k112343

add a comment |

If your dataframe is called df1:

require(dplyr)

filter_all(df1, all_vars(.!=99 & .!=999))

Result:

  dage ded dht dwt marital inc smoke time number

1   31   5  65 110       1   1     0    0      0

2   38   5  70 148       1   4     0    0      0

answered Jan 2 at 1:06

iod

4,1512723

add a comment |

Here's a solution using any() and apply() that doesn't require any supplemental packages:

#fake data

d <- data.frame(a = c(1,2,3,4,99), b = c(99, 1,2,999,4))

#subset rows that don't contain a 99 or 999

d[!apply(d, 1, function(x) any(x %in% c(99,999))),]

Yields:

answered Jan 2 at 1:08

Chase

50.4k12118153

add a comment |

Create data.frame as shown in original question:

df <- structure(list(dage = c(31L, 38L, 32L, 28L, 35L, 33L), ded = c(5L, 

5L, 1L, 4L, 4L, 4L), dht = c(65L, 70L, 99L, 99L, 99L, 98L), dwt = c(110L, 

148L, 999L, 999L, 999L, 998L), marital = c(1L, 1L, 1L, 1L, 1L, 

1L), inc = c(1L, 4L, 2L, 98L, 7L, 99L), smoke = c(0L, 0L, 1L, 

3L, 0L, 0L), time = c(0L, 0L, 1L, 4L, 0L, 0L), number = c(0L, 

0L, 1L, 2L, 0L, 0L)), row.names = c(NA, -6L), class = "data.frame")

data.table solution:

library(data.table)

dt <- as.data.table(df)

dt[rowSums(df == 99)==0 & rowSums(df == 999)==0]

base R solution:

 df[!apply(df, 1, function(x) any(x %in% c(99,999))),]

dplyr solution:

require(dplyr)

filter_all(df, all_vars(.!=99 & .!=999))

Benchmarks:

microbenchmark::microbenchmark(dt = dt[rowSums(df == 99)==0 & rowSums(df == 999)==0], 

base = df[!apply(df, 1, function(x) any(x %in% c(99,999))),], 

dplyr = filter_all(df, all_vars(.!=99 & .!=999)), times = 10000)

# Unit: microseconds

  #expr      min       lq      mean    median        uq        max neval

  #dt    588.000  645.801  701.4309  675.6005  723.2515   5203.801 10000

  #base  264.601  296.901  324.2588  314.4005  335.7020   3435.600 10000

  #dplyr 3671.400 3854.301 4036.3976 3915.3010 3983.0010 139226.802 10000

edited Jan 2 at 5:30

answered Jan 2 at 2:33

Khaynes

717721

All the 3 solutions are already included. Am I missing something ?

– Ronak Shah
Jan 2 at 4:36

1

@RonakShah You are not ... except that my data.table solution is a tad different to the base solution mentioned above. I was interested myself how the three approaches benchmarked, hence why I provided my answer. Not sure if speed is of concern or a certain method is preferred (e.g. tidyverse over data.table).

– Khaynes
Jan 2 at 5:11

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54000128%2fdelete-rows-from-a-dataframe-from-multi-variables-in-a-database%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

5 Answers
5

active

oldest

votes

5 Answers
5

active

oldest

votes

Using rowSums

df[rowSums(df[,c('dht','dwt')]==99|df[,c('dht','dwt')]==999)==0,]

  ded dht dwt

1   5  65 110

2   5  70 148

6   4  98 998

answered Jan 2 at 1:12

Wen-Ben

117k83469

add a comment |

Using rowSums

df[rowSums(df[,c('dht','dwt')]==99|df[,c('dht','dwt')]==999)==0,]

  ded dht dwt

1   5  65 110

2   5  70 148

6   4  98 998

answered Jan 2 at 1:12

Wen-Ben

117k83469

add a comment |

Using rowSums

df[rowSums(df[,c('dht','dwt')]==99|df[,c('dht','dwt')]==999)==0,]

  ded dht dwt

1   5  65 110

2   5  70 148

6   4  98 998

answered Jan 2 at 1:12

Wen-Ben

117k83469

Using rowSums

df[rowSums(df[,c('dht','dwt')]==99|df[,c('dht','dwt')]==999)==0,]

  ded dht dwt

1   5  65 110

2   5  70 148

6   4  98 998

answered Jan 2 at 1:12

Wen-Ben

117k83469

answered Jan 2 at 1:12

Wen-Ben

117k83469

answered Jan 2 at 1:12

Wen-Ben

117k83469

answered Jan 2 at 1:12

Wen-Ben

117k83469

add a comment |

You can replace 99 and 999 with NA first.

dat[dat == 99 | dat == 999] <- NA

And then use na.omit or complete.cases.

na.omit(dat)

#   dage ded dht dwt marital inc smoke time number

# 1   31   5  65 110       1   1     0    0      0

# 2   38   5  70 148       1   4     0    0      0



dat[complete.cases(dat), ]

#   dage ded dht dwt marital inc smoke time number

# 1   31   5  65 110       1   1     0    0      0

# 2   38   5  70 148       1   4     0    0      0

DATA

dat <- read.table(text = "dage ded dht dwt marital inc smoke time number

31   5  65 110       1   1     0    0      0

38   5  70 148       1   4     0    0      0

32   1  99 999       1   2     1    1      1

28   4  99 999       1  98     3    4      2

35   4  99 999       1   7     0    0      0

33   4  98 998       1  99     0    0      0",

                  header = TRUE)

answered Jan 2 at 2:30

www

28k112343

add a comment |

You can replace 99 and 999 with NA first.

dat[dat == 99 | dat == 999] <- NA

And then use na.omit or complete.cases.

na.omit(dat)

#   dage ded dht dwt marital inc smoke time number

# 1   31   5  65 110       1   1     0    0      0

# 2   38   5  70 148       1   4     0    0      0



dat[complete.cases(dat), ]

#   dage ded dht dwt marital inc smoke time number

# 1   31   5  65 110       1   1     0    0      0

# 2   38   5  70 148       1   4     0    0      0

DATA

dat <- read.table(text = "dage ded dht dwt marital inc smoke time number

31   5  65 110       1   1     0    0      0

38   5  70 148       1   4     0    0      0

32   1  99 999       1   2     1    1      1

28   4  99 999       1  98     3    4      2

35   4  99 999       1   7     0    0      0

33   4  98 998       1  99     0    0      0",

                  header = TRUE)

answered Jan 2 at 2:30

www

28k112343

add a comment |

You can replace 99 and 999 with NA first.

dat[dat == 99 | dat == 999] <- NA

And then use na.omit or complete.cases.

na.omit(dat)

#   dage ded dht dwt marital inc smoke time number

# 1   31   5  65 110       1   1     0    0      0

# 2   38   5  70 148       1   4     0    0      0



dat[complete.cases(dat), ]

#   dage ded dht dwt marital inc smoke time number

# 1   31   5  65 110       1   1     0    0      0

# 2   38   5  70 148       1   4     0    0      0

DATA

dat <- read.table(text = "dage ded dht dwt marital inc smoke time number

31   5  65 110       1   1     0    0      0

38   5  70 148       1   4     0    0      0

32   1  99 999       1   2     1    1      1

28   4  99 999       1  98     3    4      2

35   4  99 999       1   7     0    0      0

33   4  98 998       1  99     0    0      0",

                  header = TRUE)

answered Jan 2 at 2:30

www

28k112343

You can replace 99 and 999 with NA first.

dat[dat == 99 | dat == 999] <- NA

And then use na.omit or complete.cases.

na.omit(dat)

#   dage ded dht dwt marital inc smoke time number

# 1   31   5  65 110       1   1     0    0      0

# 2   38   5  70 148       1   4     0    0      0



dat[complete.cases(dat), ]

#   dage ded dht dwt marital inc smoke time number

# 1   31   5  65 110       1   1     0    0      0

# 2   38   5  70 148       1   4     0    0      0

DATA

dat <- read.table(text = "dage ded dht dwt marital inc smoke time number

31   5  65 110       1   1     0    0      0

38   5  70 148       1   4     0    0      0

32   1  99 999       1   2     1    1      1

28   4  99 999       1  98     3    4      2

35   4  99 999       1   7     0    0      0

33   4  98 998       1  99     0    0      0",

                  header = TRUE)

answered Jan 2 at 2:30

www

28k112343

answered Jan 2 at 2:30

www

28k112343

answered Jan 2 at 2:30

www

28k112343

answered Jan 2 at 2:30

www

28k112343

add a comment |

If your dataframe is called df1:

require(dplyr)

filter_all(df1, all_vars(.!=99 & .!=999))

Result:

  dage ded dht dwt marital inc smoke time number

1   31   5  65 110       1   1     0    0      0

2   38   5  70 148       1   4     0    0      0

answered Jan 2 at 1:06

iod

4,1512723

add a comment |

If your dataframe is called df1:

require(dplyr)

filter_all(df1, all_vars(.!=99 & .!=999))

Result:

  dage ded dht dwt marital inc smoke time number

1   31   5  65 110       1   1     0    0      0

2   38   5  70 148       1   4     0    0      0

answered Jan 2 at 1:06

iod

4,1512723

add a comment |

If your dataframe is called df1:

require(dplyr)

filter_all(df1, all_vars(.!=99 & .!=999))

Result:

  dage ded dht dwt marital inc smoke time number

1   31   5  65 110       1   1     0    0      0

2   38   5  70 148       1   4     0    0      0

answered Jan 2 at 1:06

iod

4,1512723

If your dataframe is called df1:

require(dplyr)

filter_all(df1, all_vars(.!=99 & .!=999))

Result:

  dage ded dht dwt marital inc smoke time number

1   31   5  65 110       1   1     0    0      0

2   38   5  70 148       1   4     0    0      0

answered Jan 2 at 1:06

iod

4,1512723

answered Jan 2 at 1:06

iod

4,1512723

answered Jan 2 at 1:06

iod

4,1512723

answered Jan 2 at 1:06

iod

4,1512723

add a comment |

Here's a solution using any() and apply() that doesn't require any supplemental packages:

#fake data

d <- data.frame(a = c(1,2,3,4,99), b = c(99, 1,2,999,4))

#subset rows that don't contain a 99 or 999

d[!apply(d, 1, function(x) any(x %in% c(99,999))),]

Yields:

answered Jan 2 at 1:08

Chase

50.4k12118153

add a comment |

Here's a solution using any() and apply() that doesn't require any supplemental packages:

#fake data

d <- data.frame(a = c(1,2,3,4,99), b = c(99, 1,2,999,4))

#subset rows that don't contain a 99 or 999

d[!apply(d, 1, function(x) any(x %in% c(99,999))),]

Yields:

answered Jan 2 at 1:08

Chase

50.4k12118153

add a comment |

Here's a solution using any() and apply() that doesn't require any supplemental packages:

#fake data

d <- data.frame(a = c(1,2,3,4,99), b = c(99, 1,2,999,4))

#subset rows that don't contain a 99 or 999

d[!apply(d, 1, function(x) any(x %in% c(99,999))),]

Yields:

answered Jan 2 at 1:08

Chase

50.4k12118153

Here's a solution using any() and apply() that doesn't require any supplemental packages:

#fake data

d <- data.frame(a = c(1,2,3,4,99), b = c(99, 1,2,999,4))

#subset rows that don't contain a 99 or 999

d[!apply(d, 1, function(x) any(x %in% c(99,999))),]

Yields:

answered Jan 2 at 1:08

Chase

50.4k12118153

answered Jan 2 at 1:08

Chase

50.4k12118153

answered Jan 2 at 1:08

Chase

50.4k12118153

answered Jan 2 at 1:08

Chase

50.4k12118153

add a comment |

Create data.frame as shown in original question:

df <- structure(list(dage = c(31L, 38L, 32L, 28L, 35L, 33L), ded = c(5L, 

5L, 1L, 4L, 4L, 4L), dht = c(65L, 70L, 99L, 99L, 99L, 98L), dwt = c(110L, 

148L, 999L, 999L, 999L, 998L), marital = c(1L, 1L, 1L, 1L, 1L, 

1L), inc = c(1L, 4L, 2L, 98L, 7L, 99L), smoke = c(0L, 0L, 1L, 

3L, 0L, 0L), time = c(0L, 0L, 1L, 4L, 0L, 0L), number = c(0L, 

0L, 1L, 2L, 0L, 0L)), row.names = c(NA, -6L), class = "data.frame")

data.table solution:

library(data.table)

dt <- as.data.table(df)

dt[rowSums(df == 99)==0 & rowSums(df == 999)==0]

base R solution:

 df[!apply(df, 1, function(x) any(x %in% c(99,999))),]

dplyr solution:

require(dplyr)

filter_all(df, all_vars(.!=99 & .!=999))

Benchmarks:

microbenchmark::microbenchmark(dt = dt[rowSums(df == 99)==0 & rowSums(df == 999)==0], 

base = df[!apply(df, 1, function(x) any(x %in% c(99,999))),], 

dplyr = filter_all(df, all_vars(.!=99 & .!=999)), times = 10000)

# Unit: microseconds

  #expr      min       lq      mean    median        uq        max neval

  #dt    588.000  645.801  701.4309  675.6005  723.2515   5203.801 10000

  #base  264.601  296.901  324.2588  314.4005  335.7020   3435.600 10000

  #dplyr 3671.400 3854.301 4036.3976 3915.3010 3983.0010 139226.802 10000

edited Jan 2 at 5:30

answered Jan 2 at 2:33

Khaynes

717721

All the 3 solutions are already included. Am I missing something ?

– Ronak Shah
Jan 2 at 4:36

1

@RonakShah You are not ... except that my data.table solution is a tad different to the base solution mentioned above. I was interested myself how the three approaches benchmarked, hence why I provided my answer. Not sure if speed is of concern or a certain method is preferred (e.g. tidyverse over data.table).

– Khaynes
Jan 2 at 5:11

add a comment |

Create data.frame as shown in original question:

df <- structure(list(dage = c(31L, 38L, 32L, 28L, 35L, 33L), ded = c(5L, 

5L, 1L, 4L, 4L, 4L), dht = c(65L, 70L, 99L, 99L, 99L, 98L), dwt = c(110L, 

148L, 999L, 999L, 999L, 998L), marital = c(1L, 1L, 1L, 1L, 1L, 

1L), inc = c(1L, 4L, 2L, 98L, 7L, 99L), smoke = c(0L, 0L, 1L, 

3L, 0L, 0L), time = c(0L, 0L, 1L, 4L, 0L, 0L), number = c(0L, 

0L, 1L, 2L, 0L, 0L)), row.names = c(NA, -6L), class = "data.frame")

data.table solution:

library(data.table)

dt <- as.data.table(df)

dt[rowSums(df == 99)==0 & rowSums(df == 999)==0]

base R solution:

 df[!apply(df, 1, function(x) any(x %in% c(99,999))),]

dplyr solution:

require(dplyr)

filter_all(df, all_vars(.!=99 & .!=999))

Benchmarks:

microbenchmark::microbenchmark(dt = dt[rowSums(df == 99)==0 & rowSums(df == 999)==0], 

base = df[!apply(df, 1, function(x) any(x %in% c(99,999))),], 

dplyr = filter_all(df, all_vars(.!=99 & .!=999)), times = 10000)

# Unit: microseconds

  #expr      min       lq      mean    median        uq        max neval

  #dt    588.000  645.801  701.4309  675.6005  723.2515   5203.801 10000

  #base  264.601  296.901  324.2588  314.4005  335.7020   3435.600 10000

  #dplyr 3671.400 3854.301 4036.3976 3915.3010 3983.0010 139226.802 10000

edited Jan 2 at 5:30

answered Jan 2 at 2:33

Khaynes

717721

All the 3 solutions are already included. Am I missing something ?

– Ronak Shah
Jan 2 at 4:36

1

@RonakShah You are not ... except that my data.table solution is a tad different to the base solution mentioned above. I was interested myself how the three approaches benchmarked, hence why I provided my answer. Not sure if speed is of concern or a certain method is preferred (e.g. tidyverse over data.table).

– Khaynes
Jan 2 at 5:11

add a comment |

Create data.frame as shown in original question:

df <- structure(list(dage = c(31L, 38L, 32L, 28L, 35L, 33L), ded = c(5L, 

5L, 1L, 4L, 4L, 4L), dht = c(65L, 70L, 99L, 99L, 99L, 98L), dwt = c(110L, 

148L, 999L, 999L, 999L, 998L), marital = c(1L, 1L, 1L, 1L, 1L, 

1L), inc = c(1L, 4L, 2L, 98L, 7L, 99L), smoke = c(0L, 0L, 1L, 

3L, 0L, 0L), time = c(0L, 0L, 1L, 4L, 0L, 0L), number = c(0L, 

0L, 1L, 2L, 0L, 0L)), row.names = c(NA, -6L), class = "data.frame")

data.table solution:

library(data.table)

dt <- as.data.table(df)

dt[rowSums(df == 99)==0 & rowSums(df == 999)==0]

base R solution:

 df[!apply(df, 1, function(x) any(x %in% c(99,999))),]

dplyr solution:

require(dplyr)

filter_all(df, all_vars(.!=99 & .!=999))

Benchmarks:

microbenchmark::microbenchmark(dt = dt[rowSums(df == 99)==0 & rowSums(df == 999)==0], 

base = df[!apply(df, 1, function(x) any(x %in% c(99,999))),], 

dplyr = filter_all(df, all_vars(.!=99 & .!=999)), times = 10000)

# Unit: microseconds

  #expr      min       lq      mean    median        uq        max neval

  #dt    588.000  645.801  701.4309  675.6005  723.2515   5203.801 10000

  #base  264.601  296.901  324.2588  314.4005  335.7020   3435.600 10000

  #dplyr 3671.400 3854.301 4036.3976 3915.3010 3983.0010 139226.802 10000

edited Jan 2 at 5:30

answered Jan 2 at 2:33

Khaynes

717721

Create data.frame as shown in original question:

df <- structure(list(dage = c(31L, 38L, 32L, 28L, 35L, 33L), ded = c(5L, 

5L, 1L, 4L, 4L, 4L), dht = c(65L, 70L, 99L, 99L, 99L, 98L), dwt = c(110L, 

148L, 999L, 999L, 999L, 998L), marital = c(1L, 1L, 1L, 1L, 1L, 

1L), inc = c(1L, 4L, 2L, 98L, 7L, 99L), smoke = c(0L, 0L, 1L, 

3L, 0L, 0L), time = c(0L, 0L, 1L, 4L, 0L, 0L), number = c(0L, 

0L, 1L, 2L, 0L, 0L)), row.names = c(NA, -6L), class = "data.frame")

data.table solution:

library(data.table)

dt <- as.data.table(df)

dt[rowSums(df == 99)==0 & rowSums(df == 999)==0]

base R solution:

 df[!apply(df, 1, function(x) any(x %in% c(99,999))),]

dplyr solution:

require(dplyr)

filter_all(df, all_vars(.!=99 & .!=999))

Benchmarks:

microbenchmark::microbenchmark(dt = dt[rowSums(df == 99)==0 & rowSums(df == 999)==0], 

base = df[!apply(df, 1, function(x) any(x %in% c(99,999))),], 

dplyr = filter_all(df, all_vars(.!=99 & .!=999)), times = 10000)

# Unit: microseconds

  #expr      min       lq      mean    median        uq        max neval

  #dt    588.000  645.801  701.4309  675.6005  723.2515   5203.801 10000

  #base  264.601  296.901  324.2588  314.4005  335.7020   3435.600 10000

  #dplyr 3671.400 3854.301 4036.3976 3915.3010 3983.0010 139226.802 10000

edited Jan 2 at 5:30

answered Jan 2 at 2:33

Khaynes

717721

edited Jan 2 at 5:30

answered Jan 2 at 2:33

Khaynes

717721

answered Jan 2 at 2:33

Khaynes

717721

answered Jan 2 at 2:33

Khaynes

717721

All the 3 solutions are already included. Am I missing something ?

– Ronak Shah
Jan 2 at 4:36

1

@RonakShah You are not ... except that my data.table solution is a tad different to the base solution mentioned above. I was interested myself how the three approaches benchmarked, hence why I provided my answer. Not sure if speed is of concern or a certain method is preferred (e.g. tidyverse over data.table).

– Khaynes
Jan 2 at 5:11

add a comment |

All the 3 solutions are already included. Am I missing something ?

– Ronak Shah
Jan 2 at 4:36

1

@RonakShah You are not ... except that my data.table solution is a tad different to the base solution mentioned above. I was interested myself how the three approaches benchmarked, hence why I provided my answer. Not sure if speed is of concern or a certain method is preferred (e.g. tidyverse over data.table).

– Khaynes
Jan 2 at 5:11

All the 3 solutions are already included. Am I missing something ?

– Ronak Shah
Jan 2 at 4:36

@RonakShah You are not ... except that my data.table solution is a tad different to the base solution mentioned above. I was interested myself how the three approaches benchmarked, hence why I provided my answer. Not sure if speed is of concern or a certain method is preferred (e.g. tidyverse over data.table).

– Khaynes
Jan 2 at 5:11

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu