Delete rows from a dataframe from multi variables in a database
I have the following data.frame
:
dage ded dht dwt marital inc smoke time number
31 5 65 110 1 1 0 0 0
38 5 70 148 1 4 0 0 0
32 1 99 999 1 2 1 1 1
28 4 99 999 1 98 3 4 2
35 4 99 999 1 7 0 0 0
33 4 98 998 1 99 0 0 0
I want to remove any row that has the number 99
or 999
(or both).
data.frame structure:
df <- structure(list(dage = c(31L, 38L, 32L, 28L, 35L, 33L), ded = c(5L,
5L, 1L, 4L, 4L, 4L), dht = c(65L, 70L, 99L, 99L, 99L, 98L), dwt = c(110L,
148L, 999L, 999L, 999L, 998L), marital = c(1L, 1L, 1L, 1L, 1L,
1L), inc = c(1L, 4L, 2L, 98L, 7L, 99L), smoke = c(0L, 0L, 1L,
3L, 0L, 0L), time = c(0L, 0L, 1L, 4L, 0L, 0L), number = c(0L,
0L, 1L, 2L, 0L, 0L)), row.names = c(NA, -6L), class = "data.frame")
r dataframe
add a comment |
I have the following data.frame
:
dage ded dht dwt marital inc smoke time number
31 5 65 110 1 1 0 0 0
38 5 70 148 1 4 0 0 0
32 1 99 999 1 2 1 1 1
28 4 99 999 1 98 3 4 2
35 4 99 999 1 7 0 0 0
33 4 98 998 1 99 0 0 0
I want to remove any row that has the number 99
or 999
(or both).
data.frame structure:
df <- structure(list(dage = c(31L, 38L, 32L, 28L, 35L, 33L), ded = c(5L,
5L, 1L, 4L, 4L, 4L), dht = c(65L, 70L, 99L, 99L, 99L, 98L), dwt = c(110L,
148L, 999L, 999L, 999L, 998L), marital = c(1L, 1L, 1L, 1L, 1L,
1L), inc = c(1L, 4L, 2L, 98L, 7L, 99L), smoke = c(0L, 0L, 1L,
3L, 0L, 0L), time = c(0L, 0L, 1L, 4L, 0L, 0L), number = c(0L,
0L, 1L, 2L, 0L, 0L)), row.names = c(NA, -6L), class = "data.frame")
r dataframe
It was better with the textual data. Having an image means people can't just copy-paste your data to try it out on their own system.
– iod
Jan 2 at 1:08
add a comment |
I have the following data.frame
:
dage ded dht dwt marital inc smoke time number
31 5 65 110 1 1 0 0 0
38 5 70 148 1 4 0 0 0
32 1 99 999 1 2 1 1 1
28 4 99 999 1 98 3 4 2
35 4 99 999 1 7 0 0 0
33 4 98 998 1 99 0 0 0
I want to remove any row that has the number 99
or 999
(or both).
data.frame structure:
df <- structure(list(dage = c(31L, 38L, 32L, 28L, 35L, 33L), ded = c(5L,
5L, 1L, 4L, 4L, 4L), dht = c(65L, 70L, 99L, 99L, 99L, 98L), dwt = c(110L,
148L, 999L, 999L, 999L, 998L), marital = c(1L, 1L, 1L, 1L, 1L,
1L), inc = c(1L, 4L, 2L, 98L, 7L, 99L), smoke = c(0L, 0L, 1L,
3L, 0L, 0L), time = c(0L, 0L, 1L, 4L, 0L, 0L), number = c(0L,
0L, 1L, 2L, 0L, 0L)), row.names = c(NA, -6L), class = "data.frame")
r dataframe
I have the following data.frame
:
dage ded dht dwt marital inc smoke time number
31 5 65 110 1 1 0 0 0
38 5 70 148 1 4 0 0 0
32 1 99 999 1 2 1 1 1
28 4 99 999 1 98 3 4 2
35 4 99 999 1 7 0 0 0
33 4 98 998 1 99 0 0 0
I want to remove any row that has the number 99
or 999
(or both).
data.frame structure:
df <- structure(list(dage = c(31L, 38L, 32L, 28L, 35L, 33L), ded = c(5L,
5L, 1L, 4L, 4L, 4L), dht = c(65L, 70L, 99L, 99L, 99L, 98L), dwt = c(110L,
148L, 999L, 999L, 999L, 998L), marital = c(1L, 1L, 1L, 1L, 1L,
1L), inc = c(1L, 4L, 2L, 98L, 7L, 99L), smoke = c(0L, 0L, 1L,
3L, 0L, 0L), time = c(0L, 0L, 1L, 4L, 0L, 0L), number = c(0L,
0L, 1L, 2L, 0L, 0L)), row.names = c(NA, -6L), class = "data.frame")
r dataframe
r dataframe
edited Jan 2 at 5:41
Khaynes
717721
717721
asked Jan 2 at 0:51
Christos VarvarrigosChristos Varvarrigos
274
274
It was better with the textual data. Having an image means people can't just copy-paste your data to try it out on their own system.
– iod
Jan 2 at 1:08
add a comment |
It was better with the textual data. Having an image means people can't just copy-paste your data to try it out on their own system.
– iod
Jan 2 at 1:08
It was better with the textual data. Having an image means people can't just copy-paste your data to try it out on their own system.
– iod
Jan 2 at 1:08
It was better with the textual data. Having an image means people can't just copy-paste your data to try it out on their own system.
– iod
Jan 2 at 1:08
add a comment |
5 Answers
5
active
oldest
votes
Using rowSums
df[rowSums(df[,c('dht','dwt')]==99|df[,c('dht','dwt')]==999)==0,]
ded dht dwt
1 5 65 110
2 5 70 148
6 4 98 998
add a comment |
You can replace 99
and 999
with NA
first.
dat[dat == 99 | dat == 999] <- NA
And then use na.omit
or complete.cases
.
na.omit(dat)
# dage ded dht dwt marital inc smoke time number
# 1 31 5 65 110 1 1 0 0 0
# 2 38 5 70 148 1 4 0 0 0
dat[complete.cases(dat), ]
# dage ded dht dwt marital inc smoke time number
# 1 31 5 65 110 1 1 0 0 0
# 2 38 5 70 148 1 4 0 0 0
DATA
dat <- read.table(text = "dage ded dht dwt marital inc smoke time number
31 5 65 110 1 1 0 0 0
38 5 70 148 1 4 0 0 0
32 1 99 999 1 2 1 1 1
28 4 99 999 1 98 3 4 2
35 4 99 999 1 7 0 0 0
33 4 98 998 1 99 0 0 0",
header = TRUE)
add a comment |
If your dataframe is called df1
:
require(dplyr)
filter_all(df1, all_vars(.!=99 & .!=999))
Result:
dage ded dht dwt marital inc smoke time number
1 31 5 65 110 1 1 0 0 0
2 38 5 70 148 1 4 0 0 0
add a comment |
Here's a solution using any()
and apply()
that doesn't require any supplemental packages:
#fake data
d <- data.frame(a = c(1,2,3,4,99), b = c(99, 1,2,999,4))
#subset rows that don't contain a 99 or 999
d[!apply(d, 1, function(x) any(x %in% c(99,999))),]
Yields:
a b
2 2 1
3 3 2
add a comment |
Create data.frame as shown in original question:
df <- structure(list(dage = c(31L, 38L, 32L, 28L, 35L, 33L), ded = c(5L,
5L, 1L, 4L, 4L, 4L), dht = c(65L, 70L, 99L, 99L, 99L, 98L), dwt = c(110L,
148L, 999L, 999L, 999L, 998L), marital = c(1L, 1L, 1L, 1L, 1L,
1L), inc = c(1L, 4L, 2L, 98L, 7L, 99L), smoke = c(0L, 0L, 1L,
3L, 0L, 0L), time = c(0L, 0L, 1L, 4L, 0L, 0L), number = c(0L,
0L, 1L, 2L, 0L, 0L)), row.names = c(NA, -6L), class = "data.frame")
data.table
solution:
library(data.table)
dt <- as.data.table(df)
dt[rowSums(df == 99)==0 & rowSums(df == 999)==0]
base
R solution:
df[!apply(df, 1, function(x) any(x %in% c(99,999))),]
dplyr
solution:
require(dplyr)
filter_all(df, all_vars(.!=99 & .!=999))
Benchmarks:
microbenchmark::microbenchmark(dt = dt[rowSums(df == 99)==0 & rowSums(df == 999)==0],
base = df[!apply(df, 1, function(x) any(x %in% c(99,999))),],
dplyr = filter_all(df, all_vars(.!=99 & .!=999)), times = 10000)
# Unit: microseconds
#expr min lq mean median uq max neval
#dt 588.000 645.801 701.4309 675.6005 723.2515 5203.801 10000
#base 264.601 296.901 324.2588 314.4005 335.7020 3435.600 10000
#dplyr 3671.400 3854.301 4036.3976 3915.3010 3983.0010 139226.802 10000
All the 3 solutions are already included. Am I missing something ?
– Ronak Shah
Jan 2 at 4:36
1
@RonakShah You are not ... except that my data.table solution is a tad different to the base solution mentioned above. I was interested myself how the three approaches benchmarked, hence why I provided my answer. Not sure if speed is of concern or a certain method is preferred (e.g. tidyverse over data.table).
– Khaynes
Jan 2 at 5:11
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54000128%2fdelete-rows-from-a-dataframe-from-multi-variables-in-a-database%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
5 Answers
5
active
oldest
votes
5 Answers
5
active
oldest
votes
active
oldest
votes
active
oldest
votes
Using rowSums
df[rowSums(df[,c('dht','dwt')]==99|df[,c('dht','dwt')]==999)==0,]
ded dht dwt
1 5 65 110
2 5 70 148
6 4 98 998
add a comment |
Using rowSums
df[rowSums(df[,c('dht','dwt')]==99|df[,c('dht','dwt')]==999)==0,]
ded dht dwt
1 5 65 110
2 5 70 148
6 4 98 998
add a comment |
Using rowSums
df[rowSums(df[,c('dht','dwt')]==99|df[,c('dht','dwt')]==999)==0,]
ded dht dwt
1 5 65 110
2 5 70 148
6 4 98 998
Using rowSums
df[rowSums(df[,c('dht','dwt')]==99|df[,c('dht','dwt')]==999)==0,]
ded dht dwt
1 5 65 110
2 5 70 148
6 4 98 998
answered Jan 2 at 1:12
Wen-BenWen-Ben
117k83469
117k83469
add a comment |
add a comment |
You can replace 99
and 999
with NA
first.
dat[dat == 99 | dat == 999] <- NA
And then use na.omit
or complete.cases
.
na.omit(dat)
# dage ded dht dwt marital inc smoke time number
# 1 31 5 65 110 1 1 0 0 0
# 2 38 5 70 148 1 4 0 0 0
dat[complete.cases(dat), ]
# dage ded dht dwt marital inc smoke time number
# 1 31 5 65 110 1 1 0 0 0
# 2 38 5 70 148 1 4 0 0 0
DATA
dat <- read.table(text = "dage ded dht dwt marital inc smoke time number
31 5 65 110 1 1 0 0 0
38 5 70 148 1 4 0 0 0
32 1 99 999 1 2 1 1 1
28 4 99 999 1 98 3 4 2
35 4 99 999 1 7 0 0 0
33 4 98 998 1 99 0 0 0",
header = TRUE)
add a comment |
You can replace 99
and 999
with NA
first.
dat[dat == 99 | dat == 999] <- NA
And then use na.omit
or complete.cases
.
na.omit(dat)
# dage ded dht dwt marital inc smoke time number
# 1 31 5 65 110 1 1 0 0 0
# 2 38 5 70 148 1 4 0 0 0
dat[complete.cases(dat), ]
# dage ded dht dwt marital inc smoke time number
# 1 31 5 65 110 1 1 0 0 0
# 2 38 5 70 148 1 4 0 0 0
DATA
dat <- read.table(text = "dage ded dht dwt marital inc smoke time number
31 5 65 110 1 1 0 0 0
38 5 70 148 1 4 0 0 0
32 1 99 999 1 2 1 1 1
28 4 99 999 1 98 3 4 2
35 4 99 999 1 7 0 0 0
33 4 98 998 1 99 0 0 0",
header = TRUE)
add a comment |
You can replace 99
and 999
with NA
first.
dat[dat == 99 | dat == 999] <- NA
And then use na.omit
or complete.cases
.
na.omit(dat)
# dage ded dht dwt marital inc smoke time number
# 1 31 5 65 110 1 1 0 0 0
# 2 38 5 70 148 1 4 0 0 0
dat[complete.cases(dat), ]
# dage ded dht dwt marital inc smoke time number
# 1 31 5 65 110 1 1 0 0 0
# 2 38 5 70 148 1 4 0 0 0
DATA
dat <- read.table(text = "dage ded dht dwt marital inc smoke time number
31 5 65 110 1 1 0 0 0
38 5 70 148 1 4 0 0 0
32 1 99 999 1 2 1 1 1
28 4 99 999 1 98 3 4 2
35 4 99 999 1 7 0 0 0
33 4 98 998 1 99 0 0 0",
header = TRUE)
You can replace 99
and 999
with NA
first.
dat[dat == 99 | dat == 999] <- NA
And then use na.omit
or complete.cases
.
na.omit(dat)
# dage ded dht dwt marital inc smoke time number
# 1 31 5 65 110 1 1 0 0 0
# 2 38 5 70 148 1 4 0 0 0
dat[complete.cases(dat), ]
# dage ded dht dwt marital inc smoke time number
# 1 31 5 65 110 1 1 0 0 0
# 2 38 5 70 148 1 4 0 0 0
DATA
dat <- read.table(text = "dage ded dht dwt marital inc smoke time number
31 5 65 110 1 1 0 0 0
38 5 70 148 1 4 0 0 0
32 1 99 999 1 2 1 1 1
28 4 99 999 1 98 3 4 2
35 4 99 999 1 7 0 0 0
33 4 98 998 1 99 0 0 0",
header = TRUE)
answered Jan 2 at 2:30
wwwwww
28k112343
28k112343
add a comment |
add a comment |
If your dataframe is called df1
:
require(dplyr)
filter_all(df1, all_vars(.!=99 & .!=999))
Result:
dage ded dht dwt marital inc smoke time number
1 31 5 65 110 1 1 0 0 0
2 38 5 70 148 1 4 0 0 0
add a comment |
If your dataframe is called df1
:
require(dplyr)
filter_all(df1, all_vars(.!=99 & .!=999))
Result:
dage ded dht dwt marital inc smoke time number
1 31 5 65 110 1 1 0 0 0
2 38 5 70 148 1 4 0 0 0
add a comment |
If your dataframe is called df1
:
require(dplyr)
filter_all(df1, all_vars(.!=99 & .!=999))
Result:
dage ded dht dwt marital inc smoke time number
1 31 5 65 110 1 1 0 0 0
2 38 5 70 148 1 4 0 0 0
If your dataframe is called df1
:
require(dplyr)
filter_all(df1, all_vars(.!=99 & .!=999))
Result:
dage ded dht dwt marital inc smoke time number
1 31 5 65 110 1 1 0 0 0
2 38 5 70 148 1 4 0 0 0
answered Jan 2 at 1:06
iodiod
4,1512723
4,1512723
add a comment |
add a comment |
Here's a solution using any()
and apply()
that doesn't require any supplemental packages:
#fake data
d <- data.frame(a = c(1,2,3,4,99), b = c(99, 1,2,999,4))
#subset rows that don't contain a 99 or 999
d[!apply(d, 1, function(x) any(x %in% c(99,999))),]
Yields:
a b
2 2 1
3 3 2
add a comment |
Here's a solution using any()
and apply()
that doesn't require any supplemental packages:
#fake data
d <- data.frame(a = c(1,2,3,4,99), b = c(99, 1,2,999,4))
#subset rows that don't contain a 99 or 999
d[!apply(d, 1, function(x) any(x %in% c(99,999))),]
Yields:
a b
2 2 1
3 3 2
add a comment |
Here's a solution using any()
and apply()
that doesn't require any supplemental packages:
#fake data
d <- data.frame(a = c(1,2,3,4,99), b = c(99, 1,2,999,4))
#subset rows that don't contain a 99 or 999
d[!apply(d, 1, function(x) any(x %in% c(99,999))),]
Yields:
a b
2 2 1
3 3 2
Here's a solution using any()
and apply()
that doesn't require any supplemental packages:
#fake data
d <- data.frame(a = c(1,2,3,4,99), b = c(99, 1,2,999,4))
#subset rows that don't contain a 99 or 999
d[!apply(d, 1, function(x) any(x %in% c(99,999))),]
Yields:
a b
2 2 1
3 3 2
answered Jan 2 at 1:08
ChaseChase
50.4k12118153
50.4k12118153
add a comment |
add a comment |
Create data.frame as shown in original question:
df <- structure(list(dage = c(31L, 38L, 32L, 28L, 35L, 33L), ded = c(5L,
5L, 1L, 4L, 4L, 4L), dht = c(65L, 70L, 99L, 99L, 99L, 98L), dwt = c(110L,
148L, 999L, 999L, 999L, 998L), marital = c(1L, 1L, 1L, 1L, 1L,
1L), inc = c(1L, 4L, 2L, 98L, 7L, 99L), smoke = c(0L, 0L, 1L,
3L, 0L, 0L), time = c(0L, 0L, 1L, 4L, 0L, 0L), number = c(0L,
0L, 1L, 2L, 0L, 0L)), row.names = c(NA, -6L), class = "data.frame")
data.table
solution:
library(data.table)
dt <- as.data.table(df)
dt[rowSums(df == 99)==0 & rowSums(df == 999)==0]
base
R solution:
df[!apply(df, 1, function(x) any(x %in% c(99,999))),]
dplyr
solution:
require(dplyr)
filter_all(df, all_vars(.!=99 & .!=999))
Benchmarks:
microbenchmark::microbenchmark(dt = dt[rowSums(df == 99)==0 & rowSums(df == 999)==0],
base = df[!apply(df, 1, function(x) any(x %in% c(99,999))),],
dplyr = filter_all(df, all_vars(.!=99 & .!=999)), times = 10000)
# Unit: microseconds
#expr min lq mean median uq max neval
#dt 588.000 645.801 701.4309 675.6005 723.2515 5203.801 10000
#base 264.601 296.901 324.2588 314.4005 335.7020 3435.600 10000
#dplyr 3671.400 3854.301 4036.3976 3915.3010 3983.0010 139226.802 10000
All the 3 solutions are already included. Am I missing something ?
– Ronak Shah
Jan 2 at 4:36
1
@RonakShah You are not ... except that my data.table solution is a tad different to the base solution mentioned above. I was interested myself how the three approaches benchmarked, hence why I provided my answer. Not sure if speed is of concern or a certain method is preferred (e.g. tidyverse over data.table).
– Khaynes
Jan 2 at 5:11
add a comment |
Create data.frame as shown in original question:
df <- structure(list(dage = c(31L, 38L, 32L, 28L, 35L, 33L), ded = c(5L,
5L, 1L, 4L, 4L, 4L), dht = c(65L, 70L, 99L, 99L, 99L, 98L), dwt = c(110L,
148L, 999L, 999L, 999L, 998L), marital = c(1L, 1L, 1L, 1L, 1L,
1L), inc = c(1L, 4L, 2L, 98L, 7L, 99L), smoke = c(0L, 0L, 1L,
3L, 0L, 0L), time = c(0L, 0L, 1L, 4L, 0L, 0L), number = c(0L,
0L, 1L, 2L, 0L, 0L)), row.names = c(NA, -6L), class = "data.frame")
data.table
solution:
library(data.table)
dt <- as.data.table(df)
dt[rowSums(df == 99)==0 & rowSums(df == 999)==0]
base
R solution:
df[!apply(df, 1, function(x) any(x %in% c(99,999))),]
dplyr
solution:
require(dplyr)
filter_all(df, all_vars(.!=99 & .!=999))
Benchmarks:
microbenchmark::microbenchmark(dt = dt[rowSums(df == 99)==0 & rowSums(df == 999)==0],
base = df[!apply(df, 1, function(x) any(x %in% c(99,999))),],
dplyr = filter_all(df, all_vars(.!=99 & .!=999)), times = 10000)
# Unit: microseconds
#expr min lq mean median uq max neval
#dt 588.000 645.801 701.4309 675.6005 723.2515 5203.801 10000
#base 264.601 296.901 324.2588 314.4005 335.7020 3435.600 10000
#dplyr 3671.400 3854.301 4036.3976 3915.3010 3983.0010 139226.802 10000
All the 3 solutions are already included. Am I missing something ?
– Ronak Shah
Jan 2 at 4:36
1
@RonakShah You are not ... except that my data.table solution is a tad different to the base solution mentioned above. I was interested myself how the three approaches benchmarked, hence why I provided my answer. Not sure if speed is of concern or a certain method is preferred (e.g. tidyverse over data.table).
– Khaynes
Jan 2 at 5:11
add a comment |
Create data.frame as shown in original question:
df <- structure(list(dage = c(31L, 38L, 32L, 28L, 35L, 33L), ded = c(5L,
5L, 1L, 4L, 4L, 4L), dht = c(65L, 70L, 99L, 99L, 99L, 98L), dwt = c(110L,
148L, 999L, 999L, 999L, 998L), marital = c(1L, 1L, 1L, 1L, 1L,
1L), inc = c(1L, 4L, 2L, 98L, 7L, 99L), smoke = c(0L, 0L, 1L,
3L, 0L, 0L), time = c(0L, 0L, 1L, 4L, 0L, 0L), number = c(0L,
0L, 1L, 2L, 0L, 0L)), row.names = c(NA, -6L), class = "data.frame")
data.table
solution:
library(data.table)
dt <- as.data.table(df)
dt[rowSums(df == 99)==0 & rowSums(df == 999)==0]
base
R solution:
df[!apply(df, 1, function(x) any(x %in% c(99,999))),]
dplyr
solution:
require(dplyr)
filter_all(df, all_vars(.!=99 & .!=999))
Benchmarks:
microbenchmark::microbenchmark(dt = dt[rowSums(df == 99)==0 & rowSums(df == 999)==0],
base = df[!apply(df, 1, function(x) any(x %in% c(99,999))),],
dplyr = filter_all(df, all_vars(.!=99 & .!=999)), times = 10000)
# Unit: microseconds
#expr min lq mean median uq max neval
#dt 588.000 645.801 701.4309 675.6005 723.2515 5203.801 10000
#base 264.601 296.901 324.2588 314.4005 335.7020 3435.600 10000
#dplyr 3671.400 3854.301 4036.3976 3915.3010 3983.0010 139226.802 10000
Create data.frame as shown in original question:
df <- structure(list(dage = c(31L, 38L, 32L, 28L, 35L, 33L), ded = c(5L,
5L, 1L, 4L, 4L, 4L), dht = c(65L, 70L, 99L, 99L, 99L, 98L), dwt = c(110L,
148L, 999L, 999L, 999L, 998L), marital = c(1L, 1L, 1L, 1L, 1L,
1L), inc = c(1L, 4L, 2L, 98L, 7L, 99L), smoke = c(0L, 0L, 1L,
3L, 0L, 0L), time = c(0L, 0L, 1L, 4L, 0L, 0L), number = c(0L,
0L, 1L, 2L, 0L, 0L)), row.names = c(NA, -6L), class = "data.frame")
data.table
solution:
library(data.table)
dt <- as.data.table(df)
dt[rowSums(df == 99)==0 & rowSums(df == 999)==0]
base
R solution:
df[!apply(df, 1, function(x) any(x %in% c(99,999))),]
dplyr
solution:
require(dplyr)
filter_all(df, all_vars(.!=99 & .!=999))
Benchmarks:
microbenchmark::microbenchmark(dt = dt[rowSums(df == 99)==0 & rowSums(df == 999)==0],
base = df[!apply(df, 1, function(x) any(x %in% c(99,999))),],
dplyr = filter_all(df, all_vars(.!=99 & .!=999)), times = 10000)
# Unit: microseconds
#expr min lq mean median uq max neval
#dt 588.000 645.801 701.4309 675.6005 723.2515 5203.801 10000
#base 264.601 296.901 324.2588 314.4005 335.7020 3435.600 10000
#dplyr 3671.400 3854.301 4036.3976 3915.3010 3983.0010 139226.802 10000
edited Jan 2 at 5:30
answered Jan 2 at 2:33
KhaynesKhaynes
717721
717721
All the 3 solutions are already included. Am I missing something ?
– Ronak Shah
Jan 2 at 4:36
1
@RonakShah You are not ... except that my data.table solution is a tad different to the base solution mentioned above. I was interested myself how the three approaches benchmarked, hence why I provided my answer. Not sure if speed is of concern or a certain method is preferred (e.g. tidyverse over data.table).
– Khaynes
Jan 2 at 5:11
add a comment |
All the 3 solutions are already included. Am I missing something ?
– Ronak Shah
Jan 2 at 4:36
1
@RonakShah You are not ... except that my data.table solution is a tad different to the base solution mentioned above. I was interested myself how the three approaches benchmarked, hence why I provided my answer. Not sure if speed is of concern or a certain method is preferred (e.g. tidyverse over data.table).
– Khaynes
Jan 2 at 5:11
All the 3 solutions are already included. Am I missing something ?
– Ronak Shah
Jan 2 at 4:36
All the 3 solutions are already included. Am I missing something ?
– Ronak Shah
Jan 2 at 4:36
1
1
@RonakShah You are not ... except that my data.table solution is a tad different to the base solution mentioned above. I was interested myself how the three approaches benchmarked, hence why I provided my answer. Not sure if speed is of concern or a certain method is preferred (e.g. tidyverse over data.table).
– Khaynes
Jan 2 at 5:11
@RonakShah You are not ... except that my data.table solution is a tad different to the base solution mentioned above. I was interested myself how the three approaches benchmarked, hence why I provided my answer. Not sure if speed is of concern or a certain method is preferred (e.g. tidyverse over data.table).
– Khaynes
Jan 2 at 5:11
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54000128%2fdelete-rows-from-a-dataframe-from-multi-variables-in-a-database%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
It was better with the textual data. Having an image means people can't just copy-paste your data to try it out on their own system.
– iod
Jan 2 at 1:08