Delete rows from a dataframe from multi variables in a database












1















I have the following data.frame:



dage ded dht dwt marital inc smoke time number
31 5 65 110 1 1 0 0 0
38 5 70 148 1 4 0 0 0
32 1 99 999 1 2 1 1 1
28 4 99 999 1 98 3 4 2
35 4 99 999 1 7 0 0 0
33 4 98 998 1 99 0 0 0


I want to remove any row that has the number 99 or 999 (or both).



data.frame structure:



df <- structure(list(dage = c(31L, 38L, 32L, 28L, 35L, 33L), ded = c(5L, 
5L, 1L, 4L, 4L, 4L), dht = c(65L, 70L, 99L, 99L, 99L, 98L), dwt = c(110L,
148L, 999L, 999L, 999L, 998L), marital = c(1L, 1L, 1L, 1L, 1L,
1L), inc = c(1L, 4L, 2L, 98L, 7L, 99L), smoke = c(0L, 0L, 1L,
3L, 0L, 0L), time = c(0L, 0L, 1L, 4L, 0L, 0L), number = c(0L,
0L, 1L, 2L, 0L, 0L)), row.names = c(NA, -6L), class = "data.frame")









share|improve this question

























  • It was better with the textual data. Having an image means people can't just copy-paste your data to try it out on their own system.

    – iod
    Jan 2 at 1:08
















1















I have the following data.frame:



dage ded dht dwt marital inc smoke time number
31 5 65 110 1 1 0 0 0
38 5 70 148 1 4 0 0 0
32 1 99 999 1 2 1 1 1
28 4 99 999 1 98 3 4 2
35 4 99 999 1 7 0 0 0
33 4 98 998 1 99 0 0 0


I want to remove any row that has the number 99 or 999 (or both).



data.frame structure:



df <- structure(list(dage = c(31L, 38L, 32L, 28L, 35L, 33L), ded = c(5L, 
5L, 1L, 4L, 4L, 4L), dht = c(65L, 70L, 99L, 99L, 99L, 98L), dwt = c(110L,
148L, 999L, 999L, 999L, 998L), marital = c(1L, 1L, 1L, 1L, 1L,
1L), inc = c(1L, 4L, 2L, 98L, 7L, 99L), smoke = c(0L, 0L, 1L,
3L, 0L, 0L), time = c(0L, 0L, 1L, 4L, 0L, 0L), number = c(0L,
0L, 1L, 2L, 0L, 0L)), row.names = c(NA, -6L), class = "data.frame")









share|improve this question

























  • It was better with the textual data. Having an image means people can't just copy-paste your data to try it out on their own system.

    – iod
    Jan 2 at 1:08














1












1








1








I have the following data.frame:



dage ded dht dwt marital inc smoke time number
31 5 65 110 1 1 0 0 0
38 5 70 148 1 4 0 0 0
32 1 99 999 1 2 1 1 1
28 4 99 999 1 98 3 4 2
35 4 99 999 1 7 0 0 0
33 4 98 998 1 99 0 0 0


I want to remove any row that has the number 99 or 999 (or both).



data.frame structure:



df <- structure(list(dage = c(31L, 38L, 32L, 28L, 35L, 33L), ded = c(5L, 
5L, 1L, 4L, 4L, 4L), dht = c(65L, 70L, 99L, 99L, 99L, 98L), dwt = c(110L,
148L, 999L, 999L, 999L, 998L), marital = c(1L, 1L, 1L, 1L, 1L,
1L), inc = c(1L, 4L, 2L, 98L, 7L, 99L), smoke = c(0L, 0L, 1L,
3L, 0L, 0L), time = c(0L, 0L, 1L, 4L, 0L, 0L), number = c(0L,
0L, 1L, 2L, 0L, 0L)), row.names = c(NA, -6L), class = "data.frame")









share|improve this question
















I have the following data.frame:



dage ded dht dwt marital inc smoke time number
31 5 65 110 1 1 0 0 0
38 5 70 148 1 4 0 0 0
32 1 99 999 1 2 1 1 1
28 4 99 999 1 98 3 4 2
35 4 99 999 1 7 0 0 0
33 4 98 998 1 99 0 0 0


I want to remove any row that has the number 99 or 999 (or both).



data.frame structure:



df <- structure(list(dage = c(31L, 38L, 32L, 28L, 35L, 33L), ded = c(5L, 
5L, 1L, 4L, 4L, 4L), dht = c(65L, 70L, 99L, 99L, 99L, 98L), dwt = c(110L,
148L, 999L, 999L, 999L, 998L), marital = c(1L, 1L, 1L, 1L, 1L,
1L), inc = c(1L, 4L, 2L, 98L, 7L, 99L), smoke = c(0L, 0L, 1L,
3L, 0L, 0L), time = c(0L, 0L, 1L, 4L, 0L, 0L), number = c(0L,
0L, 1L, 2L, 0L, 0L)), row.names = c(NA, -6L), class = "data.frame")






r dataframe






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 2 at 5:41









Khaynes

717721




717721










asked Jan 2 at 0:51









Christos VarvarrigosChristos Varvarrigos

274




274













  • It was better with the textual data. Having an image means people can't just copy-paste your data to try it out on their own system.

    – iod
    Jan 2 at 1:08



















  • It was better with the textual data. Having an image means people can't just copy-paste your data to try it out on their own system.

    – iod
    Jan 2 at 1:08

















It was better with the textual data. Having an image means people can't just copy-paste your data to try it out on their own system.

– iod
Jan 2 at 1:08





It was better with the textual data. Having an image means people can't just copy-paste your data to try it out on their own system.

– iod
Jan 2 at 1:08












5 Answers
5






active

oldest

votes


















0














Using rowSums



df[rowSums(df[,c('dht','dwt')]==99|df[,c('dht','dwt')]==999)==0,]
ded dht dwt
1 5 65 110
2 5 70 148
6 4 98 998





share|improve this answer































    1














    You can replace 99 and 999 with NA first.



    dat[dat == 99 | dat == 999] <- NA


    And then use na.omit or complete.cases.



    na.omit(dat)
    # dage ded dht dwt marital inc smoke time number
    # 1 31 5 65 110 1 1 0 0 0
    # 2 38 5 70 148 1 4 0 0 0

    dat[complete.cases(dat), ]
    # dage ded dht dwt marital inc smoke time number
    # 1 31 5 65 110 1 1 0 0 0
    # 2 38 5 70 148 1 4 0 0 0


    DATA



    dat <- read.table(text = "dage ded dht dwt marital inc smoke time number
    31 5 65 110 1 1 0 0 0
    38 5 70 148 1 4 0 0 0
    32 1 99 999 1 2 1 1 1
    28 4 99 999 1 98 3 4 2
    35 4 99 999 1 7 0 0 0
    33 4 98 998 1 99 0 0 0",
    header = TRUE)





    share|improve this answer































      0














      If your dataframe is called df1:



      require(dplyr)
      filter_all(df1, all_vars(.!=99 & .!=999))


      Result:



        dage ded dht dwt marital inc smoke time number
      1 31 5 65 110 1 1 0 0 0
      2 38 5 70 148 1 4 0 0 0





      share|improve this answer































        0














        Here's a solution using any() and apply() that doesn't require any supplemental packages:



        #fake data
        d <- data.frame(a = c(1,2,3,4,99), b = c(99, 1,2,999,4))
        #subset rows that don't contain a 99 or 999
        d[!apply(d, 1, function(x) any(x %in% c(99,999))),]


        Yields:



          a b
        2 2 1
        3 3 2





        share|improve this answer































          0














          Create data.frame as shown in original question:



          df <- structure(list(dage = c(31L, 38L, 32L, 28L, 35L, 33L), ded = c(5L, 
          5L, 1L, 4L, 4L, 4L), dht = c(65L, 70L, 99L, 99L, 99L, 98L), dwt = c(110L,
          148L, 999L, 999L, 999L, 998L), marital = c(1L, 1L, 1L, 1L, 1L,
          1L), inc = c(1L, 4L, 2L, 98L, 7L, 99L), smoke = c(0L, 0L, 1L,
          3L, 0L, 0L), time = c(0L, 0L, 1L, 4L, 0L, 0L), number = c(0L,
          0L, 1L, 2L, 0L, 0L)), row.names = c(NA, -6L), class = "data.frame")


          data.table solution:



          library(data.table)
          dt <- as.data.table(df)
          dt[rowSums(df == 99)==0 & rowSums(df == 999)==0]


          base R solution:



           df[!apply(df, 1, function(x) any(x %in% c(99,999))),]


          dplyr solution:



          require(dplyr)
          filter_all(df, all_vars(.!=99 & .!=999))


          Benchmarks:



          microbenchmark::microbenchmark(dt = dt[rowSums(df == 99)==0 & rowSums(df == 999)==0], 
          base = df[!apply(df, 1, function(x) any(x %in% c(99,999))),],
          dplyr = filter_all(df, all_vars(.!=99 & .!=999)), times = 10000)
          # Unit: microseconds
          #expr min lq mean median uq max neval
          #dt 588.000 645.801 701.4309 675.6005 723.2515 5203.801 10000
          #base 264.601 296.901 324.2588 314.4005 335.7020 3435.600 10000
          #dplyr 3671.400 3854.301 4036.3976 3915.3010 3983.0010 139226.802 10000





          share|improve this answer


























          • All the 3 solutions are already included. Am I missing something ?

            – Ronak Shah
            Jan 2 at 4:36






          • 1





            @RonakShah You are not ... except that my data.table solution is a tad different to the base solution mentioned above. I was interested myself how the three approaches benchmarked, hence why I provided my answer. Not sure if speed is of concern or a certain method is preferred (e.g. tidyverse over data.table).

            – Khaynes
            Jan 2 at 5:11











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54000128%2fdelete-rows-from-a-dataframe-from-multi-variables-in-a-database%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          5 Answers
          5






          active

          oldest

          votes








          5 Answers
          5






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          0














          Using rowSums



          df[rowSums(df[,c('dht','dwt')]==99|df[,c('dht','dwt')]==999)==0,]
          ded dht dwt
          1 5 65 110
          2 5 70 148
          6 4 98 998





          share|improve this answer




























            0














            Using rowSums



            df[rowSums(df[,c('dht','dwt')]==99|df[,c('dht','dwt')]==999)==0,]
            ded dht dwt
            1 5 65 110
            2 5 70 148
            6 4 98 998





            share|improve this answer


























              0












              0








              0







              Using rowSums



              df[rowSums(df[,c('dht','dwt')]==99|df[,c('dht','dwt')]==999)==0,]
              ded dht dwt
              1 5 65 110
              2 5 70 148
              6 4 98 998





              share|improve this answer













              Using rowSums



              df[rowSums(df[,c('dht','dwt')]==99|df[,c('dht','dwt')]==999)==0,]
              ded dht dwt
              1 5 65 110
              2 5 70 148
              6 4 98 998






              share|improve this answer












              share|improve this answer



              share|improve this answer










              answered Jan 2 at 1:12









              Wen-BenWen-Ben

              117k83469




              117k83469

























                  1














                  You can replace 99 and 999 with NA first.



                  dat[dat == 99 | dat == 999] <- NA


                  And then use na.omit or complete.cases.



                  na.omit(dat)
                  # dage ded dht dwt marital inc smoke time number
                  # 1 31 5 65 110 1 1 0 0 0
                  # 2 38 5 70 148 1 4 0 0 0

                  dat[complete.cases(dat), ]
                  # dage ded dht dwt marital inc smoke time number
                  # 1 31 5 65 110 1 1 0 0 0
                  # 2 38 5 70 148 1 4 0 0 0


                  DATA



                  dat <- read.table(text = "dage ded dht dwt marital inc smoke time number
                  31 5 65 110 1 1 0 0 0
                  38 5 70 148 1 4 0 0 0
                  32 1 99 999 1 2 1 1 1
                  28 4 99 999 1 98 3 4 2
                  35 4 99 999 1 7 0 0 0
                  33 4 98 998 1 99 0 0 0",
                  header = TRUE)





                  share|improve this answer




























                    1














                    You can replace 99 and 999 with NA first.



                    dat[dat == 99 | dat == 999] <- NA


                    And then use na.omit or complete.cases.



                    na.omit(dat)
                    # dage ded dht dwt marital inc smoke time number
                    # 1 31 5 65 110 1 1 0 0 0
                    # 2 38 5 70 148 1 4 0 0 0

                    dat[complete.cases(dat), ]
                    # dage ded dht dwt marital inc smoke time number
                    # 1 31 5 65 110 1 1 0 0 0
                    # 2 38 5 70 148 1 4 0 0 0


                    DATA



                    dat <- read.table(text = "dage ded dht dwt marital inc smoke time number
                    31 5 65 110 1 1 0 0 0
                    38 5 70 148 1 4 0 0 0
                    32 1 99 999 1 2 1 1 1
                    28 4 99 999 1 98 3 4 2
                    35 4 99 999 1 7 0 0 0
                    33 4 98 998 1 99 0 0 0",
                    header = TRUE)





                    share|improve this answer


























                      1












                      1








                      1







                      You can replace 99 and 999 with NA first.



                      dat[dat == 99 | dat == 999] <- NA


                      And then use na.omit or complete.cases.



                      na.omit(dat)
                      # dage ded dht dwt marital inc smoke time number
                      # 1 31 5 65 110 1 1 0 0 0
                      # 2 38 5 70 148 1 4 0 0 0

                      dat[complete.cases(dat), ]
                      # dage ded dht dwt marital inc smoke time number
                      # 1 31 5 65 110 1 1 0 0 0
                      # 2 38 5 70 148 1 4 0 0 0


                      DATA



                      dat <- read.table(text = "dage ded dht dwt marital inc smoke time number
                      31 5 65 110 1 1 0 0 0
                      38 5 70 148 1 4 0 0 0
                      32 1 99 999 1 2 1 1 1
                      28 4 99 999 1 98 3 4 2
                      35 4 99 999 1 7 0 0 0
                      33 4 98 998 1 99 0 0 0",
                      header = TRUE)





                      share|improve this answer













                      You can replace 99 and 999 with NA first.



                      dat[dat == 99 | dat == 999] <- NA


                      And then use na.omit or complete.cases.



                      na.omit(dat)
                      # dage ded dht dwt marital inc smoke time number
                      # 1 31 5 65 110 1 1 0 0 0
                      # 2 38 5 70 148 1 4 0 0 0

                      dat[complete.cases(dat), ]
                      # dage ded dht dwt marital inc smoke time number
                      # 1 31 5 65 110 1 1 0 0 0
                      # 2 38 5 70 148 1 4 0 0 0


                      DATA



                      dat <- read.table(text = "dage ded dht dwt marital inc smoke time number
                      31 5 65 110 1 1 0 0 0
                      38 5 70 148 1 4 0 0 0
                      32 1 99 999 1 2 1 1 1
                      28 4 99 999 1 98 3 4 2
                      35 4 99 999 1 7 0 0 0
                      33 4 98 998 1 99 0 0 0",
                      header = TRUE)






                      share|improve this answer












                      share|improve this answer



                      share|improve this answer










                      answered Jan 2 at 2:30









                      wwwwww

                      28k112343




                      28k112343























                          0














                          If your dataframe is called df1:



                          require(dplyr)
                          filter_all(df1, all_vars(.!=99 & .!=999))


                          Result:



                            dage ded dht dwt marital inc smoke time number
                          1 31 5 65 110 1 1 0 0 0
                          2 38 5 70 148 1 4 0 0 0





                          share|improve this answer




























                            0














                            If your dataframe is called df1:



                            require(dplyr)
                            filter_all(df1, all_vars(.!=99 & .!=999))


                            Result:



                              dage ded dht dwt marital inc smoke time number
                            1 31 5 65 110 1 1 0 0 0
                            2 38 5 70 148 1 4 0 0 0





                            share|improve this answer


























                              0












                              0








                              0







                              If your dataframe is called df1:



                              require(dplyr)
                              filter_all(df1, all_vars(.!=99 & .!=999))


                              Result:



                                dage ded dht dwt marital inc smoke time number
                              1 31 5 65 110 1 1 0 0 0
                              2 38 5 70 148 1 4 0 0 0





                              share|improve this answer













                              If your dataframe is called df1:



                              require(dplyr)
                              filter_all(df1, all_vars(.!=99 & .!=999))


                              Result:



                                dage ded dht dwt marital inc smoke time number
                              1 31 5 65 110 1 1 0 0 0
                              2 38 5 70 148 1 4 0 0 0






                              share|improve this answer












                              share|improve this answer



                              share|improve this answer










                              answered Jan 2 at 1:06









                              iodiod

                              4,1512723




                              4,1512723























                                  0














                                  Here's a solution using any() and apply() that doesn't require any supplemental packages:



                                  #fake data
                                  d <- data.frame(a = c(1,2,3,4,99), b = c(99, 1,2,999,4))
                                  #subset rows that don't contain a 99 or 999
                                  d[!apply(d, 1, function(x) any(x %in% c(99,999))),]


                                  Yields:



                                    a b
                                  2 2 1
                                  3 3 2





                                  share|improve this answer




























                                    0














                                    Here's a solution using any() and apply() that doesn't require any supplemental packages:



                                    #fake data
                                    d <- data.frame(a = c(1,2,3,4,99), b = c(99, 1,2,999,4))
                                    #subset rows that don't contain a 99 or 999
                                    d[!apply(d, 1, function(x) any(x %in% c(99,999))),]


                                    Yields:



                                      a b
                                    2 2 1
                                    3 3 2





                                    share|improve this answer


























                                      0












                                      0








                                      0







                                      Here's a solution using any() and apply() that doesn't require any supplemental packages:



                                      #fake data
                                      d <- data.frame(a = c(1,2,3,4,99), b = c(99, 1,2,999,4))
                                      #subset rows that don't contain a 99 or 999
                                      d[!apply(d, 1, function(x) any(x %in% c(99,999))),]


                                      Yields:



                                        a b
                                      2 2 1
                                      3 3 2





                                      share|improve this answer













                                      Here's a solution using any() and apply() that doesn't require any supplemental packages:



                                      #fake data
                                      d <- data.frame(a = c(1,2,3,4,99), b = c(99, 1,2,999,4))
                                      #subset rows that don't contain a 99 or 999
                                      d[!apply(d, 1, function(x) any(x %in% c(99,999))),]


                                      Yields:



                                        a b
                                      2 2 1
                                      3 3 2






                                      share|improve this answer












                                      share|improve this answer



                                      share|improve this answer










                                      answered Jan 2 at 1:08









                                      ChaseChase

                                      50.4k12118153




                                      50.4k12118153























                                          0














                                          Create data.frame as shown in original question:



                                          df <- structure(list(dage = c(31L, 38L, 32L, 28L, 35L, 33L), ded = c(5L, 
                                          5L, 1L, 4L, 4L, 4L), dht = c(65L, 70L, 99L, 99L, 99L, 98L), dwt = c(110L,
                                          148L, 999L, 999L, 999L, 998L), marital = c(1L, 1L, 1L, 1L, 1L,
                                          1L), inc = c(1L, 4L, 2L, 98L, 7L, 99L), smoke = c(0L, 0L, 1L,
                                          3L, 0L, 0L), time = c(0L, 0L, 1L, 4L, 0L, 0L), number = c(0L,
                                          0L, 1L, 2L, 0L, 0L)), row.names = c(NA, -6L), class = "data.frame")


                                          data.table solution:



                                          library(data.table)
                                          dt <- as.data.table(df)
                                          dt[rowSums(df == 99)==0 & rowSums(df == 999)==0]


                                          base R solution:



                                           df[!apply(df, 1, function(x) any(x %in% c(99,999))),]


                                          dplyr solution:



                                          require(dplyr)
                                          filter_all(df, all_vars(.!=99 & .!=999))


                                          Benchmarks:



                                          microbenchmark::microbenchmark(dt = dt[rowSums(df == 99)==0 & rowSums(df == 999)==0], 
                                          base = df[!apply(df, 1, function(x) any(x %in% c(99,999))),],
                                          dplyr = filter_all(df, all_vars(.!=99 & .!=999)), times = 10000)
                                          # Unit: microseconds
                                          #expr min lq mean median uq max neval
                                          #dt 588.000 645.801 701.4309 675.6005 723.2515 5203.801 10000
                                          #base 264.601 296.901 324.2588 314.4005 335.7020 3435.600 10000
                                          #dplyr 3671.400 3854.301 4036.3976 3915.3010 3983.0010 139226.802 10000





                                          share|improve this answer


























                                          • All the 3 solutions are already included. Am I missing something ?

                                            – Ronak Shah
                                            Jan 2 at 4:36






                                          • 1





                                            @RonakShah You are not ... except that my data.table solution is a tad different to the base solution mentioned above. I was interested myself how the three approaches benchmarked, hence why I provided my answer. Not sure if speed is of concern or a certain method is preferred (e.g. tidyverse over data.table).

                                            – Khaynes
                                            Jan 2 at 5:11
















                                          0














                                          Create data.frame as shown in original question:



                                          df <- structure(list(dage = c(31L, 38L, 32L, 28L, 35L, 33L), ded = c(5L, 
                                          5L, 1L, 4L, 4L, 4L), dht = c(65L, 70L, 99L, 99L, 99L, 98L), dwt = c(110L,
                                          148L, 999L, 999L, 999L, 998L), marital = c(1L, 1L, 1L, 1L, 1L,
                                          1L), inc = c(1L, 4L, 2L, 98L, 7L, 99L), smoke = c(0L, 0L, 1L,
                                          3L, 0L, 0L), time = c(0L, 0L, 1L, 4L, 0L, 0L), number = c(0L,
                                          0L, 1L, 2L, 0L, 0L)), row.names = c(NA, -6L), class = "data.frame")


                                          data.table solution:



                                          library(data.table)
                                          dt <- as.data.table(df)
                                          dt[rowSums(df == 99)==0 & rowSums(df == 999)==0]


                                          base R solution:



                                           df[!apply(df, 1, function(x) any(x %in% c(99,999))),]


                                          dplyr solution:



                                          require(dplyr)
                                          filter_all(df, all_vars(.!=99 & .!=999))


                                          Benchmarks:



                                          microbenchmark::microbenchmark(dt = dt[rowSums(df == 99)==0 & rowSums(df == 999)==0], 
                                          base = df[!apply(df, 1, function(x) any(x %in% c(99,999))),],
                                          dplyr = filter_all(df, all_vars(.!=99 & .!=999)), times = 10000)
                                          # Unit: microseconds
                                          #expr min lq mean median uq max neval
                                          #dt 588.000 645.801 701.4309 675.6005 723.2515 5203.801 10000
                                          #base 264.601 296.901 324.2588 314.4005 335.7020 3435.600 10000
                                          #dplyr 3671.400 3854.301 4036.3976 3915.3010 3983.0010 139226.802 10000





                                          share|improve this answer


























                                          • All the 3 solutions are already included. Am I missing something ?

                                            – Ronak Shah
                                            Jan 2 at 4:36






                                          • 1





                                            @RonakShah You are not ... except that my data.table solution is a tad different to the base solution mentioned above. I was interested myself how the three approaches benchmarked, hence why I provided my answer. Not sure if speed is of concern or a certain method is preferred (e.g. tidyverse over data.table).

                                            – Khaynes
                                            Jan 2 at 5:11














                                          0












                                          0








                                          0







                                          Create data.frame as shown in original question:



                                          df <- structure(list(dage = c(31L, 38L, 32L, 28L, 35L, 33L), ded = c(5L, 
                                          5L, 1L, 4L, 4L, 4L), dht = c(65L, 70L, 99L, 99L, 99L, 98L), dwt = c(110L,
                                          148L, 999L, 999L, 999L, 998L), marital = c(1L, 1L, 1L, 1L, 1L,
                                          1L), inc = c(1L, 4L, 2L, 98L, 7L, 99L), smoke = c(0L, 0L, 1L,
                                          3L, 0L, 0L), time = c(0L, 0L, 1L, 4L, 0L, 0L), number = c(0L,
                                          0L, 1L, 2L, 0L, 0L)), row.names = c(NA, -6L), class = "data.frame")


                                          data.table solution:



                                          library(data.table)
                                          dt <- as.data.table(df)
                                          dt[rowSums(df == 99)==0 & rowSums(df == 999)==0]


                                          base R solution:



                                           df[!apply(df, 1, function(x) any(x %in% c(99,999))),]


                                          dplyr solution:



                                          require(dplyr)
                                          filter_all(df, all_vars(.!=99 & .!=999))


                                          Benchmarks:



                                          microbenchmark::microbenchmark(dt = dt[rowSums(df == 99)==0 & rowSums(df == 999)==0], 
                                          base = df[!apply(df, 1, function(x) any(x %in% c(99,999))),],
                                          dplyr = filter_all(df, all_vars(.!=99 & .!=999)), times = 10000)
                                          # Unit: microseconds
                                          #expr min lq mean median uq max neval
                                          #dt 588.000 645.801 701.4309 675.6005 723.2515 5203.801 10000
                                          #base 264.601 296.901 324.2588 314.4005 335.7020 3435.600 10000
                                          #dplyr 3671.400 3854.301 4036.3976 3915.3010 3983.0010 139226.802 10000





                                          share|improve this answer















                                          Create data.frame as shown in original question:



                                          df <- structure(list(dage = c(31L, 38L, 32L, 28L, 35L, 33L), ded = c(5L, 
                                          5L, 1L, 4L, 4L, 4L), dht = c(65L, 70L, 99L, 99L, 99L, 98L), dwt = c(110L,
                                          148L, 999L, 999L, 999L, 998L), marital = c(1L, 1L, 1L, 1L, 1L,
                                          1L), inc = c(1L, 4L, 2L, 98L, 7L, 99L), smoke = c(0L, 0L, 1L,
                                          3L, 0L, 0L), time = c(0L, 0L, 1L, 4L, 0L, 0L), number = c(0L,
                                          0L, 1L, 2L, 0L, 0L)), row.names = c(NA, -6L), class = "data.frame")


                                          data.table solution:



                                          library(data.table)
                                          dt <- as.data.table(df)
                                          dt[rowSums(df == 99)==0 & rowSums(df == 999)==0]


                                          base R solution:



                                           df[!apply(df, 1, function(x) any(x %in% c(99,999))),]


                                          dplyr solution:



                                          require(dplyr)
                                          filter_all(df, all_vars(.!=99 & .!=999))


                                          Benchmarks:



                                          microbenchmark::microbenchmark(dt = dt[rowSums(df == 99)==0 & rowSums(df == 999)==0], 
                                          base = df[!apply(df, 1, function(x) any(x %in% c(99,999))),],
                                          dplyr = filter_all(df, all_vars(.!=99 & .!=999)), times = 10000)
                                          # Unit: microseconds
                                          #expr min lq mean median uq max neval
                                          #dt 588.000 645.801 701.4309 675.6005 723.2515 5203.801 10000
                                          #base 264.601 296.901 324.2588 314.4005 335.7020 3435.600 10000
                                          #dplyr 3671.400 3854.301 4036.3976 3915.3010 3983.0010 139226.802 10000






                                          share|improve this answer














                                          share|improve this answer



                                          share|improve this answer








                                          edited Jan 2 at 5:30

























                                          answered Jan 2 at 2:33









                                          KhaynesKhaynes

                                          717721




                                          717721













                                          • All the 3 solutions are already included. Am I missing something ?

                                            – Ronak Shah
                                            Jan 2 at 4:36






                                          • 1





                                            @RonakShah You are not ... except that my data.table solution is a tad different to the base solution mentioned above. I was interested myself how the three approaches benchmarked, hence why I provided my answer. Not sure if speed is of concern or a certain method is preferred (e.g. tidyverse over data.table).

                                            – Khaynes
                                            Jan 2 at 5:11



















                                          • All the 3 solutions are already included. Am I missing something ?

                                            – Ronak Shah
                                            Jan 2 at 4:36






                                          • 1





                                            @RonakShah You are not ... except that my data.table solution is a tad different to the base solution mentioned above. I was interested myself how the three approaches benchmarked, hence why I provided my answer. Not sure if speed is of concern or a certain method is preferred (e.g. tidyverse over data.table).

                                            – Khaynes
                                            Jan 2 at 5:11

















                                          All the 3 solutions are already included. Am I missing something ?

                                          – Ronak Shah
                                          Jan 2 at 4:36





                                          All the 3 solutions are already included. Am I missing something ?

                                          – Ronak Shah
                                          Jan 2 at 4:36




                                          1




                                          1





                                          @RonakShah You are not ... except that my data.table solution is a tad different to the base solution mentioned above. I was interested myself how the three approaches benchmarked, hence why I provided my answer. Not sure if speed is of concern or a certain method is preferred (e.g. tidyverse over data.table).

                                          – Khaynes
                                          Jan 2 at 5:11





                                          @RonakShah You are not ... except that my data.table solution is a tad different to the base solution mentioned above. I was interested myself how the three approaches benchmarked, hence why I provided my answer. Not sure if speed is of concern or a certain method is preferred (e.g. tidyverse over data.table).

                                          – Khaynes
                                          Jan 2 at 5:11


















                                          draft saved

                                          draft discarded




















































                                          Thanks for contributing an answer to Stack Overflow!


                                          • Please be sure to answer the question. Provide details and share your research!

                                          But avoid



                                          • Asking for help, clarification, or responding to other answers.

                                          • Making statements based on opinion; back them up with references or personal experience.


                                          To learn more, see our tips on writing great answers.




                                          draft saved


                                          draft discarded














                                          StackExchange.ready(
                                          function () {
                                          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54000128%2fdelete-rows-from-a-dataframe-from-multi-variables-in-a-database%23new-answer', 'question_page');
                                          }
                                          );

                                          Post as a guest















                                          Required, but never shown





















































                                          Required, but never shown














                                          Required, but never shown












                                          Required, but never shown







                                          Required, but never shown

































                                          Required, but never shown














                                          Required, but never shown












                                          Required, but never shown







                                          Required, but never shown







                                          Popular posts from this blog

                                          Can a sorcerer learn a 5th-level spell early by creating spell slots using the Font of Magic feature?

                                          Does disintegrating a polymorphed enemy still kill it after the 2018 errata?

                                          A Topological Invariant for $pi_3(U(n))$