Generate a dummy-variable











up vote
57
down vote

favorite
36












I have trouble generating the following dummy-variables in R:



I'm analyzing yearly time series data (time period 1948-2009). I have two questions:




  1. How do I generate a dummy variable for observation #10, i.e. for year 1957 (value = 1 at 1957 and zero otherwise)?


  2. How do I generate a dummy variable which is zero before 1957 and takes the value 1 from 1957 and onwards to 2009?











share|improve this question















migrated from stats.stackexchange.com Aug 14 '12 at 12:55


This question came from our site for people interested in statistics, machine learning, data analysis, data mining, and data visualization.



















    up vote
    57
    down vote

    favorite
    36












    I have trouble generating the following dummy-variables in R:



    I'm analyzing yearly time series data (time period 1948-2009). I have two questions:




    1. How do I generate a dummy variable for observation #10, i.e. for year 1957 (value = 1 at 1957 and zero otherwise)?


    2. How do I generate a dummy variable which is zero before 1957 and takes the value 1 from 1957 and onwards to 2009?











    share|improve this question















    migrated from stats.stackexchange.com Aug 14 '12 at 12:55


    This question came from our site for people interested in statistics, machine learning, data analysis, data mining, and data visualization.

















      up vote
      57
      down vote

      favorite
      36









      up vote
      57
      down vote

      favorite
      36






      36





      I have trouble generating the following dummy-variables in R:



      I'm analyzing yearly time series data (time period 1948-2009). I have two questions:




      1. How do I generate a dummy variable for observation #10, i.e. for year 1957 (value = 1 at 1957 and zero otherwise)?


      2. How do I generate a dummy variable which is zero before 1957 and takes the value 1 from 1957 and onwards to 2009?











      share|improve this question















      I have trouble generating the following dummy-variables in R:



      I'm analyzing yearly time series data (time period 1948-2009). I have two questions:




      1. How do I generate a dummy variable for observation #10, i.e. for year 1957 (value = 1 at 1957 and zero otherwise)?


      2. How do I generate a dummy variable which is zero before 1957 and takes the value 1 from 1957 and onwards to 2009?








      r r-faq






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Oct 16 '17 at 9:47









      Jaap

      53.9k20116127




      53.9k20116127










      asked Aug 2 '12 at 23:07









      Pantera

      391145




      391145




      migrated from stats.stackexchange.com Aug 14 '12 at 12:55


      This question came from our site for people interested in statistics, machine learning, data analysis, data mining, and data visualization.






      migrated from stats.stackexchange.com Aug 14 '12 at 12:55


      This question came from our site for people interested in statistics, machine learning, data analysis, data mining, and data visualization.


























          15 Answers
          15






          active

          oldest

          votes

















          up vote
          84
          down vote













          Another option that can work better if you have many variables is factor and model.matrix.



          > year.f = factor(year)
          > dummies = model.matrix(~year.f)


          This will include an intercept column (all ones) and one column for each of the years in your data set except one, which will be the "default" or intercept value.



          You can change how the "default" is chosen by messing with contrasts.arg in model.matrix.



          Also, if you want to omit the intercept, you can just drop the first column or add +0 to the end of the formula.



          Hope this is useful.






          share|improve this answer



















          • 4




            what if you want to generate dummy variables for all (instead of k-1) with no intercept?
            – Fernando Hoces De La Guardia
            Mar 27 '15 at 16:52






          • 1




            note that model.matrix( ) accepts multiple variables to transform into dummies: model.matrix( ~ var1 + var2, data = df) Again, just be sure that they are factors.
            – slizb
            May 1 '15 at 19:32






          • 3




            @Synergist table(1:n, factor). Where factor is the original variable and n is its length
            – Fernando Hoces De La Guardia
            Jun 3 '15 at 15:43






          • 1




            @Synergist that table is a n x k matrix with all k indicator variables (instead of k-1)
            – Fernando Hoces De La Guardia
            Jun 3 '15 at 15:49






          • 4




            @FernandoHocesDeLaGuardia You can remove the intercept from a formula either with + 0 or - 1. So model.matrix(~ year.f + 0) will give a give dummy variables without a reference level.
            – Gregor
            Jan 6 '16 at 20:16


















          up vote
          45
          down vote













          The simplest way to produce these dummy variables is something like the following:



          > print(year)
          [1] 1956 1957 1957 1958 1958 1959
          > dummy <- as.numeric(year == 1957)
          > print(dummy)
          [1] 0 1 1 0 0 0
          > dummy2 <- as.numeric(year >= 1957)
          > print(dummy2)
          [1] 0 1 1 1 1 1


          More generally, you can use ifelse to choose between two values depending on a condition. So if instead of a 0-1 dummy variable, for some reason you wanted to use, say, 4 and 7, you could use ifelse(year == 1957, 4, 7).






          share|improve this answer




























            up vote
            29
            down vote













            Using dummies::dummy():



            library(dummies)

            # example data
            df1 <- data.frame(id = 1:4, year = 1991:1994)

            df1 <- cbind(df1, dummy(df1$year, sep = "_"))

            df1
            # id year df1_1991 df1_1992 df1_1993 df1_1994
            # 1 1 1991 1 0 0 0
            # 2 2 1992 0 1 0 0
            # 3 3 1993 0 0 1 0
            # 4 4 1994 0 0 0 1





            share|improve this answer























            • Maybe adding "fun= factor" in function dummy can help if that is the meaning of the variable.
              – Filippo Mazza
              Mar 8 '17 at 10:35












            • @FilippoMazza I prefer to keep them as integer, yes, we could set factor if needed.
              – zx8754
              Mar 8 '17 at 10:51










            • how do you remove df1 before each dummy column header names?
              – mike
              Jun 10 '17 at 22:47










            • @mike colnames(df1) <- gsub("df1_", "", fixed = TRUE, colnames(df1))
              – zx8754
              Jun 11 '17 at 5:01


















            up vote
            15
            down vote













            Package mlr includes createDummyFeatures for this purpose:



            library(mlr)
            df <- data.frame(var = sample(c("A", "B", "C"), 10, replace = TRUE))
            df

            # var
            # 1 B
            # 2 A
            # 3 C
            # 4 B
            # 5 C
            # 6 A
            # 7 C
            # 8 A
            # 9 B
            # 10 C

            createDummyFeatures(df, cols = "var")

            # var.A var.B var.C
            # 1 0 1 0
            # 2 1 0 0
            # 3 0 0 1
            # 4 0 1 0
            # 5 0 0 1
            # 6 1 0 0
            # 7 0 0 1
            # 8 1 0 0
            # 9 0 1 0
            # 10 0 0 1


            createDummyFeatures drops original variable.
            https://www.rdocumentation.org/packages/mlr/versions/2.9/topics/createDummyFeatures






            share|improve this answer

















            • 1




              Enrique, I've tried installing the package, but it doesn't seem to be working after doing library(mlr). I get the following error:«Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) : there is no package called ‘ggvis’ In addition: Warning message: package ‘mlr’ was built under R version 3.2.5 Error: package or namespace load failed for ‘mlr’»
              – An old man in the sea.
              Apr 13 '17 at 11:17






            • 1




              you need to install 'ggvis' first
              – Ted Mosby
              Jul 26 at 20:01


















            up vote
            9
            down vote













            What I normally do to work with this kind of dummy variables is:



            (1) how do I generate a dummy variable for observation #10, i.e. for year 1957 (value = 1 at 1957 and zero otherwise)



            data$factor_year_1 <- factor ( with ( data, ifelse ( ( year == 1957 ), 1 , 0 ) ) )


            (2) how do I generate a dummy-variable which is zero before 1957 and takes the value 1 from 1957 and onwards to 2009?



            data$factor_year_2 <- factor ( with ( data, ifelse ( ( year < 1957 ), 0 , 1 ) ) )


            Then, I can introduce this factor as a dummy variable in my models. For example, to see whether there is a long-term trend in a varible y :



            summary ( lm ( y ~ t,  data = data ) )


            Hope this helps!






            share|improve this answer






























              up vote
              9
              down vote













              The other answers here offer direct routes to accomplish this task—one that many models (e.g. lm) will do for you internally anyway. Nonetheless, here are ways to make dummy variables with Max Kuhn's popular caret and recipes packages. While somewhat more verbose, they both scale easily to more complicated situations, and fit neatly into their respective frameworks.





              caret::dummyVars



              With caret, the relevant function is dummyVars, which has a predict method to apply it on a data frame:





              df <- data.frame(letter = rep(c('a', 'b', 'c'), each = 2),
              y = 1:6)

              library(caret)

              dummy <- dummyVars(~ ., data = df, fullRank = TRUE)

              dummy
              #> Dummy Variable Object
              #>
              #> Formula: ~.
              #> 2 variables, 1 factors
              #> Variables and levels will be separated by '.'
              #> A full rank encoding is used

              predict(dummy, df)
              #> letter.b letter.c y
              #> 1 0 0 1
              #> 2 0 0 2
              #> 3 1 0 3
              #> 4 1 0 4
              #> 5 0 1 5
              #> 6 0 1 6




              recipes::step_dummy



              With recipes, the relevant function is step_dummy:



              library(recipes)

              dummy_recipe <- recipe(y ~ letter, df) %>%
              step_dummy(letter)

              dummy_recipe
              #> Data Recipe
              #>
              #> Inputs:
              #>
              #> role #variables
              #> outcome 1
              #> predictor 1
              #>
              #> Steps:
              #>
              #> Dummy variables from letter


              Depending on context, extract the data with prep and either bake or juice:



              # Prep and bake on new data...
              dummy_recipe %>%
              prep() %>%
              bake(df)
              #> # A tibble: 6 x 3
              #> y letter_b letter_c
              #> <int> <dbl> <dbl>
              #> 1 1 0 0
              #> 2 2 0 0
              #> 3 3 1 0
              #> 4 4 1 0
              #> 5 5 0 1
              #> 6 6 0 1

              # ...or use `retain = TRUE` and `juice` to extract training data
              dummy_recipe %>%
              prep(retain = TRUE) %>%
              juice()
              #> # A tibble: 6 x 3
              #> y letter_b letter_c
              #> <int> <dbl> <dbl>
              #> 1 1 0 0
              #> 2 2 0 0
              #> 3 3 1 0
              #> 4 4 1 0
              #> 5 5 0 1
              #> 6 6 0 1





              share|improve this answer






























                up vote
                7
                down vote













                I read this on the kaggle forum:



                #Generate example dataframe with character column
                example <- as.data.frame(c("A", "A", "B", "F", "C", "G", "C", "D", "E", "F"))
                names(example) <- "strcol"

                #For every unique value in the string column, create a new 1/0 column
                #This is what Factors do "under-the-hood" automatically when passed to function requiring numeric data
                for(level in unique(example$strcol)){
                example[paste("dummy", level, sep = "_")] <- ifelse(example$strcol == level, 1, 0)
                }





                share|improve this answer




























                  up vote
                  5
                  down vote













                  If you want to get K dummy variables, instead of K-1, try:



                  dummies = table(1:length(year),as.factor(year))  


                  Best,






                  share|improve this answer





















                  • the resulting table cannot be used as a data.frame. If that's a problem, use as.data.frame.matrix(dummies) to translate it into one
                    – sheß
                    Mar 27 at 19:21




















                  up vote
                  5
                  down vote













                  For the usecase as presented in the question, you can also just multiply the logical condition with 1 (or maybe even better, with 1L):



                  # example data
                  df1 <- data.frame(yr = 1951:1960)

                  # create the dummies
                  df1$is.1957 <- 1L * (df1$yr == 1957)
                  df1$after.1957 <- 1L * (df1$yr >= 1957)


                  which gives:




                  > df1
                  yr is.1957 after.1957
                  1 1951 0 0
                  2 1952 0 0
                  3 1953 0 0
                  4 1954 0 0
                  5 1955 0 0
                  6 1956 0 0
                  7 1957 1 1
                  8 1958 0 1
                  9 1959 0 1
                  10 1960 0 1





                  For the usecases as presented in for example the answers of @zx8754 and @Sotos, there are still some other options which haven't been covered yet imo.



                  1) Make your own make_dummies-function



                  # example data
                  df2 <- data.frame(id = 1:5, year = c(1991:1994,1992))

                  # create a function
                  make_dummies <- function(v, prefix = '') {
                  s <- sort(unique(v))
                  d <- outer(v, s, function(v, s) 1L * (v == s))
                  colnames(d) <- paste0(prefix, s)
                  d
                  }

                  # bind the dummies to the original dataframe
                  cbind(df2, make_dummies(df2$year, prefix = 'y'))


                  which gives:




                    id year y1991 y1992 y1993 y1994
                  1 1 1991 1 0 0 0
                  2 2 1992 0 1 0 0
                  3 3 1993 0 0 1 0
                  4 4 1994 0 0 0 1
                  5 5 1992 0 1 0 0



                  2) use the dcast-function from either data.table or reshape2



                   dcast(df2, id + year ~ year, fun.aggregate = length)


                  which gives:




                    id year 1991 1992 1993 1994
                  1 1 1991 1 0 0 0
                  2 2 1992 0 1 0 0
                  3 3 1993 0 0 1 0
                  4 4 1994 0 0 0 1
                  5 5 1992 0 1 0 0



                  However, this will not work when there are duplicate values in the column for which the dummies have to be created. In the case a specific aggregation function is needed for dcast and the result of of dcast need to be merged back to the original:



                  # example data
                  df3 <- data.frame(var = c("B", "C", "A", "B", "C"))

                  # aggregation function to get dummy values
                  f <- function(x) as.integer(length(x) > 0)

                  # reshape to wide with the cumstom aggregation function and merge back to the original
                  merge(df3, dcast(df3, var ~ var, fun.aggregate = f), by = 'var', all.x = TRUE)


                  which gives (note that the result is order according to the by column):




                    var A B C
                  1 A 1 0 0
                  2 B 0 1 0
                  3 B 0 1 0
                  4 C 0 0 1
                  5 C 0 0 1



                  3) use the spread-function from tidyr (with mutate from dplyr)



                  library(dplyr)
                  library(tidyr)

                  df2 %>%
                  mutate(v = 1, yr = year) %>%
                  spread(yr, v, fill = 0)


                  which gives:




                    id year 1991 1992 1993 1994
                  1 1 1991 1 0 0 0
                  2 2 1992 0 1 0 0
                  3 3 1993 0 0 1 0
                  4 4 1994 0 0 0 1
                  5 5 1992 0 1 0 0






                  share|improve this answer






























                    up vote
                    4
                    down vote













                    The ifelse function is best for simple logic like this.



                    > x <- seq(1950, 1960, 1)

                    ifelse(x == 1957, 1, 0)
                    ifelse(x <= 1957, 1, 0)

                    > [1] 0 0 0 0 0 0 0 1 0 0 0
                    > [1] 1 1 1 1 1 1 1 1 0 0 0


                    Also, if you want it to return character data then you can do so.



                    > x <- seq(1950, 1960, 1)

                    ifelse(x == 1957, "foo", "bar")
                    ifelse(x <= 1957, "foo", "bar")

                    > [1] "bar" "bar" "bar" "bar" "bar" "bar" "bar" "foo" "bar" "bar" "bar"
                    > [1] "foo" "foo" "foo" "foo" "foo" "foo" "foo" "foo" "bar" "bar" "bar"


                    Categorical variables with nesting...



                    > x <- seq(1950, 1960, 1)

                    ifelse(x == 1957, "foo", ifelse(x == 1958, "bar","baz"))

                    > [1] "baz" "baz" "baz" "baz" "baz" "baz" "baz" "foo" "bar" "baz" "baz"


                    This is the most straightforward option.






                    share|improve this answer






























                      up vote
                      2
                      down vote













                      Another way is to use mtabulate from qdapTools package, i.e.



                      df <- data.frame(var = sample(c("A", "B", "C"), 5, replace = TRUE))
                      var
                      #1 C
                      #2 A
                      #3 C
                      #4 B
                      #5 B

                      library(qdapTools)
                      mtabulate(df$var)


                      which gives,




                        A B C
                      1 0 0 1
                      2 1 0 0
                      3 0 0 1
                      4 0 1 0
                      5 0 1 0






                      share|improve this answer




























                        up vote
                        1
                        down vote













                        I use such a function (for data.table):



                        # Ta funkcja dla obiektu data.table i zmiennej var.name typu factor tworzy dummy variables o nazwach "var.name: (level1)"
                        factorToDummy <- function(dtable, var.name){
                        stopifnot(is.data.table(dtable))
                        stopifnot(var.name %in% names(dtable))
                        stopifnot(is.factor(dtable[, get(var.name)]))

                        dtable[, paste0(var.name,": ",levels(get(var.name)))] -> new.names
                        dtable[, (new.names) := transpose(lapply(get(var.name), FUN = function(x){x == levels(get(var.name))})) ]

                        cat(paste("nDodano zmienne dummy: ", paste0(new.names, collapse = ", ")))
                        }


                        Usage:



                        data <- data.table(data)
                        data[, x:= droplevels(x)]
                        factorToDummy(data, "x")





                        share|improve this answer






























                          up vote
                          1
                          down vote













                          Convert your data to a data.table and use set by reference and row filtering



                          library(data.table)

                          dt <- as.data.table(your.dataframe.or.whatever)
                          dt[, is.1957 := 0]
                          dt[year == 1957, is.1957 := 1]


                          Proof-of-concept toy example:



                          library(data.table)

                          dt <- as.data.table(cbind(c(1, 1, 1), c(2, 2, 3)))
                          dt[, is.3 := 0]
                          dt[V2 == 3, is.3 := 1]





                          share|improve this answer






























                            up vote
                            0
                            down vote













                            Hi i wrote this general function to generate a dummy variable which essentially replicates the replace function in Stata.



                            If x is the data frame is x and i want a dummy variable called a which will take value 1 when x$b takes value c



                            introducedummy<-function(x,a,b,c){
                            g<-c(a,b,c)
                            n<-nrow(x)
                            newcol<-g[1]
                            p<-colnames(x)
                            p2<-c(p,newcol)
                            new1<-numeric(n)
                            state<-x[,g[2]]
                            interest<-g[3]
                            for(i in 1:n){
                            if(state[i]==interest){
                            new1[i]=1
                            }
                            else{
                            new1[i]=0
                            }
                            }
                            x$added<-new1
                            colnames(x)<-p2
                            x
                            }





                            share|improve this answer






























                              up vote
                              0
                              down vote













                              another way you can do it is use



                              ifelse(year < 1965 , 1, 0)





                              share|improve this answer






















                                protected by Jaap Oct 16 '17 at 9:47



                                Thank you for your interest in this question.
                                Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).



                                Would you like to answer one of these unanswered questions instead?














                                15 Answers
                                15






                                active

                                oldest

                                votes








                                15 Answers
                                15






                                active

                                oldest

                                votes









                                active

                                oldest

                                votes






                                active

                                oldest

                                votes








                                up vote
                                84
                                down vote













                                Another option that can work better if you have many variables is factor and model.matrix.



                                > year.f = factor(year)
                                > dummies = model.matrix(~year.f)


                                This will include an intercept column (all ones) and one column for each of the years in your data set except one, which will be the "default" or intercept value.



                                You can change how the "default" is chosen by messing with contrasts.arg in model.matrix.



                                Also, if you want to omit the intercept, you can just drop the first column or add +0 to the end of the formula.



                                Hope this is useful.






                                share|improve this answer



















                                • 4




                                  what if you want to generate dummy variables for all (instead of k-1) with no intercept?
                                  – Fernando Hoces De La Guardia
                                  Mar 27 '15 at 16:52






                                • 1




                                  note that model.matrix( ) accepts multiple variables to transform into dummies: model.matrix( ~ var1 + var2, data = df) Again, just be sure that they are factors.
                                  – slizb
                                  May 1 '15 at 19:32






                                • 3




                                  @Synergist table(1:n, factor). Where factor is the original variable and n is its length
                                  – Fernando Hoces De La Guardia
                                  Jun 3 '15 at 15:43






                                • 1




                                  @Synergist that table is a n x k matrix with all k indicator variables (instead of k-1)
                                  – Fernando Hoces De La Guardia
                                  Jun 3 '15 at 15:49






                                • 4




                                  @FernandoHocesDeLaGuardia You can remove the intercept from a formula either with + 0 or - 1. So model.matrix(~ year.f + 0) will give a give dummy variables without a reference level.
                                  – Gregor
                                  Jan 6 '16 at 20:16















                                up vote
                                84
                                down vote













                                Another option that can work better if you have many variables is factor and model.matrix.



                                > year.f = factor(year)
                                > dummies = model.matrix(~year.f)


                                This will include an intercept column (all ones) and one column for each of the years in your data set except one, which will be the "default" or intercept value.



                                You can change how the "default" is chosen by messing with contrasts.arg in model.matrix.



                                Also, if you want to omit the intercept, you can just drop the first column or add +0 to the end of the formula.



                                Hope this is useful.






                                share|improve this answer



















                                • 4




                                  what if you want to generate dummy variables for all (instead of k-1) with no intercept?
                                  – Fernando Hoces De La Guardia
                                  Mar 27 '15 at 16:52






                                • 1




                                  note that model.matrix( ) accepts multiple variables to transform into dummies: model.matrix( ~ var1 + var2, data = df) Again, just be sure that they are factors.
                                  – slizb
                                  May 1 '15 at 19:32






                                • 3




                                  @Synergist table(1:n, factor). Where factor is the original variable and n is its length
                                  – Fernando Hoces De La Guardia
                                  Jun 3 '15 at 15:43






                                • 1




                                  @Synergist that table is a n x k matrix with all k indicator variables (instead of k-1)
                                  – Fernando Hoces De La Guardia
                                  Jun 3 '15 at 15:49






                                • 4




                                  @FernandoHocesDeLaGuardia You can remove the intercept from a formula either with + 0 or - 1. So model.matrix(~ year.f + 0) will give a give dummy variables without a reference level.
                                  – Gregor
                                  Jan 6 '16 at 20:16













                                up vote
                                84
                                down vote










                                up vote
                                84
                                down vote









                                Another option that can work better if you have many variables is factor and model.matrix.



                                > year.f = factor(year)
                                > dummies = model.matrix(~year.f)


                                This will include an intercept column (all ones) and one column for each of the years in your data set except one, which will be the "default" or intercept value.



                                You can change how the "default" is chosen by messing with contrasts.arg in model.matrix.



                                Also, if you want to omit the intercept, you can just drop the first column or add +0 to the end of the formula.



                                Hope this is useful.






                                share|improve this answer














                                Another option that can work better if you have many variables is factor and model.matrix.



                                > year.f = factor(year)
                                > dummies = model.matrix(~year.f)


                                This will include an intercept column (all ones) and one column for each of the years in your data set except one, which will be the "default" or intercept value.



                                You can change how the "default" is chosen by messing with contrasts.arg in model.matrix.



                                Also, if you want to omit the intercept, you can just drop the first column or add +0 to the end of the formula.



                                Hope this is useful.







                                share|improve this answer














                                share|improve this answer



                                share|improve this answer








                                edited Jun 24 at 13:37

























                                answered Aug 3 '12 at 1:24









                                David J. Harris

                                985610




                                985610








                                • 4




                                  what if you want to generate dummy variables for all (instead of k-1) with no intercept?
                                  – Fernando Hoces De La Guardia
                                  Mar 27 '15 at 16:52






                                • 1




                                  note that model.matrix( ) accepts multiple variables to transform into dummies: model.matrix( ~ var1 + var2, data = df) Again, just be sure that they are factors.
                                  – slizb
                                  May 1 '15 at 19:32






                                • 3




                                  @Synergist table(1:n, factor). Where factor is the original variable and n is its length
                                  – Fernando Hoces De La Guardia
                                  Jun 3 '15 at 15:43






                                • 1




                                  @Synergist that table is a n x k matrix with all k indicator variables (instead of k-1)
                                  – Fernando Hoces De La Guardia
                                  Jun 3 '15 at 15:49






                                • 4




                                  @FernandoHocesDeLaGuardia You can remove the intercept from a formula either with + 0 or - 1. So model.matrix(~ year.f + 0) will give a give dummy variables without a reference level.
                                  – Gregor
                                  Jan 6 '16 at 20:16














                                • 4




                                  what if you want to generate dummy variables for all (instead of k-1) with no intercept?
                                  – Fernando Hoces De La Guardia
                                  Mar 27 '15 at 16:52






                                • 1




                                  note that model.matrix( ) accepts multiple variables to transform into dummies: model.matrix( ~ var1 + var2, data = df) Again, just be sure that they are factors.
                                  – slizb
                                  May 1 '15 at 19:32






                                • 3




                                  @Synergist table(1:n, factor). Where factor is the original variable and n is its length
                                  – Fernando Hoces De La Guardia
                                  Jun 3 '15 at 15:43






                                • 1




                                  @Synergist that table is a n x k matrix with all k indicator variables (instead of k-1)
                                  – Fernando Hoces De La Guardia
                                  Jun 3 '15 at 15:49






                                • 4




                                  @FernandoHocesDeLaGuardia You can remove the intercept from a formula either with + 0 or - 1. So model.matrix(~ year.f + 0) will give a give dummy variables without a reference level.
                                  – Gregor
                                  Jan 6 '16 at 20:16








                                4




                                4




                                what if you want to generate dummy variables for all (instead of k-1) with no intercept?
                                – Fernando Hoces De La Guardia
                                Mar 27 '15 at 16:52




                                what if you want to generate dummy variables for all (instead of k-1) with no intercept?
                                – Fernando Hoces De La Guardia
                                Mar 27 '15 at 16:52




                                1




                                1




                                note that model.matrix( ) accepts multiple variables to transform into dummies: model.matrix( ~ var1 + var2, data = df) Again, just be sure that they are factors.
                                – slizb
                                May 1 '15 at 19:32




                                note that model.matrix( ) accepts multiple variables to transform into dummies: model.matrix( ~ var1 + var2, data = df) Again, just be sure that they are factors.
                                – slizb
                                May 1 '15 at 19:32




                                3




                                3




                                @Synergist table(1:n, factor). Where factor is the original variable and n is its length
                                – Fernando Hoces De La Guardia
                                Jun 3 '15 at 15:43




                                @Synergist table(1:n, factor). Where factor is the original variable and n is its length
                                – Fernando Hoces De La Guardia
                                Jun 3 '15 at 15:43




                                1




                                1




                                @Synergist that table is a n x k matrix with all k indicator variables (instead of k-1)
                                – Fernando Hoces De La Guardia
                                Jun 3 '15 at 15:49




                                @Synergist that table is a n x k matrix with all k indicator variables (instead of k-1)
                                – Fernando Hoces De La Guardia
                                Jun 3 '15 at 15:49




                                4




                                4




                                @FernandoHocesDeLaGuardia You can remove the intercept from a formula either with + 0 or - 1. So model.matrix(~ year.f + 0) will give a give dummy variables without a reference level.
                                – Gregor
                                Jan 6 '16 at 20:16




                                @FernandoHocesDeLaGuardia You can remove the intercept from a formula either with + 0 or - 1. So model.matrix(~ year.f + 0) will give a give dummy variables without a reference level.
                                – Gregor
                                Jan 6 '16 at 20:16












                                up vote
                                45
                                down vote













                                The simplest way to produce these dummy variables is something like the following:



                                > print(year)
                                [1] 1956 1957 1957 1958 1958 1959
                                > dummy <- as.numeric(year == 1957)
                                > print(dummy)
                                [1] 0 1 1 0 0 0
                                > dummy2 <- as.numeric(year >= 1957)
                                > print(dummy2)
                                [1] 0 1 1 1 1 1


                                More generally, you can use ifelse to choose between two values depending on a condition. So if instead of a 0-1 dummy variable, for some reason you wanted to use, say, 4 and 7, you could use ifelse(year == 1957, 4, 7).






                                share|improve this answer

























                                  up vote
                                  45
                                  down vote













                                  The simplest way to produce these dummy variables is something like the following:



                                  > print(year)
                                  [1] 1956 1957 1957 1958 1958 1959
                                  > dummy <- as.numeric(year == 1957)
                                  > print(dummy)
                                  [1] 0 1 1 0 0 0
                                  > dummy2 <- as.numeric(year >= 1957)
                                  > print(dummy2)
                                  [1] 0 1 1 1 1 1


                                  More generally, you can use ifelse to choose between two values depending on a condition. So if instead of a 0-1 dummy variable, for some reason you wanted to use, say, 4 and 7, you could use ifelse(year == 1957, 4, 7).






                                  share|improve this answer























                                    up vote
                                    45
                                    down vote










                                    up vote
                                    45
                                    down vote









                                    The simplest way to produce these dummy variables is something like the following:



                                    > print(year)
                                    [1] 1956 1957 1957 1958 1958 1959
                                    > dummy <- as.numeric(year == 1957)
                                    > print(dummy)
                                    [1] 0 1 1 0 0 0
                                    > dummy2 <- as.numeric(year >= 1957)
                                    > print(dummy2)
                                    [1] 0 1 1 1 1 1


                                    More generally, you can use ifelse to choose between two values depending on a condition. So if instead of a 0-1 dummy variable, for some reason you wanted to use, say, 4 and 7, you could use ifelse(year == 1957, 4, 7).






                                    share|improve this answer












                                    The simplest way to produce these dummy variables is something like the following:



                                    > print(year)
                                    [1] 1956 1957 1957 1958 1958 1959
                                    > dummy <- as.numeric(year == 1957)
                                    > print(dummy)
                                    [1] 0 1 1 0 0 0
                                    > dummy2 <- as.numeric(year >= 1957)
                                    > print(dummy2)
                                    [1] 0 1 1 1 1 1


                                    More generally, you can use ifelse to choose between two values depending on a condition. So if instead of a 0-1 dummy variable, for some reason you wanted to use, say, 4 and 7, you could use ifelse(year == 1957, 4, 7).







                                    share|improve this answer












                                    share|improve this answer



                                    share|improve this answer










                                    answered Aug 2 '12 at 23:38









                                    Martin O'Leary

                                    91169




                                    91169






















                                        up vote
                                        29
                                        down vote













                                        Using dummies::dummy():



                                        library(dummies)

                                        # example data
                                        df1 <- data.frame(id = 1:4, year = 1991:1994)

                                        df1 <- cbind(df1, dummy(df1$year, sep = "_"))

                                        df1
                                        # id year df1_1991 df1_1992 df1_1993 df1_1994
                                        # 1 1 1991 1 0 0 0
                                        # 2 2 1992 0 1 0 0
                                        # 3 3 1993 0 0 1 0
                                        # 4 4 1994 0 0 0 1





                                        share|improve this answer























                                        • Maybe adding "fun= factor" in function dummy can help if that is the meaning of the variable.
                                          – Filippo Mazza
                                          Mar 8 '17 at 10:35












                                        • @FilippoMazza I prefer to keep them as integer, yes, we could set factor if needed.
                                          – zx8754
                                          Mar 8 '17 at 10:51










                                        • how do you remove df1 before each dummy column header names?
                                          – mike
                                          Jun 10 '17 at 22:47










                                        • @mike colnames(df1) <- gsub("df1_", "", fixed = TRUE, colnames(df1))
                                          – zx8754
                                          Jun 11 '17 at 5:01















                                        up vote
                                        29
                                        down vote













                                        Using dummies::dummy():



                                        library(dummies)

                                        # example data
                                        df1 <- data.frame(id = 1:4, year = 1991:1994)

                                        df1 <- cbind(df1, dummy(df1$year, sep = "_"))

                                        df1
                                        # id year df1_1991 df1_1992 df1_1993 df1_1994
                                        # 1 1 1991 1 0 0 0
                                        # 2 2 1992 0 1 0 0
                                        # 3 3 1993 0 0 1 0
                                        # 4 4 1994 0 0 0 1





                                        share|improve this answer























                                        • Maybe adding "fun= factor" in function dummy can help if that is the meaning of the variable.
                                          – Filippo Mazza
                                          Mar 8 '17 at 10:35












                                        • @FilippoMazza I prefer to keep them as integer, yes, we could set factor if needed.
                                          – zx8754
                                          Mar 8 '17 at 10:51










                                        • how do you remove df1 before each dummy column header names?
                                          – mike
                                          Jun 10 '17 at 22:47










                                        • @mike colnames(df1) <- gsub("df1_", "", fixed = TRUE, colnames(df1))
                                          – zx8754
                                          Jun 11 '17 at 5:01













                                        up vote
                                        29
                                        down vote










                                        up vote
                                        29
                                        down vote









                                        Using dummies::dummy():



                                        library(dummies)

                                        # example data
                                        df1 <- data.frame(id = 1:4, year = 1991:1994)

                                        df1 <- cbind(df1, dummy(df1$year, sep = "_"))

                                        df1
                                        # id year df1_1991 df1_1992 df1_1993 df1_1994
                                        # 1 1 1991 1 0 0 0
                                        # 2 2 1992 0 1 0 0
                                        # 3 3 1993 0 0 1 0
                                        # 4 4 1994 0 0 0 1





                                        share|improve this answer














                                        Using dummies::dummy():



                                        library(dummies)

                                        # example data
                                        df1 <- data.frame(id = 1:4, year = 1991:1994)

                                        df1 <- cbind(df1, dummy(df1$year, sep = "_"))

                                        df1
                                        # id year df1_1991 df1_1992 df1_1993 df1_1994
                                        # 1 1 1991 1 0 0 0
                                        # 2 2 1992 0 1 0 0
                                        # 3 3 1993 0 0 1 0
                                        # 4 4 1994 0 0 0 1






                                        share|improve this answer














                                        share|improve this answer



                                        share|improve this answer








                                        edited Jul 23 at 10:26

























                                        answered Oct 31 '16 at 13:34









                                        zx8754

                                        28.5k76394




                                        28.5k76394












                                        • Maybe adding "fun= factor" in function dummy can help if that is the meaning of the variable.
                                          – Filippo Mazza
                                          Mar 8 '17 at 10:35












                                        • @FilippoMazza I prefer to keep them as integer, yes, we could set factor if needed.
                                          – zx8754
                                          Mar 8 '17 at 10:51










                                        • how do you remove df1 before each dummy column header names?
                                          – mike
                                          Jun 10 '17 at 22:47










                                        • @mike colnames(df1) <- gsub("df1_", "", fixed = TRUE, colnames(df1))
                                          – zx8754
                                          Jun 11 '17 at 5:01


















                                        • Maybe adding "fun= factor" in function dummy can help if that is the meaning of the variable.
                                          – Filippo Mazza
                                          Mar 8 '17 at 10:35












                                        • @FilippoMazza I prefer to keep them as integer, yes, we could set factor if needed.
                                          – zx8754
                                          Mar 8 '17 at 10:51










                                        • how do you remove df1 before each dummy column header names?
                                          – mike
                                          Jun 10 '17 at 22:47










                                        • @mike colnames(df1) <- gsub("df1_", "", fixed = TRUE, colnames(df1))
                                          – zx8754
                                          Jun 11 '17 at 5:01
















                                        Maybe adding "fun= factor" in function dummy can help if that is the meaning of the variable.
                                        – Filippo Mazza
                                        Mar 8 '17 at 10:35






                                        Maybe adding "fun= factor" in function dummy can help if that is the meaning of the variable.
                                        – Filippo Mazza
                                        Mar 8 '17 at 10:35














                                        @FilippoMazza I prefer to keep them as integer, yes, we could set factor if needed.
                                        – zx8754
                                        Mar 8 '17 at 10:51




                                        @FilippoMazza I prefer to keep them as integer, yes, we could set factor if needed.
                                        – zx8754
                                        Mar 8 '17 at 10:51












                                        how do you remove df1 before each dummy column header names?
                                        – mike
                                        Jun 10 '17 at 22:47




                                        how do you remove df1 before each dummy column header names?
                                        – mike
                                        Jun 10 '17 at 22:47












                                        @mike colnames(df1) <- gsub("df1_", "", fixed = TRUE, colnames(df1))
                                        – zx8754
                                        Jun 11 '17 at 5:01




                                        @mike colnames(df1) <- gsub("df1_", "", fixed = TRUE, colnames(df1))
                                        – zx8754
                                        Jun 11 '17 at 5:01










                                        up vote
                                        15
                                        down vote













                                        Package mlr includes createDummyFeatures for this purpose:



                                        library(mlr)
                                        df <- data.frame(var = sample(c("A", "B", "C"), 10, replace = TRUE))
                                        df

                                        # var
                                        # 1 B
                                        # 2 A
                                        # 3 C
                                        # 4 B
                                        # 5 C
                                        # 6 A
                                        # 7 C
                                        # 8 A
                                        # 9 B
                                        # 10 C

                                        createDummyFeatures(df, cols = "var")

                                        # var.A var.B var.C
                                        # 1 0 1 0
                                        # 2 1 0 0
                                        # 3 0 0 1
                                        # 4 0 1 0
                                        # 5 0 0 1
                                        # 6 1 0 0
                                        # 7 0 0 1
                                        # 8 1 0 0
                                        # 9 0 1 0
                                        # 10 0 0 1


                                        createDummyFeatures drops original variable.
                                        https://www.rdocumentation.org/packages/mlr/versions/2.9/topics/createDummyFeatures






                                        share|improve this answer

















                                        • 1




                                          Enrique, I've tried installing the package, but it doesn't seem to be working after doing library(mlr). I get the following error:«Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) : there is no package called ‘ggvis’ In addition: Warning message: package ‘mlr’ was built under R version 3.2.5 Error: package or namespace load failed for ‘mlr’»
                                          – An old man in the sea.
                                          Apr 13 '17 at 11:17






                                        • 1




                                          you need to install 'ggvis' first
                                          – Ted Mosby
                                          Jul 26 at 20:01















                                        up vote
                                        15
                                        down vote













                                        Package mlr includes createDummyFeatures for this purpose:



                                        library(mlr)
                                        df <- data.frame(var = sample(c("A", "B", "C"), 10, replace = TRUE))
                                        df

                                        # var
                                        # 1 B
                                        # 2 A
                                        # 3 C
                                        # 4 B
                                        # 5 C
                                        # 6 A
                                        # 7 C
                                        # 8 A
                                        # 9 B
                                        # 10 C

                                        createDummyFeatures(df, cols = "var")

                                        # var.A var.B var.C
                                        # 1 0 1 0
                                        # 2 1 0 0
                                        # 3 0 0 1
                                        # 4 0 1 0
                                        # 5 0 0 1
                                        # 6 1 0 0
                                        # 7 0 0 1
                                        # 8 1 0 0
                                        # 9 0 1 0
                                        # 10 0 0 1


                                        createDummyFeatures drops original variable.
                                        https://www.rdocumentation.org/packages/mlr/versions/2.9/topics/createDummyFeatures






                                        share|improve this answer

















                                        • 1




                                          Enrique, I've tried installing the package, but it doesn't seem to be working after doing library(mlr). I get the following error:«Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) : there is no package called ‘ggvis’ In addition: Warning message: package ‘mlr’ was built under R version 3.2.5 Error: package or namespace load failed for ‘mlr’»
                                          – An old man in the sea.
                                          Apr 13 '17 at 11:17






                                        • 1




                                          you need to install 'ggvis' first
                                          – Ted Mosby
                                          Jul 26 at 20:01













                                        up vote
                                        15
                                        down vote










                                        up vote
                                        15
                                        down vote









                                        Package mlr includes createDummyFeatures for this purpose:



                                        library(mlr)
                                        df <- data.frame(var = sample(c("A", "B", "C"), 10, replace = TRUE))
                                        df

                                        # var
                                        # 1 B
                                        # 2 A
                                        # 3 C
                                        # 4 B
                                        # 5 C
                                        # 6 A
                                        # 7 C
                                        # 8 A
                                        # 9 B
                                        # 10 C

                                        createDummyFeatures(df, cols = "var")

                                        # var.A var.B var.C
                                        # 1 0 1 0
                                        # 2 1 0 0
                                        # 3 0 0 1
                                        # 4 0 1 0
                                        # 5 0 0 1
                                        # 6 1 0 0
                                        # 7 0 0 1
                                        # 8 1 0 0
                                        # 9 0 1 0
                                        # 10 0 0 1


                                        createDummyFeatures drops original variable.
                                        https://www.rdocumentation.org/packages/mlr/versions/2.9/topics/createDummyFeatures






                                        share|improve this answer












                                        Package mlr includes createDummyFeatures for this purpose:



                                        library(mlr)
                                        df <- data.frame(var = sample(c("A", "B", "C"), 10, replace = TRUE))
                                        df

                                        # var
                                        # 1 B
                                        # 2 A
                                        # 3 C
                                        # 4 B
                                        # 5 C
                                        # 6 A
                                        # 7 C
                                        # 8 A
                                        # 9 B
                                        # 10 C

                                        createDummyFeatures(df, cols = "var")

                                        # var.A var.B var.C
                                        # 1 0 1 0
                                        # 2 1 0 0
                                        # 3 0 0 1
                                        # 4 0 1 0
                                        # 5 0 0 1
                                        # 6 1 0 0
                                        # 7 0 0 1
                                        # 8 1 0 0
                                        # 9 0 1 0
                                        # 10 0 0 1


                                        createDummyFeatures drops original variable.
                                        https://www.rdocumentation.org/packages/mlr/versions/2.9/topics/createDummyFeatures







                                        share|improve this answer












                                        share|improve this answer



                                        share|improve this answer










                                        answered Nov 10 '16 at 16:54









                                        Enrique Pérez Herrero

                                        1,69321520




                                        1,69321520








                                        • 1




                                          Enrique, I've tried installing the package, but it doesn't seem to be working after doing library(mlr). I get the following error:«Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) : there is no package called ‘ggvis’ In addition: Warning message: package ‘mlr’ was built under R version 3.2.5 Error: package or namespace load failed for ‘mlr’»
                                          – An old man in the sea.
                                          Apr 13 '17 at 11:17






                                        • 1




                                          you need to install 'ggvis' first
                                          – Ted Mosby
                                          Jul 26 at 20:01














                                        • 1




                                          Enrique, I've tried installing the package, but it doesn't seem to be working after doing library(mlr). I get the following error:«Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) : there is no package called ‘ggvis’ In addition: Warning message: package ‘mlr’ was built under R version 3.2.5 Error: package or namespace load failed for ‘mlr’»
                                          – An old man in the sea.
                                          Apr 13 '17 at 11:17






                                        • 1




                                          you need to install 'ggvis' first
                                          – Ted Mosby
                                          Jul 26 at 20:01








                                        1




                                        1




                                        Enrique, I've tried installing the package, but it doesn't seem to be working after doing library(mlr). I get the following error:«Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) : there is no package called ‘ggvis’ In addition: Warning message: package ‘mlr’ was built under R version 3.2.5 Error: package or namespace load failed for ‘mlr’»
                                        – An old man in the sea.
                                        Apr 13 '17 at 11:17




                                        Enrique, I've tried installing the package, but it doesn't seem to be working after doing library(mlr). I get the following error:«Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) : there is no package called ‘ggvis’ In addition: Warning message: package ‘mlr’ was built under R version 3.2.5 Error: package or namespace load failed for ‘mlr’»
                                        – An old man in the sea.
                                        Apr 13 '17 at 11:17




                                        1




                                        1




                                        you need to install 'ggvis' first
                                        – Ted Mosby
                                        Jul 26 at 20:01




                                        you need to install 'ggvis' first
                                        – Ted Mosby
                                        Jul 26 at 20:01










                                        up vote
                                        9
                                        down vote













                                        What I normally do to work with this kind of dummy variables is:



                                        (1) how do I generate a dummy variable for observation #10, i.e. for year 1957 (value = 1 at 1957 and zero otherwise)



                                        data$factor_year_1 <- factor ( with ( data, ifelse ( ( year == 1957 ), 1 , 0 ) ) )


                                        (2) how do I generate a dummy-variable which is zero before 1957 and takes the value 1 from 1957 and onwards to 2009?



                                        data$factor_year_2 <- factor ( with ( data, ifelse ( ( year < 1957 ), 0 , 1 ) ) )


                                        Then, I can introduce this factor as a dummy variable in my models. For example, to see whether there is a long-term trend in a varible y :



                                        summary ( lm ( y ~ t,  data = data ) )


                                        Hope this helps!






                                        share|improve this answer



























                                          up vote
                                          9
                                          down vote













                                          What I normally do to work with this kind of dummy variables is:



                                          (1) how do I generate a dummy variable for observation #10, i.e. for year 1957 (value = 1 at 1957 and zero otherwise)



                                          data$factor_year_1 <- factor ( with ( data, ifelse ( ( year == 1957 ), 1 , 0 ) ) )


                                          (2) how do I generate a dummy-variable which is zero before 1957 and takes the value 1 from 1957 and onwards to 2009?



                                          data$factor_year_2 <- factor ( with ( data, ifelse ( ( year < 1957 ), 0 , 1 ) ) )


                                          Then, I can introduce this factor as a dummy variable in my models. For example, to see whether there is a long-term trend in a varible y :



                                          summary ( lm ( y ~ t,  data = data ) )


                                          Hope this helps!






                                          share|improve this answer

























                                            up vote
                                            9
                                            down vote










                                            up vote
                                            9
                                            down vote









                                            What I normally do to work with this kind of dummy variables is:



                                            (1) how do I generate a dummy variable for observation #10, i.e. for year 1957 (value = 1 at 1957 and zero otherwise)



                                            data$factor_year_1 <- factor ( with ( data, ifelse ( ( year == 1957 ), 1 , 0 ) ) )


                                            (2) how do I generate a dummy-variable which is zero before 1957 and takes the value 1 from 1957 and onwards to 2009?



                                            data$factor_year_2 <- factor ( with ( data, ifelse ( ( year < 1957 ), 0 , 1 ) ) )


                                            Then, I can introduce this factor as a dummy variable in my models. For example, to see whether there is a long-term trend in a varible y :



                                            summary ( lm ( y ~ t,  data = data ) )


                                            Hope this helps!






                                            share|improve this answer














                                            What I normally do to work with this kind of dummy variables is:



                                            (1) how do I generate a dummy variable for observation #10, i.e. for year 1957 (value = 1 at 1957 and zero otherwise)



                                            data$factor_year_1 <- factor ( with ( data, ifelse ( ( year == 1957 ), 1 , 0 ) ) )


                                            (2) how do I generate a dummy-variable which is zero before 1957 and takes the value 1 from 1957 and onwards to 2009?



                                            data$factor_year_2 <- factor ( with ( data, ifelse ( ( year < 1957 ), 0 , 1 ) ) )


                                            Then, I can introduce this factor as a dummy variable in my models. For example, to see whether there is a long-term trend in a varible y :



                                            summary ( lm ( y ~ t,  data = data ) )


                                            Hope this helps!







                                            share|improve this answer














                                            share|improve this answer



                                            share|improve this answer








                                            edited May 7 '14 at 16:48

























                                            answered Aug 3 '12 at 9:44









                                            Ricardo González-Gil

                                            987




                                            987






















                                                up vote
                                                9
                                                down vote













                                                The other answers here offer direct routes to accomplish this task—one that many models (e.g. lm) will do for you internally anyway. Nonetheless, here are ways to make dummy variables with Max Kuhn's popular caret and recipes packages. While somewhat more verbose, they both scale easily to more complicated situations, and fit neatly into their respective frameworks.





                                                caret::dummyVars



                                                With caret, the relevant function is dummyVars, which has a predict method to apply it on a data frame:





                                                df <- data.frame(letter = rep(c('a', 'b', 'c'), each = 2),
                                                y = 1:6)

                                                library(caret)

                                                dummy <- dummyVars(~ ., data = df, fullRank = TRUE)

                                                dummy
                                                #> Dummy Variable Object
                                                #>
                                                #> Formula: ~.
                                                #> 2 variables, 1 factors
                                                #> Variables and levels will be separated by '.'
                                                #> A full rank encoding is used

                                                predict(dummy, df)
                                                #> letter.b letter.c y
                                                #> 1 0 0 1
                                                #> 2 0 0 2
                                                #> 3 1 0 3
                                                #> 4 1 0 4
                                                #> 5 0 1 5
                                                #> 6 0 1 6




                                                recipes::step_dummy



                                                With recipes, the relevant function is step_dummy:



                                                library(recipes)

                                                dummy_recipe <- recipe(y ~ letter, df) %>%
                                                step_dummy(letter)

                                                dummy_recipe
                                                #> Data Recipe
                                                #>
                                                #> Inputs:
                                                #>
                                                #> role #variables
                                                #> outcome 1
                                                #> predictor 1
                                                #>
                                                #> Steps:
                                                #>
                                                #> Dummy variables from letter


                                                Depending on context, extract the data with prep and either bake or juice:



                                                # Prep and bake on new data...
                                                dummy_recipe %>%
                                                prep() %>%
                                                bake(df)
                                                #> # A tibble: 6 x 3
                                                #> y letter_b letter_c
                                                #> <int> <dbl> <dbl>
                                                #> 1 1 0 0
                                                #> 2 2 0 0
                                                #> 3 3 1 0
                                                #> 4 4 1 0
                                                #> 5 5 0 1
                                                #> 6 6 0 1

                                                # ...or use `retain = TRUE` and `juice` to extract training data
                                                dummy_recipe %>%
                                                prep(retain = TRUE) %>%
                                                juice()
                                                #> # A tibble: 6 x 3
                                                #> y letter_b letter_c
                                                #> <int> <dbl> <dbl>
                                                #> 1 1 0 0
                                                #> 2 2 0 0
                                                #> 3 3 1 0
                                                #> 4 4 1 0
                                                #> 5 5 0 1
                                                #> 6 6 0 1





                                                share|improve this answer



























                                                  up vote
                                                  9
                                                  down vote













                                                  The other answers here offer direct routes to accomplish this task—one that many models (e.g. lm) will do for you internally anyway. Nonetheless, here are ways to make dummy variables with Max Kuhn's popular caret and recipes packages. While somewhat more verbose, they both scale easily to more complicated situations, and fit neatly into their respective frameworks.





                                                  caret::dummyVars



                                                  With caret, the relevant function is dummyVars, which has a predict method to apply it on a data frame:





                                                  df <- data.frame(letter = rep(c('a', 'b', 'c'), each = 2),
                                                  y = 1:6)

                                                  library(caret)

                                                  dummy <- dummyVars(~ ., data = df, fullRank = TRUE)

                                                  dummy
                                                  #> Dummy Variable Object
                                                  #>
                                                  #> Formula: ~.
                                                  #> 2 variables, 1 factors
                                                  #> Variables and levels will be separated by '.'
                                                  #> A full rank encoding is used

                                                  predict(dummy, df)
                                                  #> letter.b letter.c y
                                                  #> 1 0 0 1
                                                  #> 2 0 0 2
                                                  #> 3 1 0 3
                                                  #> 4 1 0 4
                                                  #> 5 0 1 5
                                                  #> 6 0 1 6




                                                  recipes::step_dummy



                                                  With recipes, the relevant function is step_dummy:



                                                  library(recipes)

                                                  dummy_recipe <- recipe(y ~ letter, df) %>%
                                                  step_dummy(letter)

                                                  dummy_recipe
                                                  #> Data Recipe
                                                  #>
                                                  #> Inputs:
                                                  #>
                                                  #> role #variables
                                                  #> outcome 1
                                                  #> predictor 1
                                                  #>
                                                  #> Steps:
                                                  #>
                                                  #> Dummy variables from letter


                                                  Depending on context, extract the data with prep and either bake or juice:



                                                  # Prep and bake on new data...
                                                  dummy_recipe %>%
                                                  prep() %>%
                                                  bake(df)
                                                  #> # A tibble: 6 x 3
                                                  #> y letter_b letter_c
                                                  #> <int> <dbl> <dbl>
                                                  #> 1 1 0 0
                                                  #> 2 2 0 0
                                                  #> 3 3 1 0
                                                  #> 4 4 1 0
                                                  #> 5 5 0 1
                                                  #> 6 6 0 1

                                                  # ...or use `retain = TRUE` and `juice` to extract training data
                                                  dummy_recipe %>%
                                                  prep(retain = TRUE) %>%
                                                  juice()
                                                  #> # A tibble: 6 x 3
                                                  #> y letter_b letter_c
                                                  #> <int> <dbl> <dbl>
                                                  #> 1 1 0 0
                                                  #> 2 2 0 0
                                                  #> 3 3 1 0
                                                  #> 4 4 1 0
                                                  #> 5 5 0 1
                                                  #> 6 6 0 1





                                                  share|improve this answer

























                                                    up vote
                                                    9
                                                    down vote










                                                    up vote
                                                    9
                                                    down vote









                                                    The other answers here offer direct routes to accomplish this task—one that many models (e.g. lm) will do for you internally anyway. Nonetheless, here are ways to make dummy variables with Max Kuhn's popular caret and recipes packages. While somewhat more verbose, they both scale easily to more complicated situations, and fit neatly into their respective frameworks.





                                                    caret::dummyVars



                                                    With caret, the relevant function is dummyVars, which has a predict method to apply it on a data frame:





                                                    df <- data.frame(letter = rep(c('a', 'b', 'c'), each = 2),
                                                    y = 1:6)

                                                    library(caret)

                                                    dummy <- dummyVars(~ ., data = df, fullRank = TRUE)

                                                    dummy
                                                    #> Dummy Variable Object
                                                    #>
                                                    #> Formula: ~.
                                                    #> 2 variables, 1 factors
                                                    #> Variables and levels will be separated by '.'
                                                    #> A full rank encoding is used

                                                    predict(dummy, df)
                                                    #> letter.b letter.c y
                                                    #> 1 0 0 1
                                                    #> 2 0 0 2
                                                    #> 3 1 0 3
                                                    #> 4 1 0 4
                                                    #> 5 0 1 5
                                                    #> 6 0 1 6




                                                    recipes::step_dummy



                                                    With recipes, the relevant function is step_dummy:



                                                    library(recipes)

                                                    dummy_recipe <- recipe(y ~ letter, df) %>%
                                                    step_dummy(letter)

                                                    dummy_recipe
                                                    #> Data Recipe
                                                    #>
                                                    #> Inputs:
                                                    #>
                                                    #> role #variables
                                                    #> outcome 1
                                                    #> predictor 1
                                                    #>
                                                    #> Steps:
                                                    #>
                                                    #> Dummy variables from letter


                                                    Depending on context, extract the data with prep and either bake or juice:



                                                    # Prep and bake on new data...
                                                    dummy_recipe %>%
                                                    prep() %>%
                                                    bake(df)
                                                    #> # A tibble: 6 x 3
                                                    #> y letter_b letter_c
                                                    #> <int> <dbl> <dbl>
                                                    #> 1 1 0 0
                                                    #> 2 2 0 0
                                                    #> 3 3 1 0
                                                    #> 4 4 1 0
                                                    #> 5 5 0 1
                                                    #> 6 6 0 1

                                                    # ...or use `retain = TRUE` and `juice` to extract training data
                                                    dummy_recipe %>%
                                                    prep(retain = TRUE) %>%
                                                    juice()
                                                    #> # A tibble: 6 x 3
                                                    #> y letter_b letter_c
                                                    #> <int> <dbl> <dbl>
                                                    #> 1 1 0 0
                                                    #> 2 2 0 0
                                                    #> 3 3 1 0
                                                    #> 4 4 1 0
                                                    #> 5 5 0 1
                                                    #> 6 6 0 1





                                                    share|improve this answer














                                                    The other answers here offer direct routes to accomplish this task—one that many models (e.g. lm) will do for you internally anyway. Nonetheless, here are ways to make dummy variables with Max Kuhn's popular caret and recipes packages. While somewhat more verbose, they both scale easily to more complicated situations, and fit neatly into their respective frameworks.





                                                    caret::dummyVars



                                                    With caret, the relevant function is dummyVars, which has a predict method to apply it on a data frame:





                                                    df <- data.frame(letter = rep(c('a', 'b', 'c'), each = 2),
                                                    y = 1:6)

                                                    library(caret)

                                                    dummy <- dummyVars(~ ., data = df, fullRank = TRUE)

                                                    dummy
                                                    #> Dummy Variable Object
                                                    #>
                                                    #> Formula: ~.
                                                    #> 2 variables, 1 factors
                                                    #> Variables and levels will be separated by '.'
                                                    #> A full rank encoding is used

                                                    predict(dummy, df)
                                                    #> letter.b letter.c y
                                                    #> 1 0 0 1
                                                    #> 2 0 0 2
                                                    #> 3 1 0 3
                                                    #> 4 1 0 4
                                                    #> 5 0 1 5
                                                    #> 6 0 1 6




                                                    recipes::step_dummy



                                                    With recipes, the relevant function is step_dummy:



                                                    library(recipes)

                                                    dummy_recipe <- recipe(y ~ letter, df) %>%
                                                    step_dummy(letter)

                                                    dummy_recipe
                                                    #> Data Recipe
                                                    #>
                                                    #> Inputs:
                                                    #>
                                                    #> role #variables
                                                    #> outcome 1
                                                    #> predictor 1
                                                    #>
                                                    #> Steps:
                                                    #>
                                                    #> Dummy variables from letter


                                                    Depending on context, extract the data with prep and either bake or juice:



                                                    # Prep and bake on new data...
                                                    dummy_recipe %>%
                                                    prep() %>%
                                                    bake(df)
                                                    #> # A tibble: 6 x 3
                                                    #> y letter_b letter_c
                                                    #> <int> <dbl> <dbl>
                                                    #> 1 1 0 0
                                                    #> 2 2 0 0
                                                    #> 3 3 1 0
                                                    #> 4 4 1 0
                                                    #> 5 5 0 1
                                                    #> 6 6 0 1

                                                    # ...or use `retain = TRUE` and `juice` to extract training data
                                                    dummy_recipe %>%
                                                    prep(retain = TRUE) %>%
                                                    juice()
                                                    #> # A tibble: 6 x 3
                                                    #> y letter_b letter_c
                                                    #> <int> <dbl> <dbl>
                                                    #> 1 1 0 0
                                                    #> 2 2 0 0
                                                    #> 3 3 1 0
                                                    #> 4 4 1 0
                                                    #> 5 5 0 1
                                                    #> 6 6 0 1






                                                    share|improve this answer














                                                    share|improve this answer



                                                    share|improve this answer








                                                    edited Apr 16 at 19:27

























                                                    answered Dec 17 '17 at 21:59









                                                    alistaire

                                                    31k43561




                                                    31k43561






















                                                        up vote
                                                        7
                                                        down vote













                                                        I read this on the kaggle forum:



                                                        #Generate example dataframe with character column
                                                        example <- as.data.frame(c("A", "A", "B", "F", "C", "G", "C", "D", "E", "F"))
                                                        names(example) <- "strcol"

                                                        #For every unique value in the string column, create a new 1/0 column
                                                        #This is what Factors do "under-the-hood" automatically when passed to function requiring numeric data
                                                        for(level in unique(example$strcol)){
                                                        example[paste("dummy", level, sep = "_")] <- ifelse(example$strcol == level, 1, 0)
                                                        }





                                                        share|improve this answer

























                                                          up vote
                                                          7
                                                          down vote













                                                          I read this on the kaggle forum:



                                                          #Generate example dataframe with character column
                                                          example <- as.data.frame(c("A", "A", "B", "F", "C", "G", "C", "D", "E", "F"))
                                                          names(example) <- "strcol"

                                                          #For every unique value in the string column, create a new 1/0 column
                                                          #This is what Factors do "under-the-hood" automatically when passed to function requiring numeric data
                                                          for(level in unique(example$strcol)){
                                                          example[paste("dummy", level, sep = "_")] <- ifelse(example$strcol == level, 1, 0)
                                                          }





                                                          share|improve this answer























                                                            up vote
                                                            7
                                                            down vote










                                                            up vote
                                                            7
                                                            down vote









                                                            I read this on the kaggle forum:



                                                            #Generate example dataframe with character column
                                                            example <- as.data.frame(c("A", "A", "B", "F", "C", "G", "C", "D", "E", "F"))
                                                            names(example) <- "strcol"

                                                            #For every unique value in the string column, create a new 1/0 column
                                                            #This is what Factors do "under-the-hood" automatically when passed to function requiring numeric data
                                                            for(level in unique(example$strcol)){
                                                            example[paste("dummy", level, sep = "_")] <- ifelse(example$strcol == level, 1, 0)
                                                            }





                                                            share|improve this answer












                                                            I read this on the kaggle forum:



                                                            #Generate example dataframe with character column
                                                            example <- as.data.frame(c("A", "A", "B", "F", "C", "G", "C", "D", "E", "F"))
                                                            names(example) <- "strcol"

                                                            #For every unique value in the string column, create a new 1/0 column
                                                            #This is what Factors do "under-the-hood" automatically when passed to function requiring numeric data
                                                            for(level in unique(example$strcol)){
                                                            example[paste("dummy", level, sep = "_")] <- ifelse(example$strcol == level, 1, 0)
                                                            }






                                                            share|improve this answer












                                                            share|improve this answer



                                                            share|improve this answer










                                                            answered May 16 '15 at 10:37









                                                            skpro19

                                                            15124




                                                            15124






















                                                                up vote
                                                                5
                                                                down vote













                                                                If you want to get K dummy variables, instead of K-1, try:



                                                                dummies = table(1:length(year),as.factor(year))  


                                                                Best,






                                                                share|improve this answer





















                                                                • the resulting table cannot be used as a data.frame. If that's a problem, use as.data.frame.matrix(dummies) to translate it into one
                                                                  – sheß
                                                                  Mar 27 at 19:21

















                                                                up vote
                                                                5
                                                                down vote













                                                                If you want to get K dummy variables, instead of K-1, try:



                                                                dummies = table(1:length(year),as.factor(year))  


                                                                Best,






                                                                share|improve this answer





















                                                                • the resulting table cannot be used as a data.frame. If that's a problem, use as.data.frame.matrix(dummies) to translate it into one
                                                                  – sheß
                                                                  Mar 27 at 19:21















                                                                up vote
                                                                5
                                                                down vote










                                                                up vote
                                                                5
                                                                down vote









                                                                If you want to get K dummy variables, instead of K-1, try:



                                                                dummies = table(1:length(year),as.factor(year))  


                                                                Best,






                                                                share|improve this answer












                                                                If you want to get K dummy variables, instead of K-1, try:



                                                                dummies = table(1:length(year),as.factor(year))  


                                                                Best,







                                                                share|improve this answer












                                                                share|improve this answer



                                                                share|improve this answer










                                                                answered Mar 27 '15 at 17:45









                                                                Fernando Hoces De La Guardia

                                                                168413




                                                                168413












                                                                • the resulting table cannot be used as a data.frame. If that's a problem, use as.data.frame.matrix(dummies) to translate it into one
                                                                  – sheß
                                                                  Mar 27 at 19:21




















                                                                • the resulting table cannot be used as a data.frame. If that's a problem, use as.data.frame.matrix(dummies) to translate it into one
                                                                  – sheß
                                                                  Mar 27 at 19:21


















                                                                the resulting table cannot be used as a data.frame. If that's a problem, use as.data.frame.matrix(dummies) to translate it into one
                                                                – sheß
                                                                Mar 27 at 19:21






                                                                the resulting table cannot be used as a data.frame. If that's a problem, use as.data.frame.matrix(dummies) to translate it into one
                                                                – sheß
                                                                Mar 27 at 19:21












                                                                up vote
                                                                5
                                                                down vote













                                                                For the usecase as presented in the question, you can also just multiply the logical condition with 1 (or maybe even better, with 1L):



                                                                # example data
                                                                df1 <- data.frame(yr = 1951:1960)

                                                                # create the dummies
                                                                df1$is.1957 <- 1L * (df1$yr == 1957)
                                                                df1$after.1957 <- 1L * (df1$yr >= 1957)


                                                                which gives:




                                                                > df1
                                                                yr is.1957 after.1957
                                                                1 1951 0 0
                                                                2 1952 0 0
                                                                3 1953 0 0
                                                                4 1954 0 0
                                                                5 1955 0 0
                                                                6 1956 0 0
                                                                7 1957 1 1
                                                                8 1958 0 1
                                                                9 1959 0 1
                                                                10 1960 0 1





                                                                For the usecases as presented in for example the answers of @zx8754 and @Sotos, there are still some other options which haven't been covered yet imo.



                                                                1) Make your own make_dummies-function



                                                                # example data
                                                                df2 <- data.frame(id = 1:5, year = c(1991:1994,1992))

                                                                # create a function
                                                                make_dummies <- function(v, prefix = '') {
                                                                s <- sort(unique(v))
                                                                d <- outer(v, s, function(v, s) 1L * (v == s))
                                                                colnames(d) <- paste0(prefix, s)
                                                                d
                                                                }

                                                                # bind the dummies to the original dataframe
                                                                cbind(df2, make_dummies(df2$year, prefix = 'y'))


                                                                which gives:




                                                                  id year y1991 y1992 y1993 y1994
                                                                1 1 1991 1 0 0 0
                                                                2 2 1992 0 1 0 0
                                                                3 3 1993 0 0 1 0
                                                                4 4 1994 0 0 0 1
                                                                5 5 1992 0 1 0 0



                                                                2) use the dcast-function from either data.table or reshape2



                                                                 dcast(df2, id + year ~ year, fun.aggregate = length)


                                                                which gives:




                                                                  id year 1991 1992 1993 1994
                                                                1 1 1991 1 0 0 0
                                                                2 2 1992 0 1 0 0
                                                                3 3 1993 0 0 1 0
                                                                4 4 1994 0 0 0 1
                                                                5 5 1992 0 1 0 0



                                                                However, this will not work when there are duplicate values in the column for which the dummies have to be created. In the case a specific aggregation function is needed for dcast and the result of of dcast need to be merged back to the original:



                                                                # example data
                                                                df3 <- data.frame(var = c("B", "C", "A", "B", "C"))

                                                                # aggregation function to get dummy values
                                                                f <- function(x) as.integer(length(x) > 0)

                                                                # reshape to wide with the cumstom aggregation function and merge back to the original
                                                                merge(df3, dcast(df3, var ~ var, fun.aggregate = f), by = 'var', all.x = TRUE)


                                                                which gives (note that the result is order according to the by column):




                                                                  var A B C
                                                                1 A 1 0 0
                                                                2 B 0 1 0
                                                                3 B 0 1 0
                                                                4 C 0 0 1
                                                                5 C 0 0 1



                                                                3) use the spread-function from tidyr (with mutate from dplyr)



                                                                library(dplyr)
                                                                library(tidyr)

                                                                df2 %>%
                                                                mutate(v = 1, yr = year) %>%
                                                                spread(yr, v, fill = 0)


                                                                which gives:




                                                                  id year 1991 1992 1993 1994
                                                                1 1 1991 1 0 0 0
                                                                2 2 1992 0 1 0 0
                                                                3 3 1993 0 0 1 0
                                                                4 4 1994 0 0 0 1
                                                                5 5 1992 0 1 0 0






                                                                share|improve this answer



























                                                                  up vote
                                                                  5
                                                                  down vote













                                                                  For the usecase as presented in the question, you can also just multiply the logical condition with 1 (or maybe even better, with 1L):



                                                                  # example data
                                                                  df1 <- data.frame(yr = 1951:1960)

                                                                  # create the dummies
                                                                  df1$is.1957 <- 1L * (df1$yr == 1957)
                                                                  df1$after.1957 <- 1L * (df1$yr >= 1957)


                                                                  which gives:




                                                                  > df1
                                                                  yr is.1957 after.1957
                                                                  1 1951 0 0
                                                                  2 1952 0 0
                                                                  3 1953 0 0
                                                                  4 1954 0 0
                                                                  5 1955 0 0
                                                                  6 1956 0 0
                                                                  7 1957 1 1
                                                                  8 1958 0 1
                                                                  9 1959 0 1
                                                                  10 1960 0 1





                                                                  For the usecases as presented in for example the answers of @zx8754 and @Sotos, there are still some other options which haven't been covered yet imo.



                                                                  1) Make your own make_dummies-function



                                                                  # example data
                                                                  df2 <- data.frame(id = 1:5, year = c(1991:1994,1992))

                                                                  # create a function
                                                                  make_dummies <- function(v, prefix = '') {
                                                                  s <- sort(unique(v))
                                                                  d <- outer(v, s, function(v, s) 1L * (v == s))
                                                                  colnames(d) <- paste0(prefix, s)
                                                                  d
                                                                  }

                                                                  # bind the dummies to the original dataframe
                                                                  cbind(df2, make_dummies(df2$year, prefix = 'y'))


                                                                  which gives:




                                                                    id year y1991 y1992 y1993 y1994
                                                                  1 1 1991 1 0 0 0
                                                                  2 2 1992 0 1 0 0
                                                                  3 3 1993 0 0 1 0
                                                                  4 4 1994 0 0 0 1
                                                                  5 5 1992 0 1 0 0



                                                                  2) use the dcast-function from either data.table or reshape2



                                                                   dcast(df2, id + year ~ year, fun.aggregate = length)


                                                                  which gives:




                                                                    id year 1991 1992 1993 1994
                                                                  1 1 1991 1 0 0 0
                                                                  2 2 1992 0 1 0 0
                                                                  3 3 1993 0 0 1 0
                                                                  4 4 1994 0 0 0 1
                                                                  5 5 1992 0 1 0 0



                                                                  However, this will not work when there are duplicate values in the column for which the dummies have to be created. In the case a specific aggregation function is needed for dcast and the result of of dcast need to be merged back to the original:



                                                                  # example data
                                                                  df3 <- data.frame(var = c("B", "C", "A", "B", "C"))

                                                                  # aggregation function to get dummy values
                                                                  f <- function(x) as.integer(length(x) > 0)

                                                                  # reshape to wide with the cumstom aggregation function and merge back to the original
                                                                  merge(df3, dcast(df3, var ~ var, fun.aggregate = f), by = 'var', all.x = TRUE)


                                                                  which gives (note that the result is order according to the by column):




                                                                    var A B C
                                                                  1 A 1 0 0
                                                                  2 B 0 1 0
                                                                  3 B 0 1 0
                                                                  4 C 0 0 1
                                                                  5 C 0 0 1



                                                                  3) use the spread-function from tidyr (with mutate from dplyr)



                                                                  library(dplyr)
                                                                  library(tidyr)

                                                                  df2 %>%
                                                                  mutate(v = 1, yr = year) %>%
                                                                  spread(yr, v, fill = 0)


                                                                  which gives:




                                                                    id year 1991 1992 1993 1994
                                                                  1 1 1991 1 0 0 0
                                                                  2 2 1992 0 1 0 0
                                                                  3 3 1993 0 0 1 0
                                                                  4 4 1994 0 0 0 1
                                                                  5 5 1992 0 1 0 0






                                                                  share|improve this answer

























                                                                    up vote
                                                                    5
                                                                    down vote










                                                                    up vote
                                                                    5
                                                                    down vote









                                                                    For the usecase as presented in the question, you can also just multiply the logical condition with 1 (or maybe even better, with 1L):



                                                                    # example data
                                                                    df1 <- data.frame(yr = 1951:1960)

                                                                    # create the dummies
                                                                    df1$is.1957 <- 1L * (df1$yr == 1957)
                                                                    df1$after.1957 <- 1L * (df1$yr >= 1957)


                                                                    which gives:




                                                                    > df1
                                                                    yr is.1957 after.1957
                                                                    1 1951 0 0
                                                                    2 1952 0 0
                                                                    3 1953 0 0
                                                                    4 1954 0 0
                                                                    5 1955 0 0
                                                                    6 1956 0 0
                                                                    7 1957 1 1
                                                                    8 1958 0 1
                                                                    9 1959 0 1
                                                                    10 1960 0 1





                                                                    For the usecases as presented in for example the answers of @zx8754 and @Sotos, there are still some other options which haven't been covered yet imo.



                                                                    1) Make your own make_dummies-function



                                                                    # example data
                                                                    df2 <- data.frame(id = 1:5, year = c(1991:1994,1992))

                                                                    # create a function
                                                                    make_dummies <- function(v, prefix = '') {
                                                                    s <- sort(unique(v))
                                                                    d <- outer(v, s, function(v, s) 1L * (v == s))
                                                                    colnames(d) <- paste0(prefix, s)
                                                                    d
                                                                    }

                                                                    # bind the dummies to the original dataframe
                                                                    cbind(df2, make_dummies(df2$year, prefix = 'y'))


                                                                    which gives:




                                                                      id year y1991 y1992 y1993 y1994
                                                                    1 1 1991 1 0 0 0
                                                                    2 2 1992 0 1 0 0
                                                                    3 3 1993 0 0 1 0
                                                                    4 4 1994 0 0 0 1
                                                                    5 5 1992 0 1 0 0



                                                                    2) use the dcast-function from either data.table or reshape2



                                                                     dcast(df2, id + year ~ year, fun.aggregate = length)


                                                                    which gives:




                                                                      id year 1991 1992 1993 1994
                                                                    1 1 1991 1 0 0 0
                                                                    2 2 1992 0 1 0 0
                                                                    3 3 1993 0 0 1 0
                                                                    4 4 1994 0 0 0 1
                                                                    5 5 1992 0 1 0 0



                                                                    However, this will not work when there are duplicate values in the column for which the dummies have to be created. In the case a specific aggregation function is needed for dcast and the result of of dcast need to be merged back to the original:



                                                                    # example data
                                                                    df3 <- data.frame(var = c("B", "C", "A", "B", "C"))

                                                                    # aggregation function to get dummy values
                                                                    f <- function(x) as.integer(length(x) > 0)

                                                                    # reshape to wide with the cumstom aggregation function and merge back to the original
                                                                    merge(df3, dcast(df3, var ~ var, fun.aggregate = f), by = 'var', all.x = TRUE)


                                                                    which gives (note that the result is order according to the by column):




                                                                      var A B C
                                                                    1 A 1 0 0
                                                                    2 B 0 1 0
                                                                    3 B 0 1 0
                                                                    4 C 0 0 1
                                                                    5 C 0 0 1



                                                                    3) use the spread-function from tidyr (with mutate from dplyr)



                                                                    library(dplyr)
                                                                    library(tidyr)

                                                                    df2 %>%
                                                                    mutate(v = 1, yr = year) %>%
                                                                    spread(yr, v, fill = 0)


                                                                    which gives:




                                                                      id year 1991 1992 1993 1994
                                                                    1 1 1991 1 0 0 0
                                                                    2 2 1992 0 1 0 0
                                                                    3 3 1993 0 0 1 0
                                                                    4 4 1994 0 0 0 1
                                                                    5 5 1992 0 1 0 0






                                                                    share|improve this answer














                                                                    For the usecase as presented in the question, you can also just multiply the logical condition with 1 (or maybe even better, with 1L):



                                                                    # example data
                                                                    df1 <- data.frame(yr = 1951:1960)

                                                                    # create the dummies
                                                                    df1$is.1957 <- 1L * (df1$yr == 1957)
                                                                    df1$after.1957 <- 1L * (df1$yr >= 1957)


                                                                    which gives:




                                                                    > df1
                                                                    yr is.1957 after.1957
                                                                    1 1951 0 0
                                                                    2 1952 0 0
                                                                    3 1953 0 0
                                                                    4 1954 0 0
                                                                    5 1955 0 0
                                                                    6 1956 0 0
                                                                    7 1957 1 1
                                                                    8 1958 0 1
                                                                    9 1959 0 1
                                                                    10 1960 0 1





                                                                    For the usecases as presented in for example the answers of @zx8754 and @Sotos, there are still some other options which haven't been covered yet imo.



                                                                    1) Make your own make_dummies-function



                                                                    # example data
                                                                    df2 <- data.frame(id = 1:5, year = c(1991:1994,1992))

                                                                    # create a function
                                                                    make_dummies <- function(v, prefix = '') {
                                                                    s <- sort(unique(v))
                                                                    d <- outer(v, s, function(v, s) 1L * (v == s))
                                                                    colnames(d) <- paste0(prefix, s)
                                                                    d
                                                                    }

                                                                    # bind the dummies to the original dataframe
                                                                    cbind(df2, make_dummies(df2$year, prefix = 'y'))


                                                                    which gives:




                                                                      id year y1991 y1992 y1993 y1994
                                                                    1 1 1991 1 0 0 0
                                                                    2 2 1992 0 1 0 0
                                                                    3 3 1993 0 0 1 0
                                                                    4 4 1994 0 0 0 1
                                                                    5 5 1992 0 1 0 0



                                                                    2) use the dcast-function from either data.table or reshape2



                                                                     dcast(df2, id + year ~ year, fun.aggregate = length)


                                                                    which gives:




                                                                      id year 1991 1992 1993 1994
                                                                    1 1 1991 1 0 0 0
                                                                    2 2 1992 0 1 0 0
                                                                    3 3 1993 0 0 1 0
                                                                    4 4 1994 0 0 0 1
                                                                    5 5 1992 0 1 0 0



                                                                    However, this will not work when there are duplicate values in the column for which the dummies have to be created. In the case a specific aggregation function is needed for dcast and the result of of dcast need to be merged back to the original:



                                                                    # example data
                                                                    df3 <- data.frame(var = c("B", "C", "A", "B", "C"))

                                                                    # aggregation function to get dummy values
                                                                    f <- function(x) as.integer(length(x) > 0)

                                                                    # reshape to wide with the cumstom aggregation function and merge back to the original
                                                                    merge(df3, dcast(df3, var ~ var, fun.aggregate = f), by = 'var', all.x = TRUE)


                                                                    which gives (note that the result is order according to the by column):




                                                                      var A B C
                                                                    1 A 1 0 0
                                                                    2 B 0 1 0
                                                                    3 B 0 1 0
                                                                    4 C 0 0 1
                                                                    5 C 0 0 1



                                                                    3) use the spread-function from tidyr (with mutate from dplyr)



                                                                    library(dplyr)
                                                                    library(tidyr)

                                                                    df2 %>%
                                                                    mutate(v = 1, yr = year) %>%
                                                                    spread(yr, v, fill = 0)


                                                                    which gives:




                                                                      id year 1991 1992 1993 1994
                                                                    1 1 1991 1 0 0 0
                                                                    2 2 1992 0 1 0 0
                                                                    3 3 1993 0 0 1 0
                                                                    4 4 1994 0 0 0 1
                                                                    5 5 1992 0 1 0 0







                                                                    share|improve this answer














                                                                    share|improve this answer



                                                                    share|improve this answer








                                                                    edited Jul 8 at 17:30

























                                                                    answered Feb 13 at 18:38









                                                                    Jaap

                                                                    53.9k20116127




                                                                    53.9k20116127






















                                                                        up vote
                                                                        4
                                                                        down vote













                                                                        The ifelse function is best for simple logic like this.



                                                                        > x <- seq(1950, 1960, 1)

                                                                        ifelse(x == 1957, 1, 0)
                                                                        ifelse(x <= 1957, 1, 0)

                                                                        > [1] 0 0 0 0 0 0 0 1 0 0 0
                                                                        > [1] 1 1 1 1 1 1 1 1 0 0 0


                                                                        Also, if you want it to return character data then you can do so.



                                                                        > x <- seq(1950, 1960, 1)

                                                                        ifelse(x == 1957, "foo", "bar")
                                                                        ifelse(x <= 1957, "foo", "bar")

                                                                        > [1] "bar" "bar" "bar" "bar" "bar" "bar" "bar" "foo" "bar" "bar" "bar"
                                                                        > [1] "foo" "foo" "foo" "foo" "foo" "foo" "foo" "foo" "bar" "bar" "bar"


                                                                        Categorical variables with nesting...



                                                                        > x <- seq(1950, 1960, 1)

                                                                        ifelse(x == 1957, "foo", ifelse(x == 1958, "bar","baz"))

                                                                        > [1] "baz" "baz" "baz" "baz" "baz" "baz" "baz" "foo" "bar" "baz" "baz"


                                                                        This is the most straightforward option.






                                                                        share|improve this answer



























                                                                          up vote
                                                                          4
                                                                          down vote













                                                                          The ifelse function is best for simple logic like this.



                                                                          > x <- seq(1950, 1960, 1)

                                                                          ifelse(x == 1957, 1, 0)
                                                                          ifelse(x <= 1957, 1, 0)

                                                                          > [1] 0 0 0 0 0 0 0 1 0 0 0
                                                                          > [1] 1 1 1 1 1 1 1 1 0 0 0


                                                                          Also, if you want it to return character data then you can do so.



                                                                          > x <- seq(1950, 1960, 1)

                                                                          ifelse(x == 1957, "foo", "bar")
                                                                          ifelse(x <= 1957, "foo", "bar")

                                                                          > [1] "bar" "bar" "bar" "bar" "bar" "bar" "bar" "foo" "bar" "bar" "bar"
                                                                          > [1] "foo" "foo" "foo" "foo" "foo" "foo" "foo" "foo" "bar" "bar" "bar"


                                                                          Categorical variables with nesting...



                                                                          > x <- seq(1950, 1960, 1)

                                                                          ifelse(x == 1957, "foo", ifelse(x == 1958, "bar","baz"))

                                                                          > [1] "baz" "baz" "baz" "baz" "baz" "baz" "baz" "foo" "bar" "baz" "baz"


                                                                          This is the most straightforward option.






                                                                          share|improve this answer

























                                                                            up vote
                                                                            4
                                                                            down vote










                                                                            up vote
                                                                            4
                                                                            down vote









                                                                            The ifelse function is best for simple logic like this.



                                                                            > x <- seq(1950, 1960, 1)

                                                                            ifelse(x == 1957, 1, 0)
                                                                            ifelse(x <= 1957, 1, 0)

                                                                            > [1] 0 0 0 0 0 0 0 1 0 0 0
                                                                            > [1] 1 1 1 1 1 1 1 1 0 0 0


                                                                            Also, if you want it to return character data then you can do so.



                                                                            > x <- seq(1950, 1960, 1)

                                                                            ifelse(x == 1957, "foo", "bar")
                                                                            ifelse(x <= 1957, "foo", "bar")

                                                                            > [1] "bar" "bar" "bar" "bar" "bar" "bar" "bar" "foo" "bar" "bar" "bar"
                                                                            > [1] "foo" "foo" "foo" "foo" "foo" "foo" "foo" "foo" "bar" "bar" "bar"


                                                                            Categorical variables with nesting...



                                                                            > x <- seq(1950, 1960, 1)

                                                                            ifelse(x == 1957, "foo", ifelse(x == 1958, "bar","baz"))

                                                                            > [1] "baz" "baz" "baz" "baz" "baz" "baz" "baz" "foo" "bar" "baz" "baz"


                                                                            This is the most straightforward option.






                                                                            share|improve this answer














                                                                            The ifelse function is best for simple logic like this.



                                                                            > x <- seq(1950, 1960, 1)

                                                                            ifelse(x == 1957, 1, 0)
                                                                            ifelse(x <= 1957, 1, 0)

                                                                            > [1] 0 0 0 0 0 0 0 1 0 0 0
                                                                            > [1] 1 1 1 1 1 1 1 1 0 0 0


                                                                            Also, if you want it to return character data then you can do so.



                                                                            > x <- seq(1950, 1960, 1)

                                                                            ifelse(x == 1957, "foo", "bar")
                                                                            ifelse(x <= 1957, "foo", "bar")

                                                                            > [1] "bar" "bar" "bar" "bar" "bar" "bar" "bar" "foo" "bar" "bar" "bar"
                                                                            > [1] "foo" "foo" "foo" "foo" "foo" "foo" "foo" "foo" "bar" "bar" "bar"


                                                                            Categorical variables with nesting...



                                                                            > x <- seq(1950, 1960, 1)

                                                                            ifelse(x == 1957, "foo", ifelse(x == 1958, "bar","baz"))

                                                                            > [1] "baz" "baz" "baz" "baz" "baz" "baz" "baz" "foo" "bar" "baz" "baz"


                                                                            This is the most straightforward option.







                                                                            share|improve this answer














                                                                            share|improve this answer



                                                                            share|improve this answer








                                                                            edited Dec 9 '15 at 22:52

























                                                                            answered Dec 9 '15 at 22:41









                                                                            Alex Thompson

                                                                            35818




                                                                            35818






















                                                                                up vote
                                                                                2
                                                                                down vote













                                                                                Another way is to use mtabulate from qdapTools package, i.e.



                                                                                df <- data.frame(var = sample(c("A", "B", "C"), 5, replace = TRUE))
                                                                                var
                                                                                #1 C
                                                                                #2 A
                                                                                #3 C
                                                                                #4 B
                                                                                #5 B

                                                                                library(qdapTools)
                                                                                mtabulate(df$var)


                                                                                which gives,




                                                                                  A B C
                                                                                1 0 0 1
                                                                                2 1 0 0
                                                                                3 0 0 1
                                                                                4 0 1 0
                                                                                5 0 1 0






                                                                                share|improve this answer

























                                                                                  up vote
                                                                                  2
                                                                                  down vote













                                                                                  Another way is to use mtabulate from qdapTools package, i.e.



                                                                                  df <- data.frame(var = sample(c("A", "B", "C"), 5, replace = TRUE))
                                                                                  var
                                                                                  #1 C
                                                                                  #2 A
                                                                                  #3 C
                                                                                  #4 B
                                                                                  #5 B

                                                                                  library(qdapTools)
                                                                                  mtabulate(df$var)


                                                                                  which gives,




                                                                                    A B C
                                                                                  1 0 0 1
                                                                                  2 1 0 0
                                                                                  3 0 0 1
                                                                                  4 0 1 0
                                                                                  5 0 1 0






                                                                                  share|improve this answer























                                                                                    up vote
                                                                                    2
                                                                                    down vote










                                                                                    up vote
                                                                                    2
                                                                                    down vote









                                                                                    Another way is to use mtabulate from qdapTools package, i.e.



                                                                                    df <- data.frame(var = sample(c("A", "B", "C"), 5, replace = TRUE))
                                                                                    var
                                                                                    #1 C
                                                                                    #2 A
                                                                                    #3 C
                                                                                    #4 B
                                                                                    #5 B

                                                                                    library(qdapTools)
                                                                                    mtabulate(df$var)


                                                                                    which gives,




                                                                                      A B C
                                                                                    1 0 0 1
                                                                                    2 1 0 0
                                                                                    3 0 0 1
                                                                                    4 0 1 0
                                                                                    5 0 1 0






                                                                                    share|improve this answer












                                                                                    Another way is to use mtabulate from qdapTools package, i.e.



                                                                                    df <- data.frame(var = sample(c("A", "B", "C"), 5, replace = TRUE))
                                                                                    var
                                                                                    #1 C
                                                                                    #2 A
                                                                                    #3 C
                                                                                    #4 B
                                                                                    #5 B

                                                                                    library(qdapTools)
                                                                                    mtabulate(df$var)


                                                                                    which gives,




                                                                                      A B C
                                                                                    1 0 0 1
                                                                                    2 1 0 0
                                                                                    3 0 0 1
                                                                                    4 0 1 0
                                                                                    5 0 1 0







                                                                                    share|improve this answer












                                                                                    share|improve this answer



                                                                                    share|improve this answer










                                                                                    answered Oct 6 '17 at 6:32









                                                                                    Sotos

                                                                                    26.9k51540




                                                                                    26.9k51540






















                                                                                        up vote
                                                                                        1
                                                                                        down vote













                                                                                        I use such a function (for data.table):



                                                                                        # Ta funkcja dla obiektu data.table i zmiennej var.name typu factor tworzy dummy variables o nazwach "var.name: (level1)"
                                                                                        factorToDummy <- function(dtable, var.name){
                                                                                        stopifnot(is.data.table(dtable))
                                                                                        stopifnot(var.name %in% names(dtable))
                                                                                        stopifnot(is.factor(dtable[, get(var.name)]))

                                                                                        dtable[, paste0(var.name,": ",levels(get(var.name)))] -> new.names
                                                                                        dtable[, (new.names) := transpose(lapply(get(var.name), FUN = function(x){x == levels(get(var.name))})) ]

                                                                                        cat(paste("nDodano zmienne dummy: ", paste0(new.names, collapse = ", ")))
                                                                                        }


                                                                                        Usage:



                                                                                        data <- data.table(data)
                                                                                        data[, x:= droplevels(x)]
                                                                                        factorToDummy(data, "x")





                                                                                        share|improve this answer



























                                                                                          up vote
                                                                                          1
                                                                                          down vote













                                                                                          I use such a function (for data.table):



                                                                                          # Ta funkcja dla obiektu data.table i zmiennej var.name typu factor tworzy dummy variables o nazwach "var.name: (level1)"
                                                                                          factorToDummy <- function(dtable, var.name){
                                                                                          stopifnot(is.data.table(dtable))
                                                                                          stopifnot(var.name %in% names(dtable))
                                                                                          stopifnot(is.factor(dtable[, get(var.name)]))

                                                                                          dtable[, paste0(var.name,": ",levels(get(var.name)))] -> new.names
                                                                                          dtable[, (new.names) := transpose(lapply(get(var.name), FUN = function(x){x == levels(get(var.name))})) ]

                                                                                          cat(paste("nDodano zmienne dummy: ", paste0(new.names, collapse = ", ")))
                                                                                          }


                                                                                          Usage:



                                                                                          data <- data.table(data)
                                                                                          data[, x:= droplevels(x)]
                                                                                          factorToDummy(data, "x")





                                                                                          share|improve this answer

























                                                                                            up vote
                                                                                            1
                                                                                            down vote










                                                                                            up vote
                                                                                            1
                                                                                            down vote









                                                                                            I use such a function (for data.table):



                                                                                            # Ta funkcja dla obiektu data.table i zmiennej var.name typu factor tworzy dummy variables o nazwach "var.name: (level1)"
                                                                                            factorToDummy <- function(dtable, var.name){
                                                                                            stopifnot(is.data.table(dtable))
                                                                                            stopifnot(var.name %in% names(dtable))
                                                                                            stopifnot(is.factor(dtable[, get(var.name)]))

                                                                                            dtable[, paste0(var.name,": ",levels(get(var.name)))] -> new.names
                                                                                            dtable[, (new.names) := transpose(lapply(get(var.name), FUN = function(x){x == levels(get(var.name))})) ]

                                                                                            cat(paste("nDodano zmienne dummy: ", paste0(new.names, collapse = ", ")))
                                                                                            }


                                                                                            Usage:



                                                                                            data <- data.table(data)
                                                                                            data[, x:= droplevels(x)]
                                                                                            factorToDummy(data, "x")





                                                                                            share|improve this answer














                                                                                            I use such a function (for data.table):



                                                                                            # Ta funkcja dla obiektu data.table i zmiennej var.name typu factor tworzy dummy variables o nazwach "var.name: (level1)"
                                                                                            factorToDummy <- function(dtable, var.name){
                                                                                            stopifnot(is.data.table(dtable))
                                                                                            stopifnot(var.name %in% names(dtable))
                                                                                            stopifnot(is.factor(dtable[, get(var.name)]))

                                                                                            dtable[, paste0(var.name,": ",levels(get(var.name)))] -> new.names
                                                                                            dtable[, (new.names) := transpose(lapply(get(var.name), FUN = function(x){x == levels(get(var.name))})) ]

                                                                                            cat(paste("nDodano zmienne dummy: ", paste0(new.names, collapse = ", ")))
                                                                                            }


                                                                                            Usage:



                                                                                            data <- data.table(data)
                                                                                            data[, x:= droplevels(x)]
                                                                                            factorToDummy(data, "x")






                                                                                            share|improve this answer














                                                                                            share|improve this answer



                                                                                            share|improve this answer








                                                                                            edited Aug 18 '15 at 9:58

























                                                                                            answered Aug 18 '15 at 9:50









                                                                                            Maciej Mozolewski

                                                                                            113




                                                                                            113






















                                                                                                up vote
                                                                                                1
                                                                                                down vote













                                                                                                Convert your data to a data.table and use set by reference and row filtering



                                                                                                library(data.table)

                                                                                                dt <- as.data.table(your.dataframe.or.whatever)
                                                                                                dt[, is.1957 := 0]
                                                                                                dt[year == 1957, is.1957 := 1]


                                                                                                Proof-of-concept toy example:



                                                                                                library(data.table)

                                                                                                dt <- as.data.table(cbind(c(1, 1, 1), c(2, 2, 3)))
                                                                                                dt[, is.3 := 0]
                                                                                                dt[V2 == 3, is.3 := 1]





                                                                                                share|improve this answer



























                                                                                                  up vote
                                                                                                  1
                                                                                                  down vote













                                                                                                  Convert your data to a data.table and use set by reference and row filtering



                                                                                                  library(data.table)

                                                                                                  dt <- as.data.table(your.dataframe.or.whatever)
                                                                                                  dt[, is.1957 := 0]
                                                                                                  dt[year == 1957, is.1957 := 1]


                                                                                                  Proof-of-concept toy example:



                                                                                                  library(data.table)

                                                                                                  dt <- as.data.table(cbind(c(1, 1, 1), c(2, 2, 3)))
                                                                                                  dt[, is.3 := 0]
                                                                                                  dt[V2 == 3, is.3 := 1]





                                                                                                  share|improve this answer

























                                                                                                    up vote
                                                                                                    1
                                                                                                    down vote










                                                                                                    up vote
                                                                                                    1
                                                                                                    down vote









                                                                                                    Convert your data to a data.table and use set by reference and row filtering



                                                                                                    library(data.table)

                                                                                                    dt <- as.data.table(your.dataframe.or.whatever)
                                                                                                    dt[, is.1957 := 0]
                                                                                                    dt[year == 1957, is.1957 := 1]


                                                                                                    Proof-of-concept toy example:



                                                                                                    library(data.table)

                                                                                                    dt <- as.data.table(cbind(c(1, 1, 1), c(2, 2, 3)))
                                                                                                    dt[, is.3 := 0]
                                                                                                    dt[V2 == 3, is.3 := 1]





                                                                                                    share|improve this answer














                                                                                                    Convert your data to a data.table and use set by reference and row filtering



                                                                                                    library(data.table)

                                                                                                    dt <- as.data.table(your.dataframe.or.whatever)
                                                                                                    dt[, is.1957 := 0]
                                                                                                    dt[year == 1957, is.1957 := 1]


                                                                                                    Proof-of-concept toy example:



                                                                                                    library(data.table)

                                                                                                    dt <- as.data.table(cbind(c(1, 1, 1), c(2, 2, 3)))
                                                                                                    dt[, is.3 := 0]
                                                                                                    dt[V2 == 3, is.3 := 1]






                                                                                                    share|improve this answer














                                                                                                    share|improve this answer



                                                                                                    share|improve this answer








                                                                                                    edited May 9 at 23:31

























                                                                                                    answered Feb 15 at 3:48









                                                                                                    wordsforthewise

                                                                                                    2,97722446




                                                                                                    2,97722446






















                                                                                                        up vote
                                                                                                        0
                                                                                                        down vote













                                                                                                        Hi i wrote this general function to generate a dummy variable which essentially replicates the replace function in Stata.



                                                                                                        If x is the data frame is x and i want a dummy variable called a which will take value 1 when x$b takes value c



                                                                                                        introducedummy<-function(x,a,b,c){
                                                                                                        g<-c(a,b,c)
                                                                                                        n<-nrow(x)
                                                                                                        newcol<-g[1]
                                                                                                        p<-colnames(x)
                                                                                                        p2<-c(p,newcol)
                                                                                                        new1<-numeric(n)
                                                                                                        state<-x[,g[2]]
                                                                                                        interest<-g[3]
                                                                                                        for(i in 1:n){
                                                                                                        if(state[i]==interest){
                                                                                                        new1[i]=1
                                                                                                        }
                                                                                                        else{
                                                                                                        new1[i]=0
                                                                                                        }
                                                                                                        }
                                                                                                        x$added<-new1
                                                                                                        colnames(x)<-p2
                                                                                                        x
                                                                                                        }





                                                                                                        share|improve this answer



























                                                                                                          up vote
                                                                                                          0
                                                                                                          down vote













                                                                                                          Hi i wrote this general function to generate a dummy variable which essentially replicates the replace function in Stata.



                                                                                                          If x is the data frame is x and i want a dummy variable called a which will take value 1 when x$b takes value c



                                                                                                          introducedummy<-function(x,a,b,c){
                                                                                                          g<-c(a,b,c)
                                                                                                          n<-nrow(x)
                                                                                                          newcol<-g[1]
                                                                                                          p<-colnames(x)
                                                                                                          p2<-c(p,newcol)
                                                                                                          new1<-numeric(n)
                                                                                                          state<-x[,g[2]]
                                                                                                          interest<-g[3]
                                                                                                          for(i in 1:n){
                                                                                                          if(state[i]==interest){
                                                                                                          new1[i]=1
                                                                                                          }
                                                                                                          else{
                                                                                                          new1[i]=0
                                                                                                          }
                                                                                                          }
                                                                                                          x$added<-new1
                                                                                                          colnames(x)<-p2
                                                                                                          x
                                                                                                          }





                                                                                                          share|improve this answer

























                                                                                                            up vote
                                                                                                            0
                                                                                                            down vote










                                                                                                            up vote
                                                                                                            0
                                                                                                            down vote









                                                                                                            Hi i wrote this general function to generate a dummy variable which essentially replicates the replace function in Stata.



                                                                                                            If x is the data frame is x and i want a dummy variable called a which will take value 1 when x$b takes value c



                                                                                                            introducedummy<-function(x,a,b,c){
                                                                                                            g<-c(a,b,c)
                                                                                                            n<-nrow(x)
                                                                                                            newcol<-g[1]
                                                                                                            p<-colnames(x)
                                                                                                            p2<-c(p,newcol)
                                                                                                            new1<-numeric(n)
                                                                                                            state<-x[,g[2]]
                                                                                                            interest<-g[3]
                                                                                                            for(i in 1:n){
                                                                                                            if(state[i]==interest){
                                                                                                            new1[i]=1
                                                                                                            }
                                                                                                            else{
                                                                                                            new1[i]=0
                                                                                                            }
                                                                                                            }
                                                                                                            x$added<-new1
                                                                                                            colnames(x)<-p2
                                                                                                            x
                                                                                                            }





                                                                                                            share|improve this answer














                                                                                                            Hi i wrote this general function to generate a dummy variable which essentially replicates the replace function in Stata.



                                                                                                            If x is the data frame is x and i want a dummy variable called a which will take value 1 when x$b takes value c



                                                                                                            introducedummy<-function(x,a,b,c){
                                                                                                            g<-c(a,b,c)
                                                                                                            n<-nrow(x)
                                                                                                            newcol<-g[1]
                                                                                                            p<-colnames(x)
                                                                                                            p2<-c(p,newcol)
                                                                                                            new1<-numeric(n)
                                                                                                            state<-x[,g[2]]
                                                                                                            interest<-g[3]
                                                                                                            for(i in 1:n){
                                                                                                            if(state[i]==interest){
                                                                                                            new1[i]=1
                                                                                                            }
                                                                                                            else{
                                                                                                            new1[i]=0
                                                                                                            }
                                                                                                            }
                                                                                                            x$added<-new1
                                                                                                            colnames(x)<-p2
                                                                                                            x
                                                                                                            }






                                                                                                            share|improve this answer














                                                                                                            share|improve this answer



                                                                                                            share|improve this answer








                                                                                                            edited Feb 6 '15 at 18:00









                                                                                                            mor

                                                                                                            2,2171228




                                                                                                            2,2171228










                                                                                                            answered Feb 6 '15 at 17:18









                                                                                                            kangkan Dc

                                                                                                            465




                                                                                                            465






















                                                                                                                up vote
                                                                                                                0
                                                                                                                down vote













                                                                                                                another way you can do it is use



                                                                                                                ifelse(year < 1965 , 1, 0)





                                                                                                                share|improve this answer



























                                                                                                                  up vote
                                                                                                                  0
                                                                                                                  down vote













                                                                                                                  another way you can do it is use



                                                                                                                  ifelse(year < 1965 , 1, 0)





                                                                                                                  share|improve this answer

























                                                                                                                    up vote
                                                                                                                    0
                                                                                                                    down vote










                                                                                                                    up vote
                                                                                                                    0
                                                                                                                    down vote









                                                                                                                    another way you can do it is use



                                                                                                                    ifelse(year < 1965 , 1, 0)





                                                                                                                    share|improve this answer














                                                                                                                    another way you can do it is use



                                                                                                                    ifelse(year < 1965 , 1, 0)






                                                                                                                    share|improve this answer














                                                                                                                    share|improve this answer



                                                                                                                    share|improve this answer








                                                                                                                    edited May 9 at 23:54









                                                                                                                    dee-see

                                                                                                                    18.6k34479




                                                                                                                    18.6k34479










                                                                                                                    answered May 9 at 21:09









                                                                                                                    Sophia J

                                                                                                                    4910




                                                                                                                    4910

















                                                                                                                        protected by Jaap Oct 16 '17 at 9:47



                                                                                                                        Thank you for your interest in this question.
                                                                                                                        Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).



                                                                                                                        Would you like to answer one of these unanswered questions instead?



                                                                                                                        Popular posts from this blog

                                                                                                                        android studio warns about leanback feature tag usage required on manifest while using Unity exported app?

                                                                                                                        SQL update select statement

                                                                                                                        'app-layout' is not a known element: how to share Component with different Modules