up vote
57
down vote

favorite

I have trouble generating the following dummy-variables in R:

I'm analyzing yearly time series data (time period 1948-2009). I have two questions:

How do I generate a dummy variable for observation #10, i.e. for year 1957 (value = 1 at 1957 and zero otherwise)?

How do I generate a dummy variable which is zero before 1957 and takes the value 1 from 1957 and onwards to 2009?

edited Oct 16 '17 at 9:47

Jaap

53.9k20116127

asked Aug 2 '12 at 23:07

Pantera

391145

migrated from stats.stackexchange.com Aug 14 '12 at 12:55

This question came from our site for people interested in statistics, machine learning, data analysis, data mining, and data visualization.

add a comment |

up vote
57
down vote

favorite

I have trouble generating the following dummy-variables in R:

I'm analyzing yearly time series data (time period 1948-2009). I have two questions:

How do I generate a dummy variable for observation #10, i.e. for year 1957 (value = 1 at 1957 and zero otherwise)?

How do I generate a dummy variable which is zero before 1957 and takes the value 1 from 1957 and onwards to 2009?

edited Oct 16 '17 at 9:47

Jaap

53.9k20116127

asked Aug 2 '12 at 23:07

Pantera

391145

migrated from stats.stackexchange.com Aug 14 '12 at 12:55

This question came from our site for people interested in statistics, machine learning, data analysis, data mining, and data visualization.

add a comment |

up vote
57
down vote

favorite

I have trouble generating the following dummy-variables in R:

I'm analyzing yearly time series data (time period 1948-2009). I have two questions:

How do I generate a dummy variable for observation #10, i.e. for year 1957 (value = 1 at 1957 and zero otherwise)?

How do I generate a dummy variable which is zero before 1957 and takes the value 1 from 1957 and onwards to 2009?

edited Oct 16 '17 at 9:47

Jaap

53.9k20116127

asked Aug 2 '12 at 23:07

Pantera

391145

I have trouble generating the following dummy-variables in R:

I'm analyzing yearly time series data (time period 1948-2009). I have two questions:

How do I generate a dummy variable for observation #10, i.e. for year 1957 (value = 1 at 1957 and zero otherwise)?

How do I generate a dummy variable which is zero before 1957 and takes the value 1 from 1957 and onwards to 2009?

r r-faq

edited Oct 16 '17 at 9:47

Jaap

53.9k20116127

asked Aug 2 '12 at 23:07

Pantera

391145

edited Oct 16 '17 at 9:47

Jaap

53.9k20116127

asked Aug 2 '12 at 23:07

Pantera

391145

edited Oct 16 '17 at 9:47

Jaap

53.9k20116127

edited Oct 16 '17 at 9:47

Jaap

53.9k20116127

edited Oct 16 '17 at 9:47

Jaap

53.9k20116127

asked Aug 2 '12 at 23:07

Pantera

391145

asked Aug 2 '12 at 23:07

Pantera

391145

asked Aug 2 '12 at 23:07

Pantera

391145

migrated from stats.stackexchange.com Aug 14 '12 at 12:55

This question came from our site for people interested in statistics, machine learning, data analysis, data mining, and data visualization.

migrated from stats.stackexchange.com Aug 14 '12 at 12:55

This question came from our site for people interested in statistics, machine learning, data analysis, data mining, and data visualization.

add a comment |

15 Answers
15

active

oldest

votes

up vote
84
down vote

Another option that can work better if you have many variables is factor and model.matrix.

> year.f = factor(year)

> dummies = model.matrix(~year.f)

This will include an intercept column (all ones) and one column for each of the years in your data set except one, which will be the "default" or intercept value.

You can change how the "default" is chosen by messing with contrasts.arg in model.matrix.

Also, if you want to omit the intercept, you can just drop the first column or add +0 to the end of the formula.

Hope this is useful.

edited Jun 24 at 13:37

answered Aug 3 '12 at 1:24

David J. Harris

985610

4

what if you want to generate dummy variables for all (instead of k-1) with no intercept?
– Fernando Hoces De La Guardia
Mar 27 '15 at 16:52

1

note that model.matrix( ) accepts multiple variables to transform into dummies: model.matrix( ~ var1 + var2, data = df) Again, just be sure that they are factors.
– slizb
May 1 '15 at 19:32

3

@Synergist table(1:n, factor). Where factor is the original variable and n is its length
– Fernando Hoces De La Guardia
Jun 3 '15 at 15:43

1

@Synergist that table is a n x k matrix with all k indicator variables (instead of k-1)
– Fernando Hoces De La Guardia
Jun 3 '15 at 15:49

4

@FernandoHocesDeLaGuardia You can remove the intercept from a formula either with + 0 or - 1. So model.matrix(~ year.f + 0) will give a give dummy variables without a reference level.
– Gregor
Jan 6 '16 at 20:16

|
show 5 more comments

up vote
45
down vote

The simplest way to produce these dummy variables is something like the following:

> print(year)

[1] 1956 1957 1957 1958 1958 1959

> dummy <- as.numeric(year == 1957)

> print(dummy)

[1] 0 1 1 0 0 0

> dummy2 <- as.numeric(year >= 1957)

> print(dummy2)

[1] 0 1 1 1 1 1

More generally, you can use ifelse to choose between two values depending on a condition. So if instead of a 0-1 dummy variable, for some reason you wanted to use, say, 4 and 7, you could use ifelse(year == 1957, 4, 7).

answered Aug 2 '12 at 23:38

Martin O'Leary

91169

add a comment |

up vote
29
down vote

Using dummies::dummy():

library(dummies)



# example data

df1 <- data.frame(id = 1:4, year = 1991:1994)



df1 <- cbind(df1, dummy(df1$year, sep = "_"))



df1

#   id year df1_1991 df1_1992 df1_1993 df1_1994

# 1  1 1991        1        0        0        0

# 2  2 1992        0        1        0        0

# 3  3 1993        0        0        1        0

# 4  4 1994        0        0        0        1

edited Jul 23 at 10:26

answered Oct 31 '16 at 13:34

zx8754

28.5k76394

Maybe adding "fun= factor" in function dummy can help if that is the meaning of the variable.
– Filippo Mazza
Mar 8 '17 at 10:35

@FilippoMazza I prefer to keep them as integer, yes, we could set factor if needed.
– zx8754
Mar 8 '17 at 10:51

how do you remove df1 before each dummy column header names?
– mike
Jun 10 '17 at 22:47

@mike colnames(df1) <- gsub("df1_", "", fixed = TRUE, colnames(df1))
– zx8754
Jun 11 '17 at 5:01

add a comment |

up vote
15
down vote

Package mlr includes createDummyFeatures for this purpose:

library(mlr)

df <- data.frame(var = sample(c("A", "B", "C"), 10, replace = TRUE))

df



# var

# 1    B

# 2    A

# 3    C

# 4    B

# 5    C

# 6    A

# 7    C

# 8    A

# 9    B

# 10   C



createDummyFeatures(df, cols = "var")



# var.A var.B var.C

# 1      0     1     0

# 2      1     0     0

# 3      0     0     1

# 4      0     1     0

# 5      0     0     1

# 6      1     0     0

# 7      0     0     1

# 8      1     0     0

# 9      0     1     0

# 10     0     0     1

createDummyFeatures drops original variable.
https://www.rdocumentation.org/packages/mlr/versions/2.9/topics/createDummyFeatures

answered Nov 10 '16 at 16:54

Enrique Pérez Herrero

1,69321520

1

Enrique, I've tried installing the package, but it doesn't seem to be working after doing library(mlr). I get the following error:«Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) : there is no package called ‘ggvis’ In addition: Warning message: package ‘mlr’ was built under R version 3.2.5 Error: package or namespace load failed for ‘mlr’»
– An old man in the sea.
Apr 13 '17 at 11:17

1

you need to install 'ggvis' first
– Ted Mosby
Jul 26 at 20:01

add a comment |

up vote
9
down vote

What I normally do to work with this kind of dummy variables is:

(1) how do I generate a dummy variable for observation #10, i.e. for year 1957 (value = 1 at 1957 and zero otherwise)

data$factor_year_1 <- factor ( with ( data, ifelse ( ( year == 1957 ), 1 , 0 ) ) )

(2) how do I generate a dummy-variable which is zero before 1957 and takes the value 1 from 1957 and onwards to 2009?

data$factor_year_2 <- factor ( with ( data, ifelse ( ( year < 1957 ), 0 , 1 ) ) )

Then, I can introduce this factor as a dummy variable in my models. For example, to see whether there is a long-term trend in a varible y :

summary ( lm ( y ~ t,  data = data ) )

Hope this helps!

edited May 7 '14 at 16:48

answered Aug 3 '12 at 9:44

Ricardo González-Gil

987

add a comment |

up vote
9
down vote

The other answers here offer direct routes to accomplish this task—one that many models (e.g. lm) will do for you internally anyway. Nonetheless, here are ways to make dummy variables with Max Kuhn's popular caret and recipes packages. While somewhat more verbose, they both scale easily to more complicated situations, and fit neatly into their respective frameworks.

`caret::dummyVars`

With caret, the relevant function is dummyVars, which has a predict method to apply it on a data frame:

df <- data.frame(letter = rep(c('a', 'b', 'c'), each = 2),

                 y = 1:6)



library(caret)



dummy <- dummyVars(~ ., data = df, fullRank = TRUE)



dummy

#> Dummy Variable Object

#> 

#> Formula: ~.

#> 2 variables, 1 factors

#> Variables and levels will be separated by '.'

#> A full rank encoding is used



predict(dummy, df)

#>   letter.b letter.c y

#> 1        0        0 1

#> 2        0        0 2

#> 3        1        0 3

#> 4        1        0 4

#> 5        0        1 5

#> 6        0        1 6

`recipes::step_dummy`

With recipes, the relevant function is step_dummy:

library(recipes)



dummy_recipe <- recipe(y ~ letter, df) %>% 

    step_dummy(letter)



dummy_recipe

#> Data Recipe

#> 

#> Inputs:

#> 

#>       role #variables

#>    outcome          1

#>  predictor          1

#> 

#> Steps:

#> 

#> Dummy variables from letter

Depending on context, extract the data with prep and either bake or juice:

# Prep and bake on new data...

dummy_recipe %>% 

    prep() %>% 

    bake(df)

#> # A tibble: 6 x 3

#>       y letter_b letter_c

#>   <int>    <dbl>    <dbl>

#> 1     1        0        0

#> 2     2        0        0

#> 3     3        1        0

#> 4     4        1        0

#> 5     5        0        1

#> 6     6        0        1



# ...or use `retain = TRUE` and `juice` to extract training data

dummy_recipe %>% 

    prep(retain = TRUE) %>% 

    juice()

#> # A tibble: 6 x 3

#>       y letter_b letter_c

#>   <int>    <dbl>    <dbl>

#> 1     1        0        0

#> 2     2        0        0

#> 3     3        1        0

#> 4     4        1        0

#> 5     5        0        1

#> 6     6        0        1

edited Apr 16 at 19:27

answered Dec 17 '17 at 21:59

alistaire

31k43561

add a comment |

up vote
7
down vote

I read this on the kaggle forum:

#Generate example dataframe with character column

example <- as.data.frame(c("A", "A", "B", "F", "C", "G", "C", "D", "E", "F"))

names(example) <- "strcol"



#For every unique value in the string column, create a new 1/0 column

#This is what Factors do "under-the-hood" automatically when passed to function requiring numeric data

for(level in unique(example$strcol)){

  example[paste("dummy", level, sep = "_")] <- ifelse(example$strcol == level, 1, 0)

}

answered May 16 '15 at 10:37

skpro19

15124

add a comment |

up vote
5
down vote

If you want to get K dummy variables, instead of K-1, try:

dummies = table(1:length(year),as.factor(year))

Best,

answered Mar 27 '15 at 17:45

Fernando Hoces De La Guardia

168413

the resulting table cannot be used as a data.frame. If that's a problem, use as.data.frame.matrix(dummies) to translate it into one
– sheß
Mar 27 at 19:21

add a comment |

up vote
5
down vote

For the usecase as presented in the question, you can also just multiply the logical condition with 1 (or maybe even better, with 1L):

# example data

df1 <- data.frame(yr = 1951:1960)



# create the dummies

df1$is.1957 <- 1L * (df1$yr == 1957)

df1$after.1957 <- 1L * (df1$yr >= 1957)

which gives:

> df1

     yr is.1957 after.1957

1  1951       0          0

2  1952       0          0

3  1953       0          0

4  1954       0          0

5  1955       0          0

6  1956       0          0

7  1957       1          1

8  1958       0          1

9  1959       0          1

10 1960       0          1

For the usecases as presented in for example the answers of @zx8754 and @Sotos, there are still some other options which haven't been covered yet imo.

1) Make your own make_dummies-function

# example data

df2 <- data.frame(id = 1:5, year = c(1991:1994,1992))



# create a function

make_dummies <- function(v, prefix = '') {

  s <- sort(unique(v))

  d <- outer(v, s, function(v, s) 1L * (v == s))

  colnames(d) <- paste0(prefix, s)

  d

}



# bind the dummies to the original dataframe

cbind(df2, make_dummies(df2$year, prefix = 'y'))

which gives:

  id year y1991 y1992 y1993 y1994

1  1 1991     1     0     0     0

2  2 1992     0     1     0     0

3  3 1993     0     0     1     0

4  4 1994     0     0     0     1

5  5 1992     0     1     0     0

2) use the dcast-function from either data.table or reshape2

 dcast(df2, id + year ~ year, fun.aggregate = length)

which gives:

  id year 1991 1992 1993 1994

1  1 1991    1    0    0    0

2  2 1992    0    1    0    0

3  3 1993    0    0    1    0

4  4 1994    0    0    0    1

5  5 1992    0    1    0    0

However, this will not work when there are duplicate values in the column for which the dummies have to be created. In the case a specific aggregation function is needed for dcast and the result of of dcast need to be merged back to the original:

# example data

df3 <- data.frame(var = c("B", "C", "A", "B", "C"))



# aggregation function to get dummy values

f <- function(x) as.integer(length(x) > 0)



# reshape to wide with the cumstom aggregation function and merge back to the original

merge(df3, dcast(df3, var ~ var, fun.aggregate = f), by = 'var', all.x = TRUE)

which gives (note that the result is order according to the by column):

  var A B C

1   A 1 0 0

2   B 0 1 0

3   B 0 1 0

4   C 0 0 1

5   C 0 0 1

3) use the spread-function from tidyr (with mutate from dplyr)

library(dplyr)

library(tidyr)



df2 %>% 

  mutate(v = 1, yr = year) %>% 

  spread(yr, v, fill = 0)

which gives:

  id year 1991 1992 1993 1994

1  1 1991    1    0    0    0

2  2 1992    0    1    0    0

3  3 1993    0    0    1    0

4  4 1994    0    0    0    1

5  5 1992    0    1    0    0

edited Jul 8 at 17:30

answered Feb 13 at 18:38

Jaap

53.9k20116127

add a comment |

up vote
4
down vote

The ifelse function is best for simple logic like this.

> x <- seq(1950, 1960, 1)



    ifelse(x == 1957, 1, 0)

    ifelse(x <= 1957, 1, 0)



>  [1] 0 0 0 0 0 0 0 1 0 0 0

>  [1] 1 1 1 1 1 1 1 1 0 0 0

Also, if you want it to return character data then you can do so.

> x <- seq(1950, 1960, 1)



    ifelse(x == 1957, "foo", "bar")

    ifelse(x <= 1957, "foo", "bar")



>  [1] "bar" "bar" "bar" "bar" "bar" "bar" "bar" "foo" "bar" "bar" "bar"

>  [1] "foo" "foo" "foo" "foo" "foo" "foo" "foo" "foo" "bar" "bar" "bar"

Categorical variables with nesting...

> x <- seq(1950, 1960, 1)



    ifelse(x == 1957, "foo", ifelse(x == 1958, "bar","baz"))



>  [1] "baz" "baz" "baz" "baz" "baz" "baz" "baz" "foo" "bar" "baz" "baz"

This is the most straightforward option.

edited Dec 9 '15 at 22:52

answered Dec 9 '15 at 22:41

Alex Thompson

35818

add a comment |

up vote
2
down vote

Another way is to use mtabulate from qdapTools package, i.e.

df <- data.frame(var = sample(c("A", "B", "C"), 5, replace = TRUE))

  var

#1   C

#2   A

#3   C

#4   B

#5   B



library(qdapTools)

mtabulate(df$var)

which gives,

answered Oct 6 '17 at 6:32

Sotos

26.9k51540

add a comment |

up vote
1
down vote

I use such a function (for data.table):

# Ta funkcja dla obiektu data.table i zmiennej var.name typu factor tworzy dummy variables o nazwach "var.name: (level1)"

factorToDummy <- function(dtable, var.name){

  stopifnot(is.data.table(dtable))

  stopifnot(var.name %in% names(dtable))

  stopifnot(is.factor(dtable[, get(var.name)]))



  dtable[, paste0(var.name,": ",levels(get(var.name)))] -> new.names

  dtable[, (new.names) := transpose(lapply(get(var.name), FUN = function(x){x == levels(get(var.name))})) ]



  cat(paste("nDodano zmienne dummy: ", paste0(new.names, collapse = ", ")))

}

Usage:

data <- data.table(data)

data[, x:= droplevels(x)]

factorToDummy(data, "x")

edited Aug 18 '15 at 9:58

answered Aug 18 '15 at 9:50

Maciej Mozolewski

113

add a comment |

up vote
1
down vote

Convert your data to a data.table and use set by reference and row filtering

library(data.table)



dt <- as.data.table(your.dataframe.or.whatever)

dt[, is.1957 := 0]

dt[year == 1957, is.1957 := 1]

Proof-of-concept toy example:

library(data.table)



dt <- as.data.table(cbind(c(1, 1, 1), c(2, 2, 3)))

dt[, is.3 := 0]

dt[V2 == 3, is.3 := 1]

edited May 9 at 23:31

answered Feb 15 at 3:48

wordsforthewise

2,97722446

add a comment |

up vote
0
down vote

Hi i wrote this general function to generate a dummy variable which essentially replicates the replace function in Stata.

If x is the data frame is x and i want a dummy variable called a which will take value 1 when x$b takes value c

introducedummy<-function(x,a,b,c){

   g<-c(a,b,c)

  n<-nrow(x)

  newcol<-g[1]

  p<-colnames(x)

  p2<-c(p,newcol)

  new1<-numeric(n)

  state<-x[,g[2]]

  interest<-g[3]

  for(i in 1:n){

    if(state[i]==interest){

      new1[i]=1

    }

    else{

      new1[i]=0

    }

  }

    x$added<-new1

    colnames(x)<-p2

    x

  }

edited Feb 6 '15 at 18:00

mor

2,2171228

answered Feb 6 '15 at 17:18

kangkan Dc

465

add a comment |

up vote
0
down vote

another way you can do it is use

ifelse(year < 1965 , 1, 0)

edited May 9 at 23:54

dee-see

18.6k34479

answered May 9 at 21:09

Sophia J

4910

add a comment |

protected by Jaap Oct 16 '17 at 9:47

Thank you for your interest in this question.
Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).

Would you like to answer one of these unanswered questions instead?

15 Answers
15

active

oldest

votes

15 Answers
15

active

oldest

votes

up vote
84
down vote

Another option that can work better if you have many variables is factor and model.matrix.

> year.f = factor(year)

> dummies = model.matrix(~year.f)

This will include an intercept column (all ones) and one column for each of the years in your data set except one, which will be the "default" or intercept value.

You can change how the "default" is chosen by messing with contrasts.arg in model.matrix.

Also, if you want to omit the intercept, you can just drop the first column or add +0 to the end of the formula.

Hope this is useful.

edited Jun 24 at 13:37

answered Aug 3 '12 at 1:24

David J. Harris

985610

4

what if you want to generate dummy variables for all (instead of k-1) with no intercept?
– Fernando Hoces De La Guardia
Mar 27 '15 at 16:52

1

note that model.matrix( ) accepts multiple variables to transform into dummies: model.matrix( ~ var1 + var2, data = df) Again, just be sure that they are factors.
– slizb
May 1 '15 at 19:32

3

@Synergist table(1:n, factor). Where factor is the original variable and n is its length
– Fernando Hoces De La Guardia
Jun 3 '15 at 15:43

1

@Synergist that table is a n x k matrix with all k indicator variables (instead of k-1)
– Fernando Hoces De La Guardia
Jun 3 '15 at 15:49

4

@FernandoHocesDeLaGuardia You can remove the intercept from a formula either with + 0 or - 1. So model.matrix(~ year.f + 0) will give a give dummy variables without a reference level.
– Gregor
Jan 6 '16 at 20:16

|
show 5 more comments

up vote
84
down vote

Another option that can work better if you have many variables is factor and model.matrix.

> year.f = factor(year)

> dummies = model.matrix(~year.f)

This will include an intercept column (all ones) and one column for each of the years in your data set except one, which will be the "default" or intercept value.

You can change how the "default" is chosen by messing with contrasts.arg in model.matrix.

Also, if you want to omit the intercept, you can just drop the first column or add +0 to the end of the formula.

Hope this is useful.

edited Jun 24 at 13:37

answered Aug 3 '12 at 1:24

David J. Harris

985610

4

what if you want to generate dummy variables for all (instead of k-1) with no intercept?
– Fernando Hoces De La Guardia
Mar 27 '15 at 16:52

1

note that model.matrix( ) accepts multiple variables to transform into dummies: model.matrix( ~ var1 + var2, data = df) Again, just be sure that they are factors.
– slizb
May 1 '15 at 19:32

3

@Synergist table(1:n, factor). Where factor is the original variable and n is its length
– Fernando Hoces De La Guardia
Jun 3 '15 at 15:43

1

@Synergist that table is a n x k matrix with all k indicator variables (instead of k-1)
– Fernando Hoces De La Guardia
Jun 3 '15 at 15:49

4

@FernandoHocesDeLaGuardia You can remove the intercept from a formula either with + 0 or - 1. So model.matrix(~ year.f + 0) will give a give dummy variables without a reference level.
– Gregor
Jan 6 '16 at 20:16

|
show 5 more comments

up vote
84
down vote

Another option that can work better if you have many variables is factor and model.matrix.

> year.f = factor(year)

> dummies = model.matrix(~year.f)

This will include an intercept column (all ones) and one column for each of the years in your data set except one, which will be the "default" or intercept value.

You can change how the "default" is chosen by messing with contrasts.arg in model.matrix.

Also, if you want to omit the intercept, you can just drop the first column or add +0 to the end of the formula.

Hope this is useful.

edited Jun 24 at 13:37

answered Aug 3 '12 at 1:24

David J. Harris

985610

Another option that can work better if you have many variables is factor and model.matrix.

> year.f = factor(year)

> dummies = model.matrix(~year.f)

This will include an intercept column (all ones) and one column for each of the years in your data set except one, which will be the "default" or intercept value.

You can change how the "default" is chosen by messing with contrasts.arg in model.matrix.

Also, if you want to omit the intercept, you can just drop the first column or add +0 to the end of the formula.

Hope this is useful.

edited Jun 24 at 13:37

answered Aug 3 '12 at 1:24

David J. Harris

985610

edited Jun 24 at 13:37

answered Aug 3 '12 at 1:24

David J. Harris

985610

answered Aug 3 '12 at 1:24

David J. Harris

985610

answered Aug 3 '12 at 1:24

David J. Harris

985610

4

what if you want to generate dummy variables for all (instead of k-1) with no intercept?
– Fernando Hoces De La Guardia
Mar 27 '15 at 16:52

1

note that model.matrix( ) accepts multiple variables to transform into dummies: model.matrix( ~ var1 + var2, data = df) Again, just be sure that they are factors.
– slizb
May 1 '15 at 19:32

3

@Synergist table(1:n, factor). Where factor is the original variable and n is its length
– Fernando Hoces De La Guardia
Jun 3 '15 at 15:43

1

@Synergist that table is a n x k matrix with all k indicator variables (instead of k-1)
– Fernando Hoces De La Guardia
Jun 3 '15 at 15:49

4

@FernandoHocesDeLaGuardia You can remove the intercept from a formula either with + 0 or - 1. So model.matrix(~ year.f + 0) will give a give dummy variables without a reference level.
– Gregor
Jan 6 '16 at 20:16

|
show 5 more comments

4

what if you want to generate dummy variables for all (instead of k-1) with no intercept?
– Fernando Hoces De La Guardia
Mar 27 '15 at 16:52

1

note that model.matrix( ) accepts multiple variables to transform into dummies: model.matrix( ~ var1 + var2, data = df) Again, just be sure that they are factors.
– slizb
May 1 '15 at 19:32

3

@Synergist table(1:n, factor). Where factor is the original variable and n is its length
– Fernando Hoces De La Guardia
Jun 3 '15 at 15:43

1

@Synergist that table is a n x k matrix with all k indicator variables (instead of k-1)
– Fernando Hoces De La Guardia
Jun 3 '15 at 15:49

4

@FernandoHocesDeLaGuardia You can remove the intercept from a formula either with + 0 or - 1. So model.matrix(~ year.f + 0) will give a give dummy variables without a reference level.
– Gregor
Jan 6 '16 at 20:16

what if you want to generate dummy variables for all (instead of k-1) with no intercept?
– Fernando Hoces De La Guardia
Mar 27 '15 at 16:52

note that model.matrix( ) accepts multiple variables to transform into dummies: model.matrix( ~ var1 + var2, data = df) Again, just be sure that they are factors.
– slizb
May 1 '15 at 19:32

@Synergist table(1:n, factor). Where factor is the original variable and n is its length
– Fernando Hoces De La Guardia
Jun 3 '15 at 15:43

@Synergist that table is a n x k matrix with all k indicator variables (instead of k-1)
– Fernando Hoces De La Guardia
Jun 3 '15 at 15:49

@FernandoHocesDeLaGuardia You can remove the intercept from a formula either with + 0 or - 1. So model.matrix(~ year.f + 0) will give a give dummy variables without a reference level.
– Gregor
Jan 6 '16 at 20:16

|
show 5 more comments

up vote
45
down vote

The simplest way to produce these dummy variables is something like the following:

> print(year)

[1] 1956 1957 1957 1958 1958 1959

> dummy <- as.numeric(year == 1957)

> print(dummy)

[1] 0 1 1 0 0 0

> dummy2 <- as.numeric(year >= 1957)

> print(dummy2)

[1] 0 1 1 1 1 1

answered Aug 2 '12 at 23:38

Martin O'Leary

91169

add a comment |

up vote
45
down vote

The simplest way to produce these dummy variables is something like the following:

> print(year)

[1] 1956 1957 1957 1958 1958 1959

> dummy <- as.numeric(year == 1957)

> print(dummy)

[1] 0 1 1 0 0 0

> dummy2 <- as.numeric(year >= 1957)

> print(dummy2)

[1] 0 1 1 1 1 1

answered Aug 2 '12 at 23:38

Martin O'Leary

91169

add a comment |

up vote
45
down vote

The simplest way to produce these dummy variables is something like the following:

> print(year)

[1] 1956 1957 1957 1958 1958 1959

> dummy <- as.numeric(year == 1957)

> print(dummy)

[1] 0 1 1 0 0 0

> dummy2 <- as.numeric(year >= 1957)

> print(dummy2)

[1] 0 1 1 1 1 1

answered Aug 2 '12 at 23:38

Martin O'Leary

91169

The simplest way to produce these dummy variables is something like the following:

> print(year)

[1] 1956 1957 1957 1958 1958 1959

> dummy <- as.numeric(year == 1957)

> print(dummy)

[1] 0 1 1 0 0 0

> dummy2 <- as.numeric(year >= 1957)

> print(dummy2)

[1] 0 1 1 1 1 1

answered Aug 2 '12 at 23:38

Martin O'Leary

91169

answered Aug 2 '12 at 23:38

Martin O'Leary

91169

answered Aug 2 '12 at 23:38

Martin O'Leary

91169

answered Aug 2 '12 at 23:38

Martin O'Leary

91169

add a comment |

up vote
29
down vote

Using dummies::dummy():

library(dummies)



# example data

df1 <- data.frame(id = 1:4, year = 1991:1994)



df1 <- cbind(df1, dummy(df1$year, sep = "_"))



df1

#   id year df1_1991 df1_1992 df1_1993 df1_1994

# 1  1 1991        1        0        0        0

# 2  2 1992        0        1        0        0

# 3  3 1993        0        0        1        0

# 4  4 1994        0        0        0        1

edited Jul 23 at 10:26

answered Oct 31 '16 at 13:34

zx8754

28.5k76394

Maybe adding "fun= factor" in function dummy can help if that is the meaning of the variable.
– Filippo Mazza
Mar 8 '17 at 10:35

@FilippoMazza I prefer to keep them as integer, yes, we could set factor if needed.
– zx8754
Mar 8 '17 at 10:51

how do you remove df1 before each dummy column header names?
– mike
Jun 10 '17 at 22:47

@mike colnames(df1) <- gsub("df1_", "", fixed = TRUE, colnames(df1))
– zx8754
Jun 11 '17 at 5:01

add a comment |

up vote
29
down vote

Using dummies::dummy():

library(dummies)



# example data

df1 <- data.frame(id = 1:4, year = 1991:1994)



df1 <- cbind(df1, dummy(df1$year, sep = "_"))



df1

#   id year df1_1991 df1_1992 df1_1993 df1_1994

# 1  1 1991        1        0        0        0

# 2  2 1992        0        1        0        0

# 3  3 1993        0        0        1        0

# 4  4 1994        0        0        0        1

edited Jul 23 at 10:26

answered Oct 31 '16 at 13:34

zx8754

28.5k76394

Maybe adding "fun= factor" in function dummy can help if that is the meaning of the variable.
– Filippo Mazza
Mar 8 '17 at 10:35

@FilippoMazza I prefer to keep them as integer, yes, we could set factor if needed.
– zx8754
Mar 8 '17 at 10:51

how do you remove df1 before each dummy column header names?
– mike
Jun 10 '17 at 22:47

@mike colnames(df1) <- gsub("df1_", "", fixed = TRUE, colnames(df1))
– zx8754
Jun 11 '17 at 5:01

add a comment |

up vote
29
down vote

Using dummies::dummy():

library(dummies)



# example data

df1 <- data.frame(id = 1:4, year = 1991:1994)



df1 <- cbind(df1, dummy(df1$year, sep = "_"))



df1

#   id year df1_1991 df1_1992 df1_1993 df1_1994

# 1  1 1991        1        0        0        0

# 2  2 1992        0        1        0        0

# 3  3 1993        0        0        1        0

# 4  4 1994        0        0        0        1

edited Jul 23 at 10:26

answered Oct 31 '16 at 13:34

zx8754

28.5k76394

Using dummies::dummy():

library(dummies)



# example data

df1 <- data.frame(id = 1:4, year = 1991:1994)



df1 <- cbind(df1, dummy(df1$year, sep = "_"))



df1

#   id year df1_1991 df1_1992 df1_1993 df1_1994

# 1  1 1991        1        0        0        0

# 2  2 1992        0        1        0        0

# 3  3 1993        0        0        1        0

# 4  4 1994        0        0        0        1

edited Jul 23 at 10:26

answered Oct 31 '16 at 13:34

zx8754

28.5k76394

edited Jul 23 at 10:26

answered Oct 31 '16 at 13:34

zx8754

28.5k76394

answered Oct 31 '16 at 13:34

zx8754

28.5k76394

answered Oct 31 '16 at 13:34

zx8754

28.5k76394

Maybe adding "fun= factor" in function dummy can help if that is the meaning of the variable.
– Filippo Mazza
Mar 8 '17 at 10:35

@FilippoMazza I prefer to keep them as integer, yes, we could set factor if needed.
– zx8754
Mar 8 '17 at 10:51

how do you remove df1 before each dummy column header names?
– mike
Jun 10 '17 at 22:47

@mike colnames(df1) <- gsub("df1_", "", fixed = TRUE, colnames(df1))
– zx8754
Jun 11 '17 at 5:01

add a comment |

Maybe adding "fun= factor" in function dummy can help if that is the meaning of the variable.
– Filippo Mazza
Mar 8 '17 at 10:35

@FilippoMazza I prefer to keep them as integer, yes, we could set factor if needed.
– zx8754
Mar 8 '17 at 10:51

how do you remove df1 before each dummy column header names?
– mike
Jun 10 '17 at 22:47

@mike colnames(df1) <- gsub("df1_", "", fixed = TRUE, colnames(df1))
– zx8754
Jun 11 '17 at 5:01

Maybe adding "fun= factor" in function dummy can help if that is the meaning of the variable.
– Filippo Mazza
Mar 8 '17 at 10:35

@FilippoMazza I prefer to keep them as integer, yes, we could set factor if needed.
– zx8754
Mar 8 '17 at 10:51

how do you remove df1 before each dummy column header names?
– mike
Jun 10 '17 at 22:47

@mike colnames(df1) <- gsub("df1_", "", fixed = TRUE, colnames(df1))
– zx8754
Jun 11 '17 at 5:01

add a comment |

up vote
15
down vote

Package mlr includes createDummyFeatures for this purpose:

library(mlr)

df <- data.frame(var = sample(c("A", "B", "C"), 10, replace = TRUE))

df



# var

# 1    B

# 2    A

# 3    C

# 4    B

# 5    C

# 6    A

# 7    C

# 8    A

# 9    B

# 10   C



createDummyFeatures(df, cols = "var")



# var.A var.B var.C

# 1      0     1     0

# 2      1     0     0

# 3      0     0     1

# 4      0     1     0

# 5      0     0     1

# 6      1     0     0

# 7      0     0     1

# 8      1     0     0

# 9      0     1     0

# 10     0     0     1

createDummyFeatures drops original variable.
https://www.rdocumentation.org/packages/mlr/versions/2.9/topics/createDummyFeatures

answered Nov 10 '16 at 16:54

Enrique Pérez Herrero

1,69321520

1

Enrique, I've tried installing the package, but it doesn't seem to be working after doing library(mlr). I get the following error:«Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) : there is no package called ‘ggvis’ In addition: Warning message: package ‘mlr’ was built under R version 3.2.5 Error: package or namespace load failed for ‘mlr’»
– An old man in the sea.
Apr 13 '17 at 11:17

1

you need to install 'ggvis' first
– Ted Mosby
Jul 26 at 20:01

add a comment |

up vote
15
down vote

Package mlr includes createDummyFeatures for this purpose:

library(mlr)

df <- data.frame(var = sample(c("A", "B", "C"), 10, replace = TRUE))

df



# var

# 1    B

# 2    A

# 3    C

# 4    B

# 5    C

# 6    A

# 7    C

# 8    A

# 9    B

# 10   C



createDummyFeatures(df, cols = "var")



# var.A var.B var.C

# 1      0     1     0

# 2      1     0     0

# 3      0     0     1

# 4      0     1     0

# 5      0     0     1

# 6      1     0     0

# 7      0     0     1

# 8      1     0     0

# 9      0     1     0

# 10     0     0     1

createDummyFeatures drops original variable.
https://www.rdocumentation.org/packages/mlr/versions/2.9/topics/createDummyFeatures

answered Nov 10 '16 at 16:54

Enrique Pérez Herrero

1,69321520

1

Enrique, I've tried installing the package, but it doesn't seem to be working after doing library(mlr). I get the following error:«Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) : there is no package called ‘ggvis’ In addition: Warning message: package ‘mlr’ was built under R version 3.2.5 Error: package or namespace load failed for ‘mlr’»
– An old man in the sea.
Apr 13 '17 at 11:17

1

you need to install 'ggvis' first
– Ted Mosby
Jul 26 at 20:01

add a comment |

up vote
15
down vote

Package mlr includes createDummyFeatures for this purpose:

library(mlr)

df <- data.frame(var = sample(c("A", "B", "C"), 10, replace = TRUE))

df



# var

# 1    B

# 2    A

# 3    C

# 4    B

# 5    C

# 6    A

# 7    C

# 8    A

# 9    B

# 10   C



createDummyFeatures(df, cols = "var")



# var.A var.B var.C

# 1      0     1     0

# 2      1     0     0

# 3      0     0     1

# 4      0     1     0

# 5      0     0     1

# 6      1     0     0

# 7      0     0     1

# 8      1     0     0

# 9      0     1     0

# 10     0     0     1

createDummyFeatures drops original variable.
https://www.rdocumentation.org/packages/mlr/versions/2.9/topics/createDummyFeatures

answered Nov 10 '16 at 16:54

Enrique Pérez Herrero

1,69321520

Package mlr includes createDummyFeatures for this purpose:

library(mlr)

df <- data.frame(var = sample(c("A", "B", "C"), 10, replace = TRUE))

df



# var

# 1    B

# 2    A

# 3    C

# 4    B

# 5    C

# 6    A

# 7    C

# 8    A

# 9    B

# 10   C



createDummyFeatures(df, cols = "var")



# var.A var.B var.C

# 1      0     1     0

# 2      1     0     0

# 3      0     0     1

# 4      0     1     0

# 5      0     0     1

# 6      1     0     0

# 7      0     0     1

# 8      1     0     0

# 9      0     1     0

# 10     0     0     1

createDummyFeatures drops original variable.
https://www.rdocumentation.org/packages/mlr/versions/2.9/topics/createDummyFeatures

answered Nov 10 '16 at 16:54

Enrique Pérez Herrero

1,69321520

answered Nov 10 '16 at 16:54

Enrique Pérez Herrero

1,69321520

answered Nov 10 '16 at 16:54

Enrique Pérez Herrero

1,69321520

answered Nov 10 '16 at 16:54

Enrique Pérez Herrero

1,69321520

1

Enrique, I've tried installing the package, but it doesn't seem to be working after doing library(mlr). I get the following error:«Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) : there is no package called ‘ggvis’ In addition: Warning message: package ‘mlr’ was built under R version 3.2.5 Error: package or namespace load failed for ‘mlr’»
– An old man in the sea.
Apr 13 '17 at 11:17

1

you need to install 'ggvis' first
– Ted Mosby
Jul 26 at 20:01

add a comment |

1

Enrique, I've tried installing the package, but it doesn't seem to be working after doing library(mlr). I get the following error:«Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) : there is no package called ‘ggvis’ In addition: Warning message: package ‘mlr’ was built under R version 3.2.5 Error: package or namespace load failed for ‘mlr’»
– An old man in the sea.
Apr 13 '17 at 11:17

1

you need to install 'ggvis' first
– Ted Mosby
Jul 26 at 20:01

Enrique, I've tried installing the package, but it doesn't seem to be working after doing library(mlr). I get the following error:«Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) : there is no package called ‘ggvis’ In addition: Warning message: package ‘mlr’ was built under R version 3.2.5 Error: package or namespace load failed for ‘mlr’»
– An old man in the sea.
Apr 13 '17 at 11:17

you need to install 'ggvis' first
– Ted Mosby
Jul 26 at 20:01

add a comment |

up vote
9
down vote

What I normally do to work with this kind of dummy variables is:

(1) how do I generate a dummy variable for observation #10, i.e. for year 1957 (value = 1 at 1957 and zero otherwise)

data$factor_year_1 <- factor ( with ( data, ifelse ( ( year == 1957 ), 1 , 0 ) ) )

(2) how do I generate a dummy-variable which is zero before 1957 and takes the value 1 from 1957 and onwards to 2009?

data$factor_year_2 <- factor ( with ( data, ifelse ( ( year < 1957 ), 0 , 1 ) ) )

Then, I can introduce this factor as a dummy variable in my models. For example, to see whether there is a long-term trend in a varible y :

summary ( lm ( y ~ t,  data = data ) )

Hope this helps!

edited May 7 '14 at 16:48

answered Aug 3 '12 at 9:44

Ricardo González-Gil

987

add a comment |

up vote
9
down vote

What I normally do to work with this kind of dummy variables is:

(1) how do I generate a dummy variable for observation #10, i.e. for year 1957 (value = 1 at 1957 and zero otherwise)

data$factor_year_1 <- factor ( with ( data, ifelse ( ( year == 1957 ), 1 , 0 ) ) )

(2) how do I generate a dummy-variable which is zero before 1957 and takes the value 1 from 1957 and onwards to 2009?

data$factor_year_2 <- factor ( with ( data, ifelse ( ( year < 1957 ), 0 , 1 ) ) )

Then, I can introduce this factor as a dummy variable in my models. For example, to see whether there is a long-term trend in a varible y :

summary ( lm ( y ~ t,  data = data ) )

Hope this helps!

edited May 7 '14 at 16:48

answered Aug 3 '12 at 9:44

Ricardo González-Gil

987

add a comment |

up vote
9
down vote

What I normally do to work with this kind of dummy variables is:

(1) how do I generate a dummy variable for observation #10, i.e. for year 1957 (value = 1 at 1957 and zero otherwise)

data$factor_year_1 <- factor ( with ( data, ifelse ( ( year == 1957 ), 1 , 0 ) ) )

(2) how do I generate a dummy-variable which is zero before 1957 and takes the value 1 from 1957 and onwards to 2009?

data$factor_year_2 <- factor ( with ( data, ifelse ( ( year < 1957 ), 0 , 1 ) ) )

Then, I can introduce this factor as a dummy variable in my models. For example, to see whether there is a long-term trend in a varible y :

summary ( lm ( y ~ t,  data = data ) )

Hope this helps!

edited May 7 '14 at 16:48

answered Aug 3 '12 at 9:44

Ricardo González-Gil

987

What I normally do to work with this kind of dummy variables is:

(1) how do I generate a dummy variable for observation #10, i.e. for year 1957 (value = 1 at 1957 and zero otherwise)

data$factor_year_1 <- factor ( with ( data, ifelse ( ( year == 1957 ), 1 , 0 ) ) )

(2) how do I generate a dummy-variable which is zero before 1957 and takes the value 1 from 1957 and onwards to 2009?

data$factor_year_2 <- factor ( with ( data, ifelse ( ( year < 1957 ), 0 , 1 ) ) )

Then, I can introduce this factor as a dummy variable in my models. For example, to see whether there is a long-term trend in a varible y :

summary ( lm ( y ~ t,  data = data ) )

Hope this helps!

edited May 7 '14 at 16:48

answered Aug 3 '12 at 9:44

Ricardo González-Gil

987

edited May 7 '14 at 16:48

answered Aug 3 '12 at 9:44

Ricardo González-Gil

987

answered Aug 3 '12 at 9:44

Ricardo González-Gil

987

answered Aug 3 '12 at 9:44

Ricardo González-Gil

987

add a comment |

up vote
9
down vote

`caret::dummyVars`

With caret, the relevant function is dummyVars, which has a predict method to apply it on a data frame:

df <- data.frame(letter = rep(c('a', 'b', 'c'), each = 2),

                 y = 1:6)



library(caret)



dummy <- dummyVars(~ ., data = df, fullRank = TRUE)



dummy

#> Dummy Variable Object

#> 

#> Formula: ~.

#> 2 variables, 1 factors

#> Variables and levels will be separated by '.'

#> A full rank encoding is used



predict(dummy, df)

#>   letter.b letter.c y

#> 1        0        0 1

#> 2        0        0 2

#> 3        1        0 3

#> 4        1        0 4

#> 5        0        1 5

#> 6        0        1 6

`recipes::step_dummy`

With recipes, the relevant function is step_dummy:

library(recipes)



dummy_recipe <- recipe(y ~ letter, df) %>% 

    step_dummy(letter)



dummy_recipe

#> Data Recipe

#> 

#> Inputs:

#> 

#>       role #variables

#>    outcome          1

#>  predictor          1

#> 

#> Steps:

#> 

#> Dummy variables from letter

Depending on context, extract the data with prep and either bake or juice:

# Prep and bake on new data...

dummy_recipe %>% 

    prep() %>% 

    bake(df)

#> # A tibble: 6 x 3

#>       y letter_b letter_c

#>   <int>    <dbl>    <dbl>

#> 1     1        0        0

#> 2     2        0        0

#> 3     3        1        0

#> 4     4        1        0

#> 5     5        0        1

#> 6     6        0        1



# ...or use `retain = TRUE` and `juice` to extract training data

dummy_recipe %>% 

    prep(retain = TRUE) %>% 

    juice()

#> # A tibble: 6 x 3

#>       y letter_b letter_c

#>   <int>    <dbl>    <dbl>

#> 1     1        0        0

#> 2     2        0        0

#> 3     3        1        0

#> 4     4        1        0

#> 5     5        0        1

#> 6     6        0        1

edited Apr 16 at 19:27

answered Dec 17 '17 at 21:59

alistaire

31k43561

add a comment |

up vote
9
down vote

`caret::dummyVars`

With caret, the relevant function is dummyVars, which has a predict method to apply it on a data frame:

df <- data.frame(letter = rep(c('a', 'b', 'c'), each = 2),

                 y = 1:6)



library(caret)



dummy <- dummyVars(~ ., data = df, fullRank = TRUE)



dummy

#> Dummy Variable Object

#> 

#> Formula: ~.

#> 2 variables, 1 factors

#> Variables and levels will be separated by '.'

#> A full rank encoding is used



predict(dummy, df)

#>   letter.b letter.c y

#> 1        0        0 1

#> 2        0        0 2

#> 3        1        0 3

#> 4        1        0 4

#> 5        0        1 5

#> 6        0        1 6

`recipes::step_dummy`

With recipes, the relevant function is step_dummy:

library(recipes)



dummy_recipe <- recipe(y ~ letter, df) %>% 

    step_dummy(letter)



dummy_recipe

#> Data Recipe

#> 

#> Inputs:

#> 

#>       role #variables

#>    outcome          1

#>  predictor          1

#> 

#> Steps:

#> 

#> Dummy variables from letter

Depending on context, extract the data with prep and either bake or juice:

# Prep and bake on new data...

dummy_recipe %>% 

    prep() %>% 

    bake(df)

#> # A tibble: 6 x 3

#>       y letter_b letter_c

#>   <int>    <dbl>    <dbl>

#> 1     1        0        0

#> 2     2        0        0

#> 3     3        1        0

#> 4     4        1        0

#> 5     5        0        1

#> 6     6        0        1



# ...or use `retain = TRUE` and `juice` to extract training data

dummy_recipe %>% 

    prep(retain = TRUE) %>% 

    juice()

#> # A tibble: 6 x 3

#>       y letter_b letter_c

#>   <int>    <dbl>    <dbl>

#> 1     1        0        0

#> 2     2        0        0

#> 3     3        1        0

#> 4     4        1        0

#> 5     5        0        1

#> 6     6        0        1

edited Apr 16 at 19:27

answered Dec 17 '17 at 21:59

alistaire

31k43561

add a comment |

up vote
9
down vote

`caret::dummyVars`

With caret, the relevant function is dummyVars, which has a predict method to apply it on a data frame:

df <- data.frame(letter = rep(c('a', 'b', 'c'), each = 2),

                 y = 1:6)



library(caret)



dummy <- dummyVars(~ ., data = df, fullRank = TRUE)



dummy

#> Dummy Variable Object

#> 

#> Formula: ~.

#> 2 variables, 1 factors

#> Variables and levels will be separated by '.'

#> A full rank encoding is used



predict(dummy, df)

#>   letter.b letter.c y

#> 1        0        0 1

#> 2        0        0 2

#> 3        1        0 3

#> 4        1        0 4

#> 5        0        1 5

#> 6        0        1 6

`recipes::step_dummy`

With recipes, the relevant function is step_dummy:

library(recipes)



dummy_recipe <- recipe(y ~ letter, df) %>% 

    step_dummy(letter)



dummy_recipe

#> Data Recipe

#> 

#> Inputs:

#> 

#>       role #variables

#>    outcome          1

#>  predictor          1

#> 

#> Steps:

#> 

#> Dummy variables from letter

Depending on context, extract the data with prep and either bake or juice:

# Prep and bake on new data...

dummy_recipe %>% 

    prep() %>% 

    bake(df)

#> # A tibble: 6 x 3

#>       y letter_b letter_c

#>   <int>    <dbl>    <dbl>

#> 1     1        0        0

#> 2     2        0        0

#> 3     3        1        0

#> 4     4        1        0

#> 5     5        0        1

#> 6     6        0        1



# ...or use `retain = TRUE` and `juice` to extract training data

dummy_recipe %>% 

    prep(retain = TRUE) %>% 

    juice()

#> # A tibble: 6 x 3

#>       y letter_b letter_c

#>   <int>    <dbl>    <dbl>

#> 1     1        0        0

#> 2     2        0        0

#> 3     3        1        0

#> 4     4        1        0

#> 5     5        0        1

#> 6     6        0        1

edited Apr 16 at 19:27

answered Dec 17 '17 at 21:59

alistaire

31k43561

`caret::dummyVars`

With caret, the relevant function is dummyVars, which has a predict method to apply it on a data frame:

df <- data.frame(letter = rep(c('a', 'b', 'c'), each = 2),

                 y = 1:6)



library(caret)



dummy <- dummyVars(~ ., data = df, fullRank = TRUE)



dummy

#> Dummy Variable Object

#> 

#> Formula: ~.

#> 2 variables, 1 factors

#> Variables and levels will be separated by '.'

#> A full rank encoding is used



predict(dummy, df)

#>   letter.b letter.c y

#> 1        0        0 1

#> 2        0        0 2

#> 3        1        0 3

#> 4        1        0 4

#> 5        0        1 5

#> 6        0        1 6

`recipes::step_dummy`

With recipes, the relevant function is step_dummy:

library(recipes)



dummy_recipe <- recipe(y ~ letter, df) %>% 

    step_dummy(letter)



dummy_recipe

#> Data Recipe

#> 

#> Inputs:

#> 

#>       role #variables

#>    outcome          1

#>  predictor          1

#> 

#> Steps:

#> 

#> Dummy variables from letter

Depending on context, extract the data with prep and either bake or juice:

# Prep and bake on new data...

dummy_recipe %>% 

    prep() %>% 

    bake(df)

#> # A tibble: 6 x 3

#>       y letter_b letter_c

#>   <int>    <dbl>    <dbl>

#> 1     1        0        0

#> 2     2        0        0

#> 3     3        1        0

#> 4     4        1        0

#> 5     5        0        1

#> 6     6        0        1



# ...or use `retain = TRUE` and `juice` to extract training data

dummy_recipe %>% 

    prep(retain = TRUE) %>% 

    juice()

#> # A tibble: 6 x 3

#>       y letter_b letter_c

#>   <int>    <dbl>    <dbl>

#> 1     1        0        0

#> 2     2        0        0

#> 3     3        1        0

#> 4     4        1        0

#> 5     5        0        1

#> 6     6        0        1

edited Apr 16 at 19:27

answered Dec 17 '17 at 21:59

alistaire

31k43561

edited Apr 16 at 19:27

answered Dec 17 '17 at 21:59

alistaire

31k43561

answered Dec 17 '17 at 21:59

alistaire

31k43561

answered Dec 17 '17 at 21:59

alistaire

31k43561

add a comment |

up vote
7
down vote

I read this on the kaggle forum:

#Generate example dataframe with character column

example <- as.data.frame(c("A", "A", "B", "F", "C", "G", "C", "D", "E", "F"))

names(example) <- "strcol"



#For every unique value in the string column, create a new 1/0 column

#This is what Factors do "under-the-hood" automatically when passed to function requiring numeric data

for(level in unique(example$strcol)){

  example[paste("dummy", level, sep = "_")] <- ifelse(example$strcol == level, 1, 0)

}

answered May 16 '15 at 10:37

skpro19

15124

add a comment |

up vote
7
down vote

I read this on the kaggle forum:

#Generate example dataframe with character column

example <- as.data.frame(c("A", "A", "B", "F", "C", "G", "C", "D", "E", "F"))

names(example) <- "strcol"



#For every unique value in the string column, create a new 1/0 column

#This is what Factors do "under-the-hood" automatically when passed to function requiring numeric data

for(level in unique(example$strcol)){

  example[paste("dummy", level, sep = "_")] <- ifelse(example$strcol == level, 1, 0)

}

answered May 16 '15 at 10:37

skpro19

15124

add a comment |

up vote
7
down vote

I read this on the kaggle forum:

#Generate example dataframe with character column

example <- as.data.frame(c("A", "A", "B", "F", "C", "G", "C", "D", "E", "F"))

names(example) <- "strcol"



#For every unique value in the string column, create a new 1/0 column

#This is what Factors do "under-the-hood" automatically when passed to function requiring numeric data

for(level in unique(example$strcol)){

  example[paste("dummy", level, sep = "_")] <- ifelse(example$strcol == level, 1, 0)

}

answered May 16 '15 at 10:37

skpro19

15124

I read this on the kaggle forum:

#Generate example dataframe with character column

example <- as.data.frame(c("A", "A", "B", "F", "C", "G", "C", "D", "E", "F"))

names(example) <- "strcol"



#For every unique value in the string column, create a new 1/0 column

#This is what Factors do "under-the-hood" automatically when passed to function requiring numeric data

for(level in unique(example$strcol)){

  example[paste("dummy", level, sep = "_")] <- ifelse(example$strcol == level, 1, 0)

}

answered May 16 '15 at 10:37

skpro19

15124

answered May 16 '15 at 10:37

skpro19

15124

answered May 16 '15 at 10:37

skpro19

15124

answered May 16 '15 at 10:37

skpro19

15124

add a comment |

up vote
5
down vote

If you want to get K dummy variables, instead of K-1, try:

dummies = table(1:length(year),as.factor(year))

Best,

answered Mar 27 '15 at 17:45

Fernando Hoces De La Guardia

168413

the resulting table cannot be used as a data.frame. If that's a problem, use as.data.frame.matrix(dummies) to translate it into one
– sheß
Mar 27 at 19:21

add a comment |

up vote
5
down vote

If you want to get K dummy variables, instead of K-1, try:

dummies = table(1:length(year),as.factor(year))

Best,

answered Mar 27 '15 at 17:45

Fernando Hoces De La Guardia

168413

the resulting table cannot be used as a data.frame. If that's a problem, use as.data.frame.matrix(dummies) to translate it into one
– sheß
Mar 27 at 19:21

add a comment |

up vote
5
down vote

If you want to get K dummy variables, instead of K-1, try:

dummies = table(1:length(year),as.factor(year))

Best,

answered Mar 27 '15 at 17:45

Fernando Hoces De La Guardia

168413

If you want to get K dummy variables, instead of K-1, try:

dummies = table(1:length(year),as.factor(year))

Best,

answered Mar 27 '15 at 17:45

Fernando Hoces De La Guardia

168413

answered Mar 27 '15 at 17:45

Fernando Hoces De La Guardia

168413

answered Mar 27 '15 at 17:45

Fernando Hoces De La Guardia

168413

answered Mar 27 '15 at 17:45

Fernando Hoces De La Guardia

168413

the resulting table cannot be used as a data.frame. If that's a problem, use as.data.frame.matrix(dummies) to translate it into one
– sheß
Mar 27 at 19:21

add a comment |

the resulting table cannot be used as a data.frame. If that's a problem, use as.data.frame.matrix(dummies) to translate it into one
– sheß
Mar 27 at 19:21

the resulting table cannot be used as a data.frame. If that's a problem, use as.data.frame.matrix(dummies) to translate it into one
– sheß
Mar 27 at 19:21

add a comment |

up vote
5
down vote

For the usecase as presented in the question, you can also just multiply the logical condition with 1 (or maybe even better, with 1L):

# example data

df1 <- data.frame(yr = 1951:1960)



# create the dummies

df1$is.1957 <- 1L * (df1$yr == 1957)

df1$after.1957 <- 1L * (df1$yr >= 1957)

which gives:

> df1

     yr is.1957 after.1957

1  1951       0          0

2  1952       0          0

3  1953       0          0

4  1954       0          0

5  1955       0          0

6  1956       0          0

7  1957       1          1

8  1958       0          1

9  1959       0          1

10 1960       0          1

For the usecases as presented in for example the answers of @zx8754 and @Sotos, there are still some other options which haven't been covered yet imo.

1) Make your own make_dummies-function

# example data

df2 <- data.frame(id = 1:5, year = c(1991:1994,1992))



# create a function

make_dummies <- function(v, prefix = '') {

  s <- sort(unique(v))

  d <- outer(v, s, function(v, s) 1L * (v == s))

  colnames(d) <- paste0(prefix, s)

  d

}



# bind the dummies to the original dataframe

cbind(df2, make_dummies(df2$year, prefix = 'y'))

which gives:

  id year y1991 y1992 y1993 y1994

1  1 1991     1     0     0     0

2  2 1992     0     1     0     0

3  3 1993     0     0     1     0

4  4 1994     0     0     0     1

5  5 1992     0     1     0     0

2) use the dcast-function from either data.table or reshape2

 dcast(df2, id + year ~ year, fun.aggregate = length)

which gives:

  id year 1991 1992 1993 1994

1  1 1991    1    0    0    0

2  2 1992    0    1    0    0

3  3 1993    0    0    1    0

4  4 1994    0    0    0    1

5  5 1992    0    1    0    0

# example data

df3 <- data.frame(var = c("B", "C", "A", "B", "C"))



# aggregation function to get dummy values

f <- function(x) as.integer(length(x) > 0)



# reshape to wide with the cumstom aggregation function and merge back to the original

merge(df3, dcast(df3, var ~ var, fun.aggregate = f), by = 'var', all.x = TRUE)

which gives (note that the result is order according to the by column):

  var A B C

1   A 1 0 0

2   B 0 1 0

3   B 0 1 0

4   C 0 0 1

5   C 0 0 1

3) use the spread-function from tidyr (with mutate from dplyr)

library(dplyr)

library(tidyr)



df2 %>% 

  mutate(v = 1, yr = year) %>% 

  spread(yr, v, fill = 0)

which gives:

  id year 1991 1992 1993 1994

1  1 1991    1    0    0    0

2  2 1992    0    1    0    0

3  3 1993    0    0    1    0

4  4 1994    0    0    0    1

5  5 1992    0    1    0    0

edited Jul 8 at 17:30

answered Feb 13 at 18:38

Jaap

53.9k20116127

add a comment |

up vote
5
down vote

For the usecase as presented in the question, you can also just multiply the logical condition with 1 (or maybe even better, with 1L):

# example data

df1 <- data.frame(yr = 1951:1960)



# create the dummies

df1$is.1957 <- 1L * (df1$yr == 1957)

df1$after.1957 <- 1L * (df1$yr >= 1957)

which gives:

> df1

     yr is.1957 after.1957

1  1951       0          0

2  1952       0          0

3  1953       0          0

4  1954       0          0

5  1955       0          0

6  1956       0          0

7  1957       1          1

8  1958       0          1

9  1959       0          1

10 1960       0          1

For the usecases as presented in for example the answers of @zx8754 and @Sotos, there are still some other options which haven't been covered yet imo.

1) Make your own make_dummies-function

# example data

df2 <- data.frame(id = 1:5, year = c(1991:1994,1992))



# create a function

make_dummies <- function(v, prefix = '') {

  s <- sort(unique(v))

  d <- outer(v, s, function(v, s) 1L * (v == s))

  colnames(d) <- paste0(prefix, s)

  d

}



# bind the dummies to the original dataframe

cbind(df2, make_dummies(df2$year, prefix = 'y'))

which gives:

  id year y1991 y1992 y1993 y1994

1  1 1991     1     0     0     0

2  2 1992     0     1     0     0

3  3 1993     0     0     1     0

4  4 1994     0     0     0     1

5  5 1992     0     1     0     0

2) use the dcast-function from either data.table or reshape2

 dcast(df2, id + year ~ year, fun.aggregate = length)

which gives:

  id year 1991 1992 1993 1994

1  1 1991    1    0    0    0

2  2 1992    0    1    0    0

3  3 1993    0    0    1    0

4  4 1994    0    0    0    1

5  5 1992    0    1    0    0

# example data

df3 <- data.frame(var = c("B", "C", "A", "B", "C"))



# aggregation function to get dummy values

f <- function(x) as.integer(length(x) > 0)



# reshape to wide with the cumstom aggregation function and merge back to the original

merge(df3, dcast(df3, var ~ var, fun.aggregate = f), by = 'var', all.x = TRUE)

which gives (note that the result is order according to the by column):

  var A B C

1   A 1 0 0

2   B 0 1 0

3   B 0 1 0

4   C 0 0 1

5   C 0 0 1

3) use the spread-function from tidyr (with mutate from dplyr)

library(dplyr)

library(tidyr)



df2 %>% 

  mutate(v = 1, yr = year) %>% 

  spread(yr, v, fill = 0)

which gives:

  id year 1991 1992 1993 1994

1  1 1991    1    0    0    0

2  2 1992    0    1    0    0

3  3 1993    0    0    1    0

4  4 1994    0    0    0    1

5  5 1992    0    1    0    0

edited Jul 8 at 17:30

answered Feb 13 at 18:38

Jaap

53.9k20116127

add a comment |

up vote
5
down vote

For the usecase as presented in the question, you can also just multiply the logical condition with 1 (or maybe even better, with 1L):

# example data

df1 <- data.frame(yr = 1951:1960)



# create the dummies

df1$is.1957 <- 1L * (df1$yr == 1957)

df1$after.1957 <- 1L * (df1$yr >= 1957)

which gives:

> df1

     yr is.1957 after.1957

1  1951       0          0

2  1952       0          0

3  1953       0          0

4  1954       0          0

5  1955       0          0

6  1956       0          0

7  1957       1          1

8  1958       0          1

9  1959       0          1

10 1960       0          1

For the usecases as presented in for example the answers of @zx8754 and @Sotos, there are still some other options which haven't been covered yet imo.

1) Make your own make_dummies-function

# example data

df2 <- data.frame(id = 1:5, year = c(1991:1994,1992))



# create a function

make_dummies <- function(v, prefix = '') {

  s <- sort(unique(v))

  d <- outer(v, s, function(v, s) 1L * (v == s))

  colnames(d) <- paste0(prefix, s)

  d

}



# bind the dummies to the original dataframe

cbind(df2, make_dummies(df2$year, prefix = 'y'))

which gives:

  id year y1991 y1992 y1993 y1994

1  1 1991     1     0     0     0

2  2 1992     0     1     0     0

3  3 1993     0     0     1     0

4  4 1994     0     0     0     1

5  5 1992     0     1     0     0

2) use the dcast-function from either data.table or reshape2

 dcast(df2, id + year ~ year, fun.aggregate = length)

which gives:

  id year 1991 1992 1993 1994

1  1 1991    1    0    0    0

2  2 1992    0    1    0    0

3  3 1993    0    0    1    0

4  4 1994    0    0    0    1

5  5 1992    0    1    0    0

# example data

df3 <- data.frame(var = c("B", "C", "A", "B", "C"))



# aggregation function to get dummy values

f <- function(x) as.integer(length(x) > 0)



# reshape to wide with the cumstom aggregation function and merge back to the original

merge(df3, dcast(df3, var ~ var, fun.aggregate = f), by = 'var', all.x = TRUE)

which gives (note that the result is order according to the by column):

  var A B C

1   A 1 0 0

2   B 0 1 0

3   B 0 1 0

4   C 0 0 1

5   C 0 0 1

3) use the spread-function from tidyr (with mutate from dplyr)

library(dplyr)

library(tidyr)



df2 %>% 

  mutate(v = 1, yr = year) %>% 

  spread(yr, v, fill = 0)

which gives:

  id year 1991 1992 1993 1994

1  1 1991    1    0    0    0

2  2 1992    0    1    0    0

3  3 1993    0    0    1    0

4  4 1994    0    0    0    1

5  5 1992    0    1    0    0

edited Jul 8 at 17:30

answered Feb 13 at 18:38

Jaap

53.9k20116127

For the usecase as presented in the question, you can also just multiply the logical condition with 1 (or maybe even better, with 1L):

# example data

df1 <- data.frame(yr = 1951:1960)



# create the dummies

df1$is.1957 <- 1L * (df1$yr == 1957)

df1$after.1957 <- 1L * (df1$yr >= 1957)

which gives:

> df1

     yr is.1957 after.1957

1  1951       0          0

2  1952       0          0

3  1953       0          0

4  1954       0          0

5  1955       0          0

6  1956       0          0

7  1957       1          1

8  1958       0          1

9  1959       0          1

10 1960       0          1

For the usecases as presented in for example the answers of @zx8754 and @Sotos, there are still some other options which haven't been covered yet imo.

1) Make your own make_dummies-function

# example data

df2 <- data.frame(id = 1:5, year = c(1991:1994,1992))



# create a function

make_dummies <- function(v, prefix = '') {

  s <- sort(unique(v))

  d <- outer(v, s, function(v, s) 1L * (v == s))

  colnames(d) <- paste0(prefix, s)

  d

}



# bind the dummies to the original dataframe

cbind(df2, make_dummies(df2$year, prefix = 'y'))

which gives:

  id year y1991 y1992 y1993 y1994

1  1 1991     1     0     0     0

2  2 1992     0     1     0     0

3  3 1993     0     0     1     0

4  4 1994     0     0     0     1

5  5 1992     0     1     0     0

2) use the dcast-function from either data.table or reshape2

 dcast(df2, id + year ~ year, fun.aggregate = length)

which gives:

  id year 1991 1992 1993 1994

1  1 1991    1    0    0    0

2  2 1992    0    1    0    0

3  3 1993    0    0    1    0

4  4 1994    0    0    0    1

5  5 1992    0    1    0    0

# example data

df3 <- data.frame(var = c("B", "C", "A", "B", "C"))



# aggregation function to get dummy values

f <- function(x) as.integer(length(x) > 0)



# reshape to wide with the cumstom aggregation function and merge back to the original

merge(df3, dcast(df3, var ~ var, fun.aggregate = f), by = 'var', all.x = TRUE)

which gives (note that the result is order according to the by column):

  var A B C

1   A 1 0 0

2   B 0 1 0

3   B 0 1 0

4   C 0 0 1

5   C 0 0 1

3) use the spread-function from tidyr (with mutate from dplyr)

library(dplyr)

library(tidyr)



df2 %>% 

  mutate(v = 1, yr = year) %>% 

  spread(yr, v, fill = 0)

which gives:

  id year 1991 1992 1993 1994

1  1 1991    1    0    0    0

2  2 1992    0    1    0    0

3  3 1993    0    0    1    0

4  4 1994    0    0    0    1

5  5 1992    0    1    0    0

edited Jul 8 at 17:30

answered Feb 13 at 18:38

Jaap

53.9k20116127

edited Jul 8 at 17:30

answered Feb 13 at 18:38

Jaap

53.9k20116127

answered Feb 13 at 18:38

Jaap

53.9k20116127

answered Feb 13 at 18:38

Jaap

53.9k20116127

add a comment |

up vote
4
down vote

The ifelse function is best for simple logic like this.

> x <- seq(1950, 1960, 1)



    ifelse(x == 1957, 1, 0)

    ifelse(x <= 1957, 1, 0)



>  [1] 0 0 0 0 0 0 0 1 0 0 0

>  [1] 1 1 1 1 1 1 1 1 0 0 0

Also, if you want it to return character data then you can do so.

> x <- seq(1950, 1960, 1)



    ifelse(x == 1957, "foo", "bar")

    ifelse(x <= 1957, "foo", "bar")



>  [1] "bar" "bar" "bar" "bar" "bar" "bar" "bar" "foo" "bar" "bar" "bar"

>  [1] "foo" "foo" "foo" "foo" "foo" "foo" "foo" "foo" "bar" "bar" "bar"

Categorical variables with nesting...

> x <- seq(1950, 1960, 1)



    ifelse(x == 1957, "foo", ifelse(x == 1958, "bar","baz"))



>  [1] "baz" "baz" "baz" "baz" "baz" "baz" "baz" "foo" "bar" "baz" "baz"

This is the most straightforward option.

edited Dec 9 '15 at 22:52

answered Dec 9 '15 at 22:41

Alex Thompson

35818

add a comment |

up vote
4
down vote

The ifelse function is best for simple logic like this.

> x <- seq(1950, 1960, 1)



    ifelse(x == 1957, 1, 0)

    ifelse(x <= 1957, 1, 0)



>  [1] 0 0 0 0 0 0 0 1 0 0 0

>  [1] 1 1 1 1 1 1 1 1 0 0 0

Also, if you want it to return character data then you can do so.

> x <- seq(1950, 1960, 1)



    ifelse(x == 1957, "foo", "bar")

    ifelse(x <= 1957, "foo", "bar")



>  [1] "bar" "bar" "bar" "bar" "bar" "bar" "bar" "foo" "bar" "bar" "bar"

>  [1] "foo" "foo" "foo" "foo" "foo" "foo" "foo" "foo" "bar" "bar" "bar"

Categorical variables with nesting...

> x <- seq(1950, 1960, 1)



    ifelse(x == 1957, "foo", ifelse(x == 1958, "bar","baz"))



>  [1] "baz" "baz" "baz" "baz" "baz" "baz" "baz" "foo" "bar" "baz" "baz"

This is the most straightforward option.

edited Dec 9 '15 at 22:52

answered Dec 9 '15 at 22:41

Alex Thompson

35818

add a comment |

up vote
4
down vote

The ifelse function is best for simple logic like this.

> x <- seq(1950, 1960, 1)



    ifelse(x == 1957, 1, 0)

    ifelse(x <= 1957, 1, 0)



>  [1] 0 0 0 0 0 0 0 1 0 0 0

>  [1] 1 1 1 1 1 1 1 1 0 0 0

Also, if you want it to return character data then you can do so.

> x <- seq(1950, 1960, 1)



    ifelse(x == 1957, "foo", "bar")

    ifelse(x <= 1957, "foo", "bar")



>  [1] "bar" "bar" "bar" "bar" "bar" "bar" "bar" "foo" "bar" "bar" "bar"

>  [1] "foo" "foo" "foo" "foo" "foo" "foo" "foo" "foo" "bar" "bar" "bar"

Categorical variables with nesting...

> x <- seq(1950, 1960, 1)



    ifelse(x == 1957, "foo", ifelse(x == 1958, "bar","baz"))



>  [1] "baz" "baz" "baz" "baz" "baz" "baz" "baz" "foo" "bar" "baz" "baz"

This is the most straightforward option.

edited Dec 9 '15 at 22:52

answered Dec 9 '15 at 22:41

Alex Thompson

35818

The ifelse function is best for simple logic like this.

> x <- seq(1950, 1960, 1)



    ifelse(x == 1957, 1, 0)

    ifelse(x <= 1957, 1, 0)



>  [1] 0 0 0 0 0 0 0 1 0 0 0

>  [1] 1 1 1 1 1 1 1 1 0 0 0

Also, if you want it to return character data then you can do so.

> x <- seq(1950, 1960, 1)



    ifelse(x == 1957, "foo", "bar")

    ifelse(x <= 1957, "foo", "bar")



>  [1] "bar" "bar" "bar" "bar" "bar" "bar" "bar" "foo" "bar" "bar" "bar"

>  [1] "foo" "foo" "foo" "foo" "foo" "foo" "foo" "foo" "bar" "bar" "bar"

Categorical variables with nesting...

> x <- seq(1950, 1960, 1)



    ifelse(x == 1957, "foo", ifelse(x == 1958, "bar","baz"))



>  [1] "baz" "baz" "baz" "baz" "baz" "baz" "baz" "foo" "bar" "baz" "baz"

This is the most straightforward option.

edited Dec 9 '15 at 22:52

answered Dec 9 '15 at 22:41

Alex Thompson

35818

edited Dec 9 '15 at 22:52

answered Dec 9 '15 at 22:41

Alex Thompson

35818

answered Dec 9 '15 at 22:41

Alex Thompson

35818

answered Dec 9 '15 at 22:41

Alex Thompson

35818

add a comment |

up vote
2
down vote

Another way is to use mtabulate from qdapTools package, i.e.

df <- data.frame(var = sample(c("A", "B", "C"), 5, replace = TRUE))

  var

#1   C

#2   A

#3   C

#4   B

#5   B



library(qdapTools)

mtabulate(df$var)

which gives,

answered Oct 6 '17 at 6:32

Sotos

26.9k51540

add a comment |

up vote
2
down vote

Another way is to use mtabulate from qdapTools package, i.e.

df <- data.frame(var = sample(c("A", "B", "C"), 5, replace = TRUE))

  var

#1   C

#2   A

#3   C

#4   B

#5   B



library(qdapTools)

mtabulate(df$var)

which gives,

answered Oct 6 '17 at 6:32

Sotos

26.9k51540

add a comment |

up vote
2
down vote

Another way is to use mtabulate from qdapTools package, i.e.

df <- data.frame(var = sample(c("A", "B", "C"), 5, replace = TRUE))

  var

#1   C

#2   A

#3   C

#4   B

#5   B



library(qdapTools)

mtabulate(df$var)

which gives,

answered Oct 6 '17 at 6:32

Sotos

26.9k51540

Another way is to use mtabulate from qdapTools package, i.e.

df <- data.frame(var = sample(c("A", "B", "C"), 5, replace = TRUE))

  var

#1   C

#2   A

#3   C

#4   B

#5   B



library(qdapTools)

mtabulate(df$var)

which gives,

answered Oct 6 '17 at 6:32

Sotos

26.9k51540

answered Oct 6 '17 at 6:32

Sotos

26.9k51540

answered Oct 6 '17 at 6:32

Sotos

26.9k51540

answered Oct 6 '17 at 6:32

Sotos

26.9k51540

add a comment |

up vote
1
down vote

I use such a function (for data.table):

# Ta funkcja dla obiektu data.table i zmiennej var.name typu factor tworzy dummy variables o nazwach "var.name: (level1)"

factorToDummy <- function(dtable, var.name){

  stopifnot(is.data.table(dtable))

  stopifnot(var.name %in% names(dtable))

  stopifnot(is.factor(dtable[, get(var.name)]))



  dtable[, paste0(var.name,": ",levels(get(var.name)))] -> new.names

  dtable[, (new.names) := transpose(lapply(get(var.name), FUN = function(x){x == levels(get(var.name))})) ]



  cat(paste("nDodano zmienne dummy: ", paste0(new.names, collapse = ", ")))

}

Usage:

data <- data.table(data)

data[, x:= droplevels(x)]

factorToDummy(data, "x")

edited Aug 18 '15 at 9:58

answered Aug 18 '15 at 9:50

Maciej Mozolewski

113

add a comment |

up vote
1
down vote

I use such a function (for data.table):

# Ta funkcja dla obiektu data.table i zmiennej var.name typu factor tworzy dummy variables o nazwach "var.name: (level1)"

factorToDummy <- function(dtable, var.name){

  stopifnot(is.data.table(dtable))

  stopifnot(var.name %in% names(dtable))

  stopifnot(is.factor(dtable[, get(var.name)]))



  dtable[, paste0(var.name,": ",levels(get(var.name)))] -> new.names

  dtable[, (new.names) := transpose(lapply(get(var.name), FUN = function(x){x == levels(get(var.name))})) ]



  cat(paste("nDodano zmienne dummy: ", paste0(new.names, collapse = ", ")))

}

Usage:

data <- data.table(data)

data[, x:= droplevels(x)]

factorToDummy(data, "x")

edited Aug 18 '15 at 9:58

answered Aug 18 '15 at 9:50

Maciej Mozolewski

113

add a comment |

up vote
1
down vote

I use such a function (for data.table):

# Ta funkcja dla obiektu data.table i zmiennej var.name typu factor tworzy dummy variables o nazwach "var.name: (level1)"

factorToDummy <- function(dtable, var.name){

  stopifnot(is.data.table(dtable))

  stopifnot(var.name %in% names(dtable))

  stopifnot(is.factor(dtable[, get(var.name)]))



  dtable[, paste0(var.name,": ",levels(get(var.name)))] -> new.names

  dtable[, (new.names) := transpose(lapply(get(var.name), FUN = function(x){x == levels(get(var.name))})) ]



  cat(paste("nDodano zmienne dummy: ", paste0(new.names, collapse = ", ")))

}

Usage:

data <- data.table(data)

data[, x:= droplevels(x)]

factorToDummy(data, "x")

edited Aug 18 '15 at 9:58

answered Aug 18 '15 at 9:50

Maciej Mozolewski

113

I use such a function (for data.table):

# Ta funkcja dla obiektu data.table i zmiennej var.name typu factor tworzy dummy variables o nazwach "var.name: (level1)"

factorToDummy <- function(dtable, var.name){

  stopifnot(is.data.table(dtable))

  stopifnot(var.name %in% names(dtable))

  stopifnot(is.factor(dtable[, get(var.name)]))



  dtable[, paste0(var.name,": ",levels(get(var.name)))] -> new.names

  dtable[, (new.names) := transpose(lapply(get(var.name), FUN = function(x){x == levels(get(var.name))})) ]



  cat(paste("nDodano zmienne dummy: ", paste0(new.names, collapse = ", ")))

}

Usage:

data <- data.table(data)

data[, x:= droplevels(x)]

factorToDummy(data, "x")

edited Aug 18 '15 at 9:58

answered Aug 18 '15 at 9:50

Maciej Mozolewski

113

edited Aug 18 '15 at 9:58

answered Aug 18 '15 at 9:50

Maciej Mozolewski

113

answered Aug 18 '15 at 9:50

Maciej Mozolewski

113

answered Aug 18 '15 at 9:50

Maciej Mozolewski

113

add a comment |

up vote
1
down vote

Convert your data to a data.table and use set by reference and row filtering

library(data.table)



dt <- as.data.table(your.dataframe.or.whatever)

dt[, is.1957 := 0]

dt[year == 1957, is.1957 := 1]

Proof-of-concept toy example:

library(data.table)



dt <- as.data.table(cbind(c(1, 1, 1), c(2, 2, 3)))

dt[, is.3 := 0]

dt[V2 == 3, is.3 := 1]

edited May 9 at 23:31

answered Feb 15 at 3:48

wordsforthewise

2,97722446

add a comment |

up vote
1
down vote

Convert your data to a data.table and use set by reference and row filtering

library(data.table)



dt <- as.data.table(your.dataframe.or.whatever)

dt[, is.1957 := 0]

dt[year == 1957, is.1957 := 1]

Proof-of-concept toy example:

library(data.table)



dt <- as.data.table(cbind(c(1, 1, 1), c(2, 2, 3)))

dt[, is.3 := 0]

dt[V2 == 3, is.3 := 1]

edited May 9 at 23:31

answered Feb 15 at 3:48

wordsforthewise

2,97722446

add a comment |

up vote
1
down vote

Convert your data to a data.table and use set by reference and row filtering

library(data.table)



dt <- as.data.table(your.dataframe.or.whatever)

dt[, is.1957 := 0]

dt[year == 1957, is.1957 := 1]

Proof-of-concept toy example:

library(data.table)



dt <- as.data.table(cbind(c(1, 1, 1), c(2, 2, 3)))

dt[, is.3 := 0]

dt[V2 == 3, is.3 := 1]

edited May 9 at 23:31

answered Feb 15 at 3:48

wordsforthewise

2,97722446

Convert your data to a data.table and use set by reference and row filtering

library(data.table)



dt <- as.data.table(your.dataframe.or.whatever)

dt[, is.1957 := 0]

dt[year == 1957, is.1957 := 1]

Proof-of-concept toy example:

library(data.table)



dt <- as.data.table(cbind(c(1, 1, 1), c(2, 2, 3)))

dt[, is.3 := 0]

dt[V2 == 3, is.3 := 1]

edited May 9 at 23:31

answered Feb 15 at 3:48

wordsforthewise

2,97722446

edited May 9 at 23:31

answered Feb 15 at 3:48

wordsforthewise

2,97722446

answered Feb 15 at 3:48

wordsforthewise

2,97722446

answered Feb 15 at 3:48

wordsforthewise

2,97722446

add a comment |

up vote
0
down vote

Hi i wrote this general function to generate a dummy variable which essentially replicates the replace function in Stata.

If x is the data frame is x and i want a dummy variable called a which will take value 1 when x$b takes value c

introducedummy<-function(x,a,b,c){

   g<-c(a,b,c)

  n<-nrow(x)

  newcol<-g[1]

  p<-colnames(x)

  p2<-c(p,newcol)

  new1<-numeric(n)

  state<-x[,g[2]]

  interest<-g[3]

  for(i in 1:n){

    if(state[i]==interest){

      new1[i]=1

    }

    else{

      new1[i]=0

    }

  }

    x$added<-new1

    colnames(x)<-p2

    x

  }

edited Feb 6 '15 at 18:00

mor

2,2171228

answered Feb 6 '15 at 17:18

kangkan Dc

465

add a comment |

up vote
0
down vote

Hi i wrote this general function to generate a dummy variable which essentially replicates the replace function in Stata.

If x is the data frame is x and i want a dummy variable called a which will take value 1 when x$b takes value c

introducedummy<-function(x,a,b,c){

   g<-c(a,b,c)

  n<-nrow(x)

  newcol<-g[1]

  p<-colnames(x)

  p2<-c(p,newcol)

  new1<-numeric(n)

  state<-x[,g[2]]

  interest<-g[3]

  for(i in 1:n){

    if(state[i]==interest){

      new1[i]=1

    }

    else{

      new1[i]=0

    }

  }

    x$added<-new1

    colnames(x)<-p2

    x

  }

edited Feb 6 '15 at 18:00

mor

2,2171228

answered Feb 6 '15 at 17:18

kangkan Dc

465

add a comment |

up vote
0
down vote

Hi i wrote this general function to generate a dummy variable which essentially replicates the replace function in Stata.

If x is the data frame is x and i want a dummy variable called a which will take value 1 when x$b takes value c

introducedummy<-function(x,a,b,c){

   g<-c(a,b,c)

  n<-nrow(x)

  newcol<-g[1]

  p<-colnames(x)

  p2<-c(p,newcol)

  new1<-numeric(n)

  state<-x[,g[2]]

  interest<-g[3]

  for(i in 1:n){

    if(state[i]==interest){

      new1[i]=1

    }

    else{

      new1[i]=0

    }

  }

    x$added<-new1

    colnames(x)<-p2

    x

  }

edited Feb 6 '15 at 18:00

mor

2,2171228

answered Feb 6 '15 at 17:18

kangkan Dc

465

Hi i wrote this general function to generate a dummy variable which essentially replicates the replace function in Stata.

If x is the data frame is x and i want a dummy variable called a which will take value 1 when x$b takes value c

introducedummy<-function(x,a,b,c){

   g<-c(a,b,c)

  n<-nrow(x)

  newcol<-g[1]

  p<-colnames(x)

  p2<-c(p,newcol)

  new1<-numeric(n)

  state<-x[,g[2]]

  interest<-g[3]

  for(i in 1:n){

    if(state[i]==interest){

      new1[i]=1

    }

    else{

      new1[i]=0

    }

  }

    x$added<-new1

    colnames(x)<-p2

    x

  }

edited Feb 6 '15 at 18:00

mor

2,2171228

answered Feb 6 '15 at 17:18

kangkan Dc

465

edited Feb 6 '15 at 18:00

mor

2,2171228

edited Feb 6 '15 at 18:00

mor

2,2171228

edited Feb 6 '15 at 18:00

mor

2,2171228

answered Feb 6 '15 at 17:18

kangkan Dc

465

answered Feb 6 '15 at 17:18

kangkan Dc

465

answered Feb 6 '15 at 17:18

kangkan Dc

465

add a comment |

up vote
0
down vote

another way you can do it is use

ifelse(year < 1965 , 1, 0)

edited May 9 at 23:54

dee-see

18.6k34479

answered May 9 at 21:09

Sophia J

4910

add a comment |

up vote
0
down vote

another way you can do it is use

ifelse(year < 1965 , 1, 0)

edited May 9 at 23:54

dee-see

18.6k34479

answered May 9 at 21:09

Sophia J

4910

add a comment |

up vote
0
down vote

another way you can do it is use

ifelse(year < 1965 , 1, 0)

edited May 9 at 23:54

dee-see

18.6k34479

answered May 9 at 21:09

Sophia J

4910

another way you can do it is use

ifelse(year < 1965 , 1, 0)

edited May 9 at 23:54

dee-see

18.6k34479

answered May 9 at 21:09

Sophia J

4910

edited May 9 at 23:54

dee-see

18.6k34479

edited May 9 at 23:54

dee-see

18.6k34479

edited May 9 at 23:54

dee-see

18.6k34479

answered May 9 at 21:09

Sophia J

4910

answered May 9 at 21:09

Sophia J

4910

answered May 9 at 21:09

Sophia J

4910

add a comment |

protected by Jaap Oct 16 '17 at 9:47

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu

Generate a dummy-variable

migrated from stats.stackexchange.com Aug 14 '12 at 12:55

migrated from stats.stackexchange.com Aug 14 '12 at 12:55

migrated from stats.stackexchange.com Aug 14 '12 at 12:55

migrated from stats.stackexchange.com Aug 14 '12 at 12:55

15 Answers
15

`caret::dummyVars`

`recipes::step_dummy`

protected by Jaap Oct 16 '17 at 9:47

15 Answers
15

15 Answers
15

`caret::dummyVars`

`recipes::step_dummy`

`caret::dummyVars`

`recipes::step_dummy`

`caret::dummyVars`

`recipes::step_dummy`

`caret::dummyVars`

`recipes::step_dummy`

protected by Jaap Oct 16 '17 at 9:47

Popular posts from this blog

JavaFX. Displaying an image from byte[]

NPM command prompt closes immediately [closed]

Error binding properties and functions in emscripten

Category

Random preview

Generate a dummy-variable

migrated from stats.stackexchange.com Aug 14 '12 at 12:55

migrated from stats.stackexchange.com Aug 14 '12 at 12:55

migrated from stats.stackexchange.com Aug 14 '12 at 12:55

migrated from stats.stackexchange.com Aug 14 '12 at 12:55

15 Answers 15

caret::dummyVars

recipes::step_dummy

protected by Jaap Oct 16 '17 at 9:47

15 Answers 15

15 Answers 15

caret::dummyVars

recipes::step_dummy

caret::dummyVars

recipes::step_dummy

caret::dummyVars

recipes::step_dummy

caret::dummyVars

recipes::step_dummy

protected by Jaap Oct 16 '17 at 9:47

Popular posts from this blog

JavaFX. Displaying an image from byte[]

NPM command prompt closes immediately [closed]

Error binding properties and functions in emscripten

15 Answers
15

`caret::dummyVars`

`recipes::step_dummy`

15 Answers
15

15 Answers
15

`caret::dummyVars`

`recipes::step_dummy`

`caret::dummyVars`

`recipes::step_dummy`

`caret::dummyVars`

`recipes::step_dummy`

`caret::dummyVars`

`recipes::step_dummy`