recode/replace multiple values in a shared data column to a single value across data frames

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}

I hope I haven't missed it, but I haven't been able to find a working solution to this problem.
I have a set of data frames with a shared column. These columns contain multiple and varying transcription errors, some of which are shared, others not, for multiple values.
I would like replace/recode the transcription errors (bad_values) with the correct values (good_values) across all data frames.

I have tried nesting the map*() family of functions across lists of data frames, bad_values, and good_values to do this, among other things. Here is an example:

df1 = data.frame(grp = c("a1","a.","a.",rep("b",7)), measure = rnorm(10))



df2 = data.frame(grp = c(rep("as", 3), "b2",rep("a",22)), measure = rnorm(26))



df3 = data.frame(grp = c(rep("b-",3),rep("bq",2),"a", rep("a.", 3)), measure = 1:9)





df_list = list(df1, df2, df3)

bad_values = list(c("a1","a.","as"), c("b2","b-","bq"))

good_values = list("a", "b")



dfs = map(df_list, function(x) {

  x %>% mutate(grp = plyr::mapvalues(grp, bad_values, rep(good_values,length(bad_values))))

})

Which I didn't necessarily expect to work beyond a single good-bad value pair. However, I thought nesting another call to map*() within this might work:

dfs = map(df_list, function(x) {

x %>% mutate(grp = map2(bad_values, good_values, function(x,y) {

recode(grp, bad_values = good_values)})

})

I have tried a number of other approaches, none of which have worked.

Ultimately, I would like to go from a set of data frames with errors, as here:

[[1]]

  grp    measure

1  a1  0.5582253

2  a.  0.3400904

3  a. -0.2200824

4   b -0.7287385

5   b -0.2128275

6   b  1.9030766



[[2]]

  grp    measure

1  as  1.6148772

2  as  0.1090853

3  as -1.3714180

4  b2 -0.1606979

5   a  1.1726395

6   a -0.3201150



[[3]]

  grp measure

1  b-       1

2  b-       2

3  b-       3

4  bq       4

5  bq       5

6   a       6

To a list of 'fixed' data frames, as such:

[[1]]

  grp    measure

1   a -0.7671052

2   a  0.1781247

3   a -0.7565773

4   b -0.3606900

5   b  1.9264804

6   b  0.9506608



[[2]]

  grp     measure

1   a  1.45036125

2   a -2.16715639

3   a  0.80105611

4   b  0.24216723

5   a  1.33089426

6   a -0.08388404



[[3]]

  grp measure

1   b       1

2   b       2

3   b       3

4   b       4

5   b       5

6   a       6

Any help would be very much appreciated

asked Jan 3 at 6:20

Jim Junker

304

add a comment |

I have tried nesting the map*() family of functions across lists of data frames, bad_values, and good_values to do this, among other things. Here is an example:

df1 = data.frame(grp = c("a1","a.","a.",rep("b",7)), measure = rnorm(10))



df2 = data.frame(grp = c(rep("as", 3), "b2",rep("a",22)), measure = rnorm(26))



df3 = data.frame(grp = c(rep("b-",3),rep("bq",2),"a", rep("a.", 3)), measure = 1:9)





df_list = list(df1, df2, df3)

bad_values = list(c("a1","a.","as"), c("b2","b-","bq"))

good_values = list("a", "b")



dfs = map(df_list, function(x) {

  x %>% mutate(grp = plyr::mapvalues(grp, bad_values, rep(good_values,length(bad_values))))

})

Which I didn't necessarily expect to work beyond a single good-bad value pair. However, I thought nesting another call to map*() within this might work:

dfs = map(df_list, function(x) {

x %>% mutate(grp = map2(bad_values, good_values, function(x,y) {

recode(grp, bad_values = good_values)})

})

I have tried a number of other approaches, none of which have worked.

Ultimately, I would like to go from a set of data frames with errors, as here:

[[1]]

  grp    measure

1  a1  0.5582253

2  a.  0.3400904

3  a. -0.2200824

4   b -0.7287385

5   b -0.2128275

6   b  1.9030766



[[2]]

  grp    measure

1  as  1.6148772

2  as  0.1090853

3  as -1.3714180

4  b2 -0.1606979

5   a  1.1726395

6   a -0.3201150



[[3]]

  grp measure

1  b-       1

2  b-       2

3  b-       3

4  bq       4

5  bq       5

6   a       6

To a list of 'fixed' data frames, as such:

[[1]]

  grp    measure

1   a -0.7671052

2   a  0.1781247

3   a -0.7565773

4   b -0.3606900

5   b  1.9264804

6   b  0.9506608



[[2]]

  grp     measure

1   a  1.45036125

2   a -2.16715639

3   a  0.80105611

4   b  0.24216723

5   a  1.33089426

6   a -0.08388404



[[3]]

  grp measure

1   b       1

2   b       2

3   b       3

4   b       4

5   b       5

6   a       6

Any help would be very much appreciated

asked Jan 3 at 6:20

Jim Junker

304

add a comment |

I have tried nesting the map*() family of functions across lists of data frames, bad_values, and good_values to do this, among other things. Here is an example:

df1 = data.frame(grp = c("a1","a.","a.",rep("b",7)), measure = rnorm(10))



df2 = data.frame(grp = c(rep("as", 3), "b2",rep("a",22)), measure = rnorm(26))



df3 = data.frame(grp = c(rep("b-",3),rep("bq",2),"a", rep("a.", 3)), measure = 1:9)





df_list = list(df1, df2, df3)

bad_values = list(c("a1","a.","as"), c("b2","b-","bq"))

good_values = list("a", "b")



dfs = map(df_list, function(x) {

  x %>% mutate(grp = plyr::mapvalues(grp, bad_values, rep(good_values,length(bad_values))))

})

Which I didn't necessarily expect to work beyond a single good-bad value pair. However, I thought nesting another call to map*() within this might work:

dfs = map(df_list, function(x) {

x %>% mutate(grp = map2(bad_values, good_values, function(x,y) {

recode(grp, bad_values = good_values)})

})

I have tried a number of other approaches, none of which have worked.

Ultimately, I would like to go from a set of data frames with errors, as here:

[[1]]

  grp    measure

1  a1  0.5582253

2  a.  0.3400904

3  a. -0.2200824

4   b -0.7287385

5   b -0.2128275

6   b  1.9030766



[[2]]

  grp    measure

1  as  1.6148772

2  as  0.1090853

3  as -1.3714180

4  b2 -0.1606979

5   a  1.1726395

6   a -0.3201150



[[3]]

  grp measure

1  b-       1

2  b-       2

3  b-       3

4  bq       4

5  bq       5

6   a       6

To a list of 'fixed' data frames, as such:

[[1]]

  grp    measure

1   a -0.7671052

2   a  0.1781247

3   a -0.7565773

4   b -0.3606900

5   b  1.9264804

6   b  0.9506608



[[2]]

  grp     measure

1   a  1.45036125

2   a -2.16715639

3   a  0.80105611

4   b  0.24216723

5   a  1.33089426

6   a -0.08388404



[[3]]

  grp measure

1   b       1

2   b       2

3   b       3

4   b       4

5   b       5

6   a       6

Any help would be very much appreciated

asked Jan 3 at 6:20

Jim Junker

304

I have tried nesting the map*() family of functions across lists of data frames, bad_values, and good_values to do this, among other things. Here is an example:

df1 = data.frame(grp = c("a1","a.","a.",rep("b",7)), measure = rnorm(10))



df2 = data.frame(grp = c(rep("as", 3), "b2",rep("a",22)), measure = rnorm(26))



df3 = data.frame(grp = c(rep("b-",3),rep("bq",2),"a", rep("a.", 3)), measure = 1:9)





df_list = list(df1, df2, df3)

bad_values = list(c("a1","a.","as"), c("b2","b-","bq"))

good_values = list("a", "b")



dfs = map(df_list, function(x) {

  x %>% mutate(grp = plyr::mapvalues(grp, bad_values, rep(good_values,length(bad_values))))

})

Which I didn't necessarily expect to work beyond a single good-bad value pair. However, I thought nesting another call to map*() within this might work:

dfs = map(df_list, function(x) {

x %>% mutate(grp = map2(bad_values, good_values, function(x,y) {

recode(grp, bad_values = good_values)})

})

I have tried a number of other approaches, none of which have worked.

Ultimately, I would like to go from a set of data frames with errors, as here:

[[1]]

  grp    measure

1  a1  0.5582253

2  a.  0.3400904

3  a. -0.2200824

4   b -0.7287385

5   b -0.2128275

6   b  1.9030766



[[2]]

  grp    measure

1  as  1.6148772

2  as  0.1090853

3  as -1.3714180

4  b2 -0.1606979

5   a  1.1726395

6   a -0.3201150



[[3]]

  grp measure

1  b-       1

2  b-       2

3  b-       3

4  bq       4

5  bq       5

6   a       6

To a list of 'fixed' data frames, as such:

[[1]]

  grp    measure

1   a -0.7671052

2   a  0.1781247

3   a -0.7565773

4   b -0.3606900

5   b  1.9264804

6   b  0.9506608



[[2]]

  grp     measure

1   a  1.45036125

2   a -2.16715639

3   a  0.80105611

4   b  0.24216723

5   a  1.33089426

6   a -0.08388404



[[3]]

  grp measure

1   b       1

2   b       2

3   b       3

4   b       4

5   b       5

6   a       6

Any help would be very much appreciated

r dplyr lapply purrr

asked Jan 3 at 6:20

Jim Junker

304

asked Jan 3 at 6:20

Jim Junker

304

asked Jan 3 at 6:20

Jim Junker

304

asked Jan 3 at 6:20

Jim Junker

304

asked Jan 3 at 6:20

Jim Junker

304

add a comment |

3 Answers
3

active

oldest

votes

Here is an option using tidyverse with recode_factor. When there are multiple elements to be changed, create a list of key/val elements and use recode_factor to match and change the values to new levels

library(tidyverse)

keyval <- setNames(rep(good_values, lengths(bad_values)), unlist(bad_values))

out <- map(df_list, ~ .x %>% 

                  mutate(grp = recode_factor(grp, !!! keyval)))

-output

out

#[[1]]

#   grp     measure

#1    a -1.63295876

#2    a  0.03859976

#3    a -0.46541610

#4    b -0.72356671

#5    b -1.11552841

#6    b  0.99352861

#....



#[[2]]

#   grp     measure

#1    a  1.26536789

#2    a -0.48189740

#3    a  0.23041056

#4    b -1.01324689

#5    a -1.41586086

#6    a  0.59026463

#....





#[[3]]

#  grp measure

#1   b       1

#2   b       2

#3   b       3

#4   b       4

#5   b       5

#6   a       6

#....

NOTE: This doesn't change the class of the initial dataset column

str(out)

#List of 3

# $ :'data.frame':  10 obs. of  2 variables:

#  ..$ grp    : Factor w/ 2 levels "a","b": 1 1 1 2 2 2 2 2 2 2

#  ..$ measure: num [1:10] -1.633 0.0386 -0.4654 -0.7236 -1.1155 ...

# $ :'data.frame':  26 obs. of  2 variables:

#  ..$ grp    : Factor w/ 2 levels "a","b": 1 1 1 2 1 1 1 1 1 1 ...

#  ..$ measure: num [1:26] 1.265 -0.482 0.23 -1.013 -1.416 ...

# $ :'data.frame':  9 obs. of  2 variables:

#  ..$ grp    : Factor w/ 2 levels "a","b": 2 2 2 2 2 1 1 1 1

#  ..$ measure: int [1:9] 1 2 3 4 5 6 7 8 9

Once we have a keyval pair list, this can be also used in base R functions

out1 <- lapply(df_list, transform, grp = unlist(keyval[grp]))

edited Jan 3 at 7:38

answered Jan 3 at 7:23

akrun

419k13207284

add a comment |

Any reason mapping a case_when statement wouldn't work?

library(tidyverse)

df_list %>% 

  map(~ mutate_if(.x, is.factor, as.character)) %>% # convert factor to character

  map(~ mutate(.x, grp = case_when(grp %in% bad_values[[1]] ~ good_values[[1]],

                                   grp %in% bad_values[[2]] ~ good_values[[2]],

                                   TRUE ~ grp)))

I could see it working for your reprex but possibly not the greater problem.

answered Jan 3 at 6:52

zack

3,4131322

add a comment |

A base R option if you have lot of good_values and bad_values and it is not possible to check each one individually.

lapply(df_list, function(x) {

  vec = x[['grp']]

  mapply(function(p, q) vec[vec %in% p] <<- q ,bad_values, good_values)

  transform(x, grp = vec)

})





#[[1]]

#   grp      measure

#1    a -0.648146527

#2    a -0.004722549

#3    a -0.943451194

#4    b -0.709509396

#5    b -0.719434286

#....



#[[2]]

#   grp     measure

#1    a  1.03131291

#2    a -0.85558910

#3    a -0.05933911

#4    b  0.67812934

#5    a  3.23854093

#6    a  1.31688645

#7    a  1.87464048

#8    a  0.90100179

#....



#[[3]]

#  grp measure

#1   b       1

#2   b       2

#3   b       3

#4   b       4

#5   b       5

#....

Here, for every list element we extract it's grp column and replace bad_values with corresponding good_values if they are found and return the corrected dataframe.

answered Jan 3 at 7:11

Ronak Shah

45.3k104267

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54017250%2frecode-replace-multiple-values-in-a-shared-data-column-to-a-single-value-across%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

library(tidyverse)

keyval <- setNames(rep(good_values, lengths(bad_values)), unlist(bad_values))

out <- map(df_list, ~ .x %>% 

                  mutate(grp = recode_factor(grp, !!! keyval)))

-output

out

#[[1]]

#   grp     measure

#1    a -1.63295876

#2    a  0.03859976

#3    a -0.46541610

#4    b -0.72356671

#5    b -1.11552841

#6    b  0.99352861

#....



#[[2]]

#   grp     measure

#1    a  1.26536789

#2    a -0.48189740

#3    a  0.23041056

#4    b -1.01324689

#5    a -1.41586086

#6    a  0.59026463

#....





#[[3]]

#  grp measure

#1   b       1

#2   b       2

#3   b       3

#4   b       4

#5   b       5

#6   a       6

#....

NOTE: This doesn't change the class of the initial dataset column

str(out)

#List of 3

# $ :'data.frame':  10 obs. of  2 variables:

#  ..$ grp    : Factor w/ 2 levels "a","b": 1 1 1 2 2 2 2 2 2 2

#  ..$ measure: num [1:10] -1.633 0.0386 -0.4654 -0.7236 -1.1155 ...

# $ :'data.frame':  26 obs. of  2 variables:

#  ..$ grp    : Factor w/ 2 levels "a","b": 1 1 1 2 1 1 1 1 1 1 ...

#  ..$ measure: num [1:26] 1.265 -0.482 0.23 -1.013 -1.416 ...

# $ :'data.frame':  9 obs. of  2 variables:

#  ..$ grp    : Factor w/ 2 levels "a","b": 2 2 2 2 2 1 1 1 1

#  ..$ measure: int [1:9] 1 2 3 4 5 6 7 8 9

Once we have a keyval pair list, this can be also used in base R functions

out1 <- lapply(df_list, transform, grp = unlist(keyval[grp]))

edited Jan 3 at 7:38

answered Jan 3 at 7:23

akrun

419k13207284

add a comment |

library(tidyverse)

keyval <- setNames(rep(good_values, lengths(bad_values)), unlist(bad_values))

out <- map(df_list, ~ .x %>% 

                  mutate(grp = recode_factor(grp, !!! keyval)))

-output

out

#[[1]]

#   grp     measure

#1    a -1.63295876

#2    a  0.03859976

#3    a -0.46541610

#4    b -0.72356671

#5    b -1.11552841

#6    b  0.99352861

#....



#[[2]]

#   grp     measure

#1    a  1.26536789

#2    a -0.48189740

#3    a  0.23041056

#4    b -1.01324689

#5    a -1.41586086

#6    a  0.59026463

#....





#[[3]]

#  grp measure

#1   b       1

#2   b       2

#3   b       3

#4   b       4

#5   b       5

#6   a       6

#....

NOTE: This doesn't change the class of the initial dataset column

str(out)

#List of 3

# $ :'data.frame':  10 obs. of  2 variables:

#  ..$ grp    : Factor w/ 2 levels "a","b": 1 1 1 2 2 2 2 2 2 2

#  ..$ measure: num [1:10] -1.633 0.0386 -0.4654 -0.7236 -1.1155 ...

# $ :'data.frame':  26 obs. of  2 variables:

#  ..$ grp    : Factor w/ 2 levels "a","b": 1 1 1 2 1 1 1 1 1 1 ...

#  ..$ measure: num [1:26] 1.265 -0.482 0.23 -1.013 -1.416 ...

# $ :'data.frame':  9 obs. of  2 variables:

#  ..$ grp    : Factor w/ 2 levels "a","b": 2 2 2 2 2 1 1 1 1

#  ..$ measure: int [1:9] 1 2 3 4 5 6 7 8 9

Once we have a keyval pair list, this can be also used in base R functions

out1 <- lapply(df_list, transform, grp = unlist(keyval[grp]))

edited Jan 3 at 7:38

answered Jan 3 at 7:23

akrun

419k13207284

add a comment |

library(tidyverse)

keyval <- setNames(rep(good_values, lengths(bad_values)), unlist(bad_values))

out <- map(df_list, ~ .x %>% 

                  mutate(grp = recode_factor(grp, !!! keyval)))

-output

out

#[[1]]

#   grp     measure

#1    a -1.63295876

#2    a  0.03859976

#3    a -0.46541610

#4    b -0.72356671

#5    b -1.11552841

#6    b  0.99352861

#....



#[[2]]

#   grp     measure

#1    a  1.26536789

#2    a -0.48189740

#3    a  0.23041056

#4    b -1.01324689

#5    a -1.41586086

#6    a  0.59026463

#....





#[[3]]

#  grp measure

#1   b       1

#2   b       2

#3   b       3

#4   b       4

#5   b       5

#6   a       6

#....

NOTE: This doesn't change the class of the initial dataset column

str(out)

#List of 3

# $ :'data.frame':  10 obs. of  2 variables:

#  ..$ grp    : Factor w/ 2 levels "a","b": 1 1 1 2 2 2 2 2 2 2

#  ..$ measure: num [1:10] -1.633 0.0386 -0.4654 -0.7236 -1.1155 ...

# $ :'data.frame':  26 obs. of  2 variables:

#  ..$ grp    : Factor w/ 2 levels "a","b": 1 1 1 2 1 1 1 1 1 1 ...

#  ..$ measure: num [1:26] 1.265 -0.482 0.23 -1.013 -1.416 ...

# $ :'data.frame':  9 obs. of  2 variables:

#  ..$ grp    : Factor w/ 2 levels "a","b": 2 2 2 2 2 1 1 1 1

#  ..$ measure: int [1:9] 1 2 3 4 5 6 7 8 9

Once we have a keyval pair list, this can be also used in base R functions

out1 <- lapply(df_list, transform, grp = unlist(keyval[grp]))

edited Jan 3 at 7:38

answered Jan 3 at 7:23

akrun

419k13207284

library(tidyverse)

keyval <- setNames(rep(good_values, lengths(bad_values)), unlist(bad_values))

out <- map(df_list, ~ .x %>% 

                  mutate(grp = recode_factor(grp, !!! keyval)))

-output

out

#[[1]]

#   grp     measure

#1    a -1.63295876

#2    a  0.03859976

#3    a -0.46541610

#4    b -0.72356671

#5    b -1.11552841

#6    b  0.99352861

#....



#[[2]]

#   grp     measure

#1    a  1.26536789

#2    a -0.48189740

#3    a  0.23041056

#4    b -1.01324689

#5    a -1.41586086

#6    a  0.59026463

#....





#[[3]]

#  grp measure

#1   b       1

#2   b       2

#3   b       3

#4   b       4

#5   b       5

#6   a       6

#....

NOTE: This doesn't change the class of the initial dataset column

str(out)

#List of 3

# $ :'data.frame':  10 obs. of  2 variables:

#  ..$ grp    : Factor w/ 2 levels "a","b": 1 1 1 2 2 2 2 2 2 2

#  ..$ measure: num [1:10] -1.633 0.0386 -0.4654 -0.7236 -1.1155 ...

# $ :'data.frame':  26 obs. of  2 variables:

#  ..$ grp    : Factor w/ 2 levels "a","b": 1 1 1 2 1 1 1 1 1 1 ...

#  ..$ measure: num [1:26] 1.265 -0.482 0.23 -1.013 -1.416 ...

# $ :'data.frame':  9 obs. of  2 variables:

#  ..$ grp    : Factor w/ 2 levels "a","b": 2 2 2 2 2 1 1 1 1

#  ..$ measure: int [1:9] 1 2 3 4 5 6 7 8 9

Once we have a keyval pair list, this can be also used in base R functions

out1 <- lapply(df_list, transform, grp = unlist(keyval[grp]))

edited Jan 3 at 7:38

answered Jan 3 at 7:23

akrun

419k13207284

edited Jan 3 at 7:38

answered Jan 3 at 7:23

akrun

419k13207284

answered Jan 3 at 7:23

akrun

419k13207284

answered Jan 3 at 7:23

akrun

419k13207284

add a comment |

Any reason mapping a case_when statement wouldn't work?

library(tidyverse)

df_list %>% 

  map(~ mutate_if(.x, is.factor, as.character)) %>% # convert factor to character

  map(~ mutate(.x, grp = case_when(grp %in% bad_values[[1]] ~ good_values[[1]],

                                   grp %in% bad_values[[2]] ~ good_values[[2]],

                                   TRUE ~ grp)))

I could see it working for your reprex but possibly not the greater problem.

answered Jan 3 at 6:52

zack

3,4131322

add a comment |

Any reason mapping a case_when statement wouldn't work?

library(tidyverse)

df_list %>% 

  map(~ mutate_if(.x, is.factor, as.character)) %>% # convert factor to character

  map(~ mutate(.x, grp = case_when(grp %in% bad_values[[1]] ~ good_values[[1]],

                                   grp %in% bad_values[[2]] ~ good_values[[2]],

                                   TRUE ~ grp)))

I could see it working for your reprex but possibly not the greater problem.

answered Jan 3 at 6:52

zack

3,4131322

add a comment |

Any reason mapping a case_when statement wouldn't work?

library(tidyverse)

df_list %>% 

  map(~ mutate_if(.x, is.factor, as.character)) %>% # convert factor to character

  map(~ mutate(.x, grp = case_when(grp %in% bad_values[[1]] ~ good_values[[1]],

                                   grp %in% bad_values[[2]] ~ good_values[[2]],

                                   TRUE ~ grp)))

I could see it working for your reprex but possibly not the greater problem.

answered Jan 3 at 6:52

zack

3,4131322

Any reason mapping a case_when statement wouldn't work?

library(tidyverse)

df_list %>% 

  map(~ mutate_if(.x, is.factor, as.character)) %>% # convert factor to character

  map(~ mutate(.x, grp = case_when(grp %in% bad_values[[1]] ~ good_values[[1]],

                                   grp %in% bad_values[[2]] ~ good_values[[2]],

                                   TRUE ~ grp)))

I could see it working for your reprex but possibly not the greater problem.

answered Jan 3 at 6:52

zack

3,4131322

answered Jan 3 at 6:52

zack

3,4131322

answered Jan 3 at 6:52

zack

3,4131322

answered Jan 3 at 6:52

zack

3,4131322

add a comment |

A base R option if you have lot of good_values and bad_values and it is not possible to check each one individually.

lapply(df_list, function(x) {

  vec = x[['grp']]

  mapply(function(p, q) vec[vec %in% p] <<- q ,bad_values, good_values)

  transform(x, grp = vec)

})





#[[1]]

#   grp      measure

#1    a -0.648146527

#2    a -0.004722549

#3    a -0.943451194

#4    b -0.709509396

#5    b -0.719434286

#....



#[[2]]

#   grp     measure

#1    a  1.03131291

#2    a -0.85558910

#3    a -0.05933911

#4    b  0.67812934

#5    a  3.23854093

#6    a  1.31688645

#7    a  1.87464048

#8    a  0.90100179

#....



#[[3]]

#  grp measure

#1   b       1

#2   b       2

#3   b       3

#4   b       4

#5   b       5

#....

Here, for every list element we extract it's grp column and replace bad_values with corresponding good_values if they are found and return the corrected dataframe.

answered Jan 3 at 7:11

Ronak Shah

45.3k104267

add a comment |

A base R option if you have lot of good_values and bad_values and it is not possible to check each one individually.

lapply(df_list, function(x) {

  vec = x[['grp']]

  mapply(function(p, q) vec[vec %in% p] <<- q ,bad_values, good_values)

  transform(x, grp = vec)

})





#[[1]]

#   grp      measure

#1    a -0.648146527

#2    a -0.004722549

#3    a -0.943451194

#4    b -0.709509396

#5    b -0.719434286

#....



#[[2]]

#   grp     measure

#1    a  1.03131291

#2    a -0.85558910

#3    a -0.05933911

#4    b  0.67812934

#5    a  3.23854093

#6    a  1.31688645

#7    a  1.87464048

#8    a  0.90100179

#....



#[[3]]

#  grp measure

#1   b       1

#2   b       2

#3   b       3

#4   b       4

#5   b       5

#....

Here, for every list element we extract it's grp column and replace bad_values with corresponding good_values if they are found and return the corrected dataframe.

answered Jan 3 at 7:11

Ronak Shah

45.3k104267

add a comment |

A base R option if you have lot of good_values and bad_values and it is not possible to check each one individually.

lapply(df_list, function(x) {

  vec = x[['grp']]

  mapply(function(p, q) vec[vec %in% p] <<- q ,bad_values, good_values)

  transform(x, grp = vec)

})





#[[1]]

#   grp      measure

#1    a -0.648146527

#2    a -0.004722549

#3    a -0.943451194

#4    b -0.709509396

#5    b -0.719434286

#....



#[[2]]

#   grp     measure

#1    a  1.03131291

#2    a -0.85558910

#3    a -0.05933911

#4    b  0.67812934

#5    a  3.23854093

#6    a  1.31688645

#7    a  1.87464048

#8    a  0.90100179

#....



#[[3]]

#  grp measure

#1   b       1

#2   b       2

#3   b       3

#4   b       4

#5   b       5

#....

Here, for every list element we extract it's grp column and replace bad_values with corresponding good_values if they are found and return the corrected dataframe.

answered Jan 3 at 7:11

Ronak Shah

45.3k104267

A base R option if you have lot of good_values and bad_values and it is not possible to check each one individually.

lapply(df_list, function(x) {

  vec = x[['grp']]

  mapply(function(p, q) vec[vec %in% p] <<- q ,bad_values, good_values)

  transform(x, grp = vec)

})





#[[1]]

#   grp      measure

#1    a -0.648146527

#2    a -0.004722549

#3    a -0.943451194

#4    b -0.709509396

#5    b -0.719434286

#....



#[[2]]

#   grp     measure

#1    a  1.03131291

#2    a -0.85558910

#3    a -0.05933911

#4    b  0.67812934

#5    a  3.23854093

#6    a  1.31688645

#7    a  1.87464048

#8    a  0.90100179

#....



#[[3]]

#  grp measure

#1   b       1

#2   b       2

#3   b       3

#4   b       4

#5   b       5

#....

Here, for every list element we extract it's grp column and replace bad_values with corresponding good_values if they are found and return the corrected dataframe.

answered Jan 3 at 7:11

Ronak Shah

45.3k104267

answered Jan 3 at 7:11

Ronak Shah

45.3k104267

answered Jan 3 at 7:11

Ronak Shah

45.3k104267

answered Jan 3 at 7:11

Ronak Shah

45.3k104267

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu