How to reshape data from long to wide format

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}

211

I'm having trouble rearranging the following data frame:

set.seed(45)

dat1 <- data.frame(

    name = rep(c("firstName", "secondName"), each=4),

    numbers = rep(1:4, 2),

    value = rnorm(8)

    )



dat1

       name  numbers      value

1  firstName       1  0.3407997

2  firstName       2 -0.7033403

3  firstName       3 -0.3795377

4  firstName       4 -0.7460474

5 secondName       1 -0.8981073

6 secondName       2 -0.3347941

7 secondName       3 -0.5013782

8 secondName       4 -0.1745357

I want to reshape it so that each unique "name" variable is a rowname, with the "values" as observations along that row and the "numbers" as colnames. Sort of like this:

     name          1          2          3         4

1  firstName  0.3407997 -0.7033403 -0.3795377 -0.7460474

5 secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357

I've looked at melt and cast and a few other things, but none seem to do the job.

edited Mar 30 at 14:25

NelsonGon

3,9464834

asked May 4 '11 at 22:27

Steve

1,95972628

3

possible duplicate of Reshape three column data frame to matrix

– Frank
Oct 8 '13 at 20:53

4

@Frank: this is a much better title. long-form and wide-form are the standard terms used. The other answer cannot be found by searching on those terms.

– smci
Apr 11 '14 at 5:21

3

@smci Having a better title is not a good reason not to link the questions, is it? If the questions are essentially the same, it's better for future visitors that they be linked so that all the answers can be found easily. You could mark the duplicate in the other direction instead, I suppose... Also, I do not know why you have made new tags here.

– Frank
Apr 11 '14 at 16:49

2

^^ @Frank - this answer cannot be found by searching on the terms users are likely to use. It's not a question of aesthetics.

– smci
Apr 11 '14 at 21:03

2

@smci By "this answer" you mean... the one I linked to? Yeah, I agree that it is unlikely but someone might land on one or the other in their search, and we want that person to find all the relevant answers (here and there). It's the exact same question so the answers in both places are relevant -- that's why it should be marked as a dupe. The titles don't matter; and which one is a dupe of the other doesn't matter either (as far as I can think of). If you're talking about the tags, maybe you should take it to meta...?

– Frank
Apr 11 '14 at 21:13

|
show 2 more comments

211

I'm having trouble rearranging the following data frame:

set.seed(45)

dat1 <- data.frame(

    name = rep(c("firstName", "secondName"), each=4),

    numbers = rep(1:4, 2),

    value = rnorm(8)

    )



dat1

       name  numbers      value

1  firstName       1  0.3407997

2  firstName       2 -0.7033403

3  firstName       3 -0.3795377

4  firstName       4 -0.7460474

5 secondName       1 -0.8981073

6 secondName       2 -0.3347941

7 secondName       3 -0.5013782

8 secondName       4 -0.1745357

I want to reshape it so that each unique "name" variable is a rowname, with the "values" as observations along that row and the "numbers" as colnames. Sort of like this:

     name          1          2          3         4

1  firstName  0.3407997 -0.7033403 -0.3795377 -0.7460474

5 secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357

I've looked at melt and cast and a few other things, but none seem to do the job.

edited Mar 30 at 14:25

NelsonGon

3,9464834

asked May 4 '11 at 22:27

Steve

1,95972628

3

possible duplicate of Reshape three column data frame to matrix

– Frank
Oct 8 '13 at 20:53

4

@Frank: this is a much better title. long-form and wide-form are the standard terms used. The other answer cannot be found by searching on those terms.

– smci
Apr 11 '14 at 5:21

3

@smci Having a better title is not a good reason not to link the questions, is it? If the questions are essentially the same, it's better for future visitors that they be linked so that all the answers can be found easily. You could mark the duplicate in the other direction instead, I suppose... Also, I do not know why you have made new tags here.

– Frank
Apr 11 '14 at 16:49

2

^^ @Frank - this answer cannot be found by searching on the terms users are likely to use. It's not a question of aesthetics.

– smci
Apr 11 '14 at 21:03

2

@smci By "this answer" you mean... the one I linked to? Yeah, I agree that it is unlikely but someone might land on one or the other in their search, and we want that person to find all the relevant answers (here and there). It's the exact same question so the answers in both places are relevant -- that's why it should be marked as a dupe. The titles don't matter; and which one is a dupe of the other doesn't matter either (as far as I can think of). If you're talking about the tags, maybe you should take it to meta...?

– Frank
Apr 11 '14 at 21:13

|
show 2 more comments

211

I'm having trouble rearranging the following data frame:

set.seed(45)

dat1 <- data.frame(

    name = rep(c("firstName", "secondName"), each=4),

    numbers = rep(1:4, 2),

    value = rnorm(8)

    )



dat1

       name  numbers      value

1  firstName       1  0.3407997

2  firstName       2 -0.7033403

3  firstName       3 -0.3795377

4  firstName       4 -0.7460474

5 secondName       1 -0.8981073

6 secondName       2 -0.3347941

7 secondName       3 -0.5013782

8 secondName       4 -0.1745357

I want to reshape it so that each unique "name" variable is a rowname, with the "values" as observations along that row and the "numbers" as colnames. Sort of like this:

     name          1          2          3         4

1  firstName  0.3407997 -0.7033403 -0.3795377 -0.7460474

5 secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357

I've looked at melt and cast and a few other things, but none seem to do the job.

edited Mar 30 at 14:25

NelsonGon

3,9464834

asked May 4 '11 at 22:27

Steve

1,95972628

I'm having trouble rearranging the following data frame:

set.seed(45)

dat1 <- data.frame(

    name = rep(c("firstName", "secondName"), each=4),

    numbers = rep(1:4, 2),

    value = rnorm(8)

    )



dat1

       name  numbers      value

1  firstName       1  0.3407997

2  firstName       2 -0.7033403

3  firstName       3 -0.3795377

4  firstName       4 -0.7460474

5 secondName       1 -0.8981073

6 secondName       2 -0.3347941

7 secondName       3 -0.5013782

8 secondName       4 -0.1745357

I want to reshape it so that each unique "name" variable is a rowname, with the "values" as observations along that row and the "numbers" as colnames. Sort of like this:

     name          1          2          3         4

1  firstName  0.3407997 -0.7033403 -0.3795377 -0.7460474

5 secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357

I've looked at melt and cast and a few other things, but none seem to do the job.

r reshape r-faq

edited Mar 30 at 14:25

NelsonGon

3,9464834

asked May 4 '11 at 22:27

Steve

1,95972628

edited Mar 30 at 14:25

NelsonGon

3,9464834

asked May 4 '11 at 22:27

Steve

1,95972628

edited Mar 30 at 14:25

NelsonGon

3,9464834

edited Mar 30 at 14:25

NelsonGon

3,9464834

edited Mar 30 at 14:25

NelsonGon

3,9464834

asked May 4 '11 at 22:27

Steve

1,95972628

asked May 4 '11 at 22:27

Steve

1,95972628

asked May 4 '11 at 22:27

Steve

1,95972628

3

possible duplicate of Reshape three column data frame to matrix

– Frank
Oct 8 '13 at 20:53

4

@Frank: this is a much better title. long-form and wide-form are the standard terms used. The other answer cannot be found by searching on those terms.

– smci
Apr 11 '14 at 5:21

3

@smci Having a better title is not a good reason not to link the questions, is it? If the questions are essentially the same, it's better for future visitors that they be linked so that all the answers can be found easily. You could mark the duplicate in the other direction instead, I suppose... Also, I do not know why you have made new tags here.

– Frank
Apr 11 '14 at 16:49

2

^^ @Frank - this answer cannot be found by searching on the terms users are likely to use. It's not a question of aesthetics.

– smci
Apr 11 '14 at 21:03

2

@smci By "this answer" you mean... the one I linked to? Yeah, I agree that it is unlikely but someone might land on one or the other in their search, and we want that person to find all the relevant answers (here and there). It's the exact same question so the answers in both places are relevant -- that's why it should be marked as a dupe. The titles don't matter; and which one is a dupe of the other doesn't matter either (as far as I can think of). If you're talking about the tags, maybe you should take it to meta...?

– Frank
Apr 11 '14 at 21:13

|
show 2 more comments

3

possible duplicate of Reshape three column data frame to matrix

– Frank
Oct 8 '13 at 20:53

4

@Frank: this is a much better title. long-form and wide-form are the standard terms used. The other answer cannot be found by searching on those terms.

– smci
Apr 11 '14 at 5:21

3

@smci Having a better title is not a good reason not to link the questions, is it? If the questions are essentially the same, it's better for future visitors that they be linked so that all the answers can be found easily. You could mark the duplicate in the other direction instead, I suppose... Also, I do not know why you have made new tags here.

– Frank
Apr 11 '14 at 16:49

2

^^ @Frank - this answer cannot be found by searching on the terms users are likely to use. It's not a question of aesthetics.

– smci
Apr 11 '14 at 21:03

2

@smci By "this answer" you mean... the one I linked to? Yeah, I agree that it is unlikely but someone might land on one or the other in their search, and we want that person to find all the relevant answers (here and there). It's the exact same question so the answers in both places are relevant -- that's why it should be marked as a dupe. The titles don't matter; and which one is a dupe of the other doesn't matter either (as far as I can think of). If you're talking about the tags, maybe you should take it to meta...?

– Frank
Apr 11 '14 at 21:13

possible duplicate of Reshape three column data frame to matrix

– Frank
Oct 8 '13 at 20:53

@Frank: this is a much better title. long-form and wide-form are the standard terms used. The other answer cannot be found by searching on those terms.

– smci
Apr 11 '14 at 5:21

@smci Having a better title is not a good reason not to link the questions, is it? If the questions are essentially the same, it's better for future visitors that they be linked so that all the answers can be found easily. You could mark the duplicate in the other direction instead, I suppose... Also, I do not know why you have made new tags here.

– Frank
Apr 11 '14 at 16:49

^^ @Frank - this answer cannot be found by searching on the terms users are likely to use. It's not a question of aesthetics.

– smci
Apr 11 '14 at 21:03

@smci By "this answer" you mean... the one I linked to? Yeah, I agree that it is unlikely but someone might land on one or the other in their search, and we want that person to find all the relevant answers (here and there). It's the exact same question so the answers in both places are relevant -- that's why it should be marked as a dupe. The titles don't matter; and which one is a dupe of the other doesn't matter either (as far as I can think of). If you're talking about the tags, maybe you should take it to meta...?

– Frank
Apr 11 '14 at 21:13

|
show 2 more comments

9 Answers
9

active

oldest

votes

212

Using reshape function:

reshape(dat1, idvar = "name", timevar = "numbers", direction = "wide")

answered May 4 '11 at 23:20

Chase

50.8k12120153

13

+1 and you don't need to rely on external packages, since reshape comes with stats. Not to mention that it's faster! =)

– aL3xa
May 5 '11 at 0:07

116

Good luck figuring out the arguments you need though

– hadley
May 5 '11 at 1:40

3

reshape is an outstanding example for a horrible function API. It is very close to useless.

– NoBackingDown
Oct 26 '17 at 15:18

9

The reshape comments and similar argument names aren't all that helpful. However, I have found that for long to wide, you need to provide data = your data.frame, idvar = the variable that identifies your groups, v.names = the variables that will become multiple columns in wide format, timevar = the variable containing the values that will be appended to v.names in wide format, direction = wide, and sep = "_". Clear enough? ;)

– Brian D
Nov 17 '17 at 17:11

2

I would say base R still wins vote-wise by a factor of about 2 to 1

– vonjd
Nov 22 '18 at 15:14

|
show 2 more comments

116

The new (in 2014) tidyr package also does this simply, with gather()/spread() being the terms for melt/cast.

library(tidyr)

spread(dat1, key = numbers, value = value)

From github,

tidyr is a reframing of reshape2 designed to accompany the tidy data framework, and to work hand-in-hand with magrittr and dplyr to build a solid pipeline for data analysis.

Just as reshape2 did less than reshape, tidyr does less than reshape2. It's designed specifically for tidying data, not the general reshaping that reshape2 does, or the general aggregation that reshape did. In particular, built-in methods only work for data frames, and tidyr provides no margins or aggregation.

edited Jul 29 '15 at 16:39

answered Jul 29 '14 at 19:37

Gregor

68.5k1096182

5

Just wanted to add a link to the R Cookbook page that discusses the use of these functions from tidyr and reshape2. It provides good examples and explanations.

– Jake
Apr 12 '17 at 13:01

add a comment |

You can do this with the reshape() function, or with the melt() / cast() functions in the reshape package. For the second option, example code is

library(reshape)

cast(dat1, name ~ numbers)

Or using reshape2

library(reshape2)

dcast(dat1, name ~ numbers)

edited May 26 '15 at 14:52

David Arenburg

79.3k1197165

answered May 4 '11 at 22:42

Ista

7,93112526

7

+1 And use reshape2 for a performance gain.

– Andrie
May 6 '11 at 11:56

2

It might be worth noting that just using cast or dcast will not work nicely if you don't have a clear "value" column. Try dat <- data.frame(id=c(1,1,2,2),blah=c(8,4,7,6),index=c(1,2,1,2)); dcast(dat, id ~ index); cast(dat, id ~ index) and you will not get what you expect. You need to explicitly note the value/value.var - cast(dat, id ~ index, value="blah") and dcast(dat, id ~ index, value.var="blah") for instance.

– thelatemail
Jun 21 '17 at 22:37

add a comment |

Another option if performance is a concern is to use data.table's extension of reshape2's melt & dcast functions

(Reference: Efficient reshaping using data.tables)

library(data.table)



setDT(dat1)

dcast(dat1, name ~ numbers, value.var = "value")



#          name          1          2         3         4

# 1:  firstName  0.1836433 -0.8356286 1.5952808 0.3295078

# 2: secondName -0.8204684  0.4874291 0.7383247 0.5757814

And, as of data.table v1.9.6 we can cast on multiple columns

## add an extra column

dat1[, value2 := value * 2]



## cast multiple value columns

dcast(dat1, name ~ numbers, value.var = c("value", "value2"))



#          name    value_1    value_2   value_3   value_4   value2_1   value2_2 value2_3  value2_4

# 1:  firstName  0.1836433 -0.8356286 1.5952808 0.3295078  0.3672866 -1.6712572 3.190562 0.6590155

# 2: secondName -0.8204684  0.4874291 0.7383247 0.5757814 -1.6409368  0.9748581 1.476649 1.1515627

edited Mar 27 '16 at 22:51

answered Mar 27 '16 at 22:35

SymbolixAU

16.9k33292

5

data.table approach is the best ! very efficient ... you will see the difference when name is a combination of 30-40 columns !!

– joel.wilson
Aug 31 '17 at 12:06

What if I wanted to take the max?

– T.Fung
Mar 19 at 22:54

@T.Fung I don't understand what you're asking. Might be best to open a new question?

– SymbolixAU
Mar 19 at 23:54

@SymbolixAU in op's question 'name' and 'numbers' are unique combinations. What if they were not and I wanted to fetch the max value for each combination after pivoting? Not a problem if too fiddly a question. Just food for thoughts. Thank you.

– T.Fung
Mar 20 at 10:11

add a comment |

Using your example dataframe, we could:

xtabs(value ~ name + numbers, data = dat1)

edited Sep 2 '16 at 7:37

zx8754

30.8k766105

answered May 4 '11 at 22:58

Jim M.

4,66421531

2

this one is good, but the result is of format table which not may be not so easy to handle as data.frame or data.table, both has plenty of packages

– cloudscomputes
Oct 20 '17 at 4:44

add a comment |

Other two options:

Base package:

df <- unstack(dat1, form = value ~ numbers)

rownames(df) <- unique(dat1$name)

df

sqldf package:

library(sqldf)

sqldf('SELECT name,

      MAX(CASE WHEN numbers = 1 THEN value ELSE NULL END) x1, 

      MAX(CASE WHEN numbers = 2 THEN value ELSE NULL END) x2,

      MAX(CASE WHEN numbers = 3 THEN value ELSE NULL END) x3,

      MAX(CASE WHEN numbers = 4 THEN value ELSE NULL END) x4

      FROM dat1

      GROUP BY name')

answered Jul 14 '15 at 17:44

mpalanco

7,70813643

add a comment |

Using base R aggregate function:

aggregate(value ~ name, dat1, I)



# name           value.1  value.2  value.3  value.4

#1 firstName      0.4145  -0.4747   0.0659   -0.5024

#2 secondName    -0.8259   0.1669  -0.8962    0.1681

edited Dec 25 '17 at 4:05

Onyambu

16.1k1523

answered Sep 2 '16 at 7:52

Ronak Shah

47.1k104269

add a comment |

There's very powerful new package from genius data scientists at Win-Vector (folks that made vtreat, seplyr and replyr) called cdata. It implements "coordinated data" principles described in this document and also in this blog post. The idea is that regardless how you organize your data, it should be possible to identify individual data points using a system of "data coordinates". Here's a excerpt from the recent blog post by John Mount:

The whole system is based on two primitives or operators
cdata::moveValuesToRowsD() and cdata::moveValuesToColumnsD(). These
operators have pivot, un-pivot, one-hot encode, transpose, moving
multiple rows and columns, and many other transforms as simple special
cases.

It is easy to write many different operations in terms of the
cdata primitives. These operators can work-in memory or at big data
scale (with databases and Apache Spark; for big data use the
cdata::moveValuesToRowsN() and cdata::moveValuesToColumnsN()
variants). The transforms are controlled by a control table that
itself is a diagram of (or picture of) the transform.

We will first build the control table (see blog post for details) and then perform the move of data from rows to columns.

library(cdata)

# first build the control table

pivotControlTable <- buildPivotControlTableD(table = dat1, # reference to dataset

                        columnToTakeKeysFrom = 'numbers', # this will become column headers

                        columnToTakeValuesFrom = 'value', # this contains data

                        sep="_")                          # optional for making column names



# perform the move of data to columns

dat_wide <- moveValuesToColumnsD(tallTable =  dat1, # reference to dataset

                    keyColumns = c('name'),         # this(these) column(s) should stay untouched 

                    controlTable = pivotControlTable# control table above

                    ) 

dat_wide



#>         name  numbers_1  numbers_2  numbers_3  numbers_4

#> 1  firstName  0.3407997 -0.7033403 -0.3795377 -0.7460474

#> 2 secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357

answered Dec 23 '17 at 23:01

dmi3kno

1,855822

add a comment |

The base reshape function works perfectly fine:

df <- data.frame(

  year   = c(rep(2000, 12), rep(2001, 12)),

  month  = rep(1:12, 2),

  values = rnorm(24)

)

df_wide <- reshape(df, idvar="year", timevar="month", v.names="values", direction="wide", sep="_")

df_wide

Where

idvar is the column of classes that separates rows

timevar is the column of classes to cast wide

v.names is the column containing numeric values

direction specifies wide or long format

the optional sep argument is the separator used in between timevar class names and v.names in the output data.frame.

If no idvar exists, create one before using the reshape() function:

df$id   <- c(rep("year1", 12), rep("year2", 12))

df_wide <- reshape(df, idvar="id", timevar="month", v.names="values", direction="wide", sep="_")

df_wide

Just remember that idvar is required! The timevar and v.names part is easy. The output of this function is more predictable than some of the others, as everything is explicitly defined.

edited Aug 29 '18 at 3:00

SymbolixAU

16.9k33292

answered Aug 2 '18 at 23:50

Adam Erickson

2,0241720

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f5890584%2fhow-to-reshape-data-from-long-to-wide-format%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

9 Answers
9

active

oldest

votes

9 Answers
9

active

oldest

votes

212

Using reshape function:

reshape(dat1, idvar = "name", timevar = "numbers", direction = "wide")

answered May 4 '11 at 23:20

Chase

50.8k12120153

13

+1 and you don't need to rely on external packages, since reshape comes with stats. Not to mention that it's faster! =)

– aL3xa
May 5 '11 at 0:07

116

Good luck figuring out the arguments you need though

– hadley
May 5 '11 at 1:40

3

reshape is an outstanding example for a horrible function API. It is very close to useless.

– NoBackingDown
Oct 26 '17 at 15:18

9

The reshape comments and similar argument names aren't all that helpful. However, I have found that for long to wide, you need to provide data = your data.frame, idvar = the variable that identifies your groups, v.names = the variables that will become multiple columns in wide format, timevar = the variable containing the values that will be appended to v.names in wide format, direction = wide, and sep = "_". Clear enough? ;)

– Brian D
Nov 17 '17 at 17:11

2

I would say base R still wins vote-wise by a factor of about 2 to 1

– vonjd
Nov 22 '18 at 15:14

|
show 2 more comments

212

Using reshape function:

reshape(dat1, idvar = "name", timevar = "numbers", direction = "wide")

answered May 4 '11 at 23:20

Chase

50.8k12120153

13

+1 and you don't need to rely on external packages, since reshape comes with stats. Not to mention that it's faster! =)

– aL3xa
May 5 '11 at 0:07

116

Good luck figuring out the arguments you need though

– hadley
May 5 '11 at 1:40

3

reshape is an outstanding example for a horrible function API. It is very close to useless.

– NoBackingDown
Oct 26 '17 at 15:18

9

The reshape comments and similar argument names aren't all that helpful. However, I have found that for long to wide, you need to provide data = your data.frame, idvar = the variable that identifies your groups, v.names = the variables that will become multiple columns in wide format, timevar = the variable containing the values that will be appended to v.names in wide format, direction = wide, and sep = "_". Clear enough? ;)

– Brian D
Nov 17 '17 at 17:11

2

I would say base R still wins vote-wise by a factor of about 2 to 1

– vonjd
Nov 22 '18 at 15:14

|
show 2 more comments

212

Using reshape function:

reshape(dat1, idvar = "name", timevar = "numbers", direction = "wide")

answered May 4 '11 at 23:20

Chase

50.8k12120153

Using reshape function:

reshape(dat1, idvar = "name", timevar = "numbers", direction = "wide")

answered May 4 '11 at 23:20

Chase

50.8k12120153

answered May 4 '11 at 23:20

Chase

50.8k12120153

answered May 4 '11 at 23:20

Chase

50.8k12120153

answered May 4 '11 at 23:20

Chase

50.8k12120153

13

+1 and you don't need to rely on external packages, since reshape comes with stats. Not to mention that it's faster! =)

– aL3xa
May 5 '11 at 0:07

116

Good luck figuring out the arguments you need though

– hadley
May 5 '11 at 1:40

3

reshape is an outstanding example for a horrible function API. It is very close to useless.

– NoBackingDown
Oct 26 '17 at 15:18

9

The reshape comments and similar argument names aren't all that helpful. However, I have found that for long to wide, you need to provide data = your data.frame, idvar = the variable that identifies your groups, v.names = the variables that will become multiple columns in wide format, timevar = the variable containing the values that will be appended to v.names in wide format, direction = wide, and sep = "_". Clear enough? ;)

– Brian D
Nov 17 '17 at 17:11

2

I would say base R still wins vote-wise by a factor of about 2 to 1

– vonjd
Nov 22 '18 at 15:14

|
show 2 more comments

13

+1 and you don't need to rely on external packages, since reshape comes with stats. Not to mention that it's faster! =)

– aL3xa
May 5 '11 at 0:07

116

Good luck figuring out the arguments you need though

– hadley
May 5 '11 at 1:40

3

reshape is an outstanding example for a horrible function API. It is very close to useless.

– NoBackingDown
Oct 26 '17 at 15:18

9

The reshape comments and similar argument names aren't all that helpful. However, I have found that for long to wide, you need to provide data = your data.frame, idvar = the variable that identifies your groups, v.names = the variables that will become multiple columns in wide format, timevar = the variable containing the values that will be appended to v.names in wide format, direction = wide, and sep = "_". Clear enough? ;)

– Brian D
Nov 17 '17 at 17:11

2

I would say base R still wins vote-wise by a factor of about 2 to 1

– vonjd
Nov 22 '18 at 15:14

+1 and you don't need to rely on external packages, since reshape comes with stats. Not to mention that it's faster! =)

– aL3xa
May 5 '11 at 0:07

116

Good luck figuring out the arguments you need though

– hadley
May 5 '11 at 1:40

reshape is an outstanding example for a horrible function API. It is very close to useless.

– NoBackingDown
Oct 26 '17 at 15:18

The reshape comments and similar argument names aren't all that helpful. However, I have found that for long to wide, you need to provide data = your data.frame, idvar = the variable that identifies your groups, v.names = the variables that will become multiple columns in wide format, timevar = the variable containing the values that will be appended to v.names in wide format, direction = wide, and sep = "_". Clear enough? ;)

– Brian D
Nov 17 '17 at 17:11

I would say base R still wins vote-wise by a factor of about 2 to 1

– vonjd
Nov 22 '18 at 15:14

|
show 2 more comments

116

The new (in 2014) tidyr package also does this simply, with gather()/spread() being the terms for melt/cast.

library(tidyr)

spread(dat1, key = numbers, value = value)

From github,

tidyr is a reframing of reshape2 designed to accompany the tidy data framework, and to work hand-in-hand with magrittr and dplyr to build a solid pipeline for data analysis.

Just as reshape2 did less than reshape, tidyr does less than reshape2. It's designed specifically for tidying data, not the general reshaping that reshape2 does, or the general aggregation that reshape did. In particular, built-in methods only work for data frames, and tidyr provides no margins or aggregation.

edited Jul 29 '15 at 16:39

answered Jul 29 '14 at 19:37

Gregor

68.5k1096182

5

Just wanted to add a link to the R Cookbook page that discusses the use of these functions from tidyr and reshape2. It provides good examples and explanations.

– Jake
Apr 12 '17 at 13:01

add a comment |

116

The new (in 2014) tidyr package also does this simply, with gather()/spread() being the terms for melt/cast.

library(tidyr)

spread(dat1, key = numbers, value = value)

From github,

tidyr is a reframing of reshape2 designed to accompany the tidy data framework, and to work hand-in-hand with magrittr and dplyr to build a solid pipeline for data analysis.

Just as reshape2 did less than reshape, tidyr does less than reshape2. It's designed specifically for tidying data, not the general reshaping that reshape2 does, or the general aggregation that reshape did. In particular, built-in methods only work for data frames, and tidyr provides no margins or aggregation.

edited Jul 29 '15 at 16:39

answered Jul 29 '14 at 19:37

Gregor

68.5k1096182

5

Just wanted to add a link to the R Cookbook page that discusses the use of these functions from tidyr and reshape2. It provides good examples and explanations.

– Jake
Apr 12 '17 at 13:01

add a comment |

116

The new (in 2014) tidyr package also does this simply, with gather()/spread() being the terms for melt/cast.

library(tidyr)

spread(dat1, key = numbers, value = value)

From github,

tidyr is a reframing of reshape2 designed to accompany the tidy data framework, and to work hand-in-hand with magrittr and dplyr to build a solid pipeline for data analysis.

Just as reshape2 did less than reshape, tidyr does less than reshape2. It's designed specifically for tidying data, not the general reshaping that reshape2 does, or the general aggregation that reshape did. In particular, built-in methods only work for data frames, and tidyr provides no margins or aggregation.

edited Jul 29 '15 at 16:39

answered Jul 29 '14 at 19:37

Gregor

68.5k1096182

The new (in 2014) tidyr package also does this simply, with gather()/spread() being the terms for melt/cast.

library(tidyr)

spread(dat1, key = numbers, value = value)

From github,

tidyr is a reframing of reshape2 designed to accompany the tidy data framework, and to work hand-in-hand with magrittr and dplyr to build a solid pipeline for data analysis.

Just as reshape2 did less than reshape, tidyr does less than reshape2. It's designed specifically for tidying data, not the general reshaping that reshape2 does, or the general aggregation that reshape did. In particular, built-in methods only work for data frames, and tidyr provides no margins or aggregation.

edited Jul 29 '15 at 16:39

answered Jul 29 '14 at 19:37

Gregor

68.5k1096182

edited Jul 29 '15 at 16:39

answered Jul 29 '14 at 19:37

Gregor

68.5k1096182

answered Jul 29 '14 at 19:37

Gregor

68.5k1096182

answered Jul 29 '14 at 19:37

Gregor

68.5k1096182

5

Just wanted to add a link to the R Cookbook page that discusses the use of these functions from tidyr and reshape2. It provides good examples and explanations.

– Jake
Apr 12 '17 at 13:01

add a comment |

5

Just wanted to add a link to the R Cookbook page that discusses the use of these functions from tidyr and reshape2. It provides good examples and explanations.

– Jake
Apr 12 '17 at 13:01

Just wanted to add a link to the R Cookbook page that discusses the use of these functions from tidyr and reshape2. It provides good examples and explanations.

– Jake
Apr 12 '17 at 13:01

add a comment |

You can do this with the reshape() function, or with the melt() / cast() functions in the reshape package. For the second option, example code is

library(reshape)

cast(dat1, name ~ numbers)

Or using reshape2

library(reshape2)

dcast(dat1, name ~ numbers)

edited May 26 '15 at 14:52

David Arenburg

79.3k1197165

answered May 4 '11 at 22:42

Ista

7,93112526

7

+1 And use reshape2 for a performance gain.

– Andrie
May 6 '11 at 11:56

2

It might be worth noting that just using cast or dcast will not work nicely if you don't have a clear "value" column. Try dat <- data.frame(id=c(1,1,2,2),blah=c(8,4,7,6),index=c(1,2,1,2)); dcast(dat, id ~ index); cast(dat, id ~ index) and you will not get what you expect. You need to explicitly note the value/value.var - cast(dat, id ~ index, value="blah") and dcast(dat, id ~ index, value.var="blah") for instance.

– thelatemail
Jun 21 '17 at 22:37

add a comment |

You can do this with the reshape() function, or with the melt() / cast() functions in the reshape package. For the second option, example code is

library(reshape)

cast(dat1, name ~ numbers)

Or using reshape2

library(reshape2)

dcast(dat1, name ~ numbers)

edited May 26 '15 at 14:52

David Arenburg

79.3k1197165

answered May 4 '11 at 22:42

Ista

7,93112526

7

+1 And use reshape2 for a performance gain.

– Andrie
May 6 '11 at 11:56

2

It might be worth noting that just using cast or dcast will not work nicely if you don't have a clear "value" column. Try dat <- data.frame(id=c(1,1,2,2),blah=c(8,4,7,6),index=c(1,2,1,2)); dcast(dat, id ~ index); cast(dat, id ~ index) and you will not get what you expect. You need to explicitly note the value/value.var - cast(dat, id ~ index, value="blah") and dcast(dat, id ~ index, value.var="blah") for instance.

– thelatemail
Jun 21 '17 at 22:37

add a comment |

You can do this with the reshape() function, or with the melt() / cast() functions in the reshape package. For the second option, example code is

library(reshape)

cast(dat1, name ~ numbers)

Or using reshape2

library(reshape2)

dcast(dat1, name ~ numbers)

edited May 26 '15 at 14:52

David Arenburg

79.3k1197165

answered May 4 '11 at 22:42

Ista

7,93112526

You can do this with the reshape() function, or with the melt() / cast() functions in the reshape package. For the second option, example code is

library(reshape)

cast(dat1, name ~ numbers)

Or using reshape2

library(reshape2)

dcast(dat1, name ~ numbers)

edited May 26 '15 at 14:52

David Arenburg

79.3k1197165

answered May 4 '11 at 22:42

Ista

7,93112526

edited May 26 '15 at 14:52

David Arenburg

79.3k1197165

edited May 26 '15 at 14:52

David Arenburg

79.3k1197165

edited May 26 '15 at 14:52

David Arenburg

79.3k1197165

answered May 4 '11 at 22:42

Ista

7,93112526

answered May 4 '11 at 22:42

Ista

7,93112526

answered May 4 '11 at 22:42

Ista

7,93112526

7

+1 And use reshape2 for a performance gain.

– Andrie
May 6 '11 at 11:56

2

It might be worth noting that just using cast or dcast will not work nicely if you don't have a clear "value" column. Try dat <- data.frame(id=c(1,1,2,2),blah=c(8,4,7,6),index=c(1,2,1,2)); dcast(dat, id ~ index); cast(dat, id ~ index) and you will not get what you expect. You need to explicitly note the value/value.var - cast(dat, id ~ index, value="blah") and dcast(dat, id ~ index, value.var="blah") for instance.

– thelatemail
Jun 21 '17 at 22:37

add a comment |

7

+1 And use reshape2 for a performance gain.

– Andrie
May 6 '11 at 11:56

2

It might be worth noting that just using cast or dcast will not work nicely if you don't have a clear "value" column. Try dat <- data.frame(id=c(1,1,2,2),blah=c(8,4,7,6),index=c(1,2,1,2)); dcast(dat, id ~ index); cast(dat, id ~ index) and you will not get what you expect. You need to explicitly note the value/value.var - cast(dat, id ~ index, value="blah") and dcast(dat, id ~ index, value.var="blah") for instance.

– thelatemail
Jun 21 '17 at 22:37

+1 And use reshape2 for a performance gain.

– Andrie
May 6 '11 at 11:56

It might be worth noting that just using cast or dcast will not work nicely if you don't have a clear "value" column. Try dat <- data.frame(id=c(1,1,2,2),blah=c(8,4,7,6),index=c(1,2,1,2)); dcast(dat, id ~ index); cast(dat, id ~ index) and you will not get what you expect. You need to explicitly note the value/value.var - cast(dat, id ~ index, value="blah") and dcast(dat, id ~ index, value.var="blah") for instance.

– thelatemail
Jun 21 '17 at 22:37

add a comment |

Another option if performance is a concern is to use data.table's extension of reshape2's melt & dcast functions

(Reference: Efficient reshaping using data.tables)

library(data.table)



setDT(dat1)

dcast(dat1, name ~ numbers, value.var = "value")



#          name          1          2         3         4

# 1:  firstName  0.1836433 -0.8356286 1.5952808 0.3295078

# 2: secondName -0.8204684  0.4874291 0.7383247 0.5757814

And, as of data.table v1.9.6 we can cast on multiple columns

## add an extra column

dat1[, value2 := value * 2]



## cast multiple value columns

dcast(dat1, name ~ numbers, value.var = c("value", "value2"))



#          name    value_1    value_2   value_3   value_4   value2_1   value2_2 value2_3  value2_4

# 1:  firstName  0.1836433 -0.8356286 1.5952808 0.3295078  0.3672866 -1.6712572 3.190562 0.6590155

# 2: secondName -0.8204684  0.4874291 0.7383247 0.5757814 -1.6409368  0.9748581 1.476649 1.1515627

edited Mar 27 '16 at 22:51

answered Mar 27 '16 at 22:35

SymbolixAU

16.9k33292

5

data.table approach is the best ! very efficient ... you will see the difference when name is a combination of 30-40 columns !!

– joel.wilson
Aug 31 '17 at 12:06

What if I wanted to take the max?

– T.Fung
Mar 19 at 22:54

@T.Fung I don't understand what you're asking. Might be best to open a new question?

– SymbolixAU
Mar 19 at 23:54

@SymbolixAU in op's question 'name' and 'numbers' are unique combinations. What if they were not and I wanted to fetch the max value for each combination after pivoting? Not a problem if too fiddly a question. Just food for thoughts. Thank you.

– T.Fung
Mar 20 at 10:11

add a comment |

Another option if performance is a concern is to use data.table's extension of reshape2's melt & dcast functions

(Reference: Efficient reshaping using data.tables)

library(data.table)



setDT(dat1)

dcast(dat1, name ~ numbers, value.var = "value")



#          name          1          2         3         4

# 1:  firstName  0.1836433 -0.8356286 1.5952808 0.3295078

# 2: secondName -0.8204684  0.4874291 0.7383247 0.5757814

And, as of data.table v1.9.6 we can cast on multiple columns

## add an extra column

dat1[, value2 := value * 2]



## cast multiple value columns

dcast(dat1, name ~ numbers, value.var = c("value", "value2"))



#          name    value_1    value_2   value_3   value_4   value2_1   value2_2 value2_3  value2_4

# 1:  firstName  0.1836433 -0.8356286 1.5952808 0.3295078  0.3672866 -1.6712572 3.190562 0.6590155

# 2: secondName -0.8204684  0.4874291 0.7383247 0.5757814 -1.6409368  0.9748581 1.476649 1.1515627

edited Mar 27 '16 at 22:51

answered Mar 27 '16 at 22:35

SymbolixAU

16.9k33292

5

data.table approach is the best ! very efficient ... you will see the difference when name is a combination of 30-40 columns !!

– joel.wilson
Aug 31 '17 at 12:06

What if I wanted to take the max?

– T.Fung
Mar 19 at 22:54

@T.Fung I don't understand what you're asking. Might be best to open a new question?

– SymbolixAU
Mar 19 at 23:54

@SymbolixAU in op's question 'name' and 'numbers' are unique combinations. What if they were not and I wanted to fetch the max value for each combination after pivoting? Not a problem if too fiddly a question. Just food for thoughts. Thank you.

– T.Fung
Mar 20 at 10:11

add a comment |

Another option if performance is a concern is to use data.table's extension of reshape2's melt & dcast functions

(Reference: Efficient reshaping using data.tables)

library(data.table)



setDT(dat1)

dcast(dat1, name ~ numbers, value.var = "value")



#          name          1          2         3         4

# 1:  firstName  0.1836433 -0.8356286 1.5952808 0.3295078

# 2: secondName -0.8204684  0.4874291 0.7383247 0.5757814

And, as of data.table v1.9.6 we can cast on multiple columns

## add an extra column

dat1[, value2 := value * 2]



## cast multiple value columns

dcast(dat1, name ~ numbers, value.var = c("value", "value2"))



#          name    value_1    value_2   value_3   value_4   value2_1   value2_2 value2_3  value2_4

# 1:  firstName  0.1836433 -0.8356286 1.5952808 0.3295078  0.3672866 -1.6712572 3.190562 0.6590155

# 2: secondName -0.8204684  0.4874291 0.7383247 0.5757814 -1.6409368  0.9748581 1.476649 1.1515627

edited Mar 27 '16 at 22:51

answered Mar 27 '16 at 22:35

SymbolixAU

16.9k33292

Another option if performance is a concern is to use data.table's extension of reshape2's melt & dcast functions

(Reference: Efficient reshaping using data.tables)

library(data.table)



setDT(dat1)

dcast(dat1, name ~ numbers, value.var = "value")



#          name          1          2         3         4

# 1:  firstName  0.1836433 -0.8356286 1.5952808 0.3295078

# 2: secondName -0.8204684  0.4874291 0.7383247 0.5757814

And, as of data.table v1.9.6 we can cast on multiple columns

## add an extra column

dat1[, value2 := value * 2]



## cast multiple value columns

dcast(dat1, name ~ numbers, value.var = c("value", "value2"))



#          name    value_1    value_2   value_3   value_4   value2_1   value2_2 value2_3  value2_4

# 1:  firstName  0.1836433 -0.8356286 1.5952808 0.3295078  0.3672866 -1.6712572 3.190562 0.6590155

# 2: secondName -0.8204684  0.4874291 0.7383247 0.5757814 -1.6409368  0.9748581 1.476649 1.1515627

edited Mar 27 '16 at 22:51

answered Mar 27 '16 at 22:35

SymbolixAU

16.9k33292

edited Mar 27 '16 at 22:51

answered Mar 27 '16 at 22:35

SymbolixAU

16.9k33292

answered Mar 27 '16 at 22:35

SymbolixAU

16.9k33292

answered Mar 27 '16 at 22:35

SymbolixAU

16.9k33292

5

data.table approach is the best ! very efficient ... you will see the difference when name is a combination of 30-40 columns !!

– joel.wilson
Aug 31 '17 at 12:06

What if I wanted to take the max?

– T.Fung
Mar 19 at 22:54

@T.Fung I don't understand what you're asking. Might be best to open a new question?

– SymbolixAU
Mar 19 at 23:54

@SymbolixAU in op's question 'name' and 'numbers' are unique combinations. What if they were not and I wanted to fetch the max value for each combination after pivoting? Not a problem if too fiddly a question. Just food for thoughts. Thank you.

– T.Fung
Mar 20 at 10:11

add a comment |

5

data.table approach is the best ! very efficient ... you will see the difference when name is a combination of 30-40 columns !!

– joel.wilson
Aug 31 '17 at 12:06

What if I wanted to take the max?

– T.Fung
Mar 19 at 22:54

@T.Fung I don't understand what you're asking. Might be best to open a new question?

– SymbolixAU
Mar 19 at 23:54

@SymbolixAU in op's question 'name' and 'numbers' are unique combinations. What if they were not and I wanted to fetch the max value for each combination after pivoting? Not a problem if too fiddly a question. Just food for thoughts. Thank you.

– T.Fung
Mar 20 at 10:11

data.table approach is the best ! very efficient ... you will see the difference when name is a combination of 30-40 columns !!

– joel.wilson
Aug 31 '17 at 12:06

What if I wanted to take the max?

– T.Fung
Mar 19 at 22:54

@T.Fung I don't understand what you're asking. Might be best to open a new question?

– SymbolixAU
Mar 19 at 23:54

@SymbolixAU in op's question 'name' and 'numbers' are unique combinations. What if they were not and I wanted to fetch the max value for each combination after pivoting? Not a problem if too fiddly a question. Just food for thoughts. Thank you.

– T.Fung
Mar 20 at 10:11

add a comment |

Using your example dataframe, we could:

xtabs(value ~ name + numbers, data = dat1)

edited Sep 2 '16 at 7:37

zx8754

30.8k766105

answered May 4 '11 at 22:58

Jim M.

4,66421531

2

this one is good, but the result is of format table which not may be not so easy to handle as data.frame or data.table, both has plenty of packages

– cloudscomputes
Oct 20 '17 at 4:44

add a comment |

Using your example dataframe, we could:

xtabs(value ~ name + numbers, data = dat1)

edited Sep 2 '16 at 7:37

zx8754

30.8k766105

answered May 4 '11 at 22:58

Jim M.

4,66421531

2

this one is good, but the result is of format table which not may be not so easy to handle as data.frame or data.table, both has plenty of packages

– cloudscomputes
Oct 20 '17 at 4:44

add a comment |

Using your example dataframe, we could:

xtabs(value ~ name + numbers, data = dat1)

edited Sep 2 '16 at 7:37

zx8754

30.8k766105

answered May 4 '11 at 22:58

Jim M.

4,66421531

Using your example dataframe, we could:

xtabs(value ~ name + numbers, data = dat1)

edited Sep 2 '16 at 7:37

zx8754

30.8k766105

answered May 4 '11 at 22:58

Jim M.

4,66421531

edited Sep 2 '16 at 7:37

zx8754

30.8k766105

edited Sep 2 '16 at 7:37

zx8754

30.8k766105

edited Sep 2 '16 at 7:37

zx8754

30.8k766105

answered May 4 '11 at 22:58

Jim M.

4,66421531

answered May 4 '11 at 22:58

Jim M.

4,66421531

answered May 4 '11 at 22:58

Jim M.

4,66421531

2

this one is good, but the result is of format table which not may be not so easy to handle as data.frame or data.table, both has plenty of packages

– cloudscomputes
Oct 20 '17 at 4:44

add a comment |

2

this one is good, but the result is of format table which not may be not so easy to handle as data.frame or data.table, both has plenty of packages

– cloudscomputes
Oct 20 '17 at 4:44

this one is good, but the result is of format table which not may be not so easy to handle as data.frame or data.table, both has plenty of packages

– cloudscomputes
Oct 20 '17 at 4:44

add a comment |

Other two options:

Base package:

df <- unstack(dat1, form = value ~ numbers)

rownames(df) <- unique(dat1$name)

df

sqldf package:

library(sqldf)

sqldf('SELECT name,

      MAX(CASE WHEN numbers = 1 THEN value ELSE NULL END) x1, 

      MAX(CASE WHEN numbers = 2 THEN value ELSE NULL END) x2,

      MAX(CASE WHEN numbers = 3 THEN value ELSE NULL END) x3,

      MAX(CASE WHEN numbers = 4 THEN value ELSE NULL END) x4

      FROM dat1

      GROUP BY name')

answered Jul 14 '15 at 17:44

mpalanco

7,70813643

add a comment |

Other two options:

Base package:

df <- unstack(dat1, form = value ~ numbers)

rownames(df) <- unique(dat1$name)

df

sqldf package:

library(sqldf)

sqldf('SELECT name,

      MAX(CASE WHEN numbers = 1 THEN value ELSE NULL END) x1, 

      MAX(CASE WHEN numbers = 2 THEN value ELSE NULL END) x2,

      MAX(CASE WHEN numbers = 3 THEN value ELSE NULL END) x3,

      MAX(CASE WHEN numbers = 4 THEN value ELSE NULL END) x4

      FROM dat1

      GROUP BY name')

answered Jul 14 '15 at 17:44

mpalanco

7,70813643

add a comment |

Other two options:

Base package:

df <- unstack(dat1, form = value ~ numbers)

rownames(df) <- unique(dat1$name)

df

sqldf package:

library(sqldf)

sqldf('SELECT name,

      MAX(CASE WHEN numbers = 1 THEN value ELSE NULL END) x1, 

      MAX(CASE WHEN numbers = 2 THEN value ELSE NULL END) x2,

      MAX(CASE WHEN numbers = 3 THEN value ELSE NULL END) x3,

      MAX(CASE WHEN numbers = 4 THEN value ELSE NULL END) x4

      FROM dat1

      GROUP BY name')

answered Jul 14 '15 at 17:44

mpalanco

7,70813643

Other two options:

Base package:

df <- unstack(dat1, form = value ~ numbers)

rownames(df) <- unique(dat1$name)

df

sqldf package:

library(sqldf)

sqldf('SELECT name,

      MAX(CASE WHEN numbers = 1 THEN value ELSE NULL END) x1, 

      MAX(CASE WHEN numbers = 2 THEN value ELSE NULL END) x2,

      MAX(CASE WHEN numbers = 3 THEN value ELSE NULL END) x3,

      MAX(CASE WHEN numbers = 4 THEN value ELSE NULL END) x4

      FROM dat1

      GROUP BY name')

answered Jul 14 '15 at 17:44

mpalanco

7,70813643

answered Jul 14 '15 at 17:44

mpalanco

7,70813643

answered Jul 14 '15 at 17:44

mpalanco

7,70813643

answered Jul 14 '15 at 17:44

mpalanco

7,70813643

add a comment |

Using base R aggregate function:

aggregate(value ~ name, dat1, I)



# name           value.1  value.2  value.3  value.4

#1 firstName      0.4145  -0.4747   0.0659   -0.5024

#2 secondName    -0.8259   0.1669  -0.8962    0.1681

edited Dec 25 '17 at 4:05

Onyambu

16.1k1523

answered Sep 2 '16 at 7:52

Ronak Shah

47.1k104269

add a comment |

Using base R aggregate function:

aggregate(value ~ name, dat1, I)



# name           value.1  value.2  value.3  value.4

#1 firstName      0.4145  -0.4747   0.0659   -0.5024

#2 secondName    -0.8259   0.1669  -0.8962    0.1681

edited Dec 25 '17 at 4:05

Onyambu

16.1k1523

answered Sep 2 '16 at 7:52

Ronak Shah

47.1k104269

add a comment |

Using base R aggregate function:

aggregate(value ~ name, dat1, I)



# name           value.1  value.2  value.3  value.4

#1 firstName      0.4145  -0.4747   0.0659   -0.5024

#2 secondName    -0.8259   0.1669  -0.8962    0.1681

edited Dec 25 '17 at 4:05

Onyambu

16.1k1523

answered Sep 2 '16 at 7:52

Ronak Shah

47.1k104269

Using base R aggregate function:

aggregate(value ~ name, dat1, I)



# name           value.1  value.2  value.3  value.4

#1 firstName      0.4145  -0.4747   0.0659   -0.5024

#2 secondName    -0.8259   0.1669  -0.8962    0.1681

edited Dec 25 '17 at 4:05

Onyambu

16.1k1523

answered Sep 2 '16 at 7:52

Ronak Shah

47.1k104269

edited Dec 25 '17 at 4:05

Onyambu

16.1k1523

edited Dec 25 '17 at 4:05

Onyambu

16.1k1523

edited Dec 25 '17 at 4:05

Onyambu

16.1k1523

answered Sep 2 '16 at 7:52

Ronak Shah

47.1k104269

answered Sep 2 '16 at 7:52

Ronak Shah

47.1k104269

answered Sep 2 '16 at 7:52

Ronak Shah

47.1k104269

add a comment |

The whole system is based on two primitives or operators
cdata::moveValuesToRowsD() and cdata::moveValuesToColumnsD(). These
operators have pivot, un-pivot, one-hot encode, transpose, moving
multiple rows and columns, and many other transforms as simple special
cases.

It is easy to write many different operations in terms of the
cdata primitives. These operators can work-in memory or at big data
scale (with databases and Apache Spark; for big data use the
cdata::moveValuesToRowsN() and cdata::moveValuesToColumnsN()
variants). The transforms are controlled by a control table that
itself is a diagram of (or picture of) the transform.

We will first build the control table (see blog post for details) and then perform the move of data from rows to columns.

library(cdata)

# first build the control table

pivotControlTable <- buildPivotControlTableD(table = dat1, # reference to dataset

                        columnToTakeKeysFrom = 'numbers', # this will become column headers

                        columnToTakeValuesFrom = 'value', # this contains data

                        sep="_")                          # optional for making column names



# perform the move of data to columns

dat_wide <- moveValuesToColumnsD(tallTable =  dat1, # reference to dataset

                    keyColumns = c('name'),         # this(these) column(s) should stay untouched 

                    controlTable = pivotControlTable# control table above

                    ) 

dat_wide



#>         name  numbers_1  numbers_2  numbers_3  numbers_4

#> 1  firstName  0.3407997 -0.7033403 -0.3795377 -0.7460474

#> 2 secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357

answered Dec 23 '17 at 23:01

dmi3kno

1,855822

add a comment |

The whole system is based on two primitives or operators
cdata::moveValuesToRowsD() and cdata::moveValuesToColumnsD(). These
operators have pivot, un-pivot, one-hot encode, transpose, moving
multiple rows and columns, and many other transforms as simple special
cases.

It is easy to write many different operations in terms of the
cdata primitives. These operators can work-in memory or at big data
scale (with databases and Apache Spark; for big data use the
cdata::moveValuesToRowsN() and cdata::moveValuesToColumnsN()
variants). The transforms are controlled by a control table that
itself is a diagram of (or picture of) the transform.

We will first build the control table (see blog post for details) and then perform the move of data from rows to columns.

library(cdata)

# first build the control table

pivotControlTable <- buildPivotControlTableD(table = dat1, # reference to dataset

                        columnToTakeKeysFrom = 'numbers', # this will become column headers

                        columnToTakeValuesFrom = 'value', # this contains data

                        sep="_")                          # optional for making column names



# perform the move of data to columns

dat_wide <- moveValuesToColumnsD(tallTable =  dat1, # reference to dataset

                    keyColumns = c('name'),         # this(these) column(s) should stay untouched 

                    controlTable = pivotControlTable# control table above

                    ) 

dat_wide



#>         name  numbers_1  numbers_2  numbers_3  numbers_4

#> 1  firstName  0.3407997 -0.7033403 -0.3795377 -0.7460474

#> 2 secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357

answered Dec 23 '17 at 23:01

dmi3kno

1,855822

add a comment |

The whole system is based on two primitives or operators
cdata::moveValuesToRowsD() and cdata::moveValuesToColumnsD(). These
operators have pivot, un-pivot, one-hot encode, transpose, moving
multiple rows and columns, and many other transforms as simple special
cases.

It is easy to write many different operations in terms of the
cdata primitives. These operators can work-in memory or at big data
scale (with databases and Apache Spark; for big data use the
cdata::moveValuesToRowsN() and cdata::moveValuesToColumnsN()
variants). The transforms are controlled by a control table that
itself is a diagram of (or picture of) the transform.

We will first build the control table (see blog post for details) and then perform the move of data from rows to columns.

library(cdata)

# first build the control table

pivotControlTable <- buildPivotControlTableD(table = dat1, # reference to dataset

                        columnToTakeKeysFrom = 'numbers', # this will become column headers

                        columnToTakeValuesFrom = 'value', # this contains data

                        sep="_")                          # optional for making column names



# perform the move of data to columns

dat_wide <- moveValuesToColumnsD(tallTable =  dat1, # reference to dataset

                    keyColumns = c('name'),         # this(these) column(s) should stay untouched 

                    controlTable = pivotControlTable# control table above

                    ) 

dat_wide



#>         name  numbers_1  numbers_2  numbers_3  numbers_4

#> 1  firstName  0.3407997 -0.7033403 -0.3795377 -0.7460474

#> 2 secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357

answered Dec 23 '17 at 23:01

dmi3kno

1,855822

The whole system is based on two primitives or operators
cdata::moveValuesToRowsD() and cdata::moveValuesToColumnsD(). These
operators have pivot, un-pivot, one-hot encode, transpose, moving
multiple rows and columns, and many other transforms as simple special
cases.

It is easy to write many different operations in terms of the
cdata primitives. These operators can work-in memory or at big data
scale (with databases and Apache Spark; for big data use the
cdata::moveValuesToRowsN() and cdata::moveValuesToColumnsN()
variants). The transforms are controlled by a control table that
itself is a diagram of (or picture of) the transform.

We will first build the control table (see blog post for details) and then perform the move of data from rows to columns.

library(cdata)

# first build the control table

pivotControlTable <- buildPivotControlTableD(table = dat1, # reference to dataset

                        columnToTakeKeysFrom = 'numbers', # this will become column headers

                        columnToTakeValuesFrom = 'value', # this contains data

                        sep="_")                          # optional for making column names



# perform the move of data to columns

dat_wide <- moveValuesToColumnsD(tallTable =  dat1, # reference to dataset

                    keyColumns = c('name'),         # this(these) column(s) should stay untouched 

                    controlTable = pivotControlTable# control table above

                    ) 

dat_wide



#>         name  numbers_1  numbers_2  numbers_3  numbers_4

#> 1  firstName  0.3407997 -0.7033403 -0.3795377 -0.7460474

#> 2 secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357

answered Dec 23 '17 at 23:01

dmi3kno

1,855822

answered Dec 23 '17 at 23:01

dmi3kno

1,855822

answered Dec 23 '17 at 23:01

dmi3kno

1,855822

answered Dec 23 '17 at 23:01

dmi3kno

1,855822

add a comment |

The base reshape function works perfectly fine:

df <- data.frame(

  year   = c(rep(2000, 12), rep(2001, 12)),

  month  = rep(1:12, 2),

  values = rnorm(24)

)

df_wide <- reshape(df, idvar="year", timevar="month", v.names="values", direction="wide", sep="_")

df_wide

Where

idvar is the column of classes that separates rows

timevar is the column of classes to cast wide

v.names is the column containing numeric values

direction specifies wide or long format

the optional sep argument is the separator used in between timevar class names and v.names in the output data.frame.

If no idvar exists, create one before using the reshape() function:

df$id   <- c(rep("year1", 12), rep("year2", 12))

df_wide <- reshape(df, idvar="id", timevar="month", v.names="values", direction="wide", sep="_")

df_wide

Just remember that idvar is required! The timevar and v.names part is easy. The output of this function is more predictable than some of the others, as everything is explicitly defined.

edited Aug 29 '18 at 3:00

SymbolixAU

16.9k33292

answered Aug 2 '18 at 23:50

Adam Erickson

2,0241720

add a comment |

The base reshape function works perfectly fine:

df <- data.frame(

  year   = c(rep(2000, 12), rep(2001, 12)),

  month  = rep(1:12, 2),

  values = rnorm(24)

)

df_wide <- reshape(df, idvar="year", timevar="month", v.names="values", direction="wide", sep="_")

df_wide

Where

idvar is the column of classes that separates rows

timevar is the column of classes to cast wide

v.names is the column containing numeric values

direction specifies wide or long format

the optional sep argument is the separator used in between timevar class names and v.names in the output data.frame.

If no idvar exists, create one before using the reshape() function:

df$id   <- c(rep("year1", 12), rep("year2", 12))

df_wide <- reshape(df, idvar="id", timevar="month", v.names="values", direction="wide", sep="_")

df_wide

Just remember that idvar is required! The timevar and v.names part is easy. The output of this function is more predictable than some of the others, as everything is explicitly defined.

edited Aug 29 '18 at 3:00

SymbolixAU

16.9k33292

answered Aug 2 '18 at 23:50

Adam Erickson

2,0241720

add a comment |

The base reshape function works perfectly fine:

df <- data.frame(

  year   = c(rep(2000, 12), rep(2001, 12)),

  month  = rep(1:12, 2),

  values = rnorm(24)

)

df_wide <- reshape(df, idvar="year", timevar="month", v.names="values", direction="wide", sep="_")

df_wide

Where

idvar is the column of classes that separates rows

timevar is the column of classes to cast wide

v.names is the column containing numeric values

direction specifies wide or long format

the optional sep argument is the separator used in between timevar class names and v.names in the output data.frame.

If no idvar exists, create one before using the reshape() function:

df$id   <- c(rep("year1", 12), rep("year2", 12))

df_wide <- reshape(df, idvar="id", timevar="month", v.names="values", direction="wide", sep="_")

df_wide

Just remember that idvar is required! The timevar and v.names part is easy. The output of this function is more predictable than some of the others, as everything is explicitly defined.

edited Aug 29 '18 at 3:00

SymbolixAU

16.9k33292

answered Aug 2 '18 at 23:50

Adam Erickson

2,0241720

The base reshape function works perfectly fine:

df <- data.frame(

  year   = c(rep(2000, 12), rep(2001, 12)),

  month  = rep(1:12, 2),

  values = rnorm(24)

)

df_wide <- reshape(df, idvar="year", timevar="month", v.names="values", direction="wide", sep="_")

df_wide

Where

idvar is the column of classes that separates rows

timevar is the column of classes to cast wide

v.names is the column containing numeric values

direction specifies wide or long format

the optional sep argument is the separator used in between timevar class names and v.names in the output data.frame.

If no idvar exists, create one before using the reshape() function:

df$id   <- c(rep("year1", 12), rep("year2", 12))

df_wide <- reshape(df, idvar="id", timevar="month", v.names="values", direction="wide", sep="_")

df_wide

Just remember that idvar is required! The timevar and v.names part is easy. The output of this function is more predictable than some of the others, as everything is explicitly defined.

edited Aug 29 '18 at 3:00

SymbolixAU

16.9k33292

answered Aug 2 '18 at 23:50

Adam Erickson

2,0241720

edited Aug 29 '18 at 3:00

SymbolixAU

16.9k33292

edited Aug 29 '18 at 3:00

SymbolixAU

16.9k33292

edited Aug 29 '18 at 3:00

SymbolixAU

16.9k33292

answered Aug 2 '18 at 23:50

Adam Erickson

2,0241720

answered Aug 2 '18 at 23:50

Adam Erickson

2,0241720

answered Aug 2 '18 at 23:50

Adam Erickson

2,0241720

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu