Number of columns required to reach a minimum sum, by row
I have a data frame with rows as time and columns as principal components
(PC1 to PC10). An example can be found in the answer provided here: Rolling PCA
For each row, I want to extract the number of PC required to reach a minimum sum of 0.90. In the example table, for every row, summing over three columns gives a minimum of 0.90; so I want to extract the number 3 into a separate column. In my specific case, the number of columns required to reach 0.9 varies by row.
An example of the result I want is in the last column (PC_N).
r
add a comment |
I have a data frame with rows as time and columns as principal components
(PC1 to PC10). An example can be found in the answer provided here: Rolling PCA
For each row, I want to extract the number of PC required to reach a minimum sum of 0.90. In the example table, for every row, summing over three columns gives a minimum of 0.90; so I want to extract the number 3 into a separate column. In my specific case, the number of columns required to reach 0.9 varies by row.
An example of the result I want is in the last column (PC_N).
r
1
can you do a little example with expected outcome?
– Andre Elrico
Nov 19 '18 at 12:23
I just added a table to show the answer I need. Thanks!
– Prasanna S
Nov 19 '18 at 12:50
add a comment |
I have a data frame with rows as time and columns as principal components
(PC1 to PC10). An example can be found in the answer provided here: Rolling PCA
For each row, I want to extract the number of PC required to reach a minimum sum of 0.90. In the example table, for every row, summing over three columns gives a minimum of 0.90; so I want to extract the number 3 into a separate column. In my specific case, the number of columns required to reach 0.9 varies by row.
An example of the result I want is in the last column (PC_N).
r
I have a data frame with rows as time and columns as principal components
(PC1 to PC10). An example can be found in the answer provided here: Rolling PCA
For each row, I want to extract the number of PC required to reach a minimum sum of 0.90. In the example table, for every row, summing over three columns gives a minimum of 0.90; so I want to extract the number 3 into a separate column. In my specific case, the number of columns required to reach 0.9 varies by row.
An example of the result I want is in the last column (PC_N).
r
r
edited Nov 19 '18 at 13:12
asked Nov 19 '18 at 12:21
Prasanna S
706
706
1
can you do a little example with expected outcome?
– Andre Elrico
Nov 19 '18 at 12:23
I just added a table to show the answer I need. Thanks!
– Prasanna S
Nov 19 '18 at 12:50
add a comment |
1
can you do a little example with expected outcome?
– Andre Elrico
Nov 19 '18 at 12:23
I just added a table to show the answer I need. Thanks!
– Prasanna S
Nov 19 '18 at 12:50
1
1
can you do a little example with expected outcome?
– Andre Elrico
Nov 19 '18 at 12:23
can you do a little example with expected outcome?
– Andre Elrico
Nov 19 '18 at 12:23
I just added a table to show the answer I need. Thanks!
– Prasanna S
Nov 19 '18 at 12:50
I just added a table to show the answer I need. Thanks!
– Prasanna S
Nov 19 '18 at 12:50
add a comment |
3 Answers
3
active
oldest
votes
data: (you should provide ready to use data)
set.seed(1337)
df1 <- as.data.frame(matrix(runif(6*4), 6, 4))
code:
df1$PC_N <-
apply(df1[1:4], 1, function(x) {which(cumsum(x) >= .9)[1]})
result:
# V1 V2 V3 V4 PC_N
#1 0.8455612 0.5753591 0.04045594 0.1168015 2
#2 0.3623455 0.7868502 0.34512398 0.5304800 2
#3 0.9092146 0.5210399 0.48515698 0.2770135 1
#4 0.6730770 0.1798602 0.45335329 0.7649627 3
#5 0.3068619 0.3963743 0.98232933 0.9653852 3
#6 0.2104455 0.7860896 0.42140667 0.7954002 2
further detail:
apply( # use apply over rows (1)
df1[1:4], # apply only on PC1 to PC4 (first to 4th col)
1, # go row-wise
function(x) {
which(cumsum(x) >= .9)[1] # get first index of the cummulated sum that is at least 0.9
}) # the end
make sure you further read about the functions used: e.g. ?which
, ?apply
...
I like your solution better than mine. May I only suggestcumsum(sort(x, decreasing = T))
to generalise for cases where the elements are not sorted in decreasing value?
– Milan Valášek
Nov 19 '18 at 13:04
no, I leave it up to the end-user to input sorted data or not.
– Andre Elrico
Nov 19 '18 at 13:12
add a comment |
I'd write a function that returns the number of elements of a vector needed to add up to at least .9, na.rm = T and then apply it row-wise to the appropriate columns of df:
get.length <- function(x) {
ind <- which.max(x)
sum <- max(x)
if (sum >= .9) {
return(1)
} else {
while (sum < .9 & length(ind) != length(x)) {
ind <- c(ind, which.max(x[-ind]))
sum <- sum(x[ind], na.rm = T)
}
}
if (sum < .9) return(NA) else return(length(ind))
}
The function looks for maximum value of a vector and if it's less than .9 adds the next largest and repeats. Once .9 is reached it returns the number of elements needed to sum up to at least .9. If they don't, it returns NA
.
Note. Even though your PCs will decrease in value, the function works even if the elements are not sorted in decreasing order.
You can apply the function to the column indices of your data frame df
like this:
apply(df[ , col_indices], 1, get.length)
add a comment |
I suspect you are likely to have a prcomp
object rather than a dataframe, but no matter
exampldf <- data.frame(PC1 = c(0.97, 0.40, 0.85, 0.75),
PC2 = c(0.01, 0.20, 0.10, 0.10),
PC3 = c(0.01, 0.20, 0.03, 0.10),
PC4 = c(0.01, 0.20, 0.02, 0.05))
rownames(exampldf) <- c("WEEK1", "WEEK2", "WEEK3", "WEEK4")
library(matrixStats)
exampldf$PC_N <- 1 + rowSums(rowCumsums(as.matrix(exampldf)) < 0.9)
produces
> exampldf
PC1 PC2 PC3 PC4 PC_N
WEEK1 0.97 0.01 0.01 0.01 1
WEEK2 0.40 0.20 0.20 0.20 4
WEEK3 0.85 0.10 0.03 0.02 2
WEEK4 0.75 0.10 0.10 0.05 3
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53374527%2fnumber-of-columns-required-to-reach-a-minimum-sum-by-row%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
data: (you should provide ready to use data)
set.seed(1337)
df1 <- as.data.frame(matrix(runif(6*4), 6, 4))
code:
df1$PC_N <-
apply(df1[1:4], 1, function(x) {which(cumsum(x) >= .9)[1]})
result:
# V1 V2 V3 V4 PC_N
#1 0.8455612 0.5753591 0.04045594 0.1168015 2
#2 0.3623455 0.7868502 0.34512398 0.5304800 2
#3 0.9092146 0.5210399 0.48515698 0.2770135 1
#4 0.6730770 0.1798602 0.45335329 0.7649627 3
#5 0.3068619 0.3963743 0.98232933 0.9653852 3
#6 0.2104455 0.7860896 0.42140667 0.7954002 2
further detail:
apply( # use apply over rows (1)
df1[1:4], # apply only on PC1 to PC4 (first to 4th col)
1, # go row-wise
function(x) {
which(cumsum(x) >= .9)[1] # get first index of the cummulated sum that is at least 0.9
}) # the end
make sure you further read about the functions used: e.g. ?which
, ?apply
...
I like your solution better than mine. May I only suggestcumsum(sort(x, decreasing = T))
to generalise for cases where the elements are not sorted in decreasing value?
– Milan Valášek
Nov 19 '18 at 13:04
no, I leave it up to the end-user to input sorted data or not.
– Andre Elrico
Nov 19 '18 at 13:12
add a comment |
data: (you should provide ready to use data)
set.seed(1337)
df1 <- as.data.frame(matrix(runif(6*4), 6, 4))
code:
df1$PC_N <-
apply(df1[1:4], 1, function(x) {which(cumsum(x) >= .9)[1]})
result:
# V1 V2 V3 V4 PC_N
#1 0.8455612 0.5753591 0.04045594 0.1168015 2
#2 0.3623455 0.7868502 0.34512398 0.5304800 2
#3 0.9092146 0.5210399 0.48515698 0.2770135 1
#4 0.6730770 0.1798602 0.45335329 0.7649627 3
#5 0.3068619 0.3963743 0.98232933 0.9653852 3
#6 0.2104455 0.7860896 0.42140667 0.7954002 2
further detail:
apply( # use apply over rows (1)
df1[1:4], # apply only on PC1 to PC4 (first to 4th col)
1, # go row-wise
function(x) {
which(cumsum(x) >= .9)[1] # get first index of the cummulated sum that is at least 0.9
}) # the end
make sure you further read about the functions used: e.g. ?which
, ?apply
...
I like your solution better than mine. May I only suggestcumsum(sort(x, decreasing = T))
to generalise for cases where the elements are not sorted in decreasing value?
– Milan Valášek
Nov 19 '18 at 13:04
no, I leave it up to the end-user to input sorted data or not.
– Andre Elrico
Nov 19 '18 at 13:12
add a comment |
data: (you should provide ready to use data)
set.seed(1337)
df1 <- as.data.frame(matrix(runif(6*4), 6, 4))
code:
df1$PC_N <-
apply(df1[1:4], 1, function(x) {which(cumsum(x) >= .9)[1]})
result:
# V1 V2 V3 V4 PC_N
#1 0.8455612 0.5753591 0.04045594 0.1168015 2
#2 0.3623455 0.7868502 0.34512398 0.5304800 2
#3 0.9092146 0.5210399 0.48515698 0.2770135 1
#4 0.6730770 0.1798602 0.45335329 0.7649627 3
#5 0.3068619 0.3963743 0.98232933 0.9653852 3
#6 0.2104455 0.7860896 0.42140667 0.7954002 2
further detail:
apply( # use apply over rows (1)
df1[1:4], # apply only on PC1 to PC4 (first to 4th col)
1, # go row-wise
function(x) {
which(cumsum(x) >= .9)[1] # get first index of the cummulated sum that is at least 0.9
}) # the end
make sure you further read about the functions used: e.g. ?which
, ?apply
...
data: (you should provide ready to use data)
set.seed(1337)
df1 <- as.data.frame(matrix(runif(6*4), 6, 4))
code:
df1$PC_N <-
apply(df1[1:4], 1, function(x) {which(cumsum(x) >= .9)[1]})
result:
# V1 V2 V3 V4 PC_N
#1 0.8455612 0.5753591 0.04045594 0.1168015 2
#2 0.3623455 0.7868502 0.34512398 0.5304800 2
#3 0.9092146 0.5210399 0.48515698 0.2770135 1
#4 0.6730770 0.1798602 0.45335329 0.7649627 3
#5 0.3068619 0.3963743 0.98232933 0.9653852 3
#6 0.2104455 0.7860896 0.42140667 0.7954002 2
further detail:
apply( # use apply over rows (1)
df1[1:4], # apply only on PC1 to PC4 (first to 4th col)
1, # go row-wise
function(x) {
which(cumsum(x) >= .9)[1] # get first index of the cummulated sum that is at least 0.9
}) # the end
make sure you further read about the functions used: e.g. ?which
, ?apply
...
edited Nov 19 '18 at 13:03
answered Nov 19 '18 at 12:58
Andre Elrico
5,60811027
5,60811027
I like your solution better than mine. May I only suggestcumsum(sort(x, decreasing = T))
to generalise for cases where the elements are not sorted in decreasing value?
– Milan Valášek
Nov 19 '18 at 13:04
no, I leave it up to the end-user to input sorted data or not.
– Andre Elrico
Nov 19 '18 at 13:12
add a comment |
I like your solution better than mine. May I only suggestcumsum(sort(x, decreasing = T))
to generalise for cases where the elements are not sorted in decreasing value?
– Milan Valášek
Nov 19 '18 at 13:04
no, I leave it up to the end-user to input sorted data or not.
– Andre Elrico
Nov 19 '18 at 13:12
I like your solution better than mine. May I only suggest
cumsum(sort(x, decreasing = T))
to generalise for cases where the elements are not sorted in decreasing value?– Milan Valášek
Nov 19 '18 at 13:04
I like your solution better than mine. May I only suggest
cumsum(sort(x, decreasing = T))
to generalise for cases where the elements are not sorted in decreasing value?– Milan Valášek
Nov 19 '18 at 13:04
no, I leave it up to the end-user to input sorted data or not.
– Andre Elrico
Nov 19 '18 at 13:12
no, I leave it up to the end-user to input sorted data or not.
– Andre Elrico
Nov 19 '18 at 13:12
add a comment |
I'd write a function that returns the number of elements of a vector needed to add up to at least .9, na.rm = T and then apply it row-wise to the appropriate columns of df:
get.length <- function(x) {
ind <- which.max(x)
sum <- max(x)
if (sum >= .9) {
return(1)
} else {
while (sum < .9 & length(ind) != length(x)) {
ind <- c(ind, which.max(x[-ind]))
sum <- sum(x[ind], na.rm = T)
}
}
if (sum < .9) return(NA) else return(length(ind))
}
The function looks for maximum value of a vector and if it's less than .9 adds the next largest and repeats. Once .9 is reached it returns the number of elements needed to sum up to at least .9. If they don't, it returns NA
.
Note. Even though your PCs will decrease in value, the function works even if the elements are not sorted in decreasing order.
You can apply the function to the column indices of your data frame df
like this:
apply(df[ , col_indices], 1, get.length)
add a comment |
I'd write a function that returns the number of elements of a vector needed to add up to at least .9, na.rm = T and then apply it row-wise to the appropriate columns of df:
get.length <- function(x) {
ind <- which.max(x)
sum <- max(x)
if (sum >= .9) {
return(1)
} else {
while (sum < .9 & length(ind) != length(x)) {
ind <- c(ind, which.max(x[-ind]))
sum <- sum(x[ind], na.rm = T)
}
}
if (sum < .9) return(NA) else return(length(ind))
}
The function looks for maximum value of a vector and if it's less than .9 adds the next largest and repeats. Once .9 is reached it returns the number of elements needed to sum up to at least .9. If they don't, it returns NA
.
Note. Even though your PCs will decrease in value, the function works even if the elements are not sorted in decreasing order.
You can apply the function to the column indices of your data frame df
like this:
apply(df[ , col_indices], 1, get.length)
add a comment |
I'd write a function that returns the number of elements of a vector needed to add up to at least .9, na.rm = T and then apply it row-wise to the appropriate columns of df:
get.length <- function(x) {
ind <- which.max(x)
sum <- max(x)
if (sum >= .9) {
return(1)
} else {
while (sum < .9 & length(ind) != length(x)) {
ind <- c(ind, which.max(x[-ind]))
sum <- sum(x[ind], na.rm = T)
}
}
if (sum < .9) return(NA) else return(length(ind))
}
The function looks for maximum value of a vector and if it's less than .9 adds the next largest and repeats. Once .9 is reached it returns the number of elements needed to sum up to at least .9. If they don't, it returns NA
.
Note. Even though your PCs will decrease in value, the function works even if the elements are not sorted in decreasing order.
You can apply the function to the column indices of your data frame df
like this:
apply(df[ , col_indices], 1, get.length)
I'd write a function that returns the number of elements of a vector needed to add up to at least .9, na.rm = T and then apply it row-wise to the appropriate columns of df:
get.length <- function(x) {
ind <- which.max(x)
sum <- max(x)
if (sum >= .9) {
return(1)
} else {
while (sum < .9 & length(ind) != length(x)) {
ind <- c(ind, which.max(x[-ind]))
sum <- sum(x[ind], na.rm = T)
}
}
if (sum < .9) return(NA) else return(length(ind))
}
The function looks for maximum value of a vector and if it's less than .9 adds the next largest and repeats. Once .9 is reached it returns the number of elements needed to sum up to at least .9. If they don't, it returns NA
.
Note. Even though your PCs will decrease in value, the function works even if the elements are not sorted in decreasing order.
You can apply the function to the column indices of your data frame df
like this:
apply(df[ , col_indices], 1, get.length)
answered Nov 19 '18 at 12:57
Milan Valášek
36319
36319
add a comment |
add a comment |
I suspect you are likely to have a prcomp
object rather than a dataframe, but no matter
exampldf <- data.frame(PC1 = c(0.97, 0.40, 0.85, 0.75),
PC2 = c(0.01, 0.20, 0.10, 0.10),
PC3 = c(0.01, 0.20, 0.03, 0.10),
PC4 = c(0.01, 0.20, 0.02, 0.05))
rownames(exampldf) <- c("WEEK1", "WEEK2", "WEEK3", "WEEK4")
library(matrixStats)
exampldf$PC_N <- 1 + rowSums(rowCumsums(as.matrix(exampldf)) < 0.9)
produces
> exampldf
PC1 PC2 PC3 PC4 PC_N
WEEK1 0.97 0.01 0.01 0.01 1
WEEK2 0.40 0.20 0.20 0.20 4
WEEK3 0.85 0.10 0.03 0.02 2
WEEK4 0.75 0.10 0.10 0.05 3
add a comment |
I suspect you are likely to have a prcomp
object rather than a dataframe, but no matter
exampldf <- data.frame(PC1 = c(0.97, 0.40, 0.85, 0.75),
PC2 = c(0.01, 0.20, 0.10, 0.10),
PC3 = c(0.01, 0.20, 0.03, 0.10),
PC4 = c(0.01, 0.20, 0.02, 0.05))
rownames(exampldf) <- c("WEEK1", "WEEK2", "WEEK3", "WEEK4")
library(matrixStats)
exampldf$PC_N <- 1 + rowSums(rowCumsums(as.matrix(exampldf)) < 0.9)
produces
> exampldf
PC1 PC2 PC3 PC4 PC_N
WEEK1 0.97 0.01 0.01 0.01 1
WEEK2 0.40 0.20 0.20 0.20 4
WEEK3 0.85 0.10 0.03 0.02 2
WEEK4 0.75 0.10 0.10 0.05 3
add a comment |
I suspect you are likely to have a prcomp
object rather than a dataframe, but no matter
exampldf <- data.frame(PC1 = c(0.97, 0.40, 0.85, 0.75),
PC2 = c(0.01, 0.20, 0.10, 0.10),
PC3 = c(0.01, 0.20, 0.03, 0.10),
PC4 = c(0.01, 0.20, 0.02, 0.05))
rownames(exampldf) <- c("WEEK1", "WEEK2", "WEEK3", "WEEK4")
library(matrixStats)
exampldf$PC_N <- 1 + rowSums(rowCumsums(as.matrix(exampldf)) < 0.9)
produces
> exampldf
PC1 PC2 PC3 PC4 PC_N
WEEK1 0.97 0.01 0.01 0.01 1
WEEK2 0.40 0.20 0.20 0.20 4
WEEK3 0.85 0.10 0.03 0.02 2
WEEK4 0.75 0.10 0.10 0.05 3
I suspect you are likely to have a prcomp
object rather than a dataframe, but no matter
exampldf <- data.frame(PC1 = c(0.97, 0.40, 0.85, 0.75),
PC2 = c(0.01, 0.20, 0.10, 0.10),
PC3 = c(0.01, 0.20, 0.03, 0.10),
PC4 = c(0.01, 0.20, 0.02, 0.05))
rownames(exampldf) <- c("WEEK1", "WEEK2", "WEEK3", "WEEK4")
library(matrixStats)
exampldf$PC_N <- 1 + rowSums(rowCumsums(as.matrix(exampldf)) < 0.9)
produces
> exampldf
PC1 PC2 PC3 PC4 PC_N
WEEK1 0.97 0.01 0.01 0.01 1
WEEK2 0.40 0.20 0.20 0.20 4
WEEK3 0.85 0.10 0.03 0.02 2
WEEK4 0.75 0.10 0.10 0.05 3
answered Nov 19 '18 at 13:14
Henry
5,17411633
5,17411633
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53374527%2fnumber-of-columns-required-to-reach-a-minimum-sum-by-row%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
can you do a little example with expected outcome?
– Andre Elrico
Nov 19 '18 at 12:23
I just added a table to show the answer I need. Thanks!
– Prasanna S
Nov 19 '18 at 12:50