Matlab: Euclidean norm (or difference) between two vectors
up vote
0
down vote
favorite
I'd like to calculate the Euclidean distance between a vector G and each row of an array C, while dividing each row by a value in a vector GSD. What I've done seems very inefficient. What's my biggest overhead?
Could I speed it up?
m=1E7;
G=1E5*rand(1,8);
C=1E5*[zeros(m,1),rand(m,8)];
GSD=10*rand(1,8);
%I've taken the log10 of the values because G and C are very large in magnitude.
%Don't know if it's worth it.
for i=1:m
dG(i,1)=norm((log10(G)-log10(C(i,2:end)))/log10(GSD));
end
matlab
add a comment |
up vote
0
down vote
favorite
I'd like to calculate the Euclidean distance between a vector G and each row of an array C, while dividing each row by a value in a vector GSD. What I've done seems very inefficient. What's my biggest overhead?
Could I speed it up?
m=1E7;
G=1E5*rand(1,8);
C=1E5*[zeros(m,1),rand(m,8)];
GSD=10*rand(1,8);
%I've taken the log10 of the values because G and C are very large in magnitude.
%Don't know if it's worth it.
for i=1:m
dG(i,1)=norm((log10(G)-log10(C(i,2:end)))/log10(GSD));
end
matlab
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I'd like to calculate the Euclidean distance between a vector G and each row of an array C, while dividing each row by a value in a vector GSD. What I've done seems very inefficient. What's my biggest overhead?
Could I speed it up?
m=1E7;
G=1E5*rand(1,8);
C=1E5*[zeros(m,1),rand(m,8)];
GSD=10*rand(1,8);
%I've taken the log10 of the values because G and C are very large in magnitude.
%Don't know if it's worth it.
for i=1:m
dG(i,1)=norm((log10(G)-log10(C(i,2:end)))/log10(GSD));
end
matlab
I'd like to calculate the Euclidean distance between a vector G and each row of an array C, while dividing each row by a value in a vector GSD. What I've done seems very inefficient. What's my biggest overhead?
Could I speed it up?
m=1E7;
G=1E5*rand(1,8);
C=1E5*[zeros(m,1),rand(m,8)];
GSD=10*rand(1,8);
%I've taken the log10 of the values because G and C are very large in magnitude.
%Don't know if it's worth it.
for i=1:m
dG(i,1)=norm((log10(G)-log10(C(i,2:end)))/log10(GSD));
end
matlab
matlab
edited yesterday
asked yesterday
HCAI
54841337
54841337
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
up vote
1
down vote
accepted
You can use pdist2(x,y)
to calculate the pairwise distance between all elements in x
and y
, thus your example would be something like
dG = pdist2(log10(G),log10(C(:,2:end)),'mahalanobis',diag(log10(GSD)));
where the name-pair 'mahalanobis',diag(log10(GSD))
puts log10(GSD)
as weights on the Eucledean, which is the known as the Mahalanobis distance.
Implicit expansion
In newer MATLAB editions, one can also just just the implcit expansion as the first entry is only 1 vector.
dG = sqrt(sum(((log10(G)-log10(C(:,2:9)))./log10(GSD)).^2,2));
which is probably a tad faster, I do, however, prefer the pdist2
solution as I find it clearer.
Thanks Nicky, here my vector has 1 element less than my array's rows. Would it be faster to create a new dummy variable Cdum=C(:,2:9);?
– HCAI
yesterday
No that does not matter. You will have to index into it anyway.
– Nicky Mattsson
yesterday
Sorry, I misunderstand. What do you mean 'you will have to index into it anyway'?
– HCAI
yesterday
That is not the problem,C(:,2:9)
should work, the variableC
is10
long in your example soC(:,2:9)~=C(:,2:end)
. The problem is that I misinterpreted the use ofGSD
. Give me a second to fix it.
– Nicky Mattsson
yesterday
I made a mistake, size(C)=[m,9].
– HCAI
yesterday
|
show 1 more comment
up vote
1
down vote
The floating point should handle the large magnitude of the input data, up to a certain point with float
data and with any reasonable value with double
data
realmax('single')
ans =
3.4028e+38
realmax('double')
ans =
1.7977e+308
With 1e7 values in the +/- 1e5 range, you may expect the square of the Euclidian distance to be in the +/- 1e17 range (5+5+7), which both formats will handle with ease.
In any case, you should vectorize the code to remove the loop (which Matlab has a history of handling very inefficiently, especially in older versions)
With new versions (2016b and later), simply use:
tmp=(log10(G)-log10(C(:,2:end)))./log10(GSD);
dG = sqrt(sum(tmp.^2,2)); %row-by-row norm
Note that you have to use ./
which is a element-wise division, not /
which is matrix right division.
The following code will work everywhere
tmp=bsxfun(@rdivide,bsxfun(@minus,log10(G),log10(C(:,2:end))),log10(GSD));
dG = sqrt(sum(tmp.^2,2)); %row-by-row norm
I however believe that the use of log10 is a mathematical error. The result dG will not be the Euclidian norm. You should stick with the root mean square of the weighted difference:
dG = sqrt(sum(bsxfun(@rdivide,bsxfun(@minus,G,C(:,2:end)),GSD).^2,2)); % all versions
dG = sqrt(sum((G-C(:,2:end)./GSD).^2,2)); %R2016b and later
Thank you very much Brice for these answers. Why do you say that the log10 is a mathematical error? I don't absolutely have to use the Euclidean distance, but it does need to be something that takes into consideration the values GSD. GSD actually at standard deviations of each point in G. Each row of C are predictions of G.
– HCAI
yesterday
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
You can use pdist2(x,y)
to calculate the pairwise distance between all elements in x
and y
, thus your example would be something like
dG = pdist2(log10(G),log10(C(:,2:end)),'mahalanobis',diag(log10(GSD)));
where the name-pair 'mahalanobis',diag(log10(GSD))
puts log10(GSD)
as weights on the Eucledean, which is the known as the Mahalanobis distance.
Implicit expansion
In newer MATLAB editions, one can also just just the implcit expansion as the first entry is only 1 vector.
dG = sqrt(sum(((log10(G)-log10(C(:,2:9)))./log10(GSD)).^2,2));
which is probably a tad faster, I do, however, prefer the pdist2
solution as I find it clearer.
Thanks Nicky, here my vector has 1 element less than my array's rows. Would it be faster to create a new dummy variable Cdum=C(:,2:9);?
– HCAI
yesterday
No that does not matter. You will have to index into it anyway.
– Nicky Mattsson
yesterday
Sorry, I misunderstand. What do you mean 'you will have to index into it anyway'?
– HCAI
yesterday
That is not the problem,C(:,2:9)
should work, the variableC
is10
long in your example soC(:,2:9)~=C(:,2:end)
. The problem is that I misinterpreted the use ofGSD
. Give me a second to fix it.
– Nicky Mattsson
yesterday
I made a mistake, size(C)=[m,9].
– HCAI
yesterday
|
show 1 more comment
up vote
1
down vote
accepted
You can use pdist2(x,y)
to calculate the pairwise distance between all elements in x
and y
, thus your example would be something like
dG = pdist2(log10(G),log10(C(:,2:end)),'mahalanobis',diag(log10(GSD)));
where the name-pair 'mahalanobis',diag(log10(GSD))
puts log10(GSD)
as weights on the Eucledean, which is the known as the Mahalanobis distance.
Implicit expansion
In newer MATLAB editions, one can also just just the implcit expansion as the first entry is only 1 vector.
dG = sqrt(sum(((log10(G)-log10(C(:,2:9)))./log10(GSD)).^2,2));
which is probably a tad faster, I do, however, prefer the pdist2
solution as I find it clearer.
Thanks Nicky, here my vector has 1 element less than my array's rows. Would it be faster to create a new dummy variable Cdum=C(:,2:9);?
– HCAI
yesterday
No that does not matter. You will have to index into it anyway.
– Nicky Mattsson
yesterday
Sorry, I misunderstand. What do you mean 'you will have to index into it anyway'?
– HCAI
yesterday
That is not the problem,C(:,2:9)
should work, the variableC
is10
long in your example soC(:,2:9)~=C(:,2:end)
. The problem is that I misinterpreted the use ofGSD
. Give me a second to fix it.
– Nicky Mattsson
yesterday
I made a mistake, size(C)=[m,9].
– HCAI
yesterday
|
show 1 more comment
up vote
1
down vote
accepted
up vote
1
down vote
accepted
You can use pdist2(x,y)
to calculate the pairwise distance between all elements in x
and y
, thus your example would be something like
dG = pdist2(log10(G),log10(C(:,2:end)),'mahalanobis',diag(log10(GSD)));
where the name-pair 'mahalanobis',diag(log10(GSD))
puts log10(GSD)
as weights on the Eucledean, which is the known as the Mahalanobis distance.
Implicit expansion
In newer MATLAB editions, one can also just just the implcit expansion as the first entry is only 1 vector.
dG = sqrt(sum(((log10(G)-log10(C(:,2:9)))./log10(GSD)).^2,2));
which is probably a tad faster, I do, however, prefer the pdist2
solution as I find it clearer.
You can use pdist2(x,y)
to calculate the pairwise distance between all elements in x
and y
, thus your example would be something like
dG = pdist2(log10(G),log10(C(:,2:end)),'mahalanobis',diag(log10(GSD)));
where the name-pair 'mahalanobis',diag(log10(GSD))
puts log10(GSD)
as weights on the Eucledean, which is the known as the Mahalanobis distance.
Implicit expansion
In newer MATLAB editions, one can also just just the implcit expansion as the first entry is only 1 vector.
dG = sqrt(sum(((log10(G)-log10(C(:,2:9)))./log10(GSD)).^2,2));
which is probably a tad faster, I do, however, prefer the pdist2
solution as I find it clearer.
edited yesterday
answered yesterday
Nicky Mattsson
2,247625
2,247625
Thanks Nicky, here my vector has 1 element less than my array's rows. Would it be faster to create a new dummy variable Cdum=C(:,2:9);?
– HCAI
yesterday
No that does not matter. You will have to index into it anyway.
– Nicky Mattsson
yesterday
Sorry, I misunderstand. What do you mean 'you will have to index into it anyway'?
– HCAI
yesterday
That is not the problem,C(:,2:9)
should work, the variableC
is10
long in your example soC(:,2:9)~=C(:,2:end)
. The problem is that I misinterpreted the use ofGSD
. Give me a second to fix it.
– Nicky Mattsson
yesterday
I made a mistake, size(C)=[m,9].
– HCAI
yesterday
|
show 1 more comment
Thanks Nicky, here my vector has 1 element less than my array's rows. Would it be faster to create a new dummy variable Cdum=C(:,2:9);?
– HCAI
yesterday
No that does not matter. You will have to index into it anyway.
– Nicky Mattsson
yesterday
Sorry, I misunderstand. What do you mean 'you will have to index into it anyway'?
– HCAI
yesterday
That is not the problem,C(:,2:9)
should work, the variableC
is10
long in your example soC(:,2:9)~=C(:,2:end)
. The problem is that I misinterpreted the use ofGSD
. Give me a second to fix it.
– Nicky Mattsson
yesterday
I made a mistake, size(C)=[m,9].
– HCAI
yesterday
Thanks Nicky, here my vector has 1 element less than my array's rows. Would it be faster to create a new dummy variable Cdum=C(:,2:9);?
– HCAI
yesterday
Thanks Nicky, here my vector has 1 element less than my array's rows. Would it be faster to create a new dummy variable Cdum=C(:,2:9);?
– HCAI
yesterday
No that does not matter. You will have to index into it anyway.
– Nicky Mattsson
yesterday
No that does not matter. You will have to index into it anyway.
– Nicky Mattsson
yesterday
Sorry, I misunderstand. What do you mean 'you will have to index into it anyway'?
– HCAI
yesterday
Sorry, I misunderstand. What do you mean 'you will have to index into it anyway'?
– HCAI
yesterday
That is not the problem,
C(:,2:9)
should work, the variable C
is 10
long in your example so C(:,2:9)~=C(:,2:end)
. The problem is that I misinterpreted the use of GSD
. Give me a second to fix it.– Nicky Mattsson
yesterday
That is not the problem,
C(:,2:9)
should work, the variable C
is 10
long in your example so C(:,2:9)~=C(:,2:end)
. The problem is that I misinterpreted the use of GSD
. Give me a second to fix it.– Nicky Mattsson
yesterday
I made a mistake, size(C)=[m,9].
– HCAI
yesterday
I made a mistake, size(C)=[m,9].
– HCAI
yesterday
|
show 1 more comment
up vote
1
down vote
The floating point should handle the large magnitude of the input data, up to a certain point with float
data and with any reasonable value with double
data
realmax('single')
ans =
3.4028e+38
realmax('double')
ans =
1.7977e+308
With 1e7 values in the +/- 1e5 range, you may expect the square of the Euclidian distance to be in the +/- 1e17 range (5+5+7), which both formats will handle with ease.
In any case, you should vectorize the code to remove the loop (which Matlab has a history of handling very inefficiently, especially in older versions)
With new versions (2016b and later), simply use:
tmp=(log10(G)-log10(C(:,2:end)))./log10(GSD);
dG = sqrt(sum(tmp.^2,2)); %row-by-row norm
Note that you have to use ./
which is a element-wise division, not /
which is matrix right division.
The following code will work everywhere
tmp=bsxfun(@rdivide,bsxfun(@minus,log10(G),log10(C(:,2:end))),log10(GSD));
dG = sqrt(sum(tmp.^2,2)); %row-by-row norm
I however believe that the use of log10 is a mathematical error. The result dG will not be the Euclidian norm. You should stick with the root mean square of the weighted difference:
dG = sqrt(sum(bsxfun(@rdivide,bsxfun(@minus,G,C(:,2:end)),GSD).^2,2)); % all versions
dG = sqrt(sum((G-C(:,2:end)./GSD).^2,2)); %R2016b and later
Thank you very much Brice for these answers. Why do you say that the log10 is a mathematical error? I don't absolutely have to use the Euclidean distance, but it does need to be something that takes into consideration the values GSD. GSD actually at standard deviations of each point in G. Each row of C are predictions of G.
– HCAI
yesterday
add a comment |
up vote
1
down vote
The floating point should handle the large magnitude of the input data, up to a certain point with float
data and with any reasonable value with double
data
realmax('single')
ans =
3.4028e+38
realmax('double')
ans =
1.7977e+308
With 1e7 values in the +/- 1e5 range, you may expect the square of the Euclidian distance to be in the +/- 1e17 range (5+5+7), which both formats will handle with ease.
In any case, you should vectorize the code to remove the loop (which Matlab has a history of handling very inefficiently, especially in older versions)
With new versions (2016b and later), simply use:
tmp=(log10(G)-log10(C(:,2:end)))./log10(GSD);
dG = sqrt(sum(tmp.^2,2)); %row-by-row norm
Note that you have to use ./
which is a element-wise division, not /
which is matrix right division.
The following code will work everywhere
tmp=bsxfun(@rdivide,bsxfun(@minus,log10(G),log10(C(:,2:end))),log10(GSD));
dG = sqrt(sum(tmp.^2,2)); %row-by-row norm
I however believe that the use of log10 is a mathematical error. The result dG will not be the Euclidian norm. You should stick with the root mean square of the weighted difference:
dG = sqrt(sum(bsxfun(@rdivide,bsxfun(@minus,G,C(:,2:end)),GSD).^2,2)); % all versions
dG = sqrt(sum((G-C(:,2:end)./GSD).^2,2)); %R2016b and later
Thank you very much Brice for these answers. Why do you say that the log10 is a mathematical error? I don't absolutely have to use the Euclidean distance, but it does need to be something that takes into consideration the values GSD. GSD actually at standard deviations of each point in G. Each row of C are predictions of G.
– HCAI
yesterday
add a comment |
up vote
1
down vote
up vote
1
down vote
The floating point should handle the large magnitude of the input data, up to a certain point with float
data and with any reasonable value with double
data
realmax('single')
ans =
3.4028e+38
realmax('double')
ans =
1.7977e+308
With 1e7 values in the +/- 1e5 range, you may expect the square of the Euclidian distance to be in the +/- 1e17 range (5+5+7), which both formats will handle with ease.
In any case, you should vectorize the code to remove the loop (which Matlab has a history of handling very inefficiently, especially in older versions)
With new versions (2016b and later), simply use:
tmp=(log10(G)-log10(C(:,2:end)))./log10(GSD);
dG = sqrt(sum(tmp.^2,2)); %row-by-row norm
Note that you have to use ./
which is a element-wise division, not /
which is matrix right division.
The following code will work everywhere
tmp=bsxfun(@rdivide,bsxfun(@minus,log10(G),log10(C(:,2:end))),log10(GSD));
dG = sqrt(sum(tmp.^2,2)); %row-by-row norm
I however believe that the use of log10 is a mathematical error. The result dG will not be the Euclidian norm. You should stick with the root mean square of the weighted difference:
dG = sqrt(sum(bsxfun(@rdivide,bsxfun(@minus,G,C(:,2:end)),GSD).^2,2)); % all versions
dG = sqrt(sum((G-C(:,2:end)./GSD).^2,2)); %R2016b and later
The floating point should handle the large magnitude of the input data, up to a certain point with float
data and with any reasonable value with double
data
realmax('single')
ans =
3.4028e+38
realmax('double')
ans =
1.7977e+308
With 1e7 values in the +/- 1e5 range, you may expect the square of the Euclidian distance to be in the +/- 1e17 range (5+5+7), which both formats will handle with ease.
In any case, you should vectorize the code to remove the loop (which Matlab has a history of handling very inefficiently, especially in older versions)
With new versions (2016b and later), simply use:
tmp=(log10(G)-log10(C(:,2:end)))./log10(GSD);
dG = sqrt(sum(tmp.^2,2)); %row-by-row norm
Note that you have to use ./
which is a element-wise division, not /
which is matrix right division.
The following code will work everywhere
tmp=bsxfun(@rdivide,bsxfun(@minus,log10(G),log10(C(:,2:end))),log10(GSD));
dG = sqrt(sum(tmp.^2,2)); %row-by-row norm
I however believe that the use of log10 is a mathematical error. The result dG will not be the Euclidian norm. You should stick with the root mean square of the weighted difference:
dG = sqrt(sum(bsxfun(@rdivide,bsxfun(@minus,G,C(:,2:end)),GSD).^2,2)); % all versions
dG = sqrt(sum((G-C(:,2:end)./GSD).^2,2)); %R2016b and later
edited yesterday
answered yesterday
Brice
7366
7366
Thank you very much Brice for these answers. Why do you say that the log10 is a mathematical error? I don't absolutely have to use the Euclidean distance, but it does need to be something that takes into consideration the values GSD. GSD actually at standard deviations of each point in G. Each row of C are predictions of G.
– HCAI
yesterday
add a comment |
Thank you very much Brice for these answers. Why do you say that the log10 is a mathematical error? I don't absolutely have to use the Euclidean distance, but it does need to be something that takes into consideration the values GSD. GSD actually at standard deviations of each point in G. Each row of C are predictions of G.
– HCAI
yesterday
Thank you very much Brice for these answers. Why do you say that the log10 is a mathematical error? I don't absolutely have to use the Euclidean distance, but it does need to be something that takes into consideration the values GSD. GSD actually at standard deviations of each point in G. Each row of C are predictions of G.
– HCAI
yesterday
Thank you very much Brice for these answers. Why do you say that the log10 is a mathematical error? I don't absolutely have to use the Euclidean distance, but it does need to be something that takes into consideration the values GSD. GSD actually at standard deviations of each point in G. Each row of C are predictions of G.
– HCAI
yesterday
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53372579%2fmatlab-euclidean-norm-or-difference-between-two-vectors%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown