Matlab: Euclidean norm (or difference) between two vectors

up vote
0
down vote

favorite

I'd like to calculate the Euclidean distance between a vector G and each row of an array C, while dividing each row by a value in a vector GSD. What I've done seems very inefficient. What's my biggest overhead?
Could I speed it up?

m=1E7;

G=1E5*rand(1,8);

C=1E5*[zeros(m,1),rand(m,8)]; 

GSD=10*rand(1,8);



%I've taken the log10 of the values because G and C are very large in magnitude. 

%Don't know if it's worth it.



for i=1:m

    dG(i,1)=norm((log10(G)-log10(C(i,2:end)))/log10(GSD));

end

edited yesterday

asked yesterday

HCAI

54841337

add a comment |

up vote
0
down vote

favorite

m=1E7;

G=1E5*rand(1,8);

C=1E5*[zeros(m,1),rand(m,8)]; 

GSD=10*rand(1,8);



%I've taken the log10 of the values because G and C are very large in magnitude. 

%Don't know if it's worth it.



for i=1:m

    dG(i,1)=norm((log10(G)-log10(C(i,2:end)))/log10(GSD));

end

edited yesterday

asked yesterday

HCAI

54841337

add a comment |

up vote
0
down vote

favorite

m=1E7;

G=1E5*rand(1,8);

C=1E5*[zeros(m,1),rand(m,8)]; 

GSD=10*rand(1,8);



%I've taken the log10 of the values because G and C are very large in magnitude. 

%Don't know if it's worth it.



for i=1:m

    dG(i,1)=norm((log10(G)-log10(C(i,2:end)))/log10(GSD));

end

edited yesterday

asked yesterday

HCAI

54841337

m=1E7;

G=1E5*rand(1,8);

C=1E5*[zeros(m,1),rand(m,8)]; 

GSD=10*rand(1,8);



%I've taken the log10 of the values because G and C are very large in magnitude. 

%Don't know if it's worth it.



for i=1:m

    dG(i,1)=norm((log10(G)-log10(C(i,2:end)))/log10(GSD));

end

matlab

edited yesterday

asked yesterday

HCAI

54841337

edited yesterday

asked yesterday

HCAI

54841337

edited yesterday

asked yesterday

HCAI

54841337

asked yesterday

HCAI

54841337

asked yesterday

HCAI

54841337

add a comment |

2 Answers
2

active

oldest

votes

up vote
1
down vote

accepted

You can use pdist2(x,y) to calculate the pairwise distance between all elements in x and y, thus your example would be something like

dG = pdist2(log10(G),log10(C(:,2:end)),'mahalanobis',diag(log10(GSD)));

where the name-pair 'mahalanobis',diag(log10(GSD)) puts log10(GSD) as weights on the Eucledean, which is the known as the Mahalanobis distance.

Implicit expansion

In newer MATLAB editions, one can also just just the implcit expansion as the first entry is only 1 vector.

dG = sqrt(sum(((log10(G)-log10(C(:,2:9)))./log10(GSD)).^2,2));

which is probably a tad faster, I do, however, prefer the pdist2 solution as I find it clearer.

edited yesterday

answered yesterday

Nicky Mattsson

2,247625

Thanks Nicky, here my vector has 1 element less than my array's rows. Would it be faster to create a new dummy variable Cdum=C(:,2:9);?
– HCAI
yesterday

No that does not matter. You will have to index into it anyway.
– Nicky Mattsson
yesterday

Sorry, I misunderstand. What do you mean 'you will have to index into it anyway'?
– HCAI
yesterday

That is not the problem, C(:,2:9) should work, the variable C is 10 long in your example so C(:,2:9)~=C(:,2:end). The problem is that I misinterpreted the use of GSD. Give me a second to fix it.
– Nicky Mattsson
yesterday

I made a mistake, size(C)=[m,9].
– HCAI
yesterday

|
show 1 more comment

up vote
1
down vote

The floating point should handle the large magnitude of the input data, up to a certain point with float data and with any reasonable value with double data

realmax('single')

ans =

  3.4028e+38



realmax('double')

ans =

  1.7977e+308

With 1e7 values in the +/- 1e5 range, you may expect the square of the Euclidian distance to be in the +/- 1e17 range (5+5+7), which both formats will handle with ease.

In any case, you should vectorize the code to remove the loop (which Matlab has a history of handling very inefficiently, especially in older versions)

With new versions (2016b and later), simply use:

tmp=(log10(G)-log10(C(:,2:end)))./log10(GSD);

dG = sqrt(sum(tmp.^2,2)); %row-by-row norm

Note that you have to use ./ which is a element-wise division, not / which is matrix right division.

The following code will work everywhere

tmp=bsxfun(@rdivide,bsxfun(@minus,log10(G),log10(C(:,2:end))),log10(GSD));

dG = sqrt(sum(tmp.^2,2)); %row-by-row norm

I however believe that the use of log10 is a mathematical error. The result dG will not be the Euclidian norm. You should stick with the root mean square of the weighted difference:

dG = sqrt(sum(bsxfun(@rdivide,bsxfun(@minus,G,C(:,2:end)),GSD).^2,2)); % all versions

dG = sqrt(sum((G-C(:,2:end)./GSD).^2,2)); %R2016b and later

edited yesterday

answered yesterday

Brice

7366

Thank you very much Brice for these answers. Why do you say that the log10 is a mathematical error? I don't absolutely have to use the Euclidean distance, but it does need to be something that takes into consideration the values GSD. GSD actually at standard deviations of each point in G. Each row of C are predictions of G.
– HCAI
yesterday

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53372579%2fmatlab-euclidean-norm-or-difference-between-two-vectors%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
1
down vote

accepted

You can use pdist2(x,y) to calculate the pairwise distance between all elements in x and y, thus your example would be something like

dG = pdist2(log10(G),log10(C(:,2:end)),'mahalanobis',diag(log10(GSD)));

where the name-pair 'mahalanobis',diag(log10(GSD)) puts log10(GSD) as weights on the Eucledean, which is the known as the Mahalanobis distance.

Implicit expansion

In newer MATLAB editions, one can also just just the implcit expansion as the first entry is only 1 vector.

dG = sqrt(sum(((log10(G)-log10(C(:,2:9)))./log10(GSD)).^2,2));

which is probably a tad faster, I do, however, prefer the pdist2 solution as I find it clearer.

edited yesterday

answered yesterday

Nicky Mattsson

2,247625

Thanks Nicky, here my vector has 1 element less than my array's rows. Would it be faster to create a new dummy variable Cdum=C(:,2:9);?
– HCAI
yesterday

No that does not matter. You will have to index into it anyway.
– Nicky Mattsson
yesterday

Sorry, I misunderstand. What do you mean 'you will have to index into it anyway'?
– HCAI
yesterday

That is not the problem, C(:,2:9) should work, the variable C is 10 long in your example so C(:,2:9)~=C(:,2:end). The problem is that I misinterpreted the use of GSD. Give me a second to fix it.
– Nicky Mattsson
yesterday

I made a mistake, size(C)=[m,9].
– HCAI
yesterday

|
show 1 more comment

up vote
1
down vote

accepted

You can use pdist2(x,y) to calculate the pairwise distance between all elements in x and y, thus your example would be something like

dG = pdist2(log10(G),log10(C(:,2:end)),'mahalanobis',diag(log10(GSD)));

where the name-pair 'mahalanobis',diag(log10(GSD)) puts log10(GSD) as weights on the Eucledean, which is the known as the Mahalanobis distance.

Implicit expansion

In newer MATLAB editions, one can also just just the implcit expansion as the first entry is only 1 vector.

dG = sqrt(sum(((log10(G)-log10(C(:,2:9)))./log10(GSD)).^2,2));

which is probably a tad faster, I do, however, prefer the pdist2 solution as I find it clearer.

edited yesterday

answered yesterday

Nicky Mattsson

2,247625

Thanks Nicky, here my vector has 1 element less than my array's rows. Would it be faster to create a new dummy variable Cdum=C(:,2:9);?
– HCAI
yesterday

No that does not matter. You will have to index into it anyway.
– Nicky Mattsson
yesterday

Sorry, I misunderstand. What do you mean 'you will have to index into it anyway'?
– HCAI
yesterday

That is not the problem, C(:,2:9) should work, the variable C is 10 long in your example so C(:,2:9)~=C(:,2:end). The problem is that I misinterpreted the use of GSD. Give me a second to fix it.
– Nicky Mattsson
yesterday

I made a mistake, size(C)=[m,9].
– HCAI
yesterday

|
show 1 more comment

up vote
1
down vote

accepted

You can use pdist2(x,y) to calculate the pairwise distance between all elements in x and y, thus your example would be something like

dG = pdist2(log10(G),log10(C(:,2:end)),'mahalanobis',diag(log10(GSD)));

where the name-pair 'mahalanobis',diag(log10(GSD)) puts log10(GSD) as weights on the Eucledean, which is the known as the Mahalanobis distance.

Implicit expansion

In newer MATLAB editions, one can also just just the implcit expansion as the first entry is only 1 vector.

dG = sqrt(sum(((log10(G)-log10(C(:,2:9)))./log10(GSD)).^2,2));

which is probably a tad faster, I do, however, prefer the pdist2 solution as I find it clearer.

edited yesterday

answered yesterday

Nicky Mattsson

2,247625

You can use pdist2(x,y) to calculate the pairwise distance between all elements in x and y, thus your example would be something like

dG = pdist2(log10(G),log10(C(:,2:end)),'mahalanobis',diag(log10(GSD)));

where the name-pair 'mahalanobis',diag(log10(GSD)) puts log10(GSD) as weights on the Eucledean, which is the known as the Mahalanobis distance.

Implicit expansion

In newer MATLAB editions, one can also just just the implcit expansion as the first entry is only 1 vector.

dG = sqrt(sum(((log10(G)-log10(C(:,2:9)))./log10(GSD)).^2,2));

which is probably a tad faster, I do, however, prefer the pdist2 solution as I find it clearer.

edited yesterday

answered yesterday

Nicky Mattsson

2,247625

edited yesterday

answered yesterday

Nicky Mattsson

2,247625

answered yesterday

Nicky Mattsson

2,247625

answered yesterday

Nicky Mattsson

2,247625

Thanks Nicky, here my vector has 1 element less than my array's rows. Would it be faster to create a new dummy variable Cdum=C(:,2:9);?
– HCAI
yesterday

No that does not matter. You will have to index into it anyway.
– Nicky Mattsson
yesterday

Sorry, I misunderstand. What do you mean 'you will have to index into it anyway'?
– HCAI
yesterday

That is not the problem, C(:,2:9) should work, the variable C is 10 long in your example so C(:,2:9)~=C(:,2:end). The problem is that I misinterpreted the use of GSD. Give me a second to fix it.
– Nicky Mattsson
yesterday

I made a mistake, size(C)=[m,9].
– HCAI
yesterday

|
show 1 more comment

Thanks Nicky, here my vector has 1 element less than my array's rows. Would it be faster to create a new dummy variable Cdum=C(:,2:9);?
– HCAI
yesterday

No that does not matter. You will have to index into it anyway.
– Nicky Mattsson
yesterday

Sorry, I misunderstand. What do you mean 'you will have to index into it anyway'?
– HCAI
yesterday

That is not the problem, C(:,2:9) should work, the variable C is 10 long in your example so C(:,2:9)~=C(:,2:end). The problem is that I misinterpreted the use of GSD. Give me a second to fix it.
– Nicky Mattsson
yesterday

I made a mistake, size(C)=[m,9].
– HCAI
yesterday

Thanks Nicky, here my vector has 1 element less than my array's rows. Would it be faster to create a new dummy variable Cdum=C(:,2:9);?
– HCAI
yesterday

No that does not matter. You will have to index into it anyway.
– Nicky Mattsson
yesterday

Sorry, I misunderstand. What do you mean 'you will have to index into it anyway'?
– HCAI
yesterday

That is not the problem, C(:,2:9) should work, the variable C is 10 long in your example so C(:,2:9)~=C(:,2:end). The problem is that I misinterpreted the use of GSD. Give me a second to fix it.
– Nicky Mattsson
yesterday

I made a mistake, size(C)=[m,9].
– HCAI
yesterday

|
show 1 more comment

up vote
1
down vote

The floating point should handle the large magnitude of the input data, up to a certain point with float data and with any reasonable value with double data

realmax('single')

ans =

  3.4028e+38



realmax('double')

ans =

  1.7977e+308

With 1e7 values in the +/- 1e5 range, you may expect the square of the Euclidian distance to be in the +/- 1e17 range (5+5+7), which both formats will handle with ease.

In any case, you should vectorize the code to remove the loop (which Matlab has a history of handling very inefficiently, especially in older versions)

With new versions (2016b and later), simply use:

tmp=(log10(G)-log10(C(:,2:end)))./log10(GSD);

dG = sqrt(sum(tmp.^2,2)); %row-by-row norm

Note that you have to use ./ which is a element-wise division, not / which is matrix right division.

The following code will work everywhere

tmp=bsxfun(@rdivide,bsxfun(@minus,log10(G),log10(C(:,2:end))),log10(GSD));

dG = sqrt(sum(tmp.^2,2)); %row-by-row norm

I however believe that the use of log10 is a mathematical error. The result dG will not be the Euclidian norm. You should stick with the root mean square of the weighted difference:

dG = sqrt(sum(bsxfun(@rdivide,bsxfun(@minus,G,C(:,2:end)),GSD).^2,2)); % all versions

dG = sqrt(sum((G-C(:,2:end)./GSD).^2,2)); %R2016b and later

edited yesterday

answered yesterday

Brice

7366

Thank you very much Brice for these answers. Why do you say that the log10 is a mathematical error? I don't absolutely have to use the Euclidean distance, but it does need to be something that takes into consideration the values GSD. GSD actually at standard deviations of each point in G. Each row of C are predictions of G.
– HCAI
yesterday

add a comment |

up vote
1
down vote

The floating point should handle the large magnitude of the input data, up to a certain point with float data and with any reasonable value with double data

realmax('single')

ans =

  3.4028e+38



realmax('double')

ans =

  1.7977e+308

With 1e7 values in the +/- 1e5 range, you may expect the square of the Euclidian distance to be in the +/- 1e17 range (5+5+7), which both formats will handle with ease.

In any case, you should vectorize the code to remove the loop (which Matlab has a history of handling very inefficiently, especially in older versions)

With new versions (2016b and later), simply use:

tmp=(log10(G)-log10(C(:,2:end)))./log10(GSD);

dG = sqrt(sum(tmp.^2,2)); %row-by-row norm

Note that you have to use ./ which is a element-wise division, not / which is matrix right division.

The following code will work everywhere

tmp=bsxfun(@rdivide,bsxfun(@minus,log10(G),log10(C(:,2:end))),log10(GSD));

dG = sqrt(sum(tmp.^2,2)); %row-by-row norm

I however believe that the use of log10 is a mathematical error. The result dG will not be the Euclidian norm. You should stick with the root mean square of the weighted difference:

dG = sqrt(sum(bsxfun(@rdivide,bsxfun(@minus,G,C(:,2:end)),GSD).^2,2)); % all versions

dG = sqrt(sum((G-C(:,2:end)./GSD).^2,2)); %R2016b and later

edited yesterday

answered yesterday

Brice

7366

Thank you very much Brice for these answers. Why do you say that the log10 is a mathematical error? I don't absolutely have to use the Euclidean distance, but it does need to be something that takes into consideration the values GSD. GSD actually at standard deviations of each point in G. Each row of C are predictions of G.
– HCAI
yesterday

add a comment |

up vote
1
down vote

The floating point should handle the large magnitude of the input data, up to a certain point with float data and with any reasonable value with double data

realmax('single')

ans =

  3.4028e+38



realmax('double')

ans =

  1.7977e+308

With 1e7 values in the +/- 1e5 range, you may expect the square of the Euclidian distance to be in the +/- 1e17 range (5+5+7), which both formats will handle with ease.

In any case, you should vectorize the code to remove the loop (which Matlab has a history of handling very inefficiently, especially in older versions)

With new versions (2016b and later), simply use:

tmp=(log10(G)-log10(C(:,2:end)))./log10(GSD);

dG = sqrt(sum(tmp.^2,2)); %row-by-row norm

Note that you have to use ./ which is a element-wise division, not / which is matrix right division.

The following code will work everywhere

tmp=bsxfun(@rdivide,bsxfun(@minus,log10(G),log10(C(:,2:end))),log10(GSD));

dG = sqrt(sum(tmp.^2,2)); %row-by-row norm

I however believe that the use of log10 is a mathematical error. The result dG will not be the Euclidian norm. You should stick with the root mean square of the weighted difference:

dG = sqrt(sum(bsxfun(@rdivide,bsxfun(@minus,G,C(:,2:end)),GSD).^2,2)); % all versions

dG = sqrt(sum((G-C(:,2:end)./GSD).^2,2)); %R2016b and later

edited yesterday

answered yesterday

Brice

7366

The floating point should handle the large magnitude of the input data, up to a certain point with float data and with any reasonable value with double data

realmax('single')

ans =

  3.4028e+38



realmax('double')

ans =

  1.7977e+308

With 1e7 values in the +/- 1e5 range, you may expect the square of the Euclidian distance to be in the +/- 1e17 range (5+5+7), which both formats will handle with ease.

In any case, you should vectorize the code to remove the loop (which Matlab has a history of handling very inefficiently, especially in older versions)

With new versions (2016b and later), simply use:

tmp=(log10(G)-log10(C(:,2:end)))./log10(GSD);

dG = sqrt(sum(tmp.^2,2)); %row-by-row norm

Note that you have to use ./ which is a element-wise division, not / which is matrix right division.

The following code will work everywhere

tmp=bsxfun(@rdivide,bsxfun(@minus,log10(G),log10(C(:,2:end))),log10(GSD));

dG = sqrt(sum(tmp.^2,2)); %row-by-row norm

I however believe that the use of log10 is a mathematical error. The result dG will not be the Euclidian norm. You should stick with the root mean square of the weighted difference:

dG = sqrt(sum(bsxfun(@rdivide,bsxfun(@minus,G,C(:,2:end)),GSD).^2,2)); % all versions

dG = sqrt(sum((G-C(:,2:end)./GSD).^2,2)); %R2016b and later

edited yesterday

answered yesterday

Brice

7366

edited yesterday

answered yesterday

Brice

7366

answered yesterday

Brice

7366

answered yesterday

Brice

7366

Thank you very much Brice for these answers. Why do you say that the log10 is a mathematical error? I don't absolutely have to use the Euclidean distance, but it does need to be something that takes into consideration the values GSD. GSD actually at standard deviations of each point in G. Each row of C are predictions of G.
– HCAI
yesterday

add a comment |

Thank you very much Brice for these answers. Why do you say that the log10 is a mathematical error? I don't absolutely have to use the Euclidean distance, but it does need to be something that takes into consideration the values GSD. GSD actually at standard deviations of each point in G. Each row of C are predictions of G.
– HCAI
yesterday

Thank you very much Brice for these answers. Why do you say that the log10 is a mathematical error? I don't absolutely have to use the Euclidean distance, but it does need to be something that takes into consideration the values GSD. GSD actually at standard deviations of each point in G. Each row of C are predictions of G.
– HCAI
yesterday

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu