Matlab: Euclidean norm (or difference) between two vectors











up vote
0
down vote

favorite












I'd like to calculate the Euclidean distance between a vector G and each row of an array C, while dividing each row by a value in a vector GSD. What I've done seems very inefficient. What's my biggest overhead?
Could I speed it up?



m=1E7;
G=1E5*rand(1,8);
C=1E5*[zeros(m,1),rand(m,8)];
GSD=10*rand(1,8);

%I've taken the log10 of the values because G and C are very large in magnitude.
%Don't know if it's worth it.

for i=1:m
dG(i,1)=norm((log10(G)-log10(C(i,2:end)))/log10(GSD));
end









share|improve this question




























    up vote
    0
    down vote

    favorite












    I'd like to calculate the Euclidean distance between a vector G and each row of an array C, while dividing each row by a value in a vector GSD. What I've done seems very inefficient. What's my biggest overhead?
    Could I speed it up?



    m=1E7;
    G=1E5*rand(1,8);
    C=1E5*[zeros(m,1),rand(m,8)];
    GSD=10*rand(1,8);

    %I've taken the log10 of the values because G and C are very large in magnitude.
    %Don't know if it's worth it.

    for i=1:m
    dG(i,1)=norm((log10(G)-log10(C(i,2:end)))/log10(GSD));
    end









    share|improve this question


























      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      I'd like to calculate the Euclidean distance between a vector G and each row of an array C, while dividing each row by a value in a vector GSD. What I've done seems very inefficient. What's my biggest overhead?
      Could I speed it up?



      m=1E7;
      G=1E5*rand(1,8);
      C=1E5*[zeros(m,1),rand(m,8)];
      GSD=10*rand(1,8);

      %I've taken the log10 of the values because G and C are very large in magnitude.
      %Don't know if it's worth it.

      for i=1:m
      dG(i,1)=norm((log10(G)-log10(C(i,2:end)))/log10(GSD));
      end









      share|improve this question















      I'd like to calculate the Euclidean distance between a vector G and each row of an array C, while dividing each row by a value in a vector GSD. What I've done seems very inefficient. What's my biggest overhead?
      Could I speed it up?



      m=1E7;
      G=1E5*rand(1,8);
      C=1E5*[zeros(m,1),rand(m,8)];
      GSD=10*rand(1,8);

      %I've taken the log10 of the values because G and C are very large in magnitude.
      %Don't know if it's worth it.

      for i=1:m
      dG(i,1)=norm((log10(G)-log10(C(i,2:end)))/log10(GSD));
      end






      matlab






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited yesterday

























      asked yesterday









      HCAI

      54841337




      54841337
























          2 Answers
          2






          active

          oldest

          votes

















          up vote
          1
          down vote



          accepted










          You can use pdist2(x,y) to calculate the pairwise distance between all elements in x and y, thus your example would be something like



          dG = pdist2(log10(G),log10(C(:,2:end)),'mahalanobis',diag(log10(GSD)));


          where the name-pair 'mahalanobis',diag(log10(GSD)) puts log10(GSD) as weights on the Eucledean, which is the known as the Mahalanobis distance.



          Implicit expansion



          In newer MATLAB editions, one can also just just the implcit expansion as the first entry is only 1 vector.



          dG = sqrt(sum(((log10(G)-log10(C(:,2:9)))./log10(GSD)).^2,2));


          which is probably a tad faster, I do, however, prefer the pdist2 solution as I find it clearer.






          share|improve this answer























          • Thanks Nicky, here my vector has 1 element less than my array's rows. Would it be faster to create a new dummy variable Cdum=C(:,2:9);?
            – HCAI
            yesterday










          • No that does not matter. You will have to index into it anyway.
            – Nicky Mattsson
            yesterday










          • Sorry, I misunderstand. What do you mean 'you will have to index into it anyway'?
            – HCAI
            yesterday












          • That is not the problem, C(:,2:9) should work, the variable C is 10 long in your example so C(:,2:9)~=C(:,2:end). The problem is that I misinterpreted the use of GSD. Give me a second to fix it.
            – Nicky Mattsson
            yesterday










          • I made a mistake, size(C)=[m,9].
            – HCAI
            yesterday


















          up vote
          1
          down vote













          The floating point should handle the large magnitude of the input data, up to a certain point with float data and with any reasonable value with double data



          realmax('single')
          ans =
          3.4028e+38

          realmax('double')
          ans =
          1.7977e+308


          With 1e7 values in the +/- 1e5 range, you may expect the square of the Euclidian distance to be in the +/- 1e17 range (5+5+7), which both formats will handle with ease.



          In any case, you should vectorize the code to remove the loop (which Matlab has a history of handling very inefficiently, especially in older versions)



          With new versions (2016b and later), simply use:



          tmp=(log10(G)-log10(C(:,2:end)))./log10(GSD);
          dG = sqrt(sum(tmp.^2,2)); %row-by-row norm


          Note that you have to use ./ which is a element-wise division, not / which is matrix right division.



          The following code will work everywhere



          tmp=bsxfun(@rdivide,bsxfun(@minus,log10(G),log10(C(:,2:end))),log10(GSD));
          dG = sqrt(sum(tmp.^2,2)); %row-by-row norm


          I however believe that the use of log10 is a mathematical error. The result dG will not be the Euclidian norm. You should stick with the root mean square of the weighted difference:



          dG = sqrt(sum(bsxfun(@rdivide,bsxfun(@minus,G,C(:,2:end)),GSD).^2,2)); % all versions
          dG = sqrt(sum((G-C(:,2:end)./GSD).^2,2)); %R2016b and later





          share|improve this answer























          • Thank you very much Brice for these answers. Why do you say that the log10 is a mathematical error? I don't absolutely have to use the Euclidean distance, but it does need to be something that takes into consideration the values GSD. GSD actually at standard deviations of each point in G. Each row of C are predictions of G.
            – HCAI
            yesterday











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














           

          draft saved


          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53372579%2fmatlab-euclidean-norm-or-difference-between-two-vectors%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          2 Answers
          2






          active

          oldest

          votes








          2 Answers
          2






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          1
          down vote



          accepted










          You can use pdist2(x,y) to calculate the pairwise distance between all elements in x and y, thus your example would be something like



          dG = pdist2(log10(G),log10(C(:,2:end)),'mahalanobis',diag(log10(GSD)));


          where the name-pair 'mahalanobis',diag(log10(GSD)) puts log10(GSD) as weights on the Eucledean, which is the known as the Mahalanobis distance.



          Implicit expansion



          In newer MATLAB editions, one can also just just the implcit expansion as the first entry is only 1 vector.



          dG = sqrt(sum(((log10(G)-log10(C(:,2:9)))./log10(GSD)).^2,2));


          which is probably a tad faster, I do, however, prefer the pdist2 solution as I find it clearer.






          share|improve this answer























          • Thanks Nicky, here my vector has 1 element less than my array's rows. Would it be faster to create a new dummy variable Cdum=C(:,2:9);?
            – HCAI
            yesterday










          • No that does not matter. You will have to index into it anyway.
            – Nicky Mattsson
            yesterday










          • Sorry, I misunderstand. What do you mean 'you will have to index into it anyway'?
            – HCAI
            yesterday












          • That is not the problem, C(:,2:9) should work, the variable C is 10 long in your example so C(:,2:9)~=C(:,2:end). The problem is that I misinterpreted the use of GSD. Give me a second to fix it.
            – Nicky Mattsson
            yesterday










          • I made a mistake, size(C)=[m,9].
            – HCAI
            yesterday















          up vote
          1
          down vote



          accepted










          You can use pdist2(x,y) to calculate the pairwise distance between all elements in x and y, thus your example would be something like



          dG = pdist2(log10(G),log10(C(:,2:end)),'mahalanobis',diag(log10(GSD)));


          where the name-pair 'mahalanobis',diag(log10(GSD)) puts log10(GSD) as weights on the Eucledean, which is the known as the Mahalanobis distance.



          Implicit expansion



          In newer MATLAB editions, one can also just just the implcit expansion as the first entry is only 1 vector.



          dG = sqrt(sum(((log10(G)-log10(C(:,2:9)))./log10(GSD)).^2,2));


          which is probably a tad faster, I do, however, prefer the pdist2 solution as I find it clearer.






          share|improve this answer























          • Thanks Nicky, here my vector has 1 element less than my array's rows. Would it be faster to create a new dummy variable Cdum=C(:,2:9);?
            – HCAI
            yesterday










          • No that does not matter. You will have to index into it anyway.
            – Nicky Mattsson
            yesterday










          • Sorry, I misunderstand. What do you mean 'you will have to index into it anyway'?
            – HCAI
            yesterday












          • That is not the problem, C(:,2:9) should work, the variable C is 10 long in your example so C(:,2:9)~=C(:,2:end). The problem is that I misinterpreted the use of GSD. Give me a second to fix it.
            – Nicky Mattsson
            yesterday










          • I made a mistake, size(C)=[m,9].
            – HCAI
            yesterday













          up vote
          1
          down vote



          accepted







          up vote
          1
          down vote



          accepted






          You can use pdist2(x,y) to calculate the pairwise distance between all elements in x and y, thus your example would be something like



          dG = pdist2(log10(G),log10(C(:,2:end)),'mahalanobis',diag(log10(GSD)));


          where the name-pair 'mahalanobis',diag(log10(GSD)) puts log10(GSD) as weights on the Eucledean, which is the known as the Mahalanobis distance.



          Implicit expansion



          In newer MATLAB editions, one can also just just the implcit expansion as the first entry is only 1 vector.



          dG = sqrt(sum(((log10(G)-log10(C(:,2:9)))./log10(GSD)).^2,2));


          which is probably a tad faster, I do, however, prefer the pdist2 solution as I find it clearer.






          share|improve this answer














          You can use pdist2(x,y) to calculate the pairwise distance between all elements in x and y, thus your example would be something like



          dG = pdist2(log10(G),log10(C(:,2:end)),'mahalanobis',diag(log10(GSD)));


          where the name-pair 'mahalanobis',diag(log10(GSD)) puts log10(GSD) as weights on the Eucledean, which is the known as the Mahalanobis distance.



          Implicit expansion



          In newer MATLAB editions, one can also just just the implcit expansion as the first entry is only 1 vector.



          dG = sqrt(sum(((log10(G)-log10(C(:,2:9)))./log10(GSD)).^2,2));


          which is probably a tad faster, I do, however, prefer the pdist2 solution as I find it clearer.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited yesterday

























          answered yesterday









          Nicky Mattsson

          2,247625




          2,247625












          • Thanks Nicky, here my vector has 1 element less than my array's rows. Would it be faster to create a new dummy variable Cdum=C(:,2:9);?
            – HCAI
            yesterday










          • No that does not matter. You will have to index into it anyway.
            – Nicky Mattsson
            yesterday










          • Sorry, I misunderstand. What do you mean 'you will have to index into it anyway'?
            – HCAI
            yesterday












          • That is not the problem, C(:,2:9) should work, the variable C is 10 long in your example so C(:,2:9)~=C(:,2:end). The problem is that I misinterpreted the use of GSD. Give me a second to fix it.
            – Nicky Mattsson
            yesterday










          • I made a mistake, size(C)=[m,9].
            – HCAI
            yesterday


















          • Thanks Nicky, here my vector has 1 element less than my array's rows. Would it be faster to create a new dummy variable Cdum=C(:,2:9);?
            – HCAI
            yesterday










          • No that does not matter. You will have to index into it anyway.
            – Nicky Mattsson
            yesterday










          • Sorry, I misunderstand. What do you mean 'you will have to index into it anyway'?
            – HCAI
            yesterday












          • That is not the problem, C(:,2:9) should work, the variable C is 10 long in your example so C(:,2:9)~=C(:,2:end). The problem is that I misinterpreted the use of GSD. Give me a second to fix it.
            – Nicky Mattsson
            yesterday










          • I made a mistake, size(C)=[m,9].
            – HCAI
            yesterday
















          Thanks Nicky, here my vector has 1 element less than my array's rows. Would it be faster to create a new dummy variable Cdum=C(:,2:9);?
          – HCAI
          yesterday




          Thanks Nicky, here my vector has 1 element less than my array's rows. Would it be faster to create a new dummy variable Cdum=C(:,2:9);?
          – HCAI
          yesterday












          No that does not matter. You will have to index into it anyway.
          – Nicky Mattsson
          yesterday




          No that does not matter. You will have to index into it anyway.
          – Nicky Mattsson
          yesterday












          Sorry, I misunderstand. What do you mean 'you will have to index into it anyway'?
          – HCAI
          yesterday






          Sorry, I misunderstand. What do you mean 'you will have to index into it anyway'?
          – HCAI
          yesterday














          That is not the problem, C(:,2:9) should work, the variable C is 10 long in your example so C(:,2:9)~=C(:,2:end). The problem is that I misinterpreted the use of GSD. Give me a second to fix it.
          – Nicky Mattsson
          yesterday




          That is not the problem, C(:,2:9) should work, the variable C is 10 long in your example so C(:,2:9)~=C(:,2:end). The problem is that I misinterpreted the use of GSD. Give me a second to fix it.
          – Nicky Mattsson
          yesterday












          I made a mistake, size(C)=[m,9].
          – HCAI
          yesterday




          I made a mistake, size(C)=[m,9].
          – HCAI
          yesterday












          up vote
          1
          down vote













          The floating point should handle the large magnitude of the input data, up to a certain point with float data and with any reasonable value with double data



          realmax('single')
          ans =
          3.4028e+38

          realmax('double')
          ans =
          1.7977e+308


          With 1e7 values in the +/- 1e5 range, you may expect the square of the Euclidian distance to be in the +/- 1e17 range (5+5+7), which both formats will handle with ease.



          In any case, you should vectorize the code to remove the loop (which Matlab has a history of handling very inefficiently, especially in older versions)



          With new versions (2016b and later), simply use:



          tmp=(log10(G)-log10(C(:,2:end)))./log10(GSD);
          dG = sqrt(sum(tmp.^2,2)); %row-by-row norm


          Note that you have to use ./ which is a element-wise division, not / which is matrix right division.



          The following code will work everywhere



          tmp=bsxfun(@rdivide,bsxfun(@minus,log10(G),log10(C(:,2:end))),log10(GSD));
          dG = sqrt(sum(tmp.^2,2)); %row-by-row norm


          I however believe that the use of log10 is a mathematical error. The result dG will not be the Euclidian norm. You should stick with the root mean square of the weighted difference:



          dG = sqrt(sum(bsxfun(@rdivide,bsxfun(@minus,G,C(:,2:end)),GSD).^2,2)); % all versions
          dG = sqrt(sum((G-C(:,2:end)./GSD).^2,2)); %R2016b and later





          share|improve this answer























          • Thank you very much Brice for these answers. Why do you say that the log10 is a mathematical error? I don't absolutely have to use the Euclidean distance, but it does need to be something that takes into consideration the values GSD. GSD actually at standard deviations of each point in G. Each row of C are predictions of G.
            – HCAI
            yesterday















          up vote
          1
          down vote













          The floating point should handle the large magnitude of the input data, up to a certain point with float data and with any reasonable value with double data



          realmax('single')
          ans =
          3.4028e+38

          realmax('double')
          ans =
          1.7977e+308


          With 1e7 values in the +/- 1e5 range, you may expect the square of the Euclidian distance to be in the +/- 1e17 range (5+5+7), which both formats will handle with ease.



          In any case, you should vectorize the code to remove the loop (which Matlab has a history of handling very inefficiently, especially in older versions)



          With new versions (2016b and later), simply use:



          tmp=(log10(G)-log10(C(:,2:end)))./log10(GSD);
          dG = sqrt(sum(tmp.^2,2)); %row-by-row norm


          Note that you have to use ./ which is a element-wise division, not / which is matrix right division.



          The following code will work everywhere



          tmp=bsxfun(@rdivide,bsxfun(@minus,log10(G),log10(C(:,2:end))),log10(GSD));
          dG = sqrt(sum(tmp.^2,2)); %row-by-row norm


          I however believe that the use of log10 is a mathematical error. The result dG will not be the Euclidian norm. You should stick with the root mean square of the weighted difference:



          dG = sqrt(sum(bsxfun(@rdivide,bsxfun(@minus,G,C(:,2:end)),GSD).^2,2)); % all versions
          dG = sqrt(sum((G-C(:,2:end)./GSD).^2,2)); %R2016b and later





          share|improve this answer























          • Thank you very much Brice for these answers. Why do you say that the log10 is a mathematical error? I don't absolutely have to use the Euclidean distance, but it does need to be something that takes into consideration the values GSD. GSD actually at standard deviations of each point in G. Each row of C are predictions of G.
            – HCAI
            yesterday













          up vote
          1
          down vote










          up vote
          1
          down vote









          The floating point should handle the large magnitude of the input data, up to a certain point with float data and with any reasonable value with double data



          realmax('single')
          ans =
          3.4028e+38

          realmax('double')
          ans =
          1.7977e+308


          With 1e7 values in the +/- 1e5 range, you may expect the square of the Euclidian distance to be in the +/- 1e17 range (5+5+7), which both formats will handle with ease.



          In any case, you should vectorize the code to remove the loop (which Matlab has a history of handling very inefficiently, especially in older versions)



          With new versions (2016b and later), simply use:



          tmp=(log10(G)-log10(C(:,2:end)))./log10(GSD);
          dG = sqrt(sum(tmp.^2,2)); %row-by-row norm


          Note that you have to use ./ which is a element-wise division, not / which is matrix right division.



          The following code will work everywhere



          tmp=bsxfun(@rdivide,bsxfun(@minus,log10(G),log10(C(:,2:end))),log10(GSD));
          dG = sqrt(sum(tmp.^2,2)); %row-by-row norm


          I however believe that the use of log10 is a mathematical error. The result dG will not be the Euclidian norm. You should stick with the root mean square of the weighted difference:



          dG = sqrt(sum(bsxfun(@rdivide,bsxfun(@minus,G,C(:,2:end)),GSD).^2,2)); % all versions
          dG = sqrt(sum((G-C(:,2:end)./GSD).^2,2)); %R2016b and later





          share|improve this answer














          The floating point should handle the large magnitude of the input data, up to a certain point with float data and with any reasonable value with double data



          realmax('single')
          ans =
          3.4028e+38

          realmax('double')
          ans =
          1.7977e+308


          With 1e7 values in the +/- 1e5 range, you may expect the square of the Euclidian distance to be in the +/- 1e17 range (5+5+7), which both formats will handle with ease.



          In any case, you should vectorize the code to remove the loop (which Matlab has a history of handling very inefficiently, especially in older versions)



          With new versions (2016b and later), simply use:



          tmp=(log10(G)-log10(C(:,2:end)))./log10(GSD);
          dG = sqrt(sum(tmp.^2,2)); %row-by-row norm


          Note that you have to use ./ which is a element-wise division, not / which is matrix right division.



          The following code will work everywhere



          tmp=bsxfun(@rdivide,bsxfun(@minus,log10(G),log10(C(:,2:end))),log10(GSD));
          dG = sqrt(sum(tmp.^2,2)); %row-by-row norm


          I however believe that the use of log10 is a mathematical error. The result dG will not be the Euclidian norm. You should stick with the root mean square of the weighted difference:



          dG = sqrt(sum(bsxfun(@rdivide,bsxfun(@minus,G,C(:,2:end)),GSD).^2,2)); % all versions
          dG = sqrt(sum((G-C(:,2:end)./GSD).^2,2)); %R2016b and later






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited yesterday

























          answered yesterday









          Brice

          7366




          7366












          • Thank you very much Brice for these answers. Why do you say that the log10 is a mathematical error? I don't absolutely have to use the Euclidean distance, but it does need to be something that takes into consideration the values GSD. GSD actually at standard deviations of each point in G. Each row of C are predictions of G.
            – HCAI
            yesterday


















          • Thank you very much Brice for these answers. Why do you say that the log10 is a mathematical error? I don't absolutely have to use the Euclidean distance, but it does need to be something that takes into consideration the values GSD. GSD actually at standard deviations of each point in G. Each row of C are predictions of G.
            – HCAI
            yesterday
















          Thank you very much Brice for these answers. Why do you say that the log10 is a mathematical error? I don't absolutely have to use the Euclidean distance, but it does need to be something that takes into consideration the values GSD. GSD actually at standard deviations of each point in G. Each row of C are predictions of G.
          – HCAI
          yesterday




          Thank you very much Brice for these answers. Why do you say that the log10 is a mathematical error? I don't absolutely have to use the Euclidean distance, but it does need to be something that takes into consideration the values GSD. GSD actually at standard deviations of each point in G. Each row of C are predictions of G.
          – HCAI
          yesterday


















           

          draft saved


          draft discarded



















































           


          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53372579%2fmatlab-euclidean-norm-or-difference-between-two-vectors%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Can a sorcerer learn a 5th-level spell early by creating spell slots using the Font of Magic feature?

          ts Property 'filter' does not exist on type '{}'

          Notepad++ export/extract a list of installed plugins