Hubs in high-dimensional point clouds (distribution of “being within nearest neighbours of how_many other...












2














The hubness problem is a phenomenon mentioned in this paper, for instance. It claims that for random data point clouds in high-dimensional spaces, there will be "many" hubs -- that is, points that are within k nearest neighbors of many other points.



We define, for each point $x$, the number $N_k(x)$ to be the number of other points $y$ such that $x$ belongs to $k$ nearest members of $y$. Radonovic claims that $N_k(x)$ has approximately normal distribution for random point clouds in low dimensional manifolds, but a highly skewed distribution with long right-tail in high dimensions. Radonovic illustrates this with uniformly distributed points in $[0,1]^N$.



I tried to reproduce some of that in Python and came to a surprising observation. While the $N_k(x)$ distribution is indeed skewed in high dimensions when generating uniform point cloud in the cube and using the Euclidean distance function, I didn't observe this when generating uniform point clouds in the sphere.



Experiment with a cube (uniformly generated 2000 points in $[0,1]^{500}$)
enter image description hereenter image description here



The same experiment with a sphere (uniformly generated 2000 points in $S^{500}$)
enter image description hereenter image description here



My intuition is that there should be no difference between hub-histograms in an $n$-cube or an $n$-sphere, because this hubeness seems to be a local phenomenon. Unless I have some stupid mistake in the code, my intuition seems to be wrong.



Generating more points, I think I could see some skewness in the sphere as well, but only very slightly, if at all.



So here are my questions:





  1. How is the cube different from a sphere? Could the difference be only explained by boundaries, corners or corners of corners?

  2. Any insight into this hubeness-phenomena?










share|cite|improve this question





























    2














    The hubness problem is a phenomenon mentioned in this paper, for instance. It claims that for random data point clouds in high-dimensional spaces, there will be "many" hubs -- that is, points that are within k nearest neighbors of many other points.



    We define, for each point $x$, the number $N_k(x)$ to be the number of other points $y$ such that $x$ belongs to $k$ nearest members of $y$. Radonovic claims that $N_k(x)$ has approximately normal distribution for random point clouds in low dimensional manifolds, but a highly skewed distribution with long right-tail in high dimensions. Radonovic illustrates this with uniformly distributed points in $[0,1]^N$.



    I tried to reproduce some of that in Python and came to a surprising observation. While the $N_k(x)$ distribution is indeed skewed in high dimensions when generating uniform point cloud in the cube and using the Euclidean distance function, I didn't observe this when generating uniform point clouds in the sphere.



    Experiment with a cube (uniformly generated 2000 points in $[0,1]^{500}$)
    enter image description hereenter image description here



    The same experiment with a sphere (uniformly generated 2000 points in $S^{500}$)
    enter image description hereenter image description here



    My intuition is that there should be no difference between hub-histograms in an $n$-cube or an $n$-sphere, because this hubeness seems to be a local phenomenon. Unless I have some stupid mistake in the code, my intuition seems to be wrong.



    Generating more points, I think I could see some skewness in the sphere as well, but only very slightly, if at all.



    So here are my questions:





    1. How is the cube different from a sphere? Could the difference be only explained by boundaries, corners or corners of corners?

    2. Any insight into this hubeness-phenomena?










    share|cite|improve this question



























      2












      2








      2


      1





      The hubness problem is a phenomenon mentioned in this paper, for instance. It claims that for random data point clouds in high-dimensional spaces, there will be "many" hubs -- that is, points that are within k nearest neighbors of many other points.



      We define, for each point $x$, the number $N_k(x)$ to be the number of other points $y$ such that $x$ belongs to $k$ nearest members of $y$. Radonovic claims that $N_k(x)$ has approximately normal distribution for random point clouds in low dimensional manifolds, but a highly skewed distribution with long right-tail in high dimensions. Radonovic illustrates this with uniformly distributed points in $[0,1]^N$.



      I tried to reproduce some of that in Python and came to a surprising observation. While the $N_k(x)$ distribution is indeed skewed in high dimensions when generating uniform point cloud in the cube and using the Euclidean distance function, I didn't observe this when generating uniform point clouds in the sphere.



      Experiment with a cube (uniformly generated 2000 points in $[0,1]^{500}$)
      enter image description hereenter image description here



      The same experiment with a sphere (uniformly generated 2000 points in $S^{500}$)
      enter image description hereenter image description here



      My intuition is that there should be no difference between hub-histograms in an $n$-cube or an $n$-sphere, because this hubeness seems to be a local phenomenon. Unless I have some stupid mistake in the code, my intuition seems to be wrong.



      Generating more points, I think I could see some skewness in the sphere as well, but only very slightly, if at all.



      So here are my questions:





      1. How is the cube different from a sphere? Could the difference be only explained by boundaries, corners or corners of corners?

      2. Any insight into this hubeness-phenomena?










      share|cite|improve this question















      The hubness problem is a phenomenon mentioned in this paper, for instance. It claims that for random data point clouds in high-dimensional spaces, there will be "many" hubs -- that is, points that are within k nearest neighbors of many other points.



      We define, for each point $x$, the number $N_k(x)$ to be the number of other points $y$ such that $x$ belongs to $k$ nearest members of $y$. Radonovic claims that $N_k(x)$ has approximately normal distribution for random point clouds in low dimensional manifolds, but a highly skewed distribution with long right-tail in high dimensions. Radonovic illustrates this with uniformly distributed points in $[0,1]^N$.



      I tried to reproduce some of that in Python and came to a surprising observation. While the $N_k(x)$ distribution is indeed skewed in high dimensions when generating uniform point cloud in the cube and using the Euclidean distance function, I didn't observe this when generating uniform point clouds in the sphere.



      Experiment with a cube (uniformly generated 2000 points in $[0,1]^{500}$)
      enter image description hereenter image description here



      The same experiment with a sphere (uniformly generated 2000 points in $S^{500}$)
      enter image description hereenter image description here



      My intuition is that there should be no difference between hub-histograms in an $n$-cube or an $n$-sphere, because this hubeness seems to be a local phenomenon. Unless I have some stupid mistake in the code, my intuition seems to be wrong.



      Generating more points, I think I could see some skewness in the sphere as well, but only very slightly, if at all.



      So here are my questions:





      1. How is the cube different from a sphere? Could the difference be only explained by boundaries, corners or corners of corners?

      2. Any insight into this hubeness-phenomena?







      statistics euclidean-geometry experimental-mathematics






      share|cite|improve this question















      share|cite|improve this question













      share|cite|improve this question




      share|cite|improve this question








      edited Nov 21 '18 at 12:43

























      asked Nov 21 '18 at 10:21









      Peter Franek

      7,21631935




      7,21631935






















          0






          active

          oldest

          votes











          Your Answer





          StackExchange.ifUsing("editor", function () {
          return StackExchange.using("mathjaxEditing", function () {
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          });
          });
          }, "mathjax-editing");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "69"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          noCode: true, onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3007531%2fhubs-in-high-dimensional-point-clouds-distribution-of-being-within-nearest-nei%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          0






          active

          oldest

          votes








          0






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes
















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Mathematics Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.





          Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


          Please pay close attention to the following guidance:


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3007531%2fhubs-in-high-dimensional-point-clouds-distribution-of-being-within-nearest-nei%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Can a sorcerer learn a 5th-level spell early by creating spell slots using the Font of Magic feature?

          Does disintegrating a polymorphed enemy still kill it after the 2018 errata?

          A Topological Invariant for $pi_3(U(n))$