Hubs in high-dimensional point clouds (distribution of “being within nearest neighbours of how_many other...
The hubness problem is a phenomenon mentioned in this paper, for instance. It claims that for random data point clouds in high-dimensional spaces, there will be "many" hubs -- that is, points that are within k nearest neighbors of many other points.
We define, for each point $x$, the number $N_k(x)$ to be the number of other points $y$ such that $x$ belongs to $k$ nearest members of $y$. Radonovic claims that $N_k(x)$ has approximately normal distribution for random point clouds in low dimensional manifolds, but a highly skewed distribution with long right-tail in high dimensions. Radonovic illustrates this with uniformly distributed points in $[0,1]^N$.
I tried to reproduce some of that in Python and came to a surprising observation. While the $N_k(x)$ distribution is indeed skewed in high dimensions when generating uniform point cloud in the cube and using the Euclidean distance function, I didn't observe this when generating uniform point clouds in the sphere.
Experiment with a cube (uniformly generated 2000 points in $[0,1]^{500}$)
The same experiment with a sphere (uniformly generated 2000 points in $S^{500}$)
My intuition is that there should be no difference between hub-histograms in an $n$-cube or an $n$-sphere, because this hubeness seems to be a local phenomenon. Unless I have some stupid mistake in the code, my intuition seems to be wrong.
Generating more points, I think I could see some skewness in the sphere as well, but only very slightly, if at all.
So here are my questions:
How is the cube different from a sphere? Could the difference be only explained by boundaries, corners or corners of corners?- Any insight into this hubeness-phenomena?
statistics euclidean-geometry experimental-mathematics
add a comment |
The hubness problem is a phenomenon mentioned in this paper, for instance. It claims that for random data point clouds in high-dimensional spaces, there will be "many" hubs -- that is, points that are within k nearest neighbors of many other points.
We define, for each point $x$, the number $N_k(x)$ to be the number of other points $y$ such that $x$ belongs to $k$ nearest members of $y$. Radonovic claims that $N_k(x)$ has approximately normal distribution for random point clouds in low dimensional manifolds, but a highly skewed distribution with long right-tail in high dimensions. Radonovic illustrates this with uniformly distributed points in $[0,1]^N$.
I tried to reproduce some of that in Python and came to a surprising observation. While the $N_k(x)$ distribution is indeed skewed in high dimensions when generating uniform point cloud in the cube and using the Euclidean distance function, I didn't observe this when generating uniform point clouds in the sphere.
Experiment with a cube (uniformly generated 2000 points in $[0,1]^{500}$)
The same experiment with a sphere (uniformly generated 2000 points in $S^{500}$)
My intuition is that there should be no difference between hub-histograms in an $n$-cube or an $n$-sphere, because this hubeness seems to be a local phenomenon. Unless I have some stupid mistake in the code, my intuition seems to be wrong.
Generating more points, I think I could see some skewness in the sphere as well, but only very slightly, if at all.
So here are my questions:
How is the cube different from a sphere? Could the difference be only explained by boundaries, corners or corners of corners?- Any insight into this hubeness-phenomena?
statistics euclidean-geometry experimental-mathematics
add a comment |
The hubness problem is a phenomenon mentioned in this paper, for instance. It claims that for random data point clouds in high-dimensional spaces, there will be "many" hubs -- that is, points that are within k nearest neighbors of many other points.
We define, for each point $x$, the number $N_k(x)$ to be the number of other points $y$ such that $x$ belongs to $k$ nearest members of $y$. Radonovic claims that $N_k(x)$ has approximately normal distribution for random point clouds in low dimensional manifolds, but a highly skewed distribution with long right-tail in high dimensions. Radonovic illustrates this with uniformly distributed points in $[0,1]^N$.
I tried to reproduce some of that in Python and came to a surprising observation. While the $N_k(x)$ distribution is indeed skewed in high dimensions when generating uniform point cloud in the cube and using the Euclidean distance function, I didn't observe this when generating uniform point clouds in the sphere.
Experiment with a cube (uniformly generated 2000 points in $[0,1]^{500}$)
The same experiment with a sphere (uniformly generated 2000 points in $S^{500}$)
My intuition is that there should be no difference between hub-histograms in an $n$-cube or an $n$-sphere, because this hubeness seems to be a local phenomenon. Unless I have some stupid mistake in the code, my intuition seems to be wrong.
Generating more points, I think I could see some skewness in the sphere as well, but only very slightly, if at all.
So here are my questions:
How is the cube different from a sphere? Could the difference be only explained by boundaries, corners or corners of corners?- Any insight into this hubeness-phenomena?
statistics euclidean-geometry experimental-mathematics
The hubness problem is a phenomenon mentioned in this paper, for instance. It claims that for random data point clouds in high-dimensional spaces, there will be "many" hubs -- that is, points that are within k nearest neighbors of many other points.
We define, for each point $x$, the number $N_k(x)$ to be the number of other points $y$ such that $x$ belongs to $k$ nearest members of $y$. Radonovic claims that $N_k(x)$ has approximately normal distribution for random point clouds in low dimensional manifolds, but a highly skewed distribution with long right-tail in high dimensions. Radonovic illustrates this with uniformly distributed points in $[0,1]^N$.
I tried to reproduce some of that in Python and came to a surprising observation. While the $N_k(x)$ distribution is indeed skewed in high dimensions when generating uniform point cloud in the cube and using the Euclidean distance function, I didn't observe this when generating uniform point clouds in the sphere.
Experiment with a cube (uniformly generated 2000 points in $[0,1]^{500}$)
The same experiment with a sphere (uniformly generated 2000 points in $S^{500}$)
My intuition is that there should be no difference between hub-histograms in an $n$-cube or an $n$-sphere, because this hubeness seems to be a local phenomenon. Unless I have some stupid mistake in the code, my intuition seems to be wrong.
Generating more points, I think I could see some skewness in the sphere as well, but only very slightly, if at all.
So here are my questions:
How is the cube different from a sphere? Could the difference be only explained by boundaries, corners or corners of corners?- Any insight into this hubeness-phenomena?
statistics euclidean-geometry experimental-mathematics
statistics euclidean-geometry experimental-mathematics
edited Nov 21 '18 at 12:43
asked Nov 21 '18 at 10:21
Peter Franek
7,21631935
7,21631935
add a comment |
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3007531%2fhubs-in-high-dimensional-point-clouds-distribution-of-being-within-nearest-nei%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3007531%2fhubs-in-high-dimensional-point-clouds-distribution-of-being-within-nearest-nei%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown