Stochastic Neighbor Embedding (SNE) - How to Understand the Cost Function of the Kullback Leibler (KL)...

In the paper 'Stochastic Neighbor Embedding', the cost function in term of K-L divergence is

$$
C
= sum_i sum_j p_{ij} log frac{p_{ij}}{q_{ij}}
= sum_i operatorname{KL}(P_i | Q_i).
$$

Question 1:
Why does it say that SNE is an improvement over methods like LLE or SOM in which widely separated data-points can be 'collapsed' as near distances?

To minimize the cost, there is always a small distance in low-dimensional space with large $q_{ij}$, which also can cause 'collapse'. Or does 'collapse' mean something else?

Question 2:
Similarly, I do not understand how SNE keep the images of widely separated objects relatively far apart.

edited Aug 26 '18 at 13:47

Royi

3,47012352

asked Aug 26 '18 at 3:08

Doris

$begingroup$
You should not use acronyms without writing them out at the first usage.
$endgroup$
– MrYouMath
Aug 26 '18 at 7:11

$begingroup$
Thanks, I've changed it
$endgroup$
– Doris
Aug 26 '18 at 9:59

add a comment |

In the paper 'Stochastic Neighbor Embedding', the cost function in term of K-L divergence is

$$
C
= sum_i sum_j p_{ij} log frac{p_{ij}}{q_{ij}}
= sum_i operatorname{KL}(P_i | Q_i).
$$

Question 1:
Why does it say that SNE is an improvement over methods like LLE or SOM in which widely separated data-points can be 'collapsed' as near distances?

To minimize the cost, there is always a small distance in low-dimensional space with large $q_{ij}$, which also can cause 'collapse'. Or does 'collapse' mean something else?

Question 2:
Similarly, I do not understand how SNE keep the images of widely separated objects relatively far apart.

edited Aug 26 '18 at 13:47

Royi

3,47012352

asked Aug 26 '18 at 3:08

Doris

$begingroup$
You should not use acronyms without writing them out at the first usage.
$endgroup$
– MrYouMath
Aug 26 '18 at 7:11

$begingroup$
Thanks, I've changed it
$endgroup$
– Doris
Aug 26 '18 at 9:59

add a comment |

In the paper 'Stochastic Neighbor Embedding', the cost function in term of K-L divergence is

$$
C
= sum_i sum_j p_{ij} log frac{p_{ij}}{q_{ij}}
= sum_i operatorname{KL}(P_i | Q_i).
$$

Question 1:
Why does it say that SNE is an improvement over methods like LLE or SOM in which widely separated data-points can be 'collapsed' as near distances?

To minimize the cost, there is always a small distance in low-dimensional space with large $q_{ij}$, which also can cause 'collapse'. Or does 'collapse' mean something else?

Question 2:
Similarly, I do not understand how SNE keep the images of widely separated objects relatively far apart.

edited Aug 26 '18 at 13:47

Royi

3,47012352

asked Aug 26 '18 at 3:08

Doris

In the paper 'Stochastic Neighbor Embedding', the cost function in term of K-L divergence is

$$
C
= sum_i sum_j p_{ij} log frac{p_{ij}}{q_{ij}}
= sum_i operatorname{KL}(P_i | Q_i).
$$

Question 1:
Why does it say that SNE is an improvement over methods like LLE or SOM in which widely separated data-points can be 'collapsed' as near distances?

To minimize the cost, there is always a small distance in low-dimensional space with large $q_{ij}$, which also can cause 'collapse'. Or does 'collapse' mean something else?

Question 2:
Similarly, I do not understand how SNE keep the images of widely separated objects relatively far apart.

probability optimization machine-learning divergence

edited Aug 26 '18 at 13:47

Royi

3,47012352

asked Aug 26 '18 at 3:08

Doris

edited Aug 26 '18 at 13:47

Royi

3,47012352

asked Aug 26 '18 at 3:08

Doris

edited Aug 26 '18 at 13:47

Royi

3,47012352

edited Aug 26 '18 at 13:47

Royi

3,47012352

edited Aug 26 '18 at 13:47

Royi

3,47012352

asked Aug 26 '18 at 3:08

Doris

asked Aug 26 '18 at 3:08

Doris

asked Aug 26 '18 at 3:08

Doris

$begingroup$
You should not use acronyms without writing them out at the first usage.
$endgroup$
– MrYouMath
Aug 26 '18 at 7:11

$begingroup$
Thanks, I've changed it
$endgroup$
– Doris
Aug 26 '18 at 9:59

add a comment |

$begingroup$
You should not use acronyms without writing them out at the first usage.
$endgroup$
– MrYouMath
Aug 26 '18 at 7:11

$begingroup$
Thanks, I've changed it
$endgroup$
– Doris
Aug 26 '18 at 9:59

You should not use acronyms without writing them out at the first usage.

– MrYouMath
Aug 26 '18 at 7:11

Thanks, I've changed it

– Doris
Aug 26 '18 at 9:59

add a comment |

1 Answer
1

active

oldest

votes

Question 1

The paper discusses this in the following quote:

Notice that making $q_{ij}$ large when $p_{ij}$ is small wastes some of the probability mass in the $q$
distribution so there is a cost for modeling a big distance in the high-dimensionalspace with
a small distance in the low-dimensional space, though it is less than the cost of modeling
a small distance with a big one. In this respect, SNE is an improvement over methods
like LLE [4] or SOM [5] in which widely separated data-points can be “collapsed” as near
neighbors in the low-dimensional space. The intuition is that while SNE emphasizes local
distances, its cost function cleanly enforces both keeping the images of nearby objects
nearby and keeping the images of widely separated objects relatively far apart.

Ideally, to minimize the KL, we should have $q_{ij} approx p_{ij}$. Their point is that there is an asymmetry wrt increasing one compared to the other. Note that $p$ is inversely related to distance in the high-dimensional (HD) space, while $q$ is inversely related to distance in the low-dimensional (LD) embedding space. Large $q_{ij}$ with small $p_{ij}$ means modelling a large HD distance with a small LD distance. This penalty is born out in other terms of the sum, where $q_{kell}$ will be too small because some of the density of $q$ was "wasted" on $p_{ij}$. On the other hand, modelling a small HD distance with a large LD distance means having a large $p_{ij}$ with a small $q_{ij}$, meaning that $i,j$ will be a rather punishing term. I think what they mean is that without this asymmetry, this collapsing is more likely in LLE and SOM, perhaps.

Question 2

Basically re-iterating: to keep the images of widely separated objects relatively far apart, we want $q_{ij}$ to be small when $p_{ij}$ is small.

If the HD distance is far, then $p_{ij}$ will be small, (which makes it seem as if that term doesn't really matter as much since that it is $ p_{ij}log(p_{ij}/q_{ij}) $). However, if the LD distance used there is small (meaning we didn't keep them far apart), then $q_{ij}$ will be large. Due to the normalization of $q$, this means we have "wasted" some of its probability mass here, and so other terms will be higher as a result. Hence the method will be driven to avoid this.

answered Jan 13 at 22:15

user3658307

4,6533946

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2894673%2fstochastic-neighbor-embedding-sne-how-to-understand-the-cost-function-of-the%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Question 1

The paper discusses this in the following quote:

Notice that making $q_{ij}$ large when $p_{ij}$ is small wastes some of the probability mass in the $q$
distribution so there is a cost for modeling a big distance in the high-dimensionalspace with
a small distance in the low-dimensional space, though it is less than the cost of modeling
a small distance with a big one. In this respect, SNE is an improvement over methods
like LLE [4] or SOM [5] in which widely separated data-points can be “collapsed” as near
neighbors in the low-dimensional space. The intuition is that while SNE emphasizes local
distances, its cost function cleanly enforces both keeping the images of nearby objects
nearby and keeping the images of widely separated objects relatively far apart.

Question 2

Basically re-iterating: to keep the images of widely separated objects relatively far apart, we want $q_{ij}$ to be small when $p_{ij}$ is small.

answered Jan 13 at 22:15

user3658307

4,6533946

add a comment |

Question 1

The paper discusses this in the following quote:

Notice that making $q_{ij}$ large when $p_{ij}$ is small wastes some of the probability mass in the $q$
distribution so there is a cost for modeling a big distance in the high-dimensionalspace with
a small distance in the low-dimensional space, though it is less than the cost of modeling
a small distance with a big one. In this respect, SNE is an improvement over methods
like LLE [4] or SOM [5] in which widely separated data-points can be “collapsed” as near
neighbors in the low-dimensional space. The intuition is that while SNE emphasizes local
distances, its cost function cleanly enforces both keeping the images of nearby objects
nearby and keeping the images of widely separated objects relatively far apart.

Question 2

Basically re-iterating: to keep the images of widely separated objects relatively far apart, we want $q_{ij}$ to be small when $p_{ij}$ is small.

answered Jan 13 at 22:15

user3658307

4,6533946

add a comment |

Question 1

The paper discusses this in the following quote:

Notice that making $q_{ij}$ large when $p_{ij}$ is small wastes some of the probability mass in the $q$
distribution so there is a cost for modeling a big distance in the high-dimensionalspace with
a small distance in the low-dimensional space, though it is less than the cost of modeling
a small distance with a big one. In this respect, SNE is an improvement over methods
like LLE [4] or SOM [5] in which widely separated data-points can be “collapsed” as near
neighbors in the low-dimensional space. The intuition is that while SNE emphasizes local
distances, its cost function cleanly enforces both keeping the images of nearby objects
nearby and keeping the images of widely separated objects relatively far apart.

Question 2

Basically re-iterating: to keep the images of widely separated objects relatively far apart, we want $q_{ij}$ to be small when $p_{ij}$ is small.

answered Jan 13 at 22:15

user3658307

4,6533946

Question 1

The paper discusses this in the following quote:

Notice that making $q_{ij}$ large when $p_{ij}$ is small wastes some of the probability mass in the $q$
distribution so there is a cost for modeling a big distance in the high-dimensionalspace with
a small distance in the low-dimensional space, though it is less than the cost of modeling
a small distance with a big one. In this respect, SNE is an improvement over methods
like LLE [4] or SOM [5] in which widely separated data-points can be “collapsed” as near
neighbors in the low-dimensional space. The intuition is that while SNE emphasizes local
distances, its cost function cleanly enforces both keeping the images of nearby objects
nearby and keeping the images of widely separated objects relatively far apart.

Question 2

Basically re-iterating: to keep the images of widely separated objects relatively far apart, we want $q_{ij}$ to be small when $p_{ij}$ is small.

answered Jan 13 at 22:15

user3658307

4,6533946

answered Jan 13 at 22:15

user3658307

4,6533946

answered Jan 13 at 22:15

user3658307

4,6533946

answered Jan 13 at 22:15

user3658307

4,6533946

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Mathematics Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu