How to find the mode of a continuous distribution from a sample?
$begingroup$
First, my background is not math.
My objective is to find the value that occurs most frequently in a data sample OR the value that is most likely.
Let's say my sample is [1,5,6,6,7,10]. Finding the mode for this sample is simple (the mode is 6).
But if let's say I change the sample to [1,5,6,7,10], I don't know how to find the mode. The results that I want is 6 since 6 is the most probable data.
Problem is, I don't even know what to google (tried for hours), and even when I do find something that MAY be the answer (kernel density estimation, continuous probability distribution), I don't understand what the hell they're talking about.
The actual situation consist of hundreds of data (in floats) that are saved in Excel. I would appreciate if someone could demo it in Excel.
probability statistics
$endgroup$
|
show 1 more comment
$begingroup$
First, my background is not math.
My objective is to find the value that occurs most frequently in a data sample OR the value that is most likely.
Let's say my sample is [1,5,6,6,7,10]. Finding the mode for this sample is simple (the mode is 6).
But if let's say I change the sample to [1,5,6,7,10], I don't know how to find the mode. The results that I want is 6 since 6 is the most probable data.
Problem is, I don't even know what to google (tried for hours), and even when I do find something that MAY be the answer (kernel density estimation, continuous probability distribution), I don't understand what the hell they're talking about.
The actual situation consist of hundreds of data (in floats) that are saved in Excel. I would appreciate if someone could demo it in Excel.
probability statistics
$endgroup$
$begingroup$
Do you have to find the mode? There are other averages which are significantly easier to compute when you have float data, for example the mean or the median.
$endgroup$
– Chris Taylor
Nov 18 '11 at 11:44
1
$begingroup$
If your samples are from what you believe is a continuous distribution, then it is almost certain that all the "hundreds of data (in floats)" are all distinct numbers (as in your second example) and there is no mode of that data sample. You could try sorting and binning the data, say into 20 bins of equal width between min and max, (e.g. ask Excel to make a histogram of the data sample values) and finding the bin with the largest number of data samples. The center point of that bin is an estimate of the mode.
$endgroup$
– Dilip Sarwate
Nov 18 '11 at 13:04
$begingroup$
Mean and median is totally unsuitable for my data. The problem with frequency histogram is its hard to find optimal band width. I'm making a program so it's critical for me to have this feature working independently. Didn't anybody know the solution. Did anybody know where can I ask questions. Thanks
$endgroup$
– Syaiful Nizam Yahya
Nov 20 '11 at 0:42
1
$begingroup$
Not sure you fully grasped the content of @Dilip's answer so let me repeat it: the data samples you are considering will have NO MODE whatsoever. This is not as if people did not know the solution, people know that there is no solution (and if you ask the same question elsewhere every correct answer which you will get will state the same thing).
$endgroup$
– Did
Nov 27 '11 at 11:28
$begingroup$
I don't have an Excel solution for you. This is a near replicate of stats.stackexchange.com/questions/19952/… except you are asking for an Excel method. The key fact here is that you are trying to estimate the density of your data along whatever your dimension is.
$endgroup$
– Muhammad Qadri
Apr 7 '12 at 4:10
|
show 1 more comment
$begingroup$
First, my background is not math.
My objective is to find the value that occurs most frequently in a data sample OR the value that is most likely.
Let's say my sample is [1,5,6,6,7,10]. Finding the mode for this sample is simple (the mode is 6).
But if let's say I change the sample to [1,5,6,7,10], I don't know how to find the mode. The results that I want is 6 since 6 is the most probable data.
Problem is, I don't even know what to google (tried for hours), and even when I do find something that MAY be the answer (kernel density estimation, continuous probability distribution), I don't understand what the hell they're talking about.
The actual situation consist of hundreds of data (in floats) that are saved in Excel. I would appreciate if someone could demo it in Excel.
probability statistics
$endgroup$
First, my background is not math.
My objective is to find the value that occurs most frequently in a data sample OR the value that is most likely.
Let's say my sample is [1,5,6,6,7,10]. Finding the mode for this sample is simple (the mode is 6).
But if let's say I change the sample to [1,5,6,7,10], I don't know how to find the mode. The results that I want is 6 since 6 is the most probable data.
Problem is, I don't even know what to google (tried for hours), and even when I do find something that MAY be the answer (kernel density estimation, continuous probability distribution), I don't understand what the hell they're talking about.
The actual situation consist of hundreds of data (in floats) that are saved in Excel. I would appreciate if someone could demo it in Excel.
probability statistics
probability statistics
edited Dec 20 '18 at 1:10
Stephane Bersier
1034
1034
asked Nov 18 '11 at 10:06


Syaiful Nizam YahyaSyaiful Nizam Yahya
11113
11113
$begingroup$
Do you have to find the mode? There are other averages which are significantly easier to compute when you have float data, for example the mean or the median.
$endgroup$
– Chris Taylor
Nov 18 '11 at 11:44
1
$begingroup$
If your samples are from what you believe is a continuous distribution, then it is almost certain that all the "hundreds of data (in floats)" are all distinct numbers (as in your second example) and there is no mode of that data sample. You could try sorting and binning the data, say into 20 bins of equal width between min and max, (e.g. ask Excel to make a histogram of the data sample values) and finding the bin with the largest number of data samples. The center point of that bin is an estimate of the mode.
$endgroup$
– Dilip Sarwate
Nov 18 '11 at 13:04
$begingroup$
Mean and median is totally unsuitable for my data. The problem with frequency histogram is its hard to find optimal band width. I'm making a program so it's critical for me to have this feature working independently. Didn't anybody know the solution. Did anybody know where can I ask questions. Thanks
$endgroup$
– Syaiful Nizam Yahya
Nov 20 '11 at 0:42
1
$begingroup$
Not sure you fully grasped the content of @Dilip's answer so let me repeat it: the data samples you are considering will have NO MODE whatsoever. This is not as if people did not know the solution, people know that there is no solution (and if you ask the same question elsewhere every correct answer which you will get will state the same thing).
$endgroup$
– Did
Nov 27 '11 at 11:28
$begingroup$
I don't have an Excel solution for you. This is a near replicate of stats.stackexchange.com/questions/19952/… except you are asking for an Excel method. The key fact here is that you are trying to estimate the density of your data along whatever your dimension is.
$endgroup$
– Muhammad Qadri
Apr 7 '12 at 4:10
|
show 1 more comment
$begingroup$
Do you have to find the mode? There are other averages which are significantly easier to compute when you have float data, for example the mean or the median.
$endgroup$
– Chris Taylor
Nov 18 '11 at 11:44
1
$begingroup$
If your samples are from what you believe is a continuous distribution, then it is almost certain that all the "hundreds of data (in floats)" are all distinct numbers (as in your second example) and there is no mode of that data sample. You could try sorting and binning the data, say into 20 bins of equal width between min and max, (e.g. ask Excel to make a histogram of the data sample values) and finding the bin with the largest number of data samples. The center point of that bin is an estimate of the mode.
$endgroup$
– Dilip Sarwate
Nov 18 '11 at 13:04
$begingroup$
Mean and median is totally unsuitable for my data. The problem with frequency histogram is its hard to find optimal band width. I'm making a program so it's critical for me to have this feature working independently. Didn't anybody know the solution. Did anybody know where can I ask questions. Thanks
$endgroup$
– Syaiful Nizam Yahya
Nov 20 '11 at 0:42
1
$begingroup$
Not sure you fully grasped the content of @Dilip's answer so let me repeat it: the data samples you are considering will have NO MODE whatsoever. This is not as if people did not know the solution, people know that there is no solution (and if you ask the same question elsewhere every correct answer which you will get will state the same thing).
$endgroup$
– Did
Nov 27 '11 at 11:28
$begingroup$
I don't have an Excel solution for you. This is a near replicate of stats.stackexchange.com/questions/19952/… except you are asking for an Excel method. The key fact here is that you are trying to estimate the density of your data along whatever your dimension is.
$endgroup$
– Muhammad Qadri
Apr 7 '12 at 4:10
$begingroup$
Do you have to find the mode? There are other averages which are significantly easier to compute when you have float data, for example the mean or the median.
$endgroup$
– Chris Taylor
Nov 18 '11 at 11:44
$begingroup$
Do you have to find the mode? There are other averages which are significantly easier to compute when you have float data, for example the mean or the median.
$endgroup$
– Chris Taylor
Nov 18 '11 at 11:44
1
1
$begingroup$
If your samples are from what you believe is a continuous distribution, then it is almost certain that all the "hundreds of data (in floats)" are all distinct numbers (as in your second example) and there is no mode of that data sample. You could try sorting and binning the data, say into 20 bins of equal width between min and max, (e.g. ask Excel to make a histogram of the data sample values) and finding the bin with the largest number of data samples. The center point of that bin is an estimate of the mode.
$endgroup$
– Dilip Sarwate
Nov 18 '11 at 13:04
$begingroup$
If your samples are from what you believe is a continuous distribution, then it is almost certain that all the "hundreds of data (in floats)" are all distinct numbers (as in your second example) and there is no mode of that data sample. You could try sorting and binning the data, say into 20 bins of equal width between min and max, (e.g. ask Excel to make a histogram of the data sample values) and finding the bin with the largest number of data samples. The center point of that bin is an estimate of the mode.
$endgroup$
– Dilip Sarwate
Nov 18 '11 at 13:04
$begingroup$
Mean and median is totally unsuitable for my data. The problem with frequency histogram is its hard to find optimal band width. I'm making a program so it's critical for me to have this feature working independently. Didn't anybody know the solution. Did anybody know where can I ask questions. Thanks
$endgroup$
– Syaiful Nizam Yahya
Nov 20 '11 at 0:42
$begingroup$
Mean and median is totally unsuitable for my data. The problem with frequency histogram is its hard to find optimal band width. I'm making a program so it's critical for me to have this feature working independently. Didn't anybody know the solution. Did anybody know where can I ask questions. Thanks
$endgroup$
– Syaiful Nizam Yahya
Nov 20 '11 at 0:42
1
1
$begingroup$
Not sure you fully grasped the content of @Dilip's answer so let me repeat it: the data samples you are considering will have NO MODE whatsoever. This is not as if people did not know the solution, people know that there is no solution (and if you ask the same question elsewhere every correct answer which you will get will state the same thing).
$endgroup$
– Did
Nov 27 '11 at 11:28
$begingroup$
Not sure you fully grasped the content of @Dilip's answer so let me repeat it: the data samples you are considering will have NO MODE whatsoever. This is not as if people did not know the solution, people know that there is no solution (and if you ask the same question elsewhere every correct answer which you will get will state the same thing).
$endgroup$
– Did
Nov 27 '11 at 11:28
$begingroup$
I don't have an Excel solution for you. This is a near replicate of stats.stackexchange.com/questions/19952/… except you are asking for an Excel method. The key fact here is that you are trying to estimate the density of your data along whatever your dimension is.
$endgroup$
– Muhammad Qadri
Apr 7 '12 at 4:10
$begingroup$
I don't have an Excel solution for you. This is a near replicate of stats.stackexchange.com/questions/19952/… except you are asking for an Excel method. The key fact here is that you are trying to estimate the density of your data along whatever your dimension is.
$endgroup$
– Muhammad Qadri
Apr 7 '12 at 4:10
|
show 1 more comment
1 Answer
1
active
oldest
votes
$begingroup$
For the record, here are some general solution sketches that also work for high-dimensional distributions (probably too complex for the asker, though; some sort of kernel density estimation is much simpler and reasonably good):
Train an f-GAN with reverse KL divergence, without giving any random input to the generator (i.e. force it to be deterministic).
Train an f-GAN with reverse KL divergence, move the input distribution to the generator towards a Dirac delta function as training progresses, and add a gradient penalty to the generator loss function.
Train a (differentiable) generative model that can tractably evaluate an approximation of the pdf at any point (I believe that e.g. a VAE, a flow-based model, or an autoregressive model would do). Then use some type of optimization (some flavor of gradient ascent can be used if model inference is differentiable) to find a maximum of that approximation.
$endgroup$
$begingroup$
I believe these solutions converge as the sample size and the "network approximation power" increase (assuming training works well).
$endgroup$
– Stephane Bersier
Dec 20 '18 at 1:12
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f83322%2fhow-to-find-the-mode-of-a-continuous-distribution-from-a-sample%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
For the record, here are some general solution sketches that also work for high-dimensional distributions (probably too complex for the asker, though; some sort of kernel density estimation is much simpler and reasonably good):
Train an f-GAN with reverse KL divergence, without giving any random input to the generator (i.e. force it to be deterministic).
Train an f-GAN with reverse KL divergence, move the input distribution to the generator towards a Dirac delta function as training progresses, and add a gradient penalty to the generator loss function.
Train a (differentiable) generative model that can tractably evaluate an approximation of the pdf at any point (I believe that e.g. a VAE, a flow-based model, or an autoregressive model would do). Then use some type of optimization (some flavor of gradient ascent can be used if model inference is differentiable) to find a maximum of that approximation.
$endgroup$
$begingroup$
I believe these solutions converge as the sample size and the "network approximation power" increase (assuming training works well).
$endgroup$
– Stephane Bersier
Dec 20 '18 at 1:12
add a comment |
$begingroup$
For the record, here are some general solution sketches that also work for high-dimensional distributions (probably too complex for the asker, though; some sort of kernel density estimation is much simpler and reasonably good):
Train an f-GAN with reverse KL divergence, without giving any random input to the generator (i.e. force it to be deterministic).
Train an f-GAN with reverse KL divergence, move the input distribution to the generator towards a Dirac delta function as training progresses, and add a gradient penalty to the generator loss function.
Train a (differentiable) generative model that can tractably evaluate an approximation of the pdf at any point (I believe that e.g. a VAE, a flow-based model, or an autoregressive model would do). Then use some type of optimization (some flavor of gradient ascent can be used if model inference is differentiable) to find a maximum of that approximation.
$endgroup$
$begingroup$
I believe these solutions converge as the sample size and the "network approximation power" increase (assuming training works well).
$endgroup$
– Stephane Bersier
Dec 20 '18 at 1:12
add a comment |
$begingroup$
For the record, here are some general solution sketches that also work for high-dimensional distributions (probably too complex for the asker, though; some sort of kernel density estimation is much simpler and reasonably good):
Train an f-GAN with reverse KL divergence, without giving any random input to the generator (i.e. force it to be deterministic).
Train an f-GAN with reverse KL divergence, move the input distribution to the generator towards a Dirac delta function as training progresses, and add a gradient penalty to the generator loss function.
Train a (differentiable) generative model that can tractably evaluate an approximation of the pdf at any point (I believe that e.g. a VAE, a flow-based model, or an autoregressive model would do). Then use some type of optimization (some flavor of gradient ascent can be used if model inference is differentiable) to find a maximum of that approximation.
$endgroup$
For the record, here are some general solution sketches that also work for high-dimensional distributions (probably too complex for the asker, though; some sort of kernel density estimation is much simpler and reasonably good):
Train an f-GAN with reverse KL divergence, without giving any random input to the generator (i.e. force it to be deterministic).
Train an f-GAN with reverse KL divergence, move the input distribution to the generator towards a Dirac delta function as training progresses, and add a gradient penalty to the generator loss function.
Train a (differentiable) generative model that can tractably evaluate an approximation of the pdf at any point (I believe that e.g. a VAE, a flow-based model, or an autoregressive model would do). Then use some type of optimization (some flavor of gradient ascent can be used if model inference is differentiable) to find a maximum of that approximation.
answered Dec 20 '18 at 1:08
Stephane BersierStephane Bersier
1034
1034
$begingroup$
I believe these solutions converge as the sample size and the "network approximation power" increase (assuming training works well).
$endgroup$
– Stephane Bersier
Dec 20 '18 at 1:12
add a comment |
$begingroup$
I believe these solutions converge as the sample size and the "network approximation power" increase (assuming training works well).
$endgroup$
– Stephane Bersier
Dec 20 '18 at 1:12
$begingroup$
I believe these solutions converge as the sample size and the "network approximation power" increase (assuming training works well).
$endgroup$
– Stephane Bersier
Dec 20 '18 at 1:12
$begingroup$
I believe these solutions converge as the sample size and the "network approximation power" increase (assuming training works well).
$endgroup$
– Stephane Bersier
Dec 20 '18 at 1:12
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f83322%2fhow-to-find-the-mode-of-a-continuous-distribution-from-a-sample%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
Do you have to find the mode? There are other averages which are significantly easier to compute when you have float data, for example the mean or the median.
$endgroup$
– Chris Taylor
Nov 18 '11 at 11:44
1
$begingroup$
If your samples are from what you believe is a continuous distribution, then it is almost certain that all the "hundreds of data (in floats)" are all distinct numbers (as in your second example) and there is no mode of that data sample. You could try sorting and binning the data, say into 20 bins of equal width between min and max, (e.g. ask Excel to make a histogram of the data sample values) and finding the bin with the largest number of data samples. The center point of that bin is an estimate of the mode.
$endgroup$
– Dilip Sarwate
Nov 18 '11 at 13:04
$begingroup$
Mean and median is totally unsuitable for my data. The problem with frequency histogram is its hard to find optimal band width. I'm making a program so it's critical for me to have this feature working independently. Didn't anybody know the solution. Did anybody know where can I ask questions. Thanks
$endgroup$
– Syaiful Nizam Yahya
Nov 20 '11 at 0:42
1
$begingroup$
Not sure you fully grasped the content of @Dilip's answer so let me repeat it: the data samples you are considering will have NO MODE whatsoever. This is not as if people did not know the solution, people know that there is no solution (and if you ask the same question elsewhere every correct answer which you will get will state the same thing).
$endgroup$
– Did
Nov 27 '11 at 11:28
$begingroup$
I don't have an Excel solution for you. This is a near replicate of stats.stackexchange.com/questions/19952/… except you are asking for an Excel method. The key fact here is that you are trying to estimate the density of your data along whatever your dimension is.
$endgroup$
– Muhammad Qadri
Apr 7 '12 at 4:10