How to find the mode of a continuous distribution from a sample?

First, my background is not math.

My objective is to find the value that occurs most frequently in a data sample OR the value that is most likely.

Let's say my sample is [1,5,6,6,7,10]. Finding the mode for this sample is simple (the mode is 6).

But if let's say I change the sample to [1,5,6,7,10], I don't know how to find the mode. The results that I want is 6 since 6 is the most probable data.

Problem is, I don't even know what to google (tried for hours), and even when I do find something that MAY be the answer (kernel density estimation, continuous probability distribution), I don't understand what the hell they're talking about.

The actual situation consist of hundreds of data (in floats) that are saved in Excel. I would appreciate if someone could demo it in Excel.

edited Dec 20 '18 at 1:10

Stephane Bersier

1034

asked Nov 18 '11 at 10:06

Syaiful Nizam Yahya

11113

$begingroup$
Do you have to find the mode? There are other averages which are significantly easier to compute when you have float data, for example the mean or the median.
$endgroup$
– Chris Taylor
Nov 18 '11 at 11:44

1

$begingroup$
If your samples are from what you believe is a continuous distribution, then it is almost certain that all the "hundreds of data (in floats)" are all distinct numbers (as in your second example) and there is no mode of that data sample. You could try sorting and binning the data, say into 20 bins of equal width between min and max, (e.g. ask Excel to make a histogram of the data sample values) and finding the bin with the largest number of data samples. The center point of that bin is an estimate of the mode.
$endgroup$
– Dilip Sarwate
Nov 18 '11 at 13:04

$begingroup$
Mean and median is totally unsuitable for my data. The problem with frequency histogram is its hard to find optimal band width. I'm making a program so it's critical for me to have this feature working independently. Didn't anybody know the solution. Did anybody know where can I ask questions. Thanks
$endgroup$
– Syaiful Nizam Yahya
Nov 20 '11 at 0:42

1

$begingroup$
Not sure you fully grasped the content of @Dilip's answer so let me repeat it: the data samples you are considering will have NO MODE whatsoever. This is not as if people did not know the solution, people know that there is no solution (and if you ask the same question elsewhere every correct answer which you will get will state the same thing).
$endgroup$
– Did
Nov 27 '11 at 11:28

$begingroup$
I don't have an Excel solution for you. This is a near replicate of stats.stackexchange.com/questions/19952/… except you are asking for an Excel method. The key fact here is that you are trying to estimate the density of your data along whatever your dimension is.
$endgroup$
– Muhammad Qadri
Apr 7 '12 at 4:10

|
show 1 more comment

First, my background is not math.

My objective is to find the value that occurs most frequently in a data sample OR the value that is most likely.

Let's say my sample is [1,5,6,6,7,10]. Finding the mode for this sample is simple (the mode is 6).

But if let's say I change the sample to [1,5,6,7,10], I don't know how to find the mode. The results that I want is 6 since 6 is the most probable data.

The actual situation consist of hundreds of data (in floats) that are saved in Excel. I would appreciate if someone could demo it in Excel.

edited Dec 20 '18 at 1:10

Stephane Bersier

1034

asked Nov 18 '11 at 10:06

Syaiful Nizam Yahya

11113

$begingroup$
Do you have to find the mode? There are other averages which are significantly easier to compute when you have float data, for example the mean or the median.
$endgroup$
– Chris Taylor
Nov 18 '11 at 11:44

1

$begingroup$
If your samples are from what you believe is a continuous distribution, then it is almost certain that all the "hundreds of data (in floats)" are all distinct numbers (as in your second example) and there is no mode of that data sample. You could try sorting and binning the data, say into 20 bins of equal width between min and max, (e.g. ask Excel to make a histogram of the data sample values) and finding the bin with the largest number of data samples. The center point of that bin is an estimate of the mode.
$endgroup$
– Dilip Sarwate
Nov 18 '11 at 13:04

$begingroup$
Mean and median is totally unsuitable for my data. The problem with frequency histogram is its hard to find optimal band width. I'm making a program so it's critical for me to have this feature working independently. Didn't anybody know the solution. Did anybody know where can I ask questions. Thanks
$endgroup$
– Syaiful Nizam Yahya
Nov 20 '11 at 0:42

1

$begingroup$
Not sure you fully grasped the content of @Dilip's answer so let me repeat it: the data samples you are considering will have NO MODE whatsoever. This is not as if people did not know the solution, people know that there is no solution (and if you ask the same question elsewhere every correct answer which you will get will state the same thing).
$endgroup$
– Did
Nov 27 '11 at 11:28

$begingroup$
I don't have an Excel solution for you. This is a near replicate of stats.stackexchange.com/questions/19952/… except you are asking for an Excel method. The key fact here is that you are trying to estimate the density of your data along whatever your dimension is.
$endgroup$
– Muhammad Qadri
Apr 7 '12 at 4:10

|
show 1 more comment

First, my background is not math.

My objective is to find the value that occurs most frequently in a data sample OR the value that is most likely.

Let's say my sample is [1,5,6,6,7,10]. Finding the mode for this sample is simple (the mode is 6).

But if let's say I change the sample to [1,5,6,7,10], I don't know how to find the mode. The results that I want is 6 since 6 is the most probable data.

The actual situation consist of hundreds of data (in floats) that are saved in Excel. I would appreciate if someone could demo it in Excel.

edited Dec 20 '18 at 1:10

Stephane Bersier

1034

asked Nov 18 '11 at 10:06

Syaiful Nizam Yahya

11113

First, my background is not math.

My objective is to find the value that occurs most frequently in a data sample OR the value that is most likely.

Let's say my sample is [1,5,6,6,7,10]. Finding the mode for this sample is simple (the mode is 6).

But if let's say I change the sample to [1,5,6,7,10], I don't know how to find the mode. The results that I want is 6 since 6 is the most probable data.

The actual situation consist of hundreds of data (in floats) that are saved in Excel. I would appreciate if someone could demo it in Excel.

probability statistics

edited Dec 20 '18 at 1:10

Stephane Bersier

1034

asked Nov 18 '11 at 10:06

Syaiful Nizam Yahya

11113

edited Dec 20 '18 at 1:10

Stephane Bersier

1034

asked Nov 18 '11 at 10:06

Syaiful Nizam Yahya

11113

edited Dec 20 '18 at 1:10

Stephane Bersier

1034

edited Dec 20 '18 at 1:10

Stephane Bersier

1034

edited Dec 20 '18 at 1:10

Stephane Bersier

1034

asked Nov 18 '11 at 10:06

Syaiful Nizam Yahya

11113

asked Nov 18 '11 at 10:06

Syaiful Nizam Yahya

11113

asked Nov 18 '11 at 10:06

Syaiful Nizam Yahya

11113

$begingroup$
Do you have to find the mode? There are other averages which are significantly easier to compute when you have float data, for example the mean or the median.
$endgroup$
– Chris Taylor
Nov 18 '11 at 11:44

1

$begingroup$
If your samples are from what you believe is a continuous distribution, then it is almost certain that all the "hundreds of data (in floats)" are all distinct numbers (as in your second example) and there is no mode of that data sample. You could try sorting and binning the data, say into 20 bins of equal width between min and max, (e.g. ask Excel to make a histogram of the data sample values) and finding the bin with the largest number of data samples. The center point of that bin is an estimate of the mode.
$endgroup$
– Dilip Sarwate
Nov 18 '11 at 13:04

$begingroup$
Mean and median is totally unsuitable for my data. The problem with frequency histogram is its hard to find optimal band width. I'm making a program so it's critical for me to have this feature working independently. Didn't anybody know the solution. Did anybody know where can I ask questions. Thanks
$endgroup$
– Syaiful Nizam Yahya
Nov 20 '11 at 0:42

1

$begingroup$
Not sure you fully grasped the content of @Dilip's answer so let me repeat it: the data samples you are considering will have NO MODE whatsoever. This is not as if people did not know the solution, people know that there is no solution (and if you ask the same question elsewhere every correct answer which you will get will state the same thing).
$endgroup$
– Did
Nov 27 '11 at 11:28

$begingroup$
I don't have an Excel solution for you. This is a near replicate of stats.stackexchange.com/questions/19952/… except you are asking for an Excel method. The key fact here is that you are trying to estimate the density of your data along whatever your dimension is.
$endgroup$
– Muhammad Qadri
Apr 7 '12 at 4:10

|
show 1 more comment

$begingroup$
Do you have to find the mode? There are other averages which are significantly easier to compute when you have float data, for example the mean or the median.
$endgroup$
– Chris Taylor
Nov 18 '11 at 11:44

1

$begingroup$
If your samples are from what you believe is a continuous distribution, then it is almost certain that all the "hundreds of data (in floats)" are all distinct numbers (as in your second example) and there is no mode of that data sample. You could try sorting and binning the data, say into 20 bins of equal width between min and max, (e.g. ask Excel to make a histogram of the data sample values) and finding the bin with the largest number of data samples. The center point of that bin is an estimate of the mode.
$endgroup$
– Dilip Sarwate
Nov 18 '11 at 13:04

$begingroup$
Mean and median is totally unsuitable for my data. The problem with frequency histogram is its hard to find optimal band width. I'm making a program so it's critical for me to have this feature working independently. Didn't anybody know the solution. Did anybody know where can I ask questions. Thanks
$endgroup$
– Syaiful Nizam Yahya
Nov 20 '11 at 0:42

1

$begingroup$
Not sure you fully grasped the content of @Dilip's answer so let me repeat it: the data samples you are considering will have NO MODE whatsoever. This is not as if people did not know the solution, people know that there is no solution (and if you ask the same question elsewhere every correct answer which you will get will state the same thing).
$endgroup$
– Did
Nov 27 '11 at 11:28

$begingroup$
I don't have an Excel solution for you. This is a near replicate of stats.stackexchange.com/questions/19952/… except you are asking for an Excel method. The key fact here is that you are trying to estimate the density of your data along whatever your dimension is.
$endgroup$
– Muhammad Qadri
Apr 7 '12 at 4:10

Do you have to find the mode? There are other averages which are significantly easier to compute when you have float data, for example the mean or the median.

– Chris Taylor
Nov 18 '11 at 11:44

If your samples are from what you believe is a continuous distribution, then it is almost certain that all the "hundreds of data (in floats)" are all distinct numbers (as in your second example) and there is no mode of that data sample. You could try sorting and binning the data, say into 20 bins of equal width between min and max, (e.g. ask Excel to make a histogram of the data sample values) and finding the bin with the largest number of data samples. The center point of that bin is an estimate of the mode.

– Dilip Sarwate
Nov 18 '11 at 13:04

Mean and median is totally unsuitable for my data. The problem with frequency histogram is its hard to find optimal band width. I'm making a program so it's critical for me to have this feature working independently. Didn't anybody know the solution. Did anybody know where can I ask questions. Thanks

– Syaiful Nizam Yahya
Nov 20 '11 at 0:42

Not sure you fully grasped the content of @Dilip's answer so let me repeat it: the data samples you are considering will have NO MODE whatsoever. This is not as if people did not know the solution, people know that there is no solution (and if you ask the same question elsewhere every correct answer which you will get will state the same thing).

– Did
Nov 27 '11 at 11:28

I don't have an Excel solution for you. This is a near replicate of stats.stackexchange.com/questions/19952/… except you are asking for an Excel method. The key fact here is that you are trying to estimate the density of your data along whatever your dimension is.

– Muhammad Qadri
Apr 7 '12 at 4:10

|
show 1 more comment

1 Answer
1

active

oldest

votes

For the record, here are some general solution sketches that also work for high-dimensional distributions (probably too complex for the asker, though; some sort of kernel density estimation is much simpler and reasonably good):

Train an f-GAN with reverse KL divergence, without giving any random input to the generator (i.e. force it to be deterministic).

Train an f-GAN with reverse KL divergence, move the input distribution to the generator towards a Dirac delta function as training progresses, and add a gradient penalty to the generator loss function.

Train a (differentiable) generative model that can tractably evaluate an approximation of the pdf at any point (I believe that e.g. a VAE, a flow-based model, or an autoregressive model would do). Then use some type of optimization (some flavor of gradient ascent can be used if model inference is differentiable) to find a maximum of that approximation.

answered Dec 20 '18 at 1:08

Stephane Bersier

1034

$begingroup$
I believe these solutions converge as the sample size and the "network approximation power" increase (assuming training works well).
$endgroup$
– Stephane Bersier
Dec 20 '18 at 1:12

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f83322%2fhow-to-find-the-mode-of-a-continuous-distribution-from-a-sample%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Train an f-GAN with reverse KL divergence, without giving any random input to the generator (i.e. force it to be deterministic).

Train an f-GAN with reverse KL divergence, move the input distribution to the generator towards a Dirac delta function as training progresses, and add a gradient penalty to the generator loss function.

Train a (differentiable) generative model that can tractably evaluate an approximation of the pdf at any point (I believe that e.g. a VAE, a flow-based model, or an autoregressive model would do). Then use some type of optimization (some flavor of gradient ascent can be used if model inference is differentiable) to find a maximum of that approximation.

answered Dec 20 '18 at 1:08

Stephane Bersier

1034

$begingroup$
I believe these solutions converge as the sample size and the "network approximation power" increase (assuming training works well).
$endgroup$
– Stephane Bersier
Dec 20 '18 at 1:12

add a comment |

Train an f-GAN with reverse KL divergence, without giving any random input to the generator (i.e. force it to be deterministic).

Train an f-GAN with reverse KL divergence, move the input distribution to the generator towards a Dirac delta function as training progresses, and add a gradient penalty to the generator loss function.

Train a (differentiable) generative model that can tractably evaluate an approximation of the pdf at any point (I believe that e.g. a VAE, a flow-based model, or an autoregressive model would do). Then use some type of optimization (some flavor of gradient ascent can be used if model inference is differentiable) to find a maximum of that approximation.

answered Dec 20 '18 at 1:08

Stephane Bersier

1034

$begingroup$
I believe these solutions converge as the sample size and the "network approximation power" increase (assuming training works well).
$endgroup$
– Stephane Bersier
Dec 20 '18 at 1:12

add a comment |

Train an f-GAN with reverse KL divergence, without giving any random input to the generator (i.e. force it to be deterministic).

Train an f-GAN with reverse KL divergence, move the input distribution to the generator towards a Dirac delta function as training progresses, and add a gradient penalty to the generator loss function.

Train a (differentiable) generative model that can tractably evaluate an approximation of the pdf at any point (I believe that e.g. a VAE, a flow-based model, or an autoregressive model would do). Then use some type of optimization (some flavor of gradient ascent can be used if model inference is differentiable) to find a maximum of that approximation.

answered Dec 20 '18 at 1:08

Stephane Bersier

1034

Train an f-GAN with reverse KL divergence, without giving any random input to the generator (i.e. force it to be deterministic).

Train an f-GAN with reverse KL divergence, move the input distribution to the generator towards a Dirac delta function as training progresses, and add a gradient penalty to the generator loss function.

Train a (differentiable) generative model that can tractably evaluate an approximation of the pdf at any point (I believe that e.g. a VAE, a flow-based model, or an autoregressive model would do). Then use some type of optimization (some flavor of gradient ascent can be used if model inference is differentiable) to find a maximum of that approximation.

answered Dec 20 '18 at 1:08

Stephane Bersier

1034

answered Dec 20 '18 at 1:08

Stephane Bersier

1034

answered Dec 20 '18 at 1:08

Stephane Bersier

1034

answered Dec 20 '18 at 1:08

Stephane Bersier

1034

$begingroup$
I believe these solutions converge as the sample size and the "network approximation power" increase (assuming training works well).
$endgroup$
– Stephane Bersier
Dec 20 '18 at 1:12

add a comment |

$begingroup$
I believe these solutions converge as the sample size and the "network approximation power" increase (assuming training works well).
$endgroup$
– Stephane Bersier
Dec 20 '18 at 1:12

I believe these solutions converge as the sample size and the "network approximation power" increase (assuming training works well).

– Stephane Bersier
Dec 20 '18 at 1:12

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Mathematics Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu