Tests for log-normal distribution
$begingroup$
I am facing a very large dataset (e.g. 100.000 entries) and would like to answer the question if we can apply a lognormal distribution.
I used a QQ-Plot in R but the lognormal distribution doesnt seem to fit.
BUT: Statistic tests do not reject the hypothesis, that the underlying distribution is log-normal.
Do you have any powerfull test in mind ? or shall I trust the QQ plot?
Many thanks in advance.
statistics hypothesis-testing
$endgroup$
|
show 3 more comments
$begingroup$
I am facing a very large dataset (e.g. 100.000 entries) and would like to answer the question if we can apply a lognormal distribution.
I used a QQ-Plot in R but the lognormal distribution doesnt seem to fit.
BUT: Statistic tests do not reject the hypothesis, that the underlying distribution is log-normal.
Do you have any powerfull test in mind ? or shall I trust the QQ plot?
Many thanks in advance.
statistics hypothesis-testing
$endgroup$
$begingroup$
As a side note: One question in Engineering is how well does a log-normal plot work? In other words, do the deviants in the QQ plot matter? Phrased another way: do you want an absolute answer or an adequate description? Many physical processes won't have a neat formulation whereas standard formulations can be used to describe results adequately; always erring on the side of caution of course.
$endgroup$
– rrogers
Jan 8 at 18:09
$begingroup$
I would need and adequate description... I should guarantee that the usage of a lognormal distribution is not wrong. =)
$endgroup$
– Calculator123
Jan 8 at 19:22
$begingroup$
"not wrong"? You need to give a calculable formulation of the meaning of that. I am not criticizing :) In real world measurements/data you are never certain; unless you do a 100% testing and then the whole issue is moot. Normally "not wrong" is something like how many outliers you have from the QQ plot for log-normal. 100,000 is a nice number; why not break the data up and fit the lognormal model; and compare the fits. In statistics, you would assign data randomly in the sorting. In Engineering you would break it up into blocks and do the fitting. Checking for "drift".
$endgroup$
– rrogers
Jan 8 at 21:13
$begingroup$
Incedentaly, I usually find a mode (say log-nomal) do the fitting. Then take the residues from the (supposedly) perfect log-normal curve; either additively or multiplicatively, Then examine the residues for a normal distribution; both as frequency plot and a QQ normal plot. You might have to forgive me if I misuse QQ plot terminallogy. When I learned about using plots to visualize and validate they were "normal probability" plots, which I still like and use mentally. Using QQ plots I always have to reread the exact differences.
$endgroup$
– rrogers
Jan 8 at 21:21
$begingroup$
Using the normal test on the residues assures me that whatever factors are left in the orginal data will be very hard to untangle. Although the last time around I went in for chi-squared approach to estimate unknown residuals; but I don't recomend it. You should probably wait for a real statistician for answers. My in depth knowledge in testing is superficial; I only know the techniques that have served me.
$endgroup$
– rrogers
Jan 8 at 21:30
|
show 3 more comments
$begingroup$
I am facing a very large dataset (e.g. 100.000 entries) and would like to answer the question if we can apply a lognormal distribution.
I used a QQ-Plot in R but the lognormal distribution doesnt seem to fit.
BUT: Statistic tests do not reject the hypothesis, that the underlying distribution is log-normal.
Do you have any powerfull test in mind ? or shall I trust the QQ plot?
Many thanks in advance.
statistics hypothesis-testing
$endgroup$
I am facing a very large dataset (e.g. 100.000 entries) and would like to answer the question if we can apply a lognormal distribution.
I used a QQ-Plot in R but the lognormal distribution doesnt seem to fit.
BUT: Statistic tests do not reject the hypothesis, that the underlying distribution is log-normal.
Do you have any powerfull test in mind ? or shall I trust the QQ plot?
Many thanks in advance.
statistics hypothesis-testing
statistics hypothesis-testing
asked Jan 8 at 18:00
Calculator123Calculator123
1
1
$begingroup$
As a side note: One question in Engineering is how well does a log-normal plot work? In other words, do the deviants in the QQ plot matter? Phrased another way: do you want an absolute answer or an adequate description? Many physical processes won't have a neat formulation whereas standard formulations can be used to describe results adequately; always erring on the side of caution of course.
$endgroup$
– rrogers
Jan 8 at 18:09
$begingroup$
I would need and adequate description... I should guarantee that the usage of a lognormal distribution is not wrong. =)
$endgroup$
– Calculator123
Jan 8 at 19:22
$begingroup$
"not wrong"? You need to give a calculable formulation of the meaning of that. I am not criticizing :) In real world measurements/data you are never certain; unless you do a 100% testing and then the whole issue is moot. Normally "not wrong" is something like how many outliers you have from the QQ plot for log-normal. 100,000 is a nice number; why not break the data up and fit the lognormal model; and compare the fits. In statistics, you would assign data randomly in the sorting. In Engineering you would break it up into blocks and do the fitting. Checking for "drift".
$endgroup$
– rrogers
Jan 8 at 21:13
$begingroup$
Incedentaly, I usually find a mode (say log-nomal) do the fitting. Then take the residues from the (supposedly) perfect log-normal curve; either additively or multiplicatively, Then examine the residues for a normal distribution; both as frequency plot and a QQ normal plot. You might have to forgive me if I misuse QQ plot terminallogy. When I learned about using plots to visualize and validate they were "normal probability" plots, which I still like and use mentally. Using QQ plots I always have to reread the exact differences.
$endgroup$
– rrogers
Jan 8 at 21:21
$begingroup$
Using the normal test on the residues assures me that whatever factors are left in the orginal data will be very hard to untangle. Although the last time around I went in for chi-squared approach to estimate unknown residuals; but I don't recomend it. You should probably wait for a real statistician for answers. My in depth knowledge in testing is superficial; I only know the techniques that have served me.
$endgroup$
– rrogers
Jan 8 at 21:30
|
show 3 more comments
$begingroup$
As a side note: One question in Engineering is how well does a log-normal plot work? In other words, do the deviants in the QQ plot matter? Phrased another way: do you want an absolute answer or an adequate description? Many physical processes won't have a neat formulation whereas standard formulations can be used to describe results adequately; always erring on the side of caution of course.
$endgroup$
– rrogers
Jan 8 at 18:09
$begingroup$
I would need and adequate description... I should guarantee that the usage of a lognormal distribution is not wrong. =)
$endgroup$
– Calculator123
Jan 8 at 19:22
$begingroup$
"not wrong"? You need to give a calculable formulation of the meaning of that. I am not criticizing :) In real world measurements/data you are never certain; unless you do a 100% testing and then the whole issue is moot. Normally "not wrong" is something like how many outliers you have from the QQ plot for log-normal. 100,000 is a nice number; why not break the data up and fit the lognormal model; and compare the fits. In statistics, you would assign data randomly in the sorting. In Engineering you would break it up into blocks and do the fitting. Checking for "drift".
$endgroup$
– rrogers
Jan 8 at 21:13
$begingroup$
Incedentaly, I usually find a mode (say log-nomal) do the fitting. Then take the residues from the (supposedly) perfect log-normal curve; either additively or multiplicatively, Then examine the residues for a normal distribution; both as frequency plot and a QQ normal plot. You might have to forgive me if I misuse QQ plot terminallogy. When I learned about using plots to visualize and validate they were "normal probability" plots, which I still like and use mentally. Using QQ plots I always have to reread the exact differences.
$endgroup$
– rrogers
Jan 8 at 21:21
$begingroup$
Using the normal test on the residues assures me that whatever factors are left in the orginal data will be very hard to untangle. Although the last time around I went in for chi-squared approach to estimate unknown residuals; but I don't recomend it. You should probably wait for a real statistician for answers. My in depth knowledge in testing is superficial; I only know the techniques that have served me.
$endgroup$
– rrogers
Jan 8 at 21:30
$begingroup$
As a side note: One question in Engineering is how well does a log-normal plot work? In other words, do the deviants in the QQ plot matter? Phrased another way: do you want an absolute answer or an adequate description? Many physical processes won't have a neat formulation whereas standard formulations can be used to describe results adequately; always erring on the side of caution of course.
$endgroup$
– rrogers
Jan 8 at 18:09
$begingroup$
As a side note: One question in Engineering is how well does a log-normal plot work? In other words, do the deviants in the QQ plot matter? Phrased another way: do you want an absolute answer or an adequate description? Many physical processes won't have a neat formulation whereas standard formulations can be used to describe results adequately; always erring on the side of caution of course.
$endgroup$
– rrogers
Jan 8 at 18:09
$begingroup$
I would need and adequate description... I should guarantee that the usage of a lognormal distribution is not wrong. =)
$endgroup$
– Calculator123
Jan 8 at 19:22
$begingroup$
I would need and adequate description... I should guarantee that the usage of a lognormal distribution is not wrong. =)
$endgroup$
– Calculator123
Jan 8 at 19:22
$begingroup$
"not wrong"? You need to give a calculable formulation of the meaning of that. I am not criticizing :) In real world measurements/data you are never certain; unless you do a 100% testing and then the whole issue is moot. Normally "not wrong" is something like how many outliers you have from the QQ plot for log-normal. 100,000 is a nice number; why not break the data up and fit the lognormal model; and compare the fits. In statistics, you would assign data randomly in the sorting. In Engineering you would break it up into blocks and do the fitting. Checking for "drift".
$endgroup$
– rrogers
Jan 8 at 21:13
$begingroup$
"not wrong"? You need to give a calculable formulation of the meaning of that. I am not criticizing :) In real world measurements/data you are never certain; unless you do a 100% testing and then the whole issue is moot. Normally "not wrong" is something like how many outliers you have from the QQ plot for log-normal. 100,000 is a nice number; why not break the data up and fit the lognormal model; and compare the fits. In statistics, you would assign data randomly in the sorting. In Engineering you would break it up into blocks and do the fitting. Checking for "drift".
$endgroup$
– rrogers
Jan 8 at 21:13
$begingroup$
Incedentaly, I usually find a mode (say log-nomal) do the fitting. Then take the residues from the (supposedly) perfect log-normal curve; either additively or multiplicatively, Then examine the residues for a normal distribution; both as frequency plot and a QQ normal plot. You might have to forgive me if I misuse QQ plot terminallogy. When I learned about using plots to visualize and validate they were "normal probability" plots, which I still like and use mentally. Using QQ plots I always have to reread the exact differences.
$endgroup$
– rrogers
Jan 8 at 21:21
$begingroup$
Incedentaly, I usually find a mode (say log-nomal) do the fitting. Then take the residues from the (supposedly) perfect log-normal curve; either additively or multiplicatively, Then examine the residues for a normal distribution; both as frequency plot and a QQ normal plot. You might have to forgive me if I misuse QQ plot terminallogy. When I learned about using plots to visualize and validate they were "normal probability" plots, which I still like and use mentally. Using QQ plots I always have to reread the exact differences.
$endgroup$
– rrogers
Jan 8 at 21:21
$begingroup$
Using the normal test on the residues assures me that whatever factors are left in the orginal data will be very hard to untangle. Although the last time around I went in for chi-squared approach to estimate unknown residuals; but I don't recomend it. You should probably wait for a real statistician for answers. My in depth knowledge in testing is superficial; I only know the techniques that have served me.
$endgroup$
– rrogers
Jan 8 at 21:30
$begingroup$
Using the normal test on the residues assures me that whatever factors are left in the orginal data will be very hard to untangle. Although the last time around I went in for chi-squared approach to estimate unknown residuals; but I don't recomend it. You should probably wait for a real statistician for answers. My in depth knowledge in testing is superficial; I only know the techniques that have served me.
$endgroup$
– rrogers
Jan 8 at 21:30
|
show 3 more comments
1 Answer
1
active
oldest
votes
$begingroup$
You have a rather large data set.
"If we can apply" is not a well posed problem, but as an engineer,I"d ask "how well can a log normal distribution model this data?".
I'd try the following:
- Estimate mean and variance from the data set.
- From these, compute estimated parameters for the log-normal distribution.
- Use a smoothing technique such as Kernel estimation to obtain values (or a plot) of the distribution.
- Compare the plot/points obtained with the ones computed for the log-normal distribution.
For the last part, is a bit more "art" than "technique" to determine if the fit is good or not.
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3066510%2ftests-for-log-normal-distribution%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
You have a rather large data set.
"If we can apply" is not a well posed problem, but as an engineer,I"d ask "how well can a log normal distribution model this data?".
I'd try the following:
- Estimate mean and variance from the data set.
- From these, compute estimated parameters for the log-normal distribution.
- Use a smoothing technique such as Kernel estimation to obtain values (or a plot) of the distribution.
- Compare the plot/points obtained with the ones computed for the log-normal distribution.
For the last part, is a bit more "art" than "technique" to determine if the fit is good or not.
$endgroup$
add a comment |
$begingroup$
You have a rather large data set.
"If we can apply" is not a well posed problem, but as an engineer,I"d ask "how well can a log normal distribution model this data?".
I'd try the following:
- Estimate mean and variance from the data set.
- From these, compute estimated parameters for the log-normal distribution.
- Use a smoothing technique such as Kernel estimation to obtain values (or a plot) of the distribution.
- Compare the plot/points obtained with the ones computed for the log-normal distribution.
For the last part, is a bit more "art" than "technique" to determine if the fit is good or not.
$endgroup$
add a comment |
$begingroup$
You have a rather large data set.
"If we can apply" is not a well posed problem, but as an engineer,I"d ask "how well can a log normal distribution model this data?".
I'd try the following:
- Estimate mean and variance from the data set.
- From these, compute estimated parameters for the log-normal distribution.
- Use a smoothing technique such as Kernel estimation to obtain values (or a plot) of the distribution.
- Compare the plot/points obtained with the ones computed for the log-normal distribution.
For the last part, is a bit more "art" than "technique" to determine if the fit is good or not.
$endgroup$
You have a rather large data set.
"If we can apply" is not a well posed problem, but as an engineer,I"d ask "how well can a log normal distribution model this data?".
I'd try the following:
- Estimate mean and variance from the data set.
- From these, compute estimated parameters for the log-normal distribution.
- Use a smoothing technique such as Kernel estimation to obtain values (or a plot) of the distribution.
- Compare the plot/points obtained with the ones computed for the log-normal distribution.
For the last part, is a bit more "art" than "technique" to determine if the fit is good or not.
answered Jan 23 at 13:03
MefiticoMefitico
926117
926117
add a comment |
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3066510%2ftests-for-log-normal-distribution%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
As a side note: One question in Engineering is how well does a log-normal plot work? In other words, do the deviants in the QQ plot matter? Phrased another way: do you want an absolute answer or an adequate description? Many physical processes won't have a neat formulation whereas standard formulations can be used to describe results adequately; always erring on the side of caution of course.
$endgroup$
– rrogers
Jan 8 at 18:09
$begingroup$
I would need and adequate description... I should guarantee that the usage of a lognormal distribution is not wrong. =)
$endgroup$
– Calculator123
Jan 8 at 19:22
$begingroup$
"not wrong"? You need to give a calculable formulation of the meaning of that. I am not criticizing :) In real world measurements/data you are never certain; unless you do a 100% testing and then the whole issue is moot. Normally "not wrong" is something like how many outliers you have from the QQ plot for log-normal. 100,000 is a nice number; why not break the data up and fit the lognormal model; and compare the fits. In statistics, you would assign data randomly in the sorting. In Engineering you would break it up into blocks and do the fitting. Checking for "drift".
$endgroup$
– rrogers
Jan 8 at 21:13
$begingroup$
Incedentaly, I usually find a mode (say log-nomal) do the fitting. Then take the residues from the (supposedly) perfect log-normal curve; either additively or multiplicatively, Then examine the residues for a normal distribution; both as frequency plot and a QQ normal plot. You might have to forgive me if I misuse QQ plot terminallogy. When I learned about using plots to visualize and validate they were "normal probability" plots, which I still like and use mentally. Using QQ plots I always have to reread the exact differences.
$endgroup$
– rrogers
Jan 8 at 21:21
$begingroup$
Using the normal test on the residues assures me that whatever factors are left in the orginal data will be very hard to untangle. Although the last time around I went in for chi-squared approach to estimate unknown residuals; but I don't recomend it. You should probably wait for a real statistician for answers. My in depth knowledge in testing is superficial; I only know the techniques that have served me.
$endgroup$
– rrogers
Jan 8 at 21:30