Tests for log-normal distribution

I am facing a very large dataset (e.g. 100.000 entries) and would like to answer the question if we can apply a lognormal distribution.

I used a QQ-Plot in R but the lognormal distribution doesnt seem to fit.

BUT: Statistic tests do not reject the hypothesis, that the underlying distribution is log-normal.

Do you have any powerfull test in mind ? or shall I trust the QQ plot?

Many thanks in advance.

asked Jan 8 at 18:00

Calculator123

$begingroup$
As a side note: One question in Engineering is how well does a log-normal plot work? In other words, do the deviants in the QQ plot matter? Phrased another way: do you want an absolute answer or an adequate description? Many physical processes won't have a neat formulation whereas standard formulations can be used to describe results adequately; always erring on the side of caution of course.
$endgroup$
– rrogers
Jan 8 at 18:09

$begingroup$
I would need and adequate description... I should guarantee that the usage of a lognormal distribution is not wrong. =)
$endgroup$
– Calculator123
Jan 8 at 19:22

$begingroup$
"not wrong"? You need to give a calculable formulation of the meaning of that. I am not criticizing :) In real world measurements/data you are never certain; unless you do a 100% testing and then the whole issue is moot. Normally "not wrong" is something like how many outliers you have from the QQ plot for log-normal. 100,000 is a nice number; why not break the data up and fit the lognormal model; and compare the fits. In statistics, you would assign data randomly in the sorting. In Engineering you would break it up into blocks and do the fitting. Checking for "drift".
$endgroup$
– rrogers
Jan 8 at 21:13

$begingroup$
Incedentaly, I usually find a mode (say log-nomal) do the fitting. Then take the residues from the (supposedly) perfect log-normal curve; either additively or multiplicatively, Then examine the residues for a normal distribution; both as frequency plot and a QQ normal plot. You might have to forgive me if I misuse QQ plot terminallogy. When I learned about using plots to visualize and validate they were "normal probability" plots, which I still like and use mentally. Using QQ plots I always have to reread the exact differences.
$endgroup$
– rrogers
Jan 8 at 21:21

$begingroup$
Using the normal test on the residues assures me that whatever factors are left in the orginal data will be very hard to untangle. Although the last time around I went in for chi-squared approach to estimate unknown residuals; but I don't recomend it. You should probably wait for a real statistician for answers. My in depth knowledge in testing is superficial; I only know the techniques that have served me.
$endgroup$
– rrogers
Jan 8 at 21:30

|
show 3 more comments

I am facing a very large dataset (e.g. 100.000 entries) and would like to answer the question if we can apply a lognormal distribution.

I used a QQ-Plot in R but the lognormal distribution doesnt seem to fit.

BUT: Statistic tests do not reject the hypothesis, that the underlying distribution is log-normal.

Do you have any powerfull test in mind ? or shall I trust the QQ plot?

Many thanks in advance.

asked Jan 8 at 18:00

Calculator123

$begingroup$
As a side note: One question in Engineering is how well does a log-normal plot work? In other words, do the deviants in the QQ plot matter? Phrased another way: do you want an absolute answer or an adequate description? Many physical processes won't have a neat formulation whereas standard formulations can be used to describe results adequately; always erring on the side of caution of course.
$endgroup$
– rrogers
Jan 8 at 18:09

$begingroup$
I would need and adequate description... I should guarantee that the usage of a lognormal distribution is not wrong. =)
$endgroup$
– Calculator123
Jan 8 at 19:22

$begingroup$
"not wrong"? You need to give a calculable formulation of the meaning of that. I am not criticizing :) In real world measurements/data you are never certain; unless you do a 100% testing and then the whole issue is moot. Normally "not wrong" is something like how many outliers you have from the QQ plot for log-normal. 100,000 is a nice number; why not break the data up and fit the lognormal model; and compare the fits. In statistics, you would assign data randomly in the sorting. In Engineering you would break it up into blocks and do the fitting. Checking for "drift".
$endgroup$
– rrogers
Jan 8 at 21:13

$begingroup$
Incedentaly, I usually find a mode (say log-nomal) do the fitting. Then take the residues from the (supposedly) perfect log-normal curve; either additively or multiplicatively, Then examine the residues for a normal distribution; both as frequency plot and a QQ normal plot. You might have to forgive me if I misuse QQ plot terminallogy. When I learned about using plots to visualize and validate they were "normal probability" plots, which I still like and use mentally. Using QQ plots I always have to reread the exact differences.
$endgroup$
– rrogers
Jan 8 at 21:21

$begingroup$
Using the normal test on the residues assures me that whatever factors are left in the orginal data will be very hard to untangle. Although the last time around I went in for chi-squared approach to estimate unknown residuals; but I don't recomend it. You should probably wait for a real statistician for answers. My in depth knowledge in testing is superficial; I only know the techniques that have served me.
$endgroup$
– rrogers
Jan 8 at 21:30

|
show 3 more comments

I am facing a very large dataset (e.g. 100.000 entries) and would like to answer the question if we can apply a lognormal distribution.

I used a QQ-Plot in R but the lognormal distribution doesnt seem to fit.

BUT: Statistic tests do not reject the hypothesis, that the underlying distribution is log-normal.

Do you have any powerfull test in mind ? or shall I trust the QQ plot?

Many thanks in advance.

asked Jan 8 at 18:00

Calculator123

I am facing a very large dataset (e.g. 100.000 entries) and would like to answer the question if we can apply a lognormal distribution.

I used a QQ-Plot in R but the lognormal distribution doesnt seem to fit.

BUT: Statistic tests do not reject the hypothesis, that the underlying distribution is log-normal.

Do you have any powerfull test in mind ? or shall I trust the QQ plot?

Many thanks in advance.

statistics hypothesis-testing

asked Jan 8 at 18:00

Calculator123

asked Jan 8 at 18:00

Calculator123

asked Jan 8 at 18:00

Calculator123

asked Jan 8 at 18:00

Calculator123

asked Jan 8 at 18:00

Calculator123

$begingroup$
As a side note: One question in Engineering is how well does a log-normal plot work? In other words, do the deviants in the QQ plot matter? Phrased another way: do you want an absolute answer or an adequate description? Many physical processes won't have a neat formulation whereas standard formulations can be used to describe results adequately; always erring on the side of caution of course.
$endgroup$
– rrogers
Jan 8 at 18:09

$begingroup$
I would need and adequate description... I should guarantee that the usage of a lognormal distribution is not wrong. =)
$endgroup$
– Calculator123
Jan 8 at 19:22

$begingroup$
"not wrong"? You need to give a calculable formulation of the meaning of that. I am not criticizing :) In real world measurements/data you are never certain; unless you do a 100% testing and then the whole issue is moot. Normally "not wrong" is something like how many outliers you have from the QQ plot for log-normal. 100,000 is a nice number; why not break the data up and fit the lognormal model; and compare the fits. In statistics, you would assign data randomly in the sorting. In Engineering you would break it up into blocks and do the fitting. Checking for "drift".
$endgroup$
– rrogers
Jan 8 at 21:13

$begingroup$
Incedentaly, I usually find a mode (say log-nomal) do the fitting. Then take the residues from the (supposedly) perfect log-normal curve; either additively or multiplicatively, Then examine the residues for a normal distribution; both as frequency plot and a QQ normal plot. You might have to forgive me if I misuse QQ plot terminallogy. When I learned about using plots to visualize and validate they were "normal probability" plots, which I still like and use mentally. Using QQ plots I always have to reread the exact differences.
$endgroup$
– rrogers
Jan 8 at 21:21

$begingroup$
Using the normal test on the residues assures me that whatever factors are left in the orginal data will be very hard to untangle. Although the last time around I went in for chi-squared approach to estimate unknown residuals; but I don't recomend it. You should probably wait for a real statistician for answers. My in depth knowledge in testing is superficial; I only know the techniques that have served me.
$endgroup$
– rrogers
Jan 8 at 21:30

|
show 3 more comments

$begingroup$
As a side note: One question in Engineering is how well does a log-normal plot work? In other words, do the deviants in the QQ plot matter? Phrased another way: do you want an absolute answer or an adequate description? Many physical processes won't have a neat formulation whereas standard formulations can be used to describe results adequately; always erring on the side of caution of course.
$endgroup$
– rrogers
Jan 8 at 18:09

$begingroup$
I would need and adequate description... I should guarantee that the usage of a lognormal distribution is not wrong. =)
$endgroup$
– Calculator123
Jan 8 at 19:22

$begingroup$
"not wrong"? You need to give a calculable formulation of the meaning of that. I am not criticizing :) In real world measurements/data you are never certain; unless you do a 100% testing and then the whole issue is moot. Normally "not wrong" is something like how many outliers you have from the QQ plot for log-normal. 100,000 is a nice number; why not break the data up and fit the lognormal model; and compare the fits. In statistics, you would assign data randomly in the sorting. In Engineering you would break it up into blocks and do the fitting. Checking for "drift".
$endgroup$
– rrogers
Jan 8 at 21:13

$begingroup$
Incedentaly, I usually find a mode (say log-nomal) do the fitting. Then take the residues from the (supposedly) perfect log-normal curve; either additively or multiplicatively, Then examine the residues for a normal distribution; both as frequency plot and a QQ normal plot. You might have to forgive me if I misuse QQ plot terminallogy. When I learned about using plots to visualize and validate they were "normal probability" plots, which I still like and use mentally. Using QQ plots I always have to reread the exact differences.
$endgroup$
– rrogers
Jan 8 at 21:21

$begingroup$
Using the normal test on the residues assures me that whatever factors are left in the orginal data will be very hard to untangle. Although the last time around I went in for chi-squared approach to estimate unknown residuals; but I don't recomend it. You should probably wait for a real statistician for answers. My in depth knowledge in testing is superficial; I only know the techniques that have served me.
$endgroup$
– rrogers
Jan 8 at 21:30

As a side note: One question in Engineering is how well does a log-normal plot work? In other words, do the deviants in the QQ plot matter? Phrased another way: do you want an absolute answer or an adequate description? Many physical processes won't have a neat formulation whereas standard formulations can be used to describe results adequately; always erring on the side of caution of course.

– rrogers
Jan 8 at 18:09

I would need and adequate description... I should guarantee that the usage of a lognormal distribution is not wrong. =)

– Calculator123
Jan 8 at 19:22

"not wrong"? You need to give a calculable formulation of the meaning of that. I am not criticizing :) In real world measurements/data you are never certain; unless you do a 100% testing and then the whole issue is moot. Normally "not wrong" is something like how many outliers you have from the QQ plot for log-normal. 100,000 is a nice number; why not break the data up and fit the lognormal model; and compare the fits. In statistics, you would assign data randomly in the sorting. In Engineering you would break it up into blocks and do the fitting. Checking for "drift".

– rrogers
Jan 8 at 21:13

Incedentaly, I usually find a mode (say log-nomal) do the fitting. Then take the residues from the (supposedly) perfect log-normal curve; either additively or multiplicatively, Then examine the residues for a normal distribution; both as frequency plot and a QQ normal plot. You might have to forgive me if I misuse QQ plot terminallogy. When I learned about using plots to visualize and validate they were "normal probability" plots, which I still like and use mentally. Using QQ plots I always have to reread the exact differences.

– rrogers
Jan 8 at 21:21

Using the normal test on the residues assures me that whatever factors are left in the orginal data will be very hard to untangle. Although the last time around I went in for chi-squared approach to estimate unknown residuals; but I don't recomend it. You should probably wait for a real statistician for answers. My in depth knowledge in testing is superficial; I only know the techniques that have served me.

– rrogers
Jan 8 at 21:30

|
show 3 more comments

1 Answer
1

active

oldest

votes

You have a rather large data set.

"If we can apply" is not a well posed problem, but as an engineer,I"d ask "how well can a log normal distribution model this data?".

I'd try the following:

Estimate mean and variance from the data set.

From these, compute estimated parameters for the log-normal distribution.

Use a smoothing technique such as Kernel estimation to obtain values (or a plot) of the distribution.

Compare the plot/points obtained with the ones computed for the log-normal distribution.

For the last part, is a bit more "art" than "technique" to determine if the fit is good or not.

answered Jan 23 at 13:03

Mefitico

926117

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3066510%2ftests-for-log-normal-distribution%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

You have a rather large data set.

"If we can apply" is not a well posed problem, but as an engineer,I"d ask "how well can a log normal distribution model this data?".

I'd try the following:

Estimate mean and variance from the data set.

From these, compute estimated parameters for the log-normal distribution.

Use a smoothing technique such as Kernel estimation to obtain values (or a plot) of the distribution.

Compare the plot/points obtained with the ones computed for the log-normal distribution.

For the last part, is a bit more "art" than "technique" to determine if the fit is good or not.

answered Jan 23 at 13:03

Mefitico

926117

add a comment |

You have a rather large data set.

"If we can apply" is not a well posed problem, but as an engineer,I"d ask "how well can a log normal distribution model this data?".

I'd try the following:

Estimate mean and variance from the data set.

From these, compute estimated parameters for the log-normal distribution.

Use a smoothing technique such as Kernel estimation to obtain values (or a plot) of the distribution.

Compare the plot/points obtained with the ones computed for the log-normal distribution.

For the last part, is a bit more "art" than "technique" to determine if the fit is good or not.

answered Jan 23 at 13:03

Mefitico

926117

add a comment |

You have a rather large data set.

"If we can apply" is not a well posed problem, but as an engineer,I"d ask "how well can a log normal distribution model this data?".

I'd try the following:

Estimate mean and variance from the data set.

From these, compute estimated parameters for the log-normal distribution.

Use a smoothing technique such as Kernel estimation to obtain values (or a plot) of the distribution.

Compare the plot/points obtained with the ones computed for the log-normal distribution.

For the last part, is a bit more "art" than "technique" to determine if the fit is good or not.

answered Jan 23 at 13:03

Mefitico

926117

You have a rather large data set.

"If we can apply" is not a well posed problem, but as an engineer,I"d ask "how well can a log normal distribution model this data?".

I'd try the following:

Estimate mean and variance from the data set.

From these, compute estimated parameters for the log-normal distribution.

Use a smoothing technique such as Kernel estimation to obtain values (or a plot) of the distribution.

Compare the plot/points obtained with the ones computed for the log-normal distribution.

For the last part, is a bit more "art" than "technique" to determine if the fit is good or not.

answered Jan 23 at 13:03

Mefitico

926117

answered Jan 23 at 13:03

Mefitico

926117

answered Jan 23 at 13:03

Mefitico

926117

answered Jan 23 at 13:03

Mefitico

926117

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Mathematics Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu