Tests for log-normal distribution












0












$begingroup$


I am facing a very large dataset (e.g. 100.000 entries) and would like to answer the question if we can apply a lognormal distribution.



I used a QQ-Plot in R but the lognormal distribution doesnt seem to fit.



BUT: Statistic tests do not reject the hypothesis, that the underlying distribution is log-normal.



Do you have any powerfull test in mind ? or shall I trust the QQ plot?



Many thanks in advance.










share|cite|improve this question









$endgroup$












  • $begingroup$
    As a side note: One question in Engineering is how well does a log-normal plot work? In other words, do the deviants in the QQ plot matter? Phrased another way: do you want an absolute answer or an adequate description? Many physical processes won't have a neat formulation whereas standard formulations can be used to describe results adequately; always erring on the side of caution of course.
    $endgroup$
    – rrogers
    Jan 8 at 18:09










  • $begingroup$
    I would need and adequate description... I should guarantee that the usage of a lognormal distribution is not wrong. =)
    $endgroup$
    – Calculator123
    Jan 8 at 19:22










  • $begingroup$
    "not wrong"? You need to give a calculable formulation of the meaning of that. I am not criticizing :) In real world measurements/data you are never certain; unless you do a 100% testing and then the whole issue is moot. Normally "not wrong" is something like how many outliers you have from the QQ plot for log-normal. 100,000 is a nice number; why not break the data up and fit the lognormal model; and compare the fits. In statistics, you would assign data randomly in the sorting. In Engineering you would break it up into blocks and do the fitting. Checking for "drift".
    $endgroup$
    – rrogers
    Jan 8 at 21:13










  • $begingroup$
    Incedentaly, I usually find a mode (say log-nomal) do the fitting. Then take the residues from the (supposedly) perfect log-normal curve; either additively or multiplicatively, Then examine the residues for a normal distribution; both as frequency plot and a QQ normal plot. You might have to forgive me if I misuse QQ plot terminallogy. When I learned about using plots to visualize and validate they were "normal probability" plots, which I still like and use mentally. Using QQ plots I always have to reread the exact differences.
    $endgroup$
    – rrogers
    Jan 8 at 21:21










  • $begingroup$
    Using the normal test on the residues assures me that whatever factors are left in the orginal data will be very hard to untangle. Although the last time around I went in for chi-squared approach to estimate unknown residuals; but I don't recomend it. You should probably wait for a real statistician for answers. My in depth knowledge in testing is superficial; I only know the techniques that have served me.
    $endgroup$
    – rrogers
    Jan 8 at 21:30
















0












$begingroup$


I am facing a very large dataset (e.g. 100.000 entries) and would like to answer the question if we can apply a lognormal distribution.



I used a QQ-Plot in R but the lognormal distribution doesnt seem to fit.



BUT: Statistic tests do not reject the hypothesis, that the underlying distribution is log-normal.



Do you have any powerfull test in mind ? or shall I trust the QQ plot?



Many thanks in advance.










share|cite|improve this question









$endgroup$












  • $begingroup$
    As a side note: One question in Engineering is how well does a log-normal plot work? In other words, do the deviants in the QQ plot matter? Phrased another way: do you want an absolute answer or an adequate description? Many physical processes won't have a neat formulation whereas standard formulations can be used to describe results adequately; always erring on the side of caution of course.
    $endgroup$
    – rrogers
    Jan 8 at 18:09










  • $begingroup$
    I would need and adequate description... I should guarantee that the usage of a lognormal distribution is not wrong. =)
    $endgroup$
    – Calculator123
    Jan 8 at 19:22










  • $begingroup$
    "not wrong"? You need to give a calculable formulation of the meaning of that. I am not criticizing :) In real world measurements/data you are never certain; unless you do a 100% testing and then the whole issue is moot. Normally "not wrong" is something like how many outliers you have from the QQ plot for log-normal. 100,000 is a nice number; why not break the data up and fit the lognormal model; and compare the fits. In statistics, you would assign data randomly in the sorting. In Engineering you would break it up into blocks and do the fitting. Checking for "drift".
    $endgroup$
    – rrogers
    Jan 8 at 21:13










  • $begingroup$
    Incedentaly, I usually find a mode (say log-nomal) do the fitting. Then take the residues from the (supposedly) perfect log-normal curve; either additively or multiplicatively, Then examine the residues for a normal distribution; both as frequency plot and a QQ normal plot. You might have to forgive me if I misuse QQ plot terminallogy. When I learned about using plots to visualize and validate they were "normal probability" plots, which I still like and use mentally. Using QQ plots I always have to reread the exact differences.
    $endgroup$
    – rrogers
    Jan 8 at 21:21










  • $begingroup$
    Using the normal test on the residues assures me that whatever factors are left in the orginal data will be very hard to untangle. Although the last time around I went in for chi-squared approach to estimate unknown residuals; but I don't recomend it. You should probably wait for a real statistician for answers. My in depth knowledge in testing is superficial; I only know the techniques that have served me.
    $endgroup$
    – rrogers
    Jan 8 at 21:30














0












0








0





$begingroup$


I am facing a very large dataset (e.g. 100.000 entries) and would like to answer the question if we can apply a lognormal distribution.



I used a QQ-Plot in R but the lognormal distribution doesnt seem to fit.



BUT: Statistic tests do not reject the hypothesis, that the underlying distribution is log-normal.



Do you have any powerfull test in mind ? or shall I trust the QQ plot?



Many thanks in advance.










share|cite|improve this question









$endgroup$




I am facing a very large dataset (e.g. 100.000 entries) and would like to answer the question if we can apply a lognormal distribution.



I used a QQ-Plot in R but the lognormal distribution doesnt seem to fit.



BUT: Statistic tests do not reject the hypothesis, that the underlying distribution is log-normal.



Do you have any powerfull test in mind ? or shall I trust the QQ plot?



Many thanks in advance.







statistics hypothesis-testing






share|cite|improve this question













share|cite|improve this question











share|cite|improve this question




share|cite|improve this question










asked Jan 8 at 18:00









Calculator123Calculator123

1




1












  • $begingroup$
    As a side note: One question in Engineering is how well does a log-normal plot work? In other words, do the deviants in the QQ plot matter? Phrased another way: do you want an absolute answer or an adequate description? Many physical processes won't have a neat formulation whereas standard formulations can be used to describe results adequately; always erring on the side of caution of course.
    $endgroup$
    – rrogers
    Jan 8 at 18:09










  • $begingroup$
    I would need and adequate description... I should guarantee that the usage of a lognormal distribution is not wrong. =)
    $endgroup$
    – Calculator123
    Jan 8 at 19:22










  • $begingroup$
    "not wrong"? You need to give a calculable formulation of the meaning of that. I am not criticizing :) In real world measurements/data you are never certain; unless you do a 100% testing and then the whole issue is moot. Normally "not wrong" is something like how many outliers you have from the QQ plot for log-normal. 100,000 is a nice number; why not break the data up and fit the lognormal model; and compare the fits. In statistics, you would assign data randomly in the sorting. In Engineering you would break it up into blocks and do the fitting. Checking for "drift".
    $endgroup$
    – rrogers
    Jan 8 at 21:13










  • $begingroup$
    Incedentaly, I usually find a mode (say log-nomal) do the fitting. Then take the residues from the (supposedly) perfect log-normal curve; either additively or multiplicatively, Then examine the residues for a normal distribution; both as frequency plot and a QQ normal plot. You might have to forgive me if I misuse QQ plot terminallogy. When I learned about using plots to visualize and validate they were "normal probability" plots, which I still like and use mentally. Using QQ plots I always have to reread the exact differences.
    $endgroup$
    – rrogers
    Jan 8 at 21:21










  • $begingroup$
    Using the normal test on the residues assures me that whatever factors are left in the orginal data will be very hard to untangle. Although the last time around I went in for chi-squared approach to estimate unknown residuals; but I don't recomend it. You should probably wait for a real statistician for answers. My in depth knowledge in testing is superficial; I only know the techniques that have served me.
    $endgroup$
    – rrogers
    Jan 8 at 21:30


















  • $begingroup$
    As a side note: One question in Engineering is how well does a log-normal plot work? In other words, do the deviants in the QQ plot matter? Phrased another way: do you want an absolute answer or an adequate description? Many physical processes won't have a neat formulation whereas standard formulations can be used to describe results adequately; always erring on the side of caution of course.
    $endgroup$
    – rrogers
    Jan 8 at 18:09










  • $begingroup$
    I would need and adequate description... I should guarantee that the usage of a lognormal distribution is not wrong. =)
    $endgroup$
    – Calculator123
    Jan 8 at 19:22










  • $begingroup$
    "not wrong"? You need to give a calculable formulation of the meaning of that. I am not criticizing :) In real world measurements/data you are never certain; unless you do a 100% testing and then the whole issue is moot. Normally "not wrong" is something like how many outliers you have from the QQ plot for log-normal. 100,000 is a nice number; why not break the data up and fit the lognormal model; and compare the fits. In statistics, you would assign data randomly in the sorting. In Engineering you would break it up into blocks and do the fitting. Checking for "drift".
    $endgroup$
    – rrogers
    Jan 8 at 21:13










  • $begingroup$
    Incedentaly, I usually find a mode (say log-nomal) do the fitting. Then take the residues from the (supposedly) perfect log-normal curve; either additively or multiplicatively, Then examine the residues for a normal distribution; both as frequency plot and a QQ normal plot. You might have to forgive me if I misuse QQ plot terminallogy. When I learned about using plots to visualize and validate they were "normal probability" plots, which I still like and use mentally. Using QQ plots I always have to reread the exact differences.
    $endgroup$
    – rrogers
    Jan 8 at 21:21










  • $begingroup$
    Using the normal test on the residues assures me that whatever factors are left in the orginal data will be very hard to untangle. Although the last time around I went in for chi-squared approach to estimate unknown residuals; but I don't recomend it. You should probably wait for a real statistician for answers. My in depth knowledge in testing is superficial; I only know the techniques that have served me.
    $endgroup$
    – rrogers
    Jan 8 at 21:30
















$begingroup$
As a side note: One question in Engineering is how well does a log-normal plot work? In other words, do the deviants in the QQ plot matter? Phrased another way: do you want an absolute answer or an adequate description? Many physical processes won't have a neat formulation whereas standard formulations can be used to describe results adequately; always erring on the side of caution of course.
$endgroup$
– rrogers
Jan 8 at 18:09




$begingroup$
As a side note: One question in Engineering is how well does a log-normal plot work? In other words, do the deviants in the QQ plot matter? Phrased another way: do you want an absolute answer or an adequate description? Many physical processes won't have a neat formulation whereas standard formulations can be used to describe results adequately; always erring on the side of caution of course.
$endgroup$
– rrogers
Jan 8 at 18:09












$begingroup$
I would need and adequate description... I should guarantee that the usage of a lognormal distribution is not wrong. =)
$endgroup$
– Calculator123
Jan 8 at 19:22




$begingroup$
I would need and adequate description... I should guarantee that the usage of a lognormal distribution is not wrong. =)
$endgroup$
– Calculator123
Jan 8 at 19:22












$begingroup$
"not wrong"? You need to give a calculable formulation of the meaning of that. I am not criticizing :) In real world measurements/data you are never certain; unless you do a 100% testing and then the whole issue is moot. Normally "not wrong" is something like how many outliers you have from the QQ plot for log-normal. 100,000 is a nice number; why not break the data up and fit the lognormal model; and compare the fits. In statistics, you would assign data randomly in the sorting. In Engineering you would break it up into blocks and do the fitting. Checking for "drift".
$endgroup$
– rrogers
Jan 8 at 21:13




$begingroup$
"not wrong"? You need to give a calculable formulation of the meaning of that. I am not criticizing :) In real world measurements/data you are never certain; unless you do a 100% testing and then the whole issue is moot. Normally "not wrong" is something like how many outliers you have from the QQ plot for log-normal. 100,000 is a nice number; why not break the data up and fit the lognormal model; and compare the fits. In statistics, you would assign data randomly in the sorting. In Engineering you would break it up into blocks and do the fitting. Checking for "drift".
$endgroup$
– rrogers
Jan 8 at 21:13












$begingroup$
Incedentaly, I usually find a mode (say log-nomal) do the fitting. Then take the residues from the (supposedly) perfect log-normal curve; either additively or multiplicatively, Then examine the residues for a normal distribution; both as frequency plot and a QQ normal plot. You might have to forgive me if I misuse QQ plot terminallogy. When I learned about using plots to visualize and validate they were "normal probability" plots, which I still like and use mentally. Using QQ plots I always have to reread the exact differences.
$endgroup$
– rrogers
Jan 8 at 21:21




$begingroup$
Incedentaly, I usually find a mode (say log-nomal) do the fitting. Then take the residues from the (supposedly) perfect log-normal curve; either additively or multiplicatively, Then examine the residues for a normal distribution; both as frequency plot and a QQ normal plot. You might have to forgive me if I misuse QQ plot terminallogy. When I learned about using plots to visualize and validate they were "normal probability" plots, which I still like and use mentally. Using QQ plots I always have to reread the exact differences.
$endgroup$
– rrogers
Jan 8 at 21:21












$begingroup$
Using the normal test on the residues assures me that whatever factors are left in the orginal data will be very hard to untangle. Although the last time around I went in for chi-squared approach to estimate unknown residuals; but I don't recomend it. You should probably wait for a real statistician for answers. My in depth knowledge in testing is superficial; I only know the techniques that have served me.
$endgroup$
– rrogers
Jan 8 at 21:30




$begingroup$
Using the normal test on the residues assures me that whatever factors are left in the orginal data will be very hard to untangle. Although the last time around I went in for chi-squared approach to estimate unknown residuals; but I don't recomend it. You should probably wait for a real statistician for answers. My in depth knowledge in testing is superficial; I only know the techniques that have served me.
$endgroup$
– rrogers
Jan 8 at 21:30










1 Answer
1






active

oldest

votes


















0












$begingroup$

You have a rather large data set.



"If we can apply" is not a well posed problem, but as an engineer,I"d ask "how well can a log normal distribution model this data?".



I'd try the following:




  1. Estimate mean and variance from the data set.

  2. From these, compute estimated parameters for the log-normal distribution.

  3. Use a smoothing technique such as Kernel estimation to obtain values (or a plot) of the distribution.

  4. Compare the plot/points obtained with the ones computed for the log-normal distribution.


For the last part, is a bit more "art" than "technique" to determine if the fit is good or not.






share|cite|improve this answer









$endgroup$













    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "69"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    noCode: true, onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3066510%2ftests-for-log-normal-distribution%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0












    $begingroup$

    You have a rather large data set.



    "If we can apply" is not a well posed problem, but as an engineer,I"d ask "how well can a log normal distribution model this data?".



    I'd try the following:




    1. Estimate mean and variance from the data set.

    2. From these, compute estimated parameters for the log-normal distribution.

    3. Use a smoothing technique such as Kernel estimation to obtain values (or a plot) of the distribution.

    4. Compare the plot/points obtained with the ones computed for the log-normal distribution.


    For the last part, is a bit more "art" than "technique" to determine if the fit is good or not.






    share|cite|improve this answer









    $endgroup$


















      0












      $begingroup$

      You have a rather large data set.



      "If we can apply" is not a well posed problem, but as an engineer,I"d ask "how well can a log normal distribution model this data?".



      I'd try the following:




      1. Estimate mean and variance from the data set.

      2. From these, compute estimated parameters for the log-normal distribution.

      3. Use a smoothing technique such as Kernel estimation to obtain values (or a plot) of the distribution.

      4. Compare the plot/points obtained with the ones computed for the log-normal distribution.


      For the last part, is a bit more "art" than "technique" to determine if the fit is good or not.






      share|cite|improve this answer









      $endgroup$
















        0












        0








        0





        $begingroup$

        You have a rather large data set.



        "If we can apply" is not a well posed problem, but as an engineer,I"d ask "how well can a log normal distribution model this data?".



        I'd try the following:




        1. Estimate mean and variance from the data set.

        2. From these, compute estimated parameters for the log-normal distribution.

        3. Use a smoothing technique such as Kernel estimation to obtain values (or a plot) of the distribution.

        4. Compare the plot/points obtained with the ones computed for the log-normal distribution.


        For the last part, is a bit more "art" than "technique" to determine if the fit is good or not.






        share|cite|improve this answer









        $endgroup$



        You have a rather large data set.



        "If we can apply" is not a well posed problem, but as an engineer,I"d ask "how well can a log normal distribution model this data?".



        I'd try the following:




        1. Estimate mean and variance from the data set.

        2. From these, compute estimated parameters for the log-normal distribution.

        3. Use a smoothing technique such as Kernel estimation to obtain values (or a plot) of the distribution.

        4. Compare the plot/points obtained with the ones computed for the log-normal distribution.


        For the last part, is a bit more "art" than "technique" to determine if the fit is good or not.







        share|cite|improve this answer












        share|cite|improve this answer



        share|cite|improve this answer










        answered Jan 23 at 13:03









        MefiticoMefitico

        926117




        926117






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Mathematics Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3066510%2ftests-for-log-normal-distribution%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            MongoDB - Not Authorized To Execute Command

            in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith

            Npm cannot find a required file even through it is in the searched directory