Explain why the empirical distribution function $F_n$ is a reasonable approximation of $F_X$ for large $n$.












0












$begingroup$



Suppose you have a dataset $x_1, . . . , x_n$ which is a realization of a random sample from a distribution with distribution function $F_X$. Explain why the empirical distribution function $F_n$ is a reasonable approximation of $F_X$ for large $n$.




Can someone please explain or prove this sentence, I really don't know how to explain it, thanks in advance! :)










share|cite|improve this question









$endgroup$








  • 1




    $begingroup$
    Look at the Law of Large Numbers.
    $endgroup$
    – user321627
    Jan 26 at 23:32
















0












$begingroup$



Suppose you have a dataset $x_1, . . . , x_n$ which is a realization of a random sample from a distribution with distribution function $F_X$. Explain why the empirical distribution function $F_n$ is a reasonable approximation of $F_X$ for large $n$.




Can someone please explain or prove this sentence, I really don't know how to explain it, thanks in advance! :)










share|cite|improve this question









$endgroup$








  • 1




    $begingroup$
    Look at the Law of Large Numbers.
    $endgroup$
    – user321627
    Jan 26 at 23:32














0












0








0





$begingroup$



Suppose you have a dataset $x_1, . . . , x_n$ which is a realization of a random sample from a distribution with distribution function $F_X$. Explain why the empirical distribution function $F_n$ is a reasonable approximation of $F_X$ for large $n$.




Can someone please explain or prove this sentence, I really don't know how to explain it, thanks in advance! :)










share|cite|improve this question









$endgroup$





Suppose you have a dataset $x_1, . . . , x_n$ which is a realization of a random sample from a distribution with distribution function $F_X$. Explain why the empirical distribution function $F_n$ is a reasonable approximation of $F_X$ for large $n$.




Can someone please explain or prove this sentence, I really don't know how to explain it, thanks in advance! :)







probability statistics






share|cite|improve this question













share|cite|improve this question











share|cite|improve this question




share|cite|improve this question










asked Jan 26 at 22:25









Luke MarciLuke Marci

856




856








  • 1




    $begingroup$
    Look at the Law of Large Numbers.
    $endgroup$
    – user321627
    Jan 26 at 23:32














  • 1




    $begingroup$
    Look at the Law of Large Numbers.
    $endgroup$
    – user321627
    Jan 26 at 23:32








1




1




$begingroup$
Look at the Law of Large Numbers.
$endgroup$
– user321627
Jan 26 at 23:32




$begingroup$
Look at the Law of Large Numbers.
$endgroup$
– user321627
Jan 26 at 23:32










1 Answer
1






active

oldest

votes


















1












$begingroup$

First you need to state clearly what is the empirical distribution $F_n$. This is the function $F_ncolon mathbb R longrightarrow [0,1]$ such that
$$F_n(t)=frac1ncdot #{kcolon x_kle t},$$
that is the number of observations smaller or equal than $t$ divided by the total number of observations. Note that since you can't predict the values of each $X_k$ precisely, before taking a sample the value $F_n(t)$ is a random variable (in fact, you get a different random variable for each value of $t$ considered).



The trick here is to define the variables $U_1,U_2,ldots,U_n$ as
$$U_k=left{begin{matrix}1& X_kle t\0 & X_k>t\end{matrix}right.,$$
which are Bernoulli variables of parameter
$$p=P(X_kle t)=F_X(t),$$
for every $k$; in particular, they are identically distributed and have a finite mean and variance.



Since these variables are also clearly independent, the (weak) Law of Large Numbers apply, which means that
$$frac1n sum_{k=1}^n U_kto E(U_k)$$
(as convergence in probability). Check out that the left hand side is actually the same as the empirical distribution at $t$, and by a property of the Bernoulli distribution, the right hand side is the actual distribution of $X$ at $t$. That is,
$$F_n(t)to F_X(t)$$
(in probability). And this happens for any $t$. So in a sense, for bigger and bigger $n$, the empirical distribution values tend to approximate the actual distribution values, for each point $tinmathbb R$.






share|cite|improve this answer











$endgroup$













  • $begingroup$
    thank you a lot for the answer and the long explanation, now it's very clear, have a nice day!
    $endgroup$
    – Luke Marci
    Jan 27 at 8:26











Your Answer





StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3088854%2fexplain-why-the-empirical-distribution-function-f-n-is-a-reasonable-approximat%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1












$begingroup$

First you need to state clearly what is the empirical distribution $F_n$. This is the function $F_ncolon mathbb R longrightarrow [0,1]$ such that
$$F_n(t)=frac1ncdot #{kcolon x_kle t},$$
that is the number of observations smaller or equal than $t$ divided by the total number of observations. Note that since you can't predict the values of each $X_k$ precisely, before taking a sample the value $F_n(t)$ is a random variable (in fact, you get a different random variable for each value of $t$ considered).



The trick here is to define the variables $U_1,U_2,ldots,U_n$ as
$$U_k=left{begin{matrix}1& X_kle t\0 & X_k>t\end{matrix}right.,$$
which are Bernoulli variables of parameter
$$p=P(X_kle t)=F_X(t),$$
for every $k$; in particular, they are identically distributed and have a finite mean and variance.



Since these variables are also clearly independent, the (weak) Law of Large Numbers apply, which means that
$$frac1n sum_{k=1}^n U_kto E(U_k)$$
(as convergence in probability). Check out that the left hand side is actually the same as the empirical distribution at $t$, and by a property of the Bernoulli distribution, the right hand side is the actual distribution of $X$ at $t$. That is,
$$F_n(t)to F_X(t)$$
(in probability). And this happens for any $t$. So in a sense, for bigger and bigger $n$, the empirical distribution values tend to approximate the actual distribution values, for each point $tinmathbb R$.






share|cite|improve this answer











$endgroup$













  • $begingroup$
    thank you a lot for the answer and the long explanation, now it's very clear, have a nice day!
    $endgroup$
    – Luke Marci
    Jan 27 at 8:26
















1












$begingroup$

First you need to state clearly what is the empirical distribution $F_n$. This is the function $F_ncolon mathbb R longrightarrow [0,1]$ such that
$$F_n(t)=frac1ncdot #{kcolon x_kle t},$$
that is the number of observations smaller or equal than $t$ divided by the total number of observations. Note that since you can't predict the values of each $X_k$ precisely, before taking a sample the value $F_n(t)$ is a random variable (in fact, you get a different random variable for each value of $t$ considered).



The trick here is to define the variables $U_1,U_2,ldots,U_n$ as
$$U_k=left{begin{matrix}1& X_kle t\0 & X_k>t\end{matrix}right.,$$
which are Bernoulli variables of parameter
$$p=P(X_kle t)=F_X(t),$$
for every $k$; in particular, they are identically distributed and have a finite mean and variance.



Since these variables are also clearly independent, the (weak) Law of Large Numbers apply, which means that
$$frac1n sum_{k=1}^n U_kto E(U_k)$$
(as convergence in probability). Check out that the left hand side is actually the same as the empirical distribution at $t$, and by a property of the Bernoulli distribution, the right hand side is the actual distribution of $X$ at $t$. That is,
$$F_n(t)to F_X(t)$$
(in probability). And this happens for any $t$. So in a sense, for bigger and bigger $n$, the empirical distribution values tend to approximate the actual distribution values, for each point $tinmathbb R$.






share|cite|improve this answer











$endgroup$













  • $begingroup$
    thank you a lot for the answer and the long explanation, now it's very clear, have a nice day!
    $endgroup$
    – Luke Marci
    Jan 27 at 8:26














1












1








1





$begingroup$

First you need to state clearly what is the empirical distribution $F_n$. This is the function $F_ncolon mathbb R longrightarrow [0,1]$ such that
$$F_n(t)=frac1ncdot #{kcolon x_kle t},$$
that is the number of observations smaller or equal than $t$ divided by the total number of observations. Note that since you can't predict the values of each $X_k$ precisely, before taking a sample the value $F_n(t)$ is a random variable (in fact, you get a different random variable for each value of $t$ considered).



The trick here is to define the variables $U_1,U_2,ldots,U_n$ as
$$U_k=left{begin{matrix}1& X_kle t\0 & X_k>t\end{matrix}right.,$$
which are Bernoulli variables of parameter
$$p=P(X_kle t)=F_X(t),$$
for every $k$; in particular, they are identically distributed and have a finite mean and variance.



Since these variables are also clearly independent, the (weak) Law of Large Numbers apply, which means that
$$frac1n sum_{k=1}^n U_kto E(U_k)$$
(as convergence in probability). Check out that the left hand side is actually the same as the empirical distribution at $t$, and by a property of the Bernoulli distribution, the right hand side is the actual distribution of $X$ at $t$. That is,
$$F_n(t)to F_X(t)$$
(in probability). And this happens for any $t$. So in a sense, for bigger and bigger $n$, the empirical distribution values tend to approximate the actual distribution values, for each point $tinmathbb R$.






share|cite|improve this answer











$endgroup$



First you need to state clearly what is the empirical distribution $F_n$. This is the function $F_ncolon mathbb R longrightarrow [0,1]$ such that
$$F_n(t)=frac1ncdot #{kcolon x_kle t},$$
that is the number of observations smaller or equal than $t$ divided by the total number of observations. Note that since you can't predict the values of each $X_k$ precisely, before taking a sample the value $F_n(t)$ is a random variable (in fact, you get a different random variable for each value of $t$ considered).



The trick here is to define the variables $U_1,U_2,ldots,U_n$ as
$$U_k=left{begin{matrix}1& X_kle t\0 & X_k>t\end{matrix}right.,$$
which are Bernoulli variables of parameter
$$p=P(X_kle t)=F_X(t),$$
for every $k$; in particular, they are identically distributed and have a finite mean and variance.



Since these variables are also clearly independent, the (weak) Law of Large Numbers apply, which means that
$$frac1n sum_{k=1}^n U_kto E(U_k)$$
(as convergence in probability). Check out that the left hand side is actually the same as the empirical distribution at $t$, and by a property of the Bernoulli distribution, the right hand side is the actual distribution of $X$ at $t$. That is,
$$F_n(t)to F_X(t)$$
(in probability). And this happens for any $t$. So in a sense, for bigger and bigger $n$, the empirical distribution values tend to approximate the actual distribution values, for each point $tinmathbb R$.







share|cite|improve this answer














share|cite|improve this answer



share|cite|improve this answer








edited Jan 27 at 9:01

























answered Jan 27 at 5:53









Alejandro Nasif SalumAlejandro Nasif Salum

4,765118




4,765118












  • $begingroup$
    thank you a lot for the answer and the long explanation, now it's very clear, have a nice day!
    $endgroup$
    – Luke Marci
    Jan 27 at 8:26


















  • $begingroup$
    thank you a lot for the answer and the long explanation, now it's very clear, have a nice day!
    $endgroup$
    – Luke Marci
    Jan 27 at 8:26
















$begingroup$
thank you a lot for the answer and the long explanation, now it's very clear, have a nice day!
$endgroup$
– Luke Marci
Jan 27 at 8:26




$begingroup$
thank you a lot for the answer and the long explanation, now it's very clear, have a nice day!
$endgroup$
– Luke Marci
Jan 27 at 8:26


















draft saved

draft discarded




















































Thanks for contributing an answer to Mathematics Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3088854%2fexplain-why-the-empirical-distribution-function-f-n-is-a-reasonable-approximat%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

MongoDB - Not Authorized To Execute Command

Npm cannot find a required file even through it is in the searched directory

in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith