Given some statistical measures, reconstruct a list of numbers
Suppose I have a list of numbers $(y_1, y_2, y_3, dots, y_N)$ with these properties:
$$ sum_{i=1}^{N}y_i = 13, 776, 663, $$
$$ bar{y} = dfrac{1}{N} sum_{i=1}^{N}y_i = 17,135, $$
$$ s^2 = dfrac{1}{N-1} sum_{i=1}^{N}(y_i - bar{y})^2 = 139,147^2. $$
That list has these numbers:
- The lowest is $19$.
- The $5$th percentile is $336$
- The $25$th percentile is $800$
- The median is $1,668$
- The $75$th percentile is $5,050$
- The $95$th percentile is $30,295$
- The highest is $2,627,319$
These percentiles give you some idea about the distribution of the numbers. I can construct a list that has that mean and standard deviation, it doesn't matter if any $y_i$ is less than zero or if it does not follow the described distribution. The problem I face is to construct a list with mean $bar y$ and standard deviation $s$ subject to the condition that every $y_i$ has to be greater than zero (it doesn't have to follow that distribution and it doesn't have to have those numbers).
So I am looking for a way to do that. If anybody has any ideas about this, I'm happy to hear them!
statistics standard-deviation means
add a comment |
Suppose I have a list of numbers $(y_1, y_2, y_3, dots, y_N)$ with these properties:
$$ sum_{i=1}^{N}y_i = 13, 776, 663, $$
$$ bar{y} = dfrac{1}{N} sum_{i=1}^{N}y_i = 17,135, $$
$$ s^2 = dfrac{1}{N-1} sum_{i=1}^{N}(y_i - bar{y})^2 = 139,147^2. $$
That list has these numbers:
- The lowest is $19$.
- The $5$th percentile is $336$
- The $25$th percentile is $800$
- The median is $1,668$
- The $75$th percentile is $5,050$
- The $95$th percentile is $30,295$
- The highest is $2,627,319$
These percentiles give you some idea about the distribution of the numbers. I can construct a list that has that mean and standard deviation, it doesn't matter if any $y_i$ is less than zero or if it does not follow the described distribution. The problem I face is to construct a list with mean $bar y$ and standard deviation $s$ subject to the condition that every $y_i$ has to be greater than zero (it doesn't have to follow that distribution and it doesn't have to have those numbers).
So I am looking for a way to do that. If anybody has any ideas about this, I'm happy to hear them!
statistics standard-deviation means
add a comment |
Suppose I have a list of numbers $(y_1, y_2, y_3, dots, y_N)$ with these properties:
$$ sum_{i=1}^{N}y_i = 13, 776, 663, $$
$$ bar{y} = dfrac{1}{N} sum_{i=1}^{N}y_i = 17,135, $$
$$ s^2 = dfrac{1}{N-1} sum_{i=1}^{N}(y_i - bar{y})^2 = 139,147^2. $$
That list has these numbers:
- The lowest is $19$.
- The $5$th percentile is $336$
- The $25$th percentile is $800$
- The median is $1,668$
- The $75$th percentile is $5,050$
- The $95$th percentile is $30,295$
- The highest is $2,627,319$
These percentiles give you some idea about the distribution of the numbers. I can construct a list that has that mean and standard deviation, it doesn't matter if any $y_i$ is less than zero or if it does not follow the described distribution. The problem I face is to construct a list with mean $bar y$ and standard deviation $s$ subject to the condition that every $y_i$ has to be greater than zero (it doesn't have to follow that distribution and it doesn't have to have those numbers).
So I am looking for a way to do that. If anybody has any ideas about this, I'm happy to hear them!
statistics standard-deviation means
Suppose I have a list of numbers $(y_1, y_2, y_3, dots, y_N)$ with these properties:
$$ sum_{i=1}^{N}y_i = 13, 776, 663, $$
$$ bar{y} = dfrac{1}{N} sum_{i=1}^{N}y_i = 17,135, $$
$$ s^2 = dfrac{1}{N-1} sum_{i=1}^{N}(y_i - bar{y})^2 = 139,147^2. $$
That list has these numbers:
- The lowest is $19$.
- The $5$th percentile is $336$
- The $25$th percentile is $800$
- The median is $1,668$
- The $75$th percentile is $5,050$
- The $95$th percentile is $30,295$
- The highest is $2,627,319$
These percentiles give you some idea about the distribution of the numbers. I can construct a list that has that mean and standard deviation, it doesn't matter if any $y_i$ is less than zero or if it does not follow the described distribution. The problem I face is to construct a list with mean $bar y$ and standard deviation $s$ subject to the condition that every $y_i$ has to be greater than zero (it doesn't have to follow that distribution and it doesn't have to have those numbers).
So I am looking for a way to do that. If anybody has any ideas about this, I'm happy to hear them!
statistics standard-deviation means
statistics standard-deviation means
asked Nov 21 '18 at 3:13
David
784410
784410
add a comment |
add a comment |
3 Answers
3
active
oldest
votes
The solution is probably not unique, and you would want to do it numerically. I would use the approach found in Datasaurus dataset.
The first step is to find $N$. From the first two equations you get $Napprox804$. Since $N$ is not exactly an integer, the first indication that I have that these numbers are just an approximation. The last equation gives you $bar{y^2}$. Now choose $y_1=19$ and $y_{408}=2627319$. You can now recalculate $bar y$ and $bar{y^2}$ without those values. Put $203$ values on the median and the other $203$ remaining at a value such that the average (or the sum) is your desired value. Obviously, $bar{y^2}$ is going to be wrong. Move one value from the median down, somewhere in the lower 5th percentile. To get the same average, you must move at least one value from the higher dataset upward. Check if moving one value or moving two values higher will improve your $bar{y^2}$. You need to repeat this procedure until all your conditions are met.
Follow the link to the code on AutoDesk Research.
– Andrei
Nov 21 '18 at 3:53
I like this approach. If I succeed, I will mark your answer as the correct answer (of course, if there is no better answer).
– David
Nov 21 '18 at 3:58
add a comment |
So definitely every number is positive, because the lowest is 19.
This seems tractible to me, assuming it's possible. My recommendation is to simply start with an arbitrary list satisfying the bottom list of conditions. These can be thought of as fixed "milestones". Then simply move the other numbers around until you satisfy the mean and standard deviation.
By moving different elements (ie the largest elements, smallest elements, or ones in the middle) to move around), and moving them up versus down, you can increase or decrease the mean and standard deviation as necessary. With some thought (or some experimentation), you'll be able to figure out what to do from here.
add a comment |
Ok, I managed to find an answer to my question. I wanted to do it numerically and I used Python. Here is the code:
import statistics as stat
import random
import sys
from scipy.optimize import fsolve, root
import matplotlib.pyplot as plt
random.seed(210)
N = 804
y_mean = 17_135
y_sd = 139_147
median = 1668
lowest = 19
highest_95 = 30295
l_nums =
rango1 = range(lowest, median)
rango2 = range(median, highest_95)
for _ in range(N // 2):
numero = random.choice(rango1)
l_nums.append(numero)
for _ in range(N // 2 - 3):
numero = random.choice(rango2)
l_nums.append(numero)
l_nums.append(2627319)
print(len(l_nums))
print(stat.mean(l_nums), stat.stdev(l_nums))
#sys.exit('!')
def equations(x):
a = sum(l_nums)
b = sum(map(lambda x: (x - y_mean)**2, l_nums))
f = [a + x[0] + x[1] - y_mean * N,
b + (x[0] - y_mean)**2 + (x[1] - y_mean)**2 - y_sd**2 * (N - 1)]
return f
x_sol = root(equations, [5e8, 5e8], method='lm')
#print(x_sol)
print(x_sol.fun)
print(x_sol.x)
l_nums.extend(x_sol.x)
print(len(l_nums))
print(stat.mean(l_nums), stat.stdev(l_nums))
I explain my code. First, find $N$, in this case $N = 804$. Create two lists of numbers, one between $19$ and the median, the other between the median and $30295$. In Python
rango1 = range(lowest, median)
rango2 = range(median, highest_95)
From rango1
, draw $N/2$ numbers randomly and put them in a list. Then, from rango2
, draw $N/2 -3$ numbers randomly and add them to that list. Now you have a list with $801$ numbers. Good. As you can see, the highest number is $2,627,319$, add it.
l_nums.append(2627319)
To find the last two numbers, you have to solve two equations
$$frac{x+y+a}{N}=bar y,$$
$$dfrac{(x-bar y)^2+(y-bar y)^2+ b}{N-1}=s^2.$$
That is done with Scipy. In my case, I have to add the line random.seed(210)
in order to get the exact results, which depends on the operative system and the computer. Without that line, the results are close.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3007207%2fgiven-some-statistical-measures-reconstruct-a-list-of-numbers%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
The solution is probably not unique, and you would want to do it numerically. I would use the approach found in Datasaurus dataset.
The first step is to find $N$. From the first two equations you get $Napprox804$. Since $N$ is not exactly an integer, the first indication that I have that these numbers are just an approximation. The last equation gives you $bar{y^2}$. Now choose $y_1=19$ and $y_{408}=2627319$. You can now recalculate $bar y$ and $bar{y^2}$ without those values. Put $203$ values on the median and the other $203$ remaining at a value such that the average (or the sum) is your desired value. Obviously, $bar{y^2}$ is going to be wrong. Move one value from the median down, somewhere in the lower 5th percentile. To get the same average, you must move at least one value from the higher dataset upward. Check if moving one value or moving two values higher will improve your $bar{y^2}$. You need to repeat this procedure until all your conditions are met.
Follow the link to the code on AutoDesk Research.
– Andrei
Nov 21 '18 at 3:53
I like this approach. If I succeed, I will mark your answer as the correct answer (of course, if there is no better answer).
– David
Nov 21 '18 at 3:58
add a comment |
The solution is probably not unique, and you would want to do it numerically. I would use the approach found in Datasaurus dataset.
The first step is to find $N$. From the first two equations you get $Napprox804$. Since $N$ is not exactly an integer, the first indication that I have that these numbers are just an approximation. The last equation gives you $bar{y^2}$. Now choose $y_1=19$ and $y_{408}=2627319$. You can now recalculate $bar y$ and $bar{y^2}$ without those values. Put $203$ values on the median and the other $203$ remaining at a value such that the average (or the sum) is your desired value. Obviously, $bar{y^2}$ is going to be wrong. Move one value from the median down, somewhere in the lower 5th percentile. To get the same average, you must move at least one value from the higher dataset upward. Check if moving one value or moving two values higher will improve your $bar{y^2}$. You need to repeat this procedure until all your conditions are met.
Follow the link to the code on AutoDesk Research.
– Andrei
Nov 21 '18 at 3:53
I like this approach. If I succeed, I will mark your answer as the correct answer (of course, if there is no better answer).
– David
Nov 21 '18 at 3:58
add a comment |
The solution is probably not unique, and you would want to do it numerically. I would use the approach found in Datasaurus dataset.
The first step is to find $N$. From the first two equations you get $Napprox804$. Since $N$ is not exactly an integer, the first indication that I have that these numbers are just an approximation. The last equation gives you $bar{y^2}$. Now choose $y_1=19$ and $y_{408}=2627319$. You can now recalculate $bar y$ and $bar{y^2}$ without those values. Put $203$ values on the median and the other $203$ remaining at a value such that the average (or the sum) is your desired value. Obviously, $bar{y^2}$ is going to be wrong. Move one value from the median down, somewhere in the lower 5th percentile. To get the same average, you must move at least one value from the higher dataset upward. Check if moving one value or moving two values higher will improve your $bar{y^2}$. You need to repeat this procedure until all your conditions are met.
The solution is probably not unique, and you would want to do it numerically. I would use the approach found in Datasaurus dataset.
The first step is to find $N$. From the first two equations you get $Napprox804$. Since $N$ is not exactly an integer, the first indication that I have that these numbers are just an approximation. The last equation gives you $bar{y^2}$. Now choose $y_1=19$ and $y_{408}=2627319$. You can now recalculate $bar y$ and $bar{y^2}$ without those values. Put $203$ values on the median and the other $203$ remaining at a value such that the average (or the sum) is your desired value. Obviously, $bar{y^2}$ is going to be wrong. Move one value from the median down, somewhere in the lower 5th percentile. To get the same average, you must move at least one value from the higher dataset upward. Check if moving one value or moving two values higher will improve your $bar{y^2}$. You need to repeat this procedure until all your conditions are met.
answered Nov 21 '18 at 3:51
Andrei
11.3k21026
11.3k21026
Follow the link to the code on AutoDesk Research.
– Andrei
Nov 21 '18 at 3:53
I like this approach. If I succeed, I will mark your answer as the correct answer (of course, if there is no better answer).
– David
Nov 21 '18 at 3:58
add a comment |
Follow the link to the code on AutoDesk Research.
– Andrei
Nov 21 '18 at 3:53
I like this approach. If I succeed, I will mark your answer as the correct answer (of course, if there is no better answer).
– David
Nov 21 '18 at 3:58
Follow the link to the code on AutoDesk Research.
– Andrei
Nov 21 '18 at 3:53
Follow the link to the code on AutoDesk Research.
– Andrei
Nov 21 '18 at 3:53
I like this approach. If I succeed, I will mark your answer as the correct answer (of course, if there is no better answer).
– David
Nov 21 '18 at 3:58
I like this approach. If I succeed, I will mark your answer as the correct answer (of course, if there is no better answer).
– David
Nov 21 '18 at 3:58
add a comment |
So definitely every number is positive, because the lowest is 19.
This seems tractible to me, assuming it's possible. My recommendation is to simply start with an arbitrary list satisfying the bottom list of conditions. These can be thought of as fixed "milestones". Then simply move the other numbers around until you satisfy the mean and standard deviation.
By moving different elements (ie the largest elements, smallest elements, or ones in the middle) to move around), and moving them up versus down, you can increase or decrease the mean and standard deviation as necessary. With some thought (or some experimentation), you'll be able to figure out what to do from here.
add a comment |
So definitely every number is positive, because the lowest is 19.
This seems tractible to me, assuming it's possible. My recommendation is to simply start with an arbitrary list satisfying the bottom list of conditions. These can be thought of as fixed "milestones". Then simply move the other numbers around until you satisfy the mean and standard deviation.
By moving different elements (ie the largest elements, smallest elements, or ones in the middle) to move around), and moving them up versus down, you can increase or decrease the mean and standard deviation as necessary. With some thought (or some experimentation), you'll be able to figure out what to do from here.
add a comment |
So definitely every number is positive, because the lowest is 19.
This seems tractible to me, assuming it's possible. My recommendation is to simply start with an arbitrary list satisfying the bottom list of conditions. These can be thought of as fixed "milestones". Then simply move the other numbers around until you satisfy the mean and standard deviation.
By moving different elements (ie the largest elements, smallest elements, or ones in the middle) to move around), and moving them up versus down, you can increase or decrease the mean and standard deviation as necessary. With some thought (or some experimentation), you'll be able to figure out what to do from here.
So definitely every number is positive, because the lowest is 19.
This seems tractible to me, assuming it's possible. My recommendation is to simply start with an arbitrary list satisfying the bottom list of conditions. These can be thought of as fixed "milestones". Then simply move the other numbers around until you satisfy the mean and standard deviation.
By moving different elements (ie the largest elements, smallest elements, or ones in the middle) to move around), and moving them up versus down, you can increase or decrease the mean and standard deviation as necessary. With some thought (or some experimentation), you'll be able to figure out what to do from here.
answered Nov 21 '18 at 3:58
Nate 8
48426
48426
add a comment |
add a comment |
Ok, I managed to find an answer to my question. I wanted to do it numerically and I used Python. Here is the code:
import statistics as stat
import random
import sys
from scipy.optimize import fsolve, root
import matplotlib.pyplot as plt
random.seed(210)
N = 804
y_mean = 17_135
y_sd = 139_147
median = 1668
lowest = 19
highest_95 = 30295
l_nums =
rango1 = range(lowest, median)
rango2 = range(median, highest_95)
for _ in range(N // 2):
numero = random.choice(rango1)
l_nums.append(numero)
for _ in range(N // 2 - 3):
numero = random.choice(rango2)
l_nums.append(numero)
l_nums.append(2627319)
print(len(l_nums))
print(stat.mean(l_nums), stat.stdev(l_nums))
#sys.exit('!')
def equations(x):
a = sum(l_nums)
b = sum(map(lambda x: (x - y_mean)**2, l_nums))
f = [a + x[0] + x[1] - y_mean * N,
b + (x[0] - y_mean)**2 + (x[1] - y_mean)**2 - y_sd**2 * (N - 1)]
return f
x_sol = root(equations, [5e8, 5e8], method='lm')
#print(x_sol)
print(x_sol.fun)
print(x_sol.x)
l_nums.extend(x_sol.x)
print(len(l_nums))
print(stat.mean(l_nums), stat.stdev(l_nums))
I explain my code. First, find $N$, in this case $N = 804$. Create two lists of numbers, one between $19$ and the median, the other between the median and $30295$. In Python
rango1 = range(lowest, median)
rango2 = range(median, highest_95)
From rango1
, draw $N/2$ numbers randomly and put them in a list. Then, from rango2
, draw $N/2 -3$ numbers randomly and add them to that list. Now you have a list with $801$ numbers. Good. As you can see, the highest number is $2,627,319$, add it.
l_nums.append(2627319)
To find the last two numbers, you have to solve two equations
$$frac{x+y+a}{N}=bar y,$$
$$dfrac{(x-bar y)^2+(y-bar y)^2+ b}{N-1}=s^2.$$
That is done with Scipy. In my case, I have to add the line random.seed(210)
in order to get the exact results, which depends on the operative system and the computer. Without that line, the results are close.
add a comment |
Ok, I managed to find an answer to my question. I wanted to do it numerically and I used Python. Here is the code:
import statistics as stat
import random
import sys
from scipy.optimize import fsolve, root
import matplotlib.pyplot as plt
random.seed(210)
N = 804
y_mean = 17_135
y_sd = 139_147
median = 1668
lowest = 19
highest_95 = 30295
l_nums =
rango1 = range(lowest, median)
rango2 = range(median, highest_95)
for _ in range(N // 2):
numero = random.choice(rango1)
l_nums.append(numero)
for _ in range(N // 2 - 3):
numero = random.choice(rango2)
l_nums.append(numero)
l_nums.append(2627319)
print(len(l_nums))
print(stat.mean(l_nums), stat.stdev(l_nums))
#sys.exit('!')
def equations(x):
a = sum(l_nums)
b = sum(map(lambda x: (x - y_mean)**2, l_nums))
f = [a + x[0] + x[1] - y_mean * N,
b + (x[0] - y_mean)**2 + (x[1] - y_mean)**2 - y_sd**2 * (N - 1)]
return f
x_sol = root(equations, [5e8, 5e8], method='lm')
#print(x_sol)
print(x_sol.fun)
print(x_sol.x)
l_nums.extend(x_sol.x)
print(len(l_nums))
print(stat.mean(l_nums), stat.stdev(l_nums))
I explain my code. First, find $N$, in this case $N = 804$. Create two lists of numbers, one between $19$ and the median, the other between the median and $30295$. In Python
rango1 = range(lowest, median)
rango2 = range(median, highest_95)
From rango1
, draw $N/2$ numbers randomly and put them in a list. Then, from rango2
, draw $N/2 -3$ numbers randomly and add them to that list. Now you have a list with $801$ numbers. Good. As you can see, the highest number is $2,627,319$, add it.
l_nums.append(2627319)
To find the last two numbers, you have to solve two equations
$$frac{x+y+a}{N}=bar y,$$
$$dfrac{(x-bar y)^2+(y-bar y)^2+ b}{N-1}=s^2.$$
That is done with Scipy. In my case, I have to add the line random.seed(210)
in order to get the exact results, which depends on the operative system and the computer. Without that line, the results are close.
add a comment |
Ok, I managed to find an answer to my question. I wanted to do it numerically and I used Python. Here is the code:
import statistics as stat
import random
import sys
from scipy.optimize import fsolve, root
import matplotlib.pyplot as plt
random.seed(210)
N = 804
y_mean = 17_135
y_sd = 139_147
median = 1668
lowest = 19
highest_95 = 30295
l_nums =
rango1 = range(lowest, median)
rango2 = range(median, highest_95)
for _ in range(N // 2):
numero = random.choice(rango1)
l_nums.append(numero)
for _ in range(N // 2 - 3):
numero = random.choice(rango2)
l_nums.append(numero)
l_nums.append(2627319)
print(len(l_nums))
print(stat.mean(l_nums), stat.stdev(l_nums))
#sys.exit('!')
def equations(x):
a = sum(l_nums)
b = sum(map(lambda x: (x - y_mean)**2, l_nums))
f = [a + x[0] + x[1] - y_mean * N,
b + (x[0] - y_mean)**2 + (x[1] - y_mean)**2 - y_sd**2 * (N - 1)]
return f
x_sol = root(equations, [5e8, 5e8], method='lm')
#print(x_sol)
print(x_sol.fun)
print(x_sol.x)
l_nums.extend(x_sol.x)
print(len(l_nums))
print(stat.mean(l_nums), stat.stdev(l_nums))
I explain my code. First, find $N$, in this case $N = 804$. Create two lists of numbers, one between $19$ and the median, the other between the median and $30295$. In Python
rango1 = range(lowest, median)
rango2 = range(median, highest_95)
From rango1
, draw $N/2$ numbers randomly and put them in a list. Then, from rango2
, draw $N/2 -3$ numbers randomly and add them to that list. Now you have a list with $801$ numbers. Good. As you can see, the highest number is $2,627,319$, add it.
l_nums.append(2627319)
To find the last two numbers, you have to solve two equations
$$frac{x+y+a}{N}=bar y,$$
$$dfrac{(x-bar y)^2+(y-bar y)^2+ b}{N-1}=s^2.$$
That is done with Scipy. In my case, I have to add the line random.seed(210)
in order to get the exact results, which depends on the operative system and the computer. Without that line, the results are close.
Ok, I managed to find an answer to my question. I wanted to do it numerically and I used Python. Here is the code:
import statistics as stat
import random
import sys
from scipy.optimize import fsolve, root
import matplotlib.pyplot as plt
random.seed(210)
N = 804
y_mean = 17_135
y_sd = 139_147
median = 1668
lowest = 19
highest_95 = 30295
l_nums =
rango1 = range(lowest, median)
rango2 = range(median, highest_95)
for _ in range(N // 2):
numero = random.choice(rango1)
l_nums.append(numero)
for _ in range(N // 2 - 3):
numero = random.choice(rango2)
l_nums.append(numero)
l_nums.append(2627319)
print(len(l_nums))
print(stat.mean(l_nums), stat.stdev(l_nums))
#sys.exit('!')
def equations(x):
a = sum(l_nums)
b = sum(map(lambda x: (x - y_mean)**2, l_nums))
f = [a + x[0] + x[1] - y_mean * N,
b + (x[0] - y_mean)**2 + (x[1] - y_mean)**2 - y_sd**2 * (N - 1)]
return f
x_sol = root(equations, [5e8, 5e8], method='lm')
#print(x_sol)
print(x_sol.fun)
print(x_sol.x)
l_nums.extend(x_sol.x)
print(len(l_nums))
print(stat.mean(l_nums), stat.stdev(l_nums))
I explain my code. First, find $N$, in this case $N = 804$. Create two lists of numbers, one between $19$ and the median, the other between the median and $30295$. In Python
rango1 = range(lowest, median)
rango2 = range(median, highest_95)
From rango1
, draw $N/2$ numbers randomly and put them in a list. Then, from rango2
, draw $N/2 -3$ numbers randomly and add them to that list. Now you have a list with $801$ numbers. Good. As you can see, the highest number is $2,627,319$, add it.
l_nums.append(2627319)
To find the last two numbers, you have to solve two equations
$$frac{x+y+a}{N}=bar y,$$
$$dfrac{(x-bar y)^2+(y-bar y)^2+ b}{N-1}=s^2.$$
That is done with Scipy. In my case, I have to add the line random.seed(210)
in order to get the exact results, which depends on the operative system and the computer. Without that line, the results are close.
answered Nov 22 '18 at 1:31
David
784410
784410
add a comment |
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3007207%2fgiven-some-statistical-measures-reconstruct-a-list-of-numbers%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown