What preprocessing.scale() do? How does it work?

Python 3.5, preprocessing from sklearn

df = quandl.get('WIKI/GOOGL')

X = np.array(df)

X = preprocessing.scale(X)

edited Feb 19 '17 at 8:41

Chris Martin

23.6k450106

asked Feb 19 '17 at 8:39

0x Tps

3218

Have you looked at the documentation?
– Chris Martin
Feb 19 '17 at 8:42

yeah but I can't understand what it is doing to the values of X ?
– 0x Tps
Feb 19 '17 at 9:04

1

I beleive it subtracts the mean and divides by the standard deviation of your dataset along a given axis.
– pbreach
Feb 19 '17 at 9:22

here is another link this can help.
– Ganesh_
Sep 23 '17 at 16:04

add a comment |

Python 3.5, preprocessing from sklearn

df = quandl.get('WIKI/GOOGL')

X = np.array(df)

X = preprocessing.scale(X)

edited Feb 19 '17 at 8:41

Chris Martin

23.6k450106

asked Feb 19 '17 at 8:39

0x Tps

3218

Have you looked at the documentation?
– Chris Martin
Feb 19 '17 at 8:42

yeah but I can't understand what it is doing to the values of X ?
– 0x Tps
Feb 19 '17 at 9:04

1

I beleive it subtracts the mean and divides by the standard deviation of your dataset along a given axis.
– pbreach
Feb 19 '17 at 9:22

here is another link this can help.
– Ganesh_
Sep 23 '17 at 16:04

add a comment |

Python 3.5, preprocessing from sklearn

df = quandl.get('WIKI/GOOGL')

X = np.array(df)

X = preprocessing.scale(X)

edited Feb 19 '17 at 8:41

Chris Martin

23.6k450106

asked Feb 19 '17 at 8:39

0x Tps

3218

Python 3.5, preprocessing from sklearn

df = quandl.get('WIKI/GOOGL')

X = np.array(df)

X = preprocessing.scale(X)

python python-3.x machine-learning scikit-learn

edited Feb 19 '17 at 8:41

Chris Martin

23.6k450106

asked Feb 19 '17 at 8:39

0x Tps

3218

edited Feb 19 '17 at 8:41

Chris Martin

23.6k450106

asked Feb 19 '17 at 8:39

0x Tps

3218

edited Feb 19 '17 at 8:41

Chris Martin

23.6k450106

edited Feb 19 '17 at 8:41

Chris Martin

23.6k450106

edited Feb 19 '17 at 8:41

Chris Martin

23.6k450106

asked Feb 19 '17 at 8:39

0x Tps

3218

asked Feb 19 '17 at 8:39

0x Tps

3218

asked Feb 19 '17 at 8:39

0x Tps

3218

Have you looked at the documentation?
– Chris Martin
Feb 19 '17 at 8:42

yeah but I can't understand what it is doing to the values of X ?
– 0x Tps
Feb 19 '17 at 9:04

1

I beleive it subtracts the mean and divides by the standard deviation of your dataset along a given axis.
– pbreach
Feb 19 '17 at 9:22

here is another link this can help.
– Ganesh_
Sep 23 '17 at 16:04

add a comment |

Have you looked at the documentation?
– Chris Martin
Feb 19 '17 at 8:42

yeah but I can't understand what it is doing to the values of X ?
– 0x Tps
Feb 19 '17 at 9:04

1

I beleive it subtracts the mean and divides by the standard deviation of your dataset along a given axis.
– pbreach
Feb 19 '17 at 9:22

here is another link this can help.
– Ganesh_
Sep 23 '17 at 16:04

Have you looked at the documentation?
– Chris Martin
Feb 19 '17 at 8:42

yeah but I can't understand what it is doing to the values of X ?
– 0x Tps
Feb 19 '17 at 9:04

I beleive it subtracts the mean and divides by the standard deviation of your dataset along a given axis.
– pbreach
Feb 19 '17 at 9:22

here is another link this can help.
– Ganesh_
Sep 23 '17 at 16:04

add a comment |

2 Answers
2

active

oldest

votes

The preprocessing.scale() algorithm puts your data on one scale. This is helpful with largely sparse datasets. In simple words, your data is vastly spread out. For example the values of X maybe like so:

X = [1, 4, 400, 10000, 100000]

The issue with sparsity is that it very biased or in statistical terms skewed. So, therefore, scaling the data brings all your values onto one scale eliminating the sparsity. In regards to know how it works in mathematical detail, this follows the same concept of Normalization and Standardization. You can do research on those to find out how it works in detail. But to make life simpler the sklearn algorithm does everything for you !

edited Feb 19 '17 at 20:51

answered Feb 19 '17 at 20:45

Deepak M

3111415

After scaling this data will still be skewed. It will just be a lot closer to zero. Also an array of numbers cannot be biased unless there is some ground truth this is trying to represent.
– Richard Rast
Dec 4 '18 at 18:20

add a comment |

Scaling the data brings all your values onto one scale eliminating the sparsity and it follows the same concept of Normalization and Standardization.
To see the effect, you can call describe on the dataframe before and after processing:

df.describe()



#with X is already pre-proccessed 

df2 = pandas.DataFrame(X)

df2.describe()

You will see df2 has 0 mean and the standard variation of 1 in each field.

answered Nov 19 '18 at 21:05

T D Nguyen

3,28222347

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f42325212%2fwhat-preprocessing-scale-do-how-does-it-work%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

X = [1, 4, 400, 10000, 100000]

edited Feb 19 '17 at 20:51

answered Feb 19 '17 at 20:45

Deepak M

3111415

After scaling this data will still be skewed. It will just be a lot closer to zero. Also an array of numbers cannot be biased unless there is some ground truth this is trying to represent.
– Richard Rast
Dec 4 '18 at 18:20

add a comment |

X = [1, 4, 400, 10000, 100000]

edited Feb 19 '17 at 20:51

answered Feb 19 '17 at 20:45

Deepak M

3111415

After scaling this data will still be skewed. It will just be a lot closer to zero. Also an array of numbers cannot be biased unless there is some ground truth this is trying to represent.
– Richard Rast
Dec 4 '18 at 18:20

add a comment |

X = [1, 4, 400, 10000, 100000]

edited Feb 19 '17 at 20:51

answered Feb 19 '17 at 20:45

Deepak M

3111415

X = [1, 4, 400, 10000, 100000]

edited Feb 19 '17 at 20:51

answered Feb 19 '17 at 20:45

Deepak M

3111415

edited Feb 19 '17 at 20:51

answered Feb 19 '17 at 20:45

Deepak M

3111415

answered Feb 19 '17 at 20:45

Deepak M

3111415

answered Feb 19 '17 at 20:45

Deepak M

3111415

After scaling this data will still be skewed. It will just be a lot closer to zero. Also an array of numbers cannot be biased unless there is some ground truth this is trying to represent.
– Richard Rast
Dec 4 '18 at 18:20

add a comment |

After scaling this data will still be skewed. It will just be a lot closer to zero. Also an array of numbers cannot be biased unless there is some ground truth this is trying to represent.
– Richard Rast
Dec 4 '18 at 18:20

After scaling this data will still be skewed. It will just be a lot closer to zero. Also an array of numbers cannot be biased unless there is some ground truth this is trying to represent.
– Richard Rast
Dec 4 '18 at 18:20

add a comment |

df.describe()



#with X is already pre-proccessed 

df2 = pandas.DataFrame(X)

df2.describe()

You will see df2 has 0 mean and the standard variation of 1 in each field.

answered Nov 19 '18 at 21:05

T D Nguyen

3,28222347

add a comment |

df.describe()



#with X is already pre-proccessed 

df2 = pandas.DataFrame(X)

df2.describe()

You will see df2 has 0 mean and the standard variation of 1 in each field.

answered Nov 19 '18 at 21:05

T D Nguyen

3,28222347

add a comment |

df.describe()



#with X is already pre-proccessed 

df2 = pandas.DataFrame(X)

df2.describe()

You will see df2 has 0 mean and the standard variation of 1 in each field.

answered Nov 19 '18 at 21:05

T D Nguyen

3,28222347

df.describe()



#with X is already pre-proccessed 

df2 = pandas.DataFrame(X)

df2.describe()

You will see df2 has 0 mean and the standard variation of 1 in each field.

answered Nov 19 '18 at 21:05

T D Nguyen

3,28222347

answered Nov 19 '18 at 21:05

T D Nguyen

3,28222347

answered Nov 19 '18 at 21:05

T D Nguyen

3,28222347

answered Nov 19 '18 at 21:05

T D Nguyen

3,28222347

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu