Why uppercase for $X$ and lowercase for $y$?

Why is it most of the time (in many websites, articles or demonstration) the feature variable (columns) is denoted by a upper-case 'X' whereas the target variable is a lower-case 'y'?

Looks more like a coding standard to me.
Ex.

X = df.iloc[:, :-1]

y = df.iloc[:, -1]

Just curious because I hardly ever use just a single letter to represent a variable storing meaningful data.

edited Feb 19 at 12:20

amoeba

61.5k15206266

asked Jan 27 at 15:39

ranit.b

286

5

$begingroup$
Consistency with linear algebra notation, I guess? The features usually form a matrix (typically denoted by uppercase) whereas the labels are usually 1d, forming a column vector (typically denoted by lowercase).
$endgroup$
– galoosh33
Jan 27 at 15:49

1

$begingroup$
"I hardly ever use just a single letter to represent a variable storing meaningful data." - the problem is that in common math typesetting, product signs are not written. Thus, it is unclear whether an expression $pi$ represents a single variable called "pi", or the product of two separate variables "p" and "i", $pi=ptimes i$. To avoid this confusion, math-heavy disciplines very rarely use variables containing multiple letters. (When you implement an algorithm, yes, it is very good practice to replace single-letter variables by multi-letter ones, if only for easier search&replace.)
$endgroup$
– Stephan Kolassa
Jan 27 at 21:21

$begingroup$
@StephanKolassa But in programming, a variable named pi does never mean a product of p and i (of course, unless you write pi = p * i or similarly). Even in the languages that allow juxtaposition for product they will be spaced out (p i, for example). (I personally think that allowing omission of the product symbol was of the strongest blunders in math notation history, since it restricts the variable name to one letter, so more or less meaningful designations are not possible without using indices; and the 1-letter names quickly run out, which forces us to use Greek, Gothic etc.)
$endgroup$
– trolley813
Jan 28 at 7:58

1

$begingroup$
@trolley813: the OP was explicitly asking about "websites , articles or demonstration", not programming - but then proceeded to noting (correctly) that this is not the convention in programming. There is simply a category confusion here, which I pointed out.
$endgroup$
– Stephan Kolassa
Jan 28 at 9:02

add a comment |

Why is it most of the time (in many websites, articles or demonstration) the feature variable (columns) is denoted by a upper-case 'X' whereas the target variable is a lower-case 'y'?

Looks more like a coding standard to me.
Ex.

X = df.iloc[:, :-1]

y = df.iloc[:, -1]

Just curious because I hardly ever use just a single letter to represent a variable storing meaningful data.

edited Feb 19 at 12:20

amoeba

61.5k15206266

asked Jan 27 at 15:39

ranit.b

286

5

$begingroup$
Consistency with linear algebra notation, I guess? The features usually form a matrix (typically denoted by uppercase) whereas the labels are usually 1d, forming a column vector (typically denoted by lowercase).
$endgroup$
– galoosh33
Jan 27 at 15:49

1

$begingroup$
"I hardly ever use just a single letter to represent a variable storing meaningful data." - the problem is that in common math typesetting, product signs are not written. Thus, it is unclear whether an expression $pi$ represents a single variable called "pi", or the product of two separate variables "p" and "i", $pi=ptimes i$. To avoid this confusion, math-heavy disciplines very rarely use variables containing multiple letters. (When you implement an algorithm, yes, it is very good practice to replace single-letter variables by multi-letter ones, if only for easier search&replace.)
$endgroup$
– Stephan Kolassa
Jan 27 at 21:21

$begingroup$
@StephanKolassa But in programming, a variable named pi does never mean a product of p and i (of course, unless you write pi = p * i or similarly). Even in the languages that allow juxtaposition for product they will be spaced out (p i, for example). (I personally think that allowing omission of the product symbol was of the strongest blunders in math notation history, since it restricts the variable name to one letter, so more or less meaningful designations are not possible without using indices; and the 1-letter names quickly run out, which forces us to use Greek, Gothic etc.)
$endgroup$
– trolley813
Jan 28 at 7:58

1

$begingroup$
@trolley813: the OP was explicitly asking about "websites , articles or demonstration", not programming - but then proceeded to noting (correctly) that this is not the convention in programming. There is simply a category confusion here, which I pointed out.
$endgroup$
– Stephan Kolassa
Jan 28 at 9:02

add a comment |

Why is it most of the time (in many websites, articles or demonstration) the feature variable (columns) is denoted by a upper-case 'X' whereas the target variable is a lower-case 'y'?

Looks more like a coding standard to me.
Ex.

X = df.iloc[:, :-1]

y = df.iloc[:, -1]

Just curious because I hardly ever use just a single letter to represent a variable storing meaningful data.

edited Feb 19 at 12:20

amoeba

61.5k15206266

asked Jan 27 at 15:39

ranit.b

286

Why is it most of the time (in many websites, articles or demonstration) the feature variable (columns) is denoted by a upper-case 'X' whereas the target variable is a lower-case 'y'?

Looks more like a coding standard to me.
Ex.

X = df.iloc[:, :-1]

y = df.iloc[:, -1]

Just curious because I hardly ever use just a single letter to represent a variable storing meaningful data.

machine-learning classification python cross-validation scikit-learn

edited Feb 19 at 12:20

amoeba

61.5k15206266

asked Jan 27 at 15:39

ranit.b

286

edited Feb 19 at 12:20

amoeba

61.5k15206266

asked Jan 27 at 15:39

ranit.b

286

edited Feb 19 at 12:20

amoeba

61.5k15206266

edited Feb 19 at 12:20

amoeba

61.5k15206266

edited Feb 19 at 12:20

amoeba

61.5k15206266

asked Jan 27 at 15:39

ranit.b

286

asked Jan 27 at 15:39

ranit.b

286

asked Jan 27 at 15:39

ranit.b

286

5

$begingroup$
Consistency with linear algebra notation, I guess? The features usually form a matrix (typically denoted by uppercase) whereas the labels are usually 1d, forming a column vector (typically denoted by lowercase).
$endgroup$
– galoosh33
Jan 27 at 15:49

1

$begingroup$
"I hardly ever use just a single letter to represent a variable storing meaningful data." - the problem is that in common math typesetting, product signs are not written. Thus, it is unclear whether an expression $pi$ represents a single variable called "pi", or the product of two separate variables "p" and "i", $pi=ptimes i$. To avoid this confusion, math-heavy disciplines very rarely use variables containing multiple letters. (When you implement an algorithm, yes, it is very good practice to replace single-letter variables by multi-letter ones, if only for easier search&replace.)
$endgroup$
– Stephan Kolassa
Jan 27 at 21:21

$begingroup$
@StephanKolassa But in programming, a variable named pi does never mean a product of p and i (of course, unless you write pi = p * i or similarly). Even in the languages that allow juxtaposition for product they will be spaced out (p i, for example). (I personally think that allowing omission of the product symbol was of the strongest blunders in math notation history, since it restricts the variable name to one letter, so more or less meaningful designations are not possible without using indices; and the 1-letter names quickly run out, which forces us to use Greek, Gothic etc.)
$endgroup$
– trolley813
Jan 28 at 7:58

1

$begingroup$
@trolley813: the OP was explicitly asking about "websites , articles or demonstration", not programming - but then proceeded to noting (correctly) that this is not the convention in programming. There is simply a category confusion here, which I pointed out.
$endgroup$
– Stephan Kolassa
Jan 28 at 9:02

add a comment |

5

$begingroup$
Consistency with linear algebra notation, I guess? The features usually form a matrix (typically denoted by uppercase) whereas the labels are usually 1d, forming a column vector (typically denoted by lowercase).
$endgroup$
– galoosh33
Jan 27 at 15:49

1

$begingroup$
"I hardly ever use just a single letter to represent a variable storing meaningful data." - the problem is that in common math typesetting, product signs are not written. Thus, it is unclear whether an expression $pi$ represents a single variable called "pi", or the product of two separate variables "p" and "i", $pi=ptimes i$. To avoid this confusion, math-heavy disciplines very rarely use variables containing multiple letters. (When you implement an algorithm, yes, it is very good practice to replace single-letter variables by multi-letter ones, if only for easier search&replace.)
$endgroup$
– Stephan Kolassa
Jan 27 at 21:21

$begingroup$
@StephanKolassa But in programming, a variable named pi does never mean a product of p and i (of course, unless you write pi = p * i or similarly). Even in the languages that allow juxtaposition for product they will be spaced out (p i, for example). (I personally think that allowing omission of the product symbol was of the strongest blunders in math notation history, since it restricts the variable name to one letter, so more or less meaningful designations are not possible without using indices; and the 1-letter names quickly run out, which forces us to use Greek, Gothic etc.)
$endgroup$
– trolley813
Jan 28 at 7:58

1

$begingroup$
@trolley813: the OP was explicitly asking about "websites , articles or demonstration", not programming - but then proceeded to noting (correctly) that this is not the convention in programming. There is simply a category confusion here, which I pointed out.
$endgroup$
– Stephan Kolassa
Jan 28 at 9:02

Consistency with linear algebra notation, I guess? The features usually form a matrix (typically denoted by uppercase) whereas the labels are usually 1d, forming a column vector (typically denoted by lowercase).

– galoosh33
Jan 27 at 15:49

"I hardly ever use just a single letter to represent a variable storing meaningful data." - the problem is that in common math typesetting, product signs are not written. Thus, it is unclear whether an expression $pi$ represents a single variable called "pi", or the product of two separate variables "p" and "i", $pi=ptimes i$. To avoid this confusion, math-heavy disciplines very rarely use variables containing multiple letters. (When you implement an algorithm, yes, it is very good practice to replace single-letter variables by multi-letter ones, if only for easier search&replace.)

– Stephan Kolassa
Jan 27 at 21:21

@StephanKolassa But in programming, a variable named pi does never mean a product of p and i (of course, unless you write pi = p * i or similarly). Even in the languages that allow juxtaposition for product they will be spaced out (p i, for example). (I personally think that allowing omission of the product symbol was of the strongest blunders in math notation history, since it restricts the variable name to one letter, so more or less meaningful designations are not possible without using indices; and the 1-letter names quickly run out, which forces us to use Greek, Gothic etc.)

– trolley813
Jan 28 at 7:58

@trolley813: the OP was explicitly asking about "websites , articles or demonstration", not programming - but then proceeded to noting (correctly) that this is not the convention in programming. There is simply a category confusion here, which I pointed out.

– Stephan Kolassa
Jan 28 at 9:02

add a comment |

2 Answers
2

active

oldest

votes

The question about why $X$ and $y$ are popular choices in mathematical notions has been answered in the History of Science and Mathematics SE website: Why are X and Y commonly used as mathematical placeholders? (In short: cause Descartes said so!)

In terms of Linear Algebra, it is extremely common to use capital Latin letters for matrices (e.g. design matrix $X$) and lowercase Latin letters for vectors (response vector $y$). Standard textbooks on the use of matrices in Statistics (e.g. Matrix Algebra Useful for Statistics by Searle, Matrix Algebra From a Statistician's Perspective by Harville and Matrix Algebra: Theory, Computations, and Applications in Statistics by Gentle) utilise this convention too, so it has become a standard way to denote things.

edited Jan 27 at 20:41

answered Jan 27 at 17:51

usεr11852

19.5k14275

add a comment |

Before you collect any data values on the feature and target variables, these variables can be considered to be random variables provided a random mechanism will be used to select the subjects who will generate these values. In that case, the correct notation for these variables is Y and X (i.e., upper case letters for both).

Recall that the value of a random variable is unknown prior to collecting the data, though its behaviour in the long run can be predicted using probability laws. However, once we collect the data, that value becomes known.

After you collect all desired data values on the feature and target variables, you can use the lower case notation to denote the collection of data values corresponding to the target variable (y) and the feature variables (x). If you have a single feature variable, x is a vector of data values. If you have multiple feature variables, x is a matrix of data values, having one column per feature variable. Usually, y is a vector of data values.

So the upper case notation refers to "random (hence unknown)", while the lower case notation refers to "known". Alternatively, the upper case notation refers to "before collecting the data", while the lower case notation refers to "after collecting the data".

Sadly, the literature is not at all consistent in the use of this notation, which is why you see the (y,X) notation you mention in your question.

answered Jan 27 at 17:27

Isabella Ghement

7,558422

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "65"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f389395%2fwhy-uppercase-for-x-and-lowercase-for-y%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

edited Jan 27 at 20:41

answered Jan 27 at 17:51

usεr11852

19.5k14275

add a comment |

edited Jan 27 at 20:41

answered Jan 27 at 17:51

usεr11852

19.5k14275

add a comment |

edited Jan 27 at 20:41

answered Jan 27 at 17:51

usεr11852

19.5k14275

edited Jan 27 at 20:41

answered Jan 27 at 17:51

usεr11852

19.5k14275

edited Jan 27 at 20:41

answered Jan 27 at 17:51

usεr11852

19.5k14275

answered Jan 27 at 17:51

usεr11852

19.5k14275

answered Jan 27 at 17:51

usεr11852

19.5k14275

add a comment |

Sadly, the literature is not at all consistent in the use of this notation, which is why you see the (y,X) notation you mention in your question.

answered Jan 27 at 17:27

Isabella Ghement

7,558422

add a comment |

Sadly, the literature is not at all consistent in the use of this notation, which is why you see the (y,X) notation you mention in your question.

answered Jan 27 at 17:27

Isabella Ghement

7,558422

add a comment |

Sadly, the literature is not at all consistent in the use of this notation, which is why you see the (y,X) notation you mention in your question.

answered Jan 27 at 17:27

Isabella Ghement

7,558422

Sadly, the literature is not at all consistent in the use of this notation, which is why you see the (y,X) notation you mention in your question.

answered Jan 27 at 17:27

Isabella Ghement

7,558422

answered Jan 27 at 17:27

Isabella Ghement

7,558422

answered Jan 27 at 17:27

Isabella Ghement

7,558422

answered Jan 27 at 17:27

Isabella Ghement

7,558422

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Cross Validated!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu