Why uppercase for $X$ and lowercase for $y$?












2












$begingroup$


Why is it most of the time (in many websites, articles or demonstration) the feature variable (columns) is denoted by a upper-case 'X' whereas the target variable is a lower-case 'y'?



Looks more like a coding standard to me.
Ex.



X = df.iloc[:, :-1]
y = df.iloc[:, -1]


Just curious because I hardly ever use just a single letter to represent a variable storing meaningful data.










share|cite|improve this question











$endgroup$








  • 5




    $begingroup$
    Consistency with linear algebra notation, I guess? The features usually form a matrix (typically denoted by uppercase) whereas the labels are usually 1d, forming a column vector (typically denoted by lowercase).
    $endgroup$
    – galoosh33
    Jan 27 at 15:49






  • 1




    $begingroup$
    "I hardly ever use just a single letter to represent a variable storing meaningful data." - the problem is that in common math typesetting, product signs are not written. Thus, it is unclear whether an expression $pi$ represents a single variable called "pi", or the product of two separate variables "p" and "i", $pi=ptimes i$. To avoid this confusion, math-heavy disciplines very rarely use variables containing multiple letters. (When you implement an algorithm, yes, it is very good practice to replace single-letter variables by multi-letter ones, if only for easier search&replace.)
    $endgroup$
    – Stephan Kolassa
    Jan 27 at 21:21










  • $begingroup$
    @StephanKolassa But in programming, a variable named pi does never mean a product of p and i (of course, unless you write pi = p * i or similarly). Even in the languages that allow juxtaposition for product they will be spaced out (p i, for example). (I personally think that allowing omission of the product symbol was of the strongest blunders in math notation history, since it restricts the variable name to one letter, so more or less meaningful designations are not possible without using indices; and the 1-letter names quickly run out, which forces us to use Greek, Gothic etc.)
    $endgroup$
    – trolley813
    Jan 28 at 7:58






  • 1




    $begingroup$
    @trolley813: the OP was explicitly asking about "websites , articles or demonstration", not programming - but then proceeded to noting (correctly) that this is not the convention in programming. There is simply a category confusion here, which I pointed out.
    $endgroup$
    – Stephan Kolassa
    Jan 28 at 9:02
















2












$begingroup$


Why is it most of the time (in many websites, articles or demonstration) the feature variable (columns) is denoted by a upper-case 'X' whereas the target variable is a lower-case 'y'?



Looks more like a coding standard to me.
Ex.



X = df.iloc[:, :-1]
y = df.iloc[:, -1]


Just curious because I hardly ever use just a single letter to represent a variable storing meaningful data.










share|cite|improve this question











$endgroup$








  • 5




    $begingroup$
    Consistency with linear algebra notation, I guess? The features usually form a matrix (typically denoted by uppercase) whereas the labels are usually 1d, forming a column vector (typically denoted by lowercase).
    $endgroup$
    – galoosh33
    Jan 27 at 15:49






  • 1




    $begingroup$
    "I hardly ever use just a single letter to represent a variable storing meaningful data." - the problem is that in common math typesetting, product signs are not written. Thus, it is unclear whether an expression $pi$ represents a single variable called "pi", or the product of two separate variables "p" and "i", $pi=ptimes i$. To avoid this confusion, math-heavy disciplines very rarely use variables containing multiple letters. (When you implement an algorithm, yes, it is very good practice to replace single-letter variables by multi-letter ones, if only for easier search&replace.)
    $endgroup$
    – Stephan Kolassa
    Jan 27 at 21:21










  • $begingroup$
    @StephanKolassa But in programming, a variable named pi does never mean a product of p and i (of course, unless you write pi = p * i or similarly). Even in the languages that allow juxtaposition for product they will be spaced out (p i, for example). (I personally think that allowing omission of the product symbol was of the strongest blunders in math notation history, since it restricts the variable name to one letter, so more or less meaningful designations are not possible without using indices; and the 1-letter names quickly run out, which forces us to use Greek, Gothic etc.)
    $endgroup$
    – trolley813
    Jan 28 at 7:58






  • 1




    $begingroup$
    @trolley813: the OP was explicitly asking about "websites , articles or demonstration", not programming - but then proceeded to noting (correctly) that this is not the convention in programming. There is simply a category confusion here, which I pointed out.
    $endgroup$
    – Stephan Kolassa
    Jan 28 at 9:02














2












2








2


0



$begingroup$


Why is it most of the time (in many websites, articles or demonstration) the feature variable (columns) is denoted by a upper-case 'X' whereas the target variable is a lower-case 'y'?



Looks more like a coding standard to me.
Ex.



X = df.iloc[:, :-1]
y = df.iloc[:, -1]


Just curious because I hardly ever use just a single letter to represent a variable storing meaningful data.










share|cite|improve this question











$endgroup$




Why is it most of the time (in many websites, articles or demonstration) the feature variable (columns) is denoted by a upper-case 'X' whereas the target variable is a lower-case 'y'?



Looks more like a coding standard to me.
Ex.



X = df.iloc[:, :-1]
y = df.iloc[:, -1]


Just curious because I hardly ever use just a single letter to represent a variable storing meaningful data.







machine-learning classification python cross-validation scikit-learn






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited Feb 19 at 12:20









amoeba

61.5k15206266




61.5k15206266










asked Jan 27 at 15:39









ranit.branit.b

286




286








  • 5




    $begingroup$
    Consistency with linear algebra notation, I guess? The features usually form a matrix (typically denoted by uppercase) whereas the labels are usually 1d, forming a column vector (typically denoted by lowercase).
    $endgroup$
    – galoosh33
    Jan 27 at 15:49






  • 1




    $begingroup$
    "I hardly ever use just a single letter to represent a variable storing meaningful data." - the problem is that in common math typesetting, product signs are not written. Thus, it is unclear whether an expression $pi$ represents a single variable called "pi", or the product of two separate variables "p" and "i", $pi=ptimes i$. To avoid this confusion, math-heavy disciplines very rarely use variables containing multiple letters. (When you implement an algorithm, yes, it is very good practice to replace single-letter variables by multi-letter ones, if only for easier search&replace.)
    $endgroup$
    – Stephan Kolassa
    Jan 27 at 21:21










  • $begingroup$
    @StephanKolassa But in programming, a variable named pi does never mean a product of p and i (of course, unless you write pi = p * i or similarly). Even in the languages that allow juxtaposition for product they will be spaced out (p i, for example). (I personally think that allowing omission of the product symbol was of the strongest blunders in math notation history, since it restricts the variable name to one letter, so more or less meaningful designations are not possible without using indices; and the 1-letter names quickly run out, which forces us to use Greek, Gothic etc.)
    $endgroup$
    – trolley813
    Jan 28 at 7:58






  • 1




    $begingroup$
    @trolley813: the OP was explicitly asking about "websites , articles or demonstration", not programming - but then proceeded to noting (correctly) that this is not the convention in programming. There is simply a category confusion here, which I pointed out.
    $endgroup$
    – Stephan Kolassa
    Jan 28 at 9:02














  • 5




    $begingroup$
    Consistency with linear algebra notation, I guess? The features usually form a matrix (typically denoted by uppercase) whereas the labels are usually 1d, forming a column vector (typically denoted by lowercase).
    $endgroup$
    – galoosh33
    Jan 27 at 15:49






  • 1




    $begingroup$
    "I hardly ever use just a single letter to represent a variable storing meaningful data." - the problem is that in common math typesetting, product signs are not written. Thus, it is unclear whether an expression $pi$ represents a single variable called "pi", or the product of two separate variables "p" and "i", $pi=ptimes i$. To avoid this confusion, math-heavy disciplines very rarely use variables containing multiple letters. (When you implement an algorithm, yes, it is very good practice to replace single-letter variables by multi-letter ones, if only for easier search&replace.)
    $endgroup$
    – Stephan Kolassa
    Jan 27 at 21:21










  • $begingroup$
    @StephanKolassa But in programming, a variable named pi does never mean a product of p and i (of course, unless you write pi = p * i or similarly). Even in the languages that allow juxtaposition for product they will be spaced out (p i, for example). (I personally think that allowing omission of the product symbol was of the strongest blunders in math notation history, since it restricts the variable name to one letter, so more or less meaningful designations are not possible without using indices; and the 1-letter names quickly run out, which forces us to use Greek, Gothic etc.)
    $endgroup$
    – trolley813
    Jan 28 at 7:58






  • 1




    $begingroup$
    @trolley813: the OP was explicitly asking about "websites , articles or demonstration", not programming - but then proceeded to noting (correctly) that this is not the convention in programming. There is simply a category confusion here, which I pointed out.
    $endgroup$
    – Stephan Kolassa
    Jan 28 at 9:02








5




5




$begingroup$
Consistency with linear algebra notation, I guess? The features usually form a matrix (typically denoted by uppercase) whereas the labels are usually 1d, forming a column vector (typically denoted by lowercase).
$endgroup$
– galoosh33
Jan 27 at 15:49




$begingroup$
Consistency with linear algebra notation, I guess? The features usually form a matrix (typically denoted by uppercase) whereas the labels are usually 1d, forming a column vector (typically denoted by lowercase).
$endgroup$
– galoosh33
Jan 27 at 15:49




1




1




$begingroup$
"I hardly ever use just a single letter to represent a variable storing meaningful data." - the problem is that in common math typesetting, product signs are not written. Thus, it is unclear whether an expression $pi$ represents a single variable called "pi", or the product of two separate variables "p" and "i", $pi=ptimes i$. To avoid this confusion, math-heavy disciplines very rarely use variables containing multiple letters. (When you implement an algorithm, yes, it is very good practice to replace single-letter variables by multi-letter ones, if only for easier search&replace.)
$endgroup$
– Stephan Kolassa
Jan 27 at 21:21




$begingroup$
"I hardly ever use just a single letter to represent a variable storing meaningful data." - the problem is that in common math typesetting, product signs are not written. Thus, it is unclear whether an expression $pi$ represents a single variable called "pi", or the product of two separate variables "p" and "i", $pi=ptimes i$. To avoid this confusion, math-heavy disciplines very rarely use variables containing multiple letters. (When you implement an algorithm, yes, it is very good practice to replace single-letter variables by multi-letter ones, if only for easier search&replace.)
$endgroup$
– Stephan Kolassa
Jan 27 at 21:21












$begingroup$
@StephanKolassa But in programming, a variable named pi does never mean a product of p and i (of course, unless you write pi = p * i or similarly). Even in the languages that allow juxtaposition for product they will be spaced out (p i, for example). (I personally think that allowing omission of the product symbol was of the strongest blunders in math notation history, since it restricts the variable name to one letter, so more or less meaningful designations are not possible without using indices; and the 1-letter names quickly run out, which forces us to use Greek, Gothic etc.)
$endgroup$
– trolley813
Jan 28 at 7:58




$begingroup$
@StephanKolassa But in programming, a variable named pi does never mean a product of p and i (of course, unless you write pi = p * i or similarly). Even in the languages that allow juxtaposition for product they will be spaced out (p i, for example). (I personally think that allowing omission of the product symbol was of the strongest blunders in math notation history, since it restricts the variable name to one letter, so more or less meaningful designations are not possible without using indices; and the 1-letter names quickly run out, which forces us to use Greek, Gothic etc.)
$endgroup$
– trolley813
Jan 28 at 7:58




1




1




$begingroup$
@trolley813: the OP was explicitly asking about "websites , articles or demonstration", not programming - but then proceeded to noting (correctly) that this is not the convention in programming. There is simply a category confusion here, which I pointed out.
$endgroup$
– Stephan Kolassa
Jan 28 at 9:02




$begingroup$
@trolley813: the OP was explicitly asking about "websites , articles or demonstration", not programming - but then proceeded to noting (correctly) that this is not the convention in programming. There is simply a category confusion here, which I pointed out.
$endgroup$
– Stephan Kolassa
Jan 28 at 9:02










2 Answers
2






active

oldest

votes


















14












$begingroup$

The question about why $X$ and $y$ are popular choices in mathematical notions has been answered in the History of Science and Mathematics SE website: Why are X and Y commonly used as mathematical placeholders? (In short: cause Descartes said so!)



In terms of Linear Algebra, it is extremely common to use capital Latin letters for matrices (e.g. design matrix $X$) and lowercase Latin letters for vectors (response vector $y$). Standard textbooks on the use of matrices in Statistics (e.g. Matrix Algebra Useful for Statistics by Searle, Matrix Algebra From a Statistician's Perspective by Harville and Matrix Algebra: Theory, Computations, and Applications in Statistics by Gentle) utilise this convention too, so it has become a standard way to denote things.






share|cite|improve this answer











$endgroup$





















    3












    $begingroup$

    Before you collect any data values on the feature and target variables, these variables can be considered to be random variables provided a random mechanism will be used to select the subjects who will generate these values. In that case, the correct notation for these variables is Y and X (i.e., upper case letters for both).



    Recall that the value of a random variable is unknown prior to collecting the data, though its behaviour in the long run can be predicted using probability laws. However, once we collect the data, that value becomes known.



    After you collect all desired data values on the feature and target variables, you can use the lower case notation to denote the collection of data values corresponding to the target variable (y) and the feature variables (x). If you have a single feature variable, x is a vector of data values. If you have multiple feature variables, x is a matrix of data values, having one column per feature variable. Usually, y is a vector of data values.



    So the upper case notation refers to "random (hence unknown)", while the lower case notation refers to "known". Alternatively, the upper case notation refers to "before collecting the data", while the lower case notation refers to "after collecting the data".



    Sadly, the literature is not at all consistent in the use of this notation, which is why you see the (y,X) notation you mention in your question.






    share|cite|improve this answer









    $endgroup$













      Your Answer





      StackExchange.ifUsing("editor", function () {
      return StackExchange.using("mathjaxEditing", function () {
      StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
      StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
      });
      });
      }, "mathjax-editing");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "65"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: false,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f389395%2fwhy-uppercase-for-x-and-lowercase-for-y%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      14












      $begingroup$

      The question about why $X$ and $y$ are popular choices in mathematical notions has been answered in the History of Science and Mathematics SE website: Why are X and Y commonly used as mathematical placeholders? (In short: cause Descartes said so!)



      In terms of Linear Algebra, it is extremely common to use capital Latin letters for matrices (e.g. design matrix $X$) and lowercase Latin letters for vectors (response vector $y$). Standard textbooks on the use of matrices in Statistics (e.g. Matrix Algebra Useful for Statistics by Searle, Matrix Algebra From a Statistician's Perspective by Harville and Matrix Algebra: Theory, Computations, and Applications in Statistics by Gentle) utilise this convention too, so it has become a standard way to denote things.






      share|cite|improve this answer











      $endgroup$


















        14












        $begingroup$

        The question about why $X$ and $y$ are popular choices in mathematical notions has been answered in the History of Science and Mathematics SE website: Why are X and Y commonly used as mathematical placeholders? (In short: cause Descartes said so!)



        In terms of Linear Algebra, it is extremely common to use capital Latin letters for matrices (e.g. design matrix $X$) and lowercase Latin letters for vectors (response vector $y$). Standard textbooks on the use of matrices in Statistics (e.g. Matrix Algebra Useful for Statistics by Searle, Matrix Algebra From a Statistician's Perspective by Harville and Matrix Algebra: Theory, Computations, and Applications in Statistics by Gentle) utilise this convention too, so it has become a standard way to denote things.






        share|cite|improve this answer











        $endgroup$
















          14












          14








          14





          $begingroup$

          The question about why $X$ and $y$ are popular choices in mathematical notions has been answered in the History of Science and Mathematics SE website: Why are X and Y commonly used as mathematical placeholders? (In short: cause Descartes said so!)



          In terms of Linear Algebra, it is extremely common to use capital Latin letters for matrices (e.g. design matrix $X$) and lowercase Latin letters for vectors (response vector $y$). Standard textbooks on the use of matrices in Statistics (e.g. Matrix Algebra Useful for Statistics by Searle, Matrix Algebra From a Statistician's Perspective by Harville and Matrix Algebra: Theory, Computations, and Applications in Statistics by Gentle) utilise this convention too, so it has become a standard way to denote things.






          share|cite|improve this answer











          $endgroup$



          The question about why $X$ and $y$ are popular choices in mathematical notions has been answered in the History of Science and Mathematics SE website: Why are X and Y commonly used as mathematical placeholders? (In short: cause Descartes said so!)



          In terms of Linear Algebra, it is extremely common to use capital Latin letters for matrices (e.g. design matrix $X$) and lowercase Latin letters for vectors (response vector $y$). Standard textbooks on the use of matrices in Statistics (e.g. Matrix Algebra Useful for Statistics by Searle, Matrix Algebra From a Statistician's Perspective by Harville and Matrix Algebra: Theory, Computations, and Applications in Statistics by Gentle) utilise this convention too, so it has become a standard way to denote things.







          share|cite|improve this answer














          share|cite|improve this answer



          share|cite|improve this answer








          edited Jan 27 at 20:41

























          answered Jan 27 at 17:51









          usεr11852usεr11852

          19.5k14275




          19.5k14275

























              3












              $begingroup$

              Before you collect any data values on the feature and target variables, these variables can be considered to be random variables provided a random mechanism will be used to select the subjects who will generate these values. In that case, the correct notation for these variables is Y and X (i.e., upper case letters for both).



              Recall that the value of a random variable is unknown prior to collecting the data, though its behaviour in the long run can be predicted using probability laws. However, once we collect the data, that value becomes known.



              After you collect all desired data values on the feature and target variables, you can use the lower case notation to denote the collection of data values corresponding to the target variable (y) and the feature variables (x). If you have a single feature variable, x is a vector of data values. If you have multiple feature variables, x is a matrix of data values, having one column per feature variable. Usually, y is a vector of data values.



              So the upper case notation refers to "random (hence unknown)", while the lower case notation refers to "known". Alternatively, the upper case notation refers to "before collecting the data", while the lower case notation refers to "after collecting the data".



              Sadly, the literature is not at all consistent in the use of this notation, which is why you see the (y,X) notation you mention in your question.






              share|cite|improve this answer









              $endgroup$


















                3












                $begingroup$

                Before you collect any data values on the feature and target variables, these variables can be considered to be random variables provided a random mechanism will be used to select the subjects who will generate these values. In that case, the correct notation for these variables is Y and X (i.e., upper case letters for both).



                Recall that the value of a random variable is unknown prior to collecting the data, though its behaviour in the long run can be predicted using probability laws. However, once we collect the data, that value becomes known.



                After you collect all desired data values on the feature and target variables, you can use the lower case notation to denote the collection of data values corresponding to the target variable (y) and the feature variables (x). If you have a single feature variable, x is a vector of data values. If you have multiple feature variables, x is a matrix of data values, having one column per feature variable. Usually, y is a vector of data values.



                So the upper case notation refers to "random (hence unknown)", while the lower case notation refers to "known". Alternatively, the upper case notation refers to "before collecting the data", while the lower case notation refers to "after collecting the data".



                Sadly, the literature is not at all consistent in the use of this notation, which is why you see the (y,X) notation you mention in your question.






                share|cite|improve this answer









                $endgroup$
















                  3












                  3








                  3





                  $begingroup$

                  Before you collect any data values on the feature and target variables, these variables can be considered to be random variables provided a random mechanism will be used to select the subjects who will generate these values. In that case, the correct notation for these variables is Y and X (i.e., upper case letters for both).



                  Recall that the value of a random variable is unknown prior to collecting the data, though its behaviour in the long run can be predicted using probability laws. However, once we collect the data, that value becomes known.



                  After you collect all desired data values on the feature and target variables, you can use the lower case notation to denote the collection of data values corresponding to the target variable (y) and the feature variables (x). If you have a single feature variable, x is a vector of data values. If you have multiple feature variables, x is a matrix of data values, having one column per feature variable. Usually, y is a vector of data values.



                  So the upper case notation refers to "random (hence unknown)", while the lower case notation refers to "known". Alternatively, the upper case notation refers to "before collecting the data", while the lower case notation refers to "after collecting the data".



                  Sadly, the literature is not at all consistent in the use of this notation, which is why you see the (y,X) notation you mention in your question.






                  share|cite|improve this answer









                  $endgroup$



                  Before you collect any data values on the feature and target variables, these variables can be considered to be random variables provided a random mechanism will be used to select the subjects who will generate these values. In that case, the correct notation for these variables is Y and X (i.e., upper case letters for both).



                  Recall that the value of a random variable is unknown prior to collecting the data, though its behaviour in the long run can be predicted using probability laws. However, once we collect the data, that value becomes known.



                  After you collect all desired data values on the feature and target variables, you can use the lower case notation to denote the collection of data values corresponding to the target variable (y) and the feature variables (x). If you have a single feature variable, x is a vector of data values. If you have multiple feature variables, x is a matrix of data values, having one column per feature variable. Usually, y is a vector of data values.



                  So the upper case notation refers to "random (hence unknown)", while the lower case notation refers to "known". Alternatively, the upper case notation refers to "before collecting the data", while the lower case notation refers to "after collecting the data".



                  Sadly, the literature is not at all consistent in the use of this notation, which is why you see the (y,X) notation you mention in your question.







                  share|cite|improve this answer












                  share|cite|improve this answer



                  share|cite|improve this answer










                  answered Jan 27 at 17:27









                  Isabella GhementIsabella Ghement

                  7,558422




                  7,558422






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Cross Validated!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      Use MathJax to format equations. MathJax reference.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f389395%2fwhy-uppercase-for-x-and-lowercase-for-y%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      The term 'EXEC' is not recognized as the name of a cmdlet Powershell

                      NPM command prompt closes immediately [closed]

                      Error binding properties and functions in emscripten