Derivation of Linear Regression using Normal Equations
$begingroup$
I was going through Andrew Ng's course on ML and had a doubt regarding one of the steps while deriving the solution for linear regression using normal equations.
Normal equation: $theta=(X^TX)^{-1}X^TY$
While deriving, there's this step:
$frac{delta}{deltatheta}theta^TX^TXtheta = X^TXfrac{delta}{deltatheta}theta^Ttheta$
But isn't matrix multiplication commutative, for us to take out $X^TX$ from inside the derivative?
Thanks
matrix-calculus linear-regression
$endgroup$
add a comment |
$begingroup$
I was going through Andrew Ng's course on ML and had a doubt regarding one of the steps while deriving the solution for linear regression using normal equations.
Normal equation: $theta=(X^TX)^{-1}X^TY$
While deriving, there's this step:
$frac{delta}{deltatheta}theta^TX^TXtheta = X^TXfrac{delta}{deltatheta}theta^Ttheta$
But isn't matrix multiplication commutative, for us to take out $X^TX$ from inside the derivative?
Thanks
matrix-calculus linear-regression
$endgroup$
add a comment |
$begingroup$
I was going through Andrew Ng's course on ML and had a doubt regarding one of the steps while deriving the solution for linear regression using normal equations.
Normal equation: $theta=(X^TX)^{-1}X^TY$
While deriving, there's this step:
$frac{delta}{deltatheta}theta^TX^TXtheta = X^TXfrac{delta}{deltatheta}theta^Ttheta$
But isn't matrix multiplication commutative, for us to take out $X^TX$ from inside the derivative?
Thanks
matrix-calculus linear-regression
$endgroup$
I was going through Andrew Ng's course on ML and had a doubt regarding one of the steps while deriving the solution for linear regression using normal equations.
Normal equation: $theta=(X^TX)^{-1}X^TY$
While deriving, there's this step:
$frac{delta}{deltatheta}theta^TX^TXtheta = X^TXfrac{delta}{deltatheta}theta^Ttheta$
But isn't matrix multiplication commutative, for us to take out $X^TX$ from inside the derivative?
Thanks
matrix-calculus linear-regression
matrix-calculus linear-regression
asked Jan 13 at 19:13
Rish1618Rish1618
31
31
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
Given two symmetric $(A, B)$ consider these following the scalar functions and their gradients
$$eqalign{
alpha &= theta^TAtheta &implies frac{partialalpha}{partialtheta}=2Atheta cr
beta &= theta^TBtheta &implies frac{partialbeta}{partialtheta}=2Btheta cr
}$$
It's not terribly illuminating, but you can write the second gradient in terms of the first, i.e.
$$frac{partialbeta}{partialtheta} = BA^{-1}frac{partialalpha}{partialtheta}$$
For the purposes of your question, $A=I$ and $B=X^TX$.
$endgroup$
add a comment |
$begingroup$
Although that equality is true, it does not give insight into why it is true.
There are many ways to compute that gradient, but here is a direct approach that simply computes all the partial derivatives individually.
Let $A$ be a symmetric matrix. (In your context, $A= X^top X$.)
The partial derivative of $theta^top A theta = sum_i sum_j A_{ij} theta_i theta_j$ with respect to $theta_k$ is
$$frac{partial}{partial theta_k} theta^top A theta = sum_i sum_j A_{ij} frac{partial}{partial theta_k}(theta_i theta_j) = A_{kk} cdot 2 theta_k + sum_{i ne k} A_{ik} cdot theta_ i + sum_{j ne k} A_{kj} theta_j = 2sum_i A_{ki} theta_i = 2 (A theta)_k$$
Stacking the partial derivatives into a vector gives you the gradient, so
$$nabla_theta theta^top A theta = 2 A theta.$$
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3072408%2fderivation-of-linear-regression-using-normal-equations%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Given two symmetric $(A, B)$ consider these following the scalar functions and their gradients
$$eqalign{
alpha &= theta^TAtheta &implies frac{partialalpha}{partialtheta}=2Atheta cr
beta &= theta^TBtheta &implies frac{partialbeta}{partialtheta}=2Btheta cr
}$$
It's not terribly illuminating, but you can write the second gradient in terms of the first, i.e.
$$frac{partialbeta}{partialtheta} = BA^{-1}frac{partialalpha}{partialtheta}$$
For the purposes of your question, $A=I$ and $B=X^TX$.
$endgroup$
add a comment |
$begingroup$
Given two symmetric $(A, B)$ consider these following the scalar functions and their gradients
$$eqalign{
alpha &= theta^TAtheta &implies frac{partialalpha}{partialtheta}=2Atheta cr
beta &= theta^TBtheta &implies frac{partialbeta}{partialtheta}=2Btheta cr
}$$
It's not terribly illuminating, but you can write the second gradient in terms of the first, i.e.
$$frac{partialbeta}{partialtheta} = BA^{-1}frac{partialalpha}{partialtheta}$$
For the purposes of your question, $A=I$ and $B=X^TX$.
$endgroup$
add a comment |
$begingroup$
Given two symmetric $(A, B)$ consider these following the scalar functions and their gradients
$$eqalign{
alpha &= theta^TAtheta &implies frac{partialalpha}{partialtheta}=2Atheta cr
beta &= theta^TBtheta &implies frac{partialbeta}{partialtheta}=2Btheta cr
}$$
It's not terribly illuminating, but you can write the second gradient in terms of the first, i.e.
$$frac{partialbeta}{partialtheta} = BA^{-1}frac{partialalpha}{partialtheta}$$
For the purposes of your question, $A=I$ and $B=X^TX$.
$endgroup$
Given two symmetric $(A, B)$ consider these following the scalar functions and their gradients
$$eqalign{
alpha &= theta^TAtheta &implies frac{partialalpha}{partialtheta}=2Atheta cr
beta &= theta^TBtheta &implies frac{partialbeta}{partialtheta}=2Btheta cr
}$$
It's not terribly illuminating, but you can write the second gradient in terms of the first, i.e.
$$frac{partialbeta}{partialtheta} = BA^{-1}frac{partialalpha}{partialtheta}$$
For the purposes of your question, $A=I$ and $B=X^TX$.
answered Jan 13 at 21:30
greggreg
8,2751823
8,2751823
add a comment |
add a comment |
$begingroup$
Although that equality is true, it does not give insight into why it is true.
There are many ways to compute that gradient, but here is a direct approach that simply computes all the partial derivatives individually.
Let $A$ be a symmetric matrix. (In your context, $A= X^top X$.)
The partial derivative of $theta^top A theta = sum_i sum_j A_{ij} theta_i theta_j$ with respect to $theta_k$ is
$$frac{partial}{partial theta_k} theta^top A theta = sum_i sum_j A_{ij} frac{partial}{partial theta_k}(theta_i theta_j) = A_{kk} cdot 2 theta_k + sum_{i ne k} A_{ik} cdot theta_ i + sum_{j ne k} A_{kj} theta_j = 2sum_i A_{ki} theta_i = 2 (A theta)_k$$
Stacking the partial derivatives into a vector gives you the gradient, so
$$nabla_theta theta^top A theta = 2 A theta.$$
$endgroup$
add a comment |
$begingroup$
Although that equality is true, it does not give insight into why it is true.
There are many ways to compute that gradient, but here is a direct approach that simply computes all the partial derivatives individually.
Let $A$ be a symmetric matrix. (In your context, $A= X^top X$.)
The partial derivative of $theta^top A theta = sum_i sum_j A_{ij} theta_i theta_j$ with respect to $theta_k$ is
$$frac{partial}{partial theta_k} theta^top A theta = sum_i sum_j A_{ij} frac{partial}{partial theta_k}(theta_i theta_j) = A_{kk} cdot 2 theta_k + sum_{i ne k} A_{ik} cdot theta_ i + sum_{j ne k} A_{kj} theta_j = 2sum_i A_{ki} theta_i = 2 (A theta)_k$$
Stacking the partial derivatives into a vector gives you the gradient, so
$$nabla_theta theta^top A theta = 2 A theta.$$
$endgroup$
add a comment |
$begingroup$
Although that equality is true, it does not give insight into why it is true.
There are many ways to compute that gradient, but here is a direct approach that simply computes all the partial derivatives individually.
Let $A$ be a symmetric matrix. (In your context, $A= X^top X$.)
The partial derivative of $theta^top A theta = sum_i sum_j A_{ij} theta_i theta_j$ with respect to $theta_k$ is
$$frac{partial}{partial theta_k} theta^top A theta = sum_i sum_j A_{ij} frac{partial}{partial theta_k}(theta_i theta_j) = A_{kk} cdot 2 theta_k + sum_{i ne k} A_{ik} cdot theta_ i + sum_{j ne k} A_{kj} theta_j = 2sum_i A_{ki} theta_i = 2 (A theta)_k$$
Stacking the partial derivatives into a vector gives you the gradient, so
$$nabla_theta theta^top A theta = 2 A theta.$$
$endgroup$
Although that equality is true, it does not give insight into why it is true.
There are many ways to compute that gradient, but here is a direct approach that simply computes all the partial derivatives individually.
Let $A$ be a symmetric matrix. (In your context, $A= X^top X$.)
The partial derivative of $theta^top A theta = sum_i sum_j A_{ij} theta_i theta_j$ with respect to $theta_k$ is
$$frac{partial}{partial theta_k} theta^top A theta = sum_i sum_j A_{ij} frac{partial}{partial theta_k}(theta_i theta_j) = A_{kk} cdot 2 theta_k + sum_{i ne k} A_{ik} cdot theta_ i + sum_{j ne k} A_{kj} theta_j = 2sum_i A_{ki} theta_i = 2 (A theta)_k$$
Stacking the partial derivatives into a vector gives you the gradient, so
$$nabla_theta theta^top A theta = 2 A theta.$$
answered Jan 13 at 19:29
angryavianangryavian
41k23380
41k23380
add a comment |
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3072408%2fderivation-of-linear-regression-using-normal-equations%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown