Derivate of an Inverse of a Matrix












0














I have the following loss function.



$$||theta - (X^T X)^{-1} X^T y||_2^2$$



$$Xspace text{ is a matrix, } theta text{ and } y text{ are known vectors.}$$



I have another constraint for $X$, which is $X = f(lambda)$ for some function $f$ that I didn't include here.



The idea is that I want to initialize $lambda$ to some random vector, compute an $X$ with $X=f(lambda)$, and then use gradient descent or some iterative method to minimize the loss function given above by updating $lambda$ at each step. However, I am having trouble taking the gradient of this loss function that could be used in this iterative algorithm.



How would I do this?










share|cite|improve this question






















  • Wasn't that posted like 15 minutes ago ?
    – Rebellos
    Nov 20 '18 at 18:12
















0














I have the following loss function.



$$||theta - (X^T X)^{-1} X^T y||_2^2$$



$$Xspace text{ is a matrix, } theta text{ and } y text{ are known vectors.}$$



I have another constraint for $X$, which is $X = f(lambda)$ for some function $f$ that I didn't include here.



The idea is that I want to initialize $lambda$ to some random vector, compute an $X$ with $X=f(lambda)$, and then use gradient descent or some iterative method to minimize the loss function given above by updating $lambda$ at each step. However, I am having trouble taking the gradient of this loss function that could be used in this iterative algorithm.



How would I do this?










share|cite|improve this question






















  • Wasn't that posted like 15 minutes ago ?
    – Rebellos
    Nov 20 '18 at 18:12














0












0








0


1





I have the following loss function.



$$||theta - (X^T X)^{-1} X^T y||_2^2$$



$$Xspace text{ is a matrix, } theta text{ and } y text{ are known vectors.}$$



I have another constraint for $X$, which is $X = f(lambda)$ for some function $f$ that I didn't include here.



The idea is that I want to initialize $lambda$ to some random vector, compute an $X$ with $X=f(lambda)$, and then use gradient descent or some iterative method to minimize the loss function given above by updating $lambda$ at each step. However, I am having trouble taking the gradient of this loss function that could be used in this iterative algorithm.



How would I do this?










share|cite|improve this question













I have the following loss function.



$$||theta - (X^T X)^{-1} X^T y||_2^2$$



$$Xspace text{ is a matrix, } theta text{ and } y text{ are known vectors.}$$



I have another constraint for $X$, which is $X = f(lambda)$ for some function $f$ that I didn't include here.



The idea is that I want to initialize $lambda$ to some random vector, compute an $X$ with $X=f(lambda)$, and then use gradient descent or some iterative method to minimize the loss function given above by updating $lambda$ at each step. However, I am having trouble taking the gradient of this loss function that could be used in this iterative algorithm.



How would I do this?







optimization matrix-calculus gradient-descent






share|cite|improve this question













share|cite|improve this question











share|cite|improve this question




share|cite|improve this question










asked Nov 20 '18 at 18:10









codemirel

113




113












  • Wasn't that posted like 15 minutes ago ?
    – Rebellos
    Nov 20 '18 at 18:12


















  • Wasn't that posted like 15 minutes ago ?
    – Rebellos
    Nov 20 '18 at 18:12
















Wasn't that posted like 15 minutes ago ?
– Rebellos
Nov 20 '18 at 18:12




Wasn't that posted like 15 minutes ago ?
– Rebellos
Nov 20 '18 at 18:12










1 Answer
1






active

oldest

votes


















1














Define some new variables
$$eqalign{
M &= (X^TX)^{-1}X^T cr
p &= My - theta cr
}$$

and their differentials
$$eqalign{
dM &= (X^TX)^{-1},dX^T - (X^TX)^{-1},d(X^TX),(X^TX)^{-1}X^T cr
&= (X^TX)^{-1},dX^T - (X^TX)^{-1},dX^T,XM - M,dX,M cr
dp &= dM,y cr
}$$

Write the cost function in terms of these new variables.

Then find its differential and gradient.
$$eqalign{
phi &= p:p crcr
dphi
&= p:dM,y cr
&= py^T:dM cr
&= py^T:(X^TX)^{-1},dX^T - py^T:(X^TX)^{-1},dX^T,XM - py^T:M,dX,M cr
&= (X^TX)^{-1}py^T:dX^T - (X^TX)^{-1}py^TM^TX^T:dX^T - M^Tpy^TM^T:dX cr
&= Big(yp^T(X^TX)^{-1} - XMyp^T(X^TX)^{-1} - M^Tpy^TM^TBig):dX cr
&= Big(yp^T(X^TX)^{-1} - XMyp^T(X^TX)^{-1} - M^Tpy^TM^TBig):frac{partial X}{partiallambda_k},dlambda_k crcr
frac{partialphi}{partial lambda_k}
&= Big(yp^T(X^TX)^{-1} - XMyp^T(X^TX)^{-1} - M^Tpy^TM^TBig):frac{partial X}{partiallambda_k} crcr
}$$

The colon is a convenient product notation for the trace, i.e. $,,A:B={rm Tr}(A^TB)$.

Rules for rearranging terms in a colon product follow from the properties of the trace.






share|cite|improve this answer























    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "69"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    noCode: true, onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3006684%2fderivate-of-an-inverse-of-a-matrix%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    Define some new variables
    $$eqalign{
    M &= (X^TX)^{-1}X^T cr
    p &= My - theta cr
    }$$

    and their differentials
    $$eqalign{
    dM &= (X^TX)^{-1},dX^T - (X^TX)^{-1},d(X^TX),(X^TX)^{-1}X^T cr
    &= (X^TX)^{-1},dX^T - (X^TX)^{-1},dX^T,XM - M,dX,M cr
    dp &= dM,y cr
    }$$

    Write the cost function in terms of these new variables.

    Then find its differential and gradient.
    $$eqalign{
    phi &= p:p crcr
    dphi
    &= p:dM,y cr
    &= py^T:dM cr
    &= py^T:(X^TX)^{-1},dX^T - py^T:(X^TX)^{-1},dX^T,XM - py^T:M,dX,M cr
    &= (X^TX)^{-1}py^T:dX^T - (X^TX)^{-1}py^TM^TX^T:dX^T - M^Tpy^TM^T:dX cr
    &= Big(yp^T(X^TX)^{-1} - XMyp^T(X^TX)^{-1} - M^Tpy^TM^TBig):dX cr
    &= Big(yp^T(X^TX)^{-1} - XMyp^T(X^TX)^{-1} - M^Tpy^TM^TBig):frac{partial X}{partiallambda_k},dlambda_k crcr
    frac{partialphi}{partial lambda_k}
    &= Big(yp^T(X^TX)^{-1} - XMyp^T(X^TX)^{-1} - M^Tpy^TM^TBig):frac{partial X}{partiallambda_k} crcr
    }$$

    The colon is a convenient product notation for the trace, i.e. $,,A:B={rm Tr}(A^TB)$.

    Rules for rearranging terms in a colon product follow from the properties of the trace.






    share|cite|improve this answer




























      1














      Define some new variables
      $$eqalign{
      M &= (X^TX)^{-1}X^T cr
      p &= My - theta cr
      }$$

      and their differentials
      $$eqalign{
      dM &= (X^TX)^{-1},dX^T - (X^TX)^{-1},d(X^TX),(X^TX)^{-1}X^T cr
      &= (X^TX)^{-1},dX^T - (X^TX)^{-1},dX^T,XM - M,dX,M cr
      dp &= dM,y cr
      }$$

      Write the cost function in terms of these new variables.

      Then find its differential and gradient.
      $$eqalign{
      phi &= p:p crcr
      dphi
      &= p:dM,y cr
      &= py^T:dM cr
      &= py^T:(X^TX)^{-1},dX^T - py^T:(X^TX)^{-1},dX^T,XM - py^T:M,dX,M cr
      &= (X^TX)^{-1}py^T:dX^T - (X^TX)^{-1}py^TM^TX^T:dX^T - M^Tpy^TM^T:dX cr
      &= Big(yp^T(X^TX)^{-1} - XMyp^T(X^TX)^{-1} - M^Tpy^TM^TBig):dX cr
      &= Big(yp^T(X^TX)^{-1} - XMyp^T(X^TX)^{-1} - M^Tpy^TM^TBig):frac{partial X}{partiallambda_k},dlambda_k crcr
      frac{partialphi}{partial lambda_k}
      &= Big(yp^T(X^TX)^{-1} - XMyp^T(X^TX)^{-1} - M^Tpy^TM^TBig):frac{partial X}{partiallambda_k} crcr
      }$$

      The colon is a convenient product notation for the trace, i.e. $,,A:B={rm Tr}(A^TB)$.

      Rules for rearranging terms in a colon product follow from the properties of the trace.






      share|cite|improve this answer


























        1












        1








        1






        Define some new variables
        $$eqalign{
        M &= (X^TX)^{-1}X^T cr
        p &= My - theta cr
        }$$

        and their differentials
        $$eqalign{
        dM &= (X^TX)^{-1},dX^T - (X^TX)^{-1},d(X^TX),(X^TX)^{-1}X^T cr
        &= (X^TX)^{-1},dX^T - (X^TX)^{-1},dX^T,XM - M,dX,M cr
        dp &= dM,y cr
        }$$

        Write the cost function in terms of these new variables.

        Then find its differential and gradient.
        $$eqalign{
        phi &= p:p crcr
        dphi
        &= p:dM,y cr
        &= py^T:dM cr
        &= py^T:(X^TX)^{-1},dX^T - py^T:(X^TX)^{-1},dX^T,XM - py^T:M,dX,M cr
        &= (X^TX)^{-1}py^T:dX^T - (X^TX)^{-1}py^TM^TX^T:dX^T - M^Tpy^TM^T:dX cr
        &= Big(yp^T(X^TX)^{-1} - XMyp^T(X^TX)^{-1} - M^Tpy^TM^TBig):dX cr
        &= Big(yp^T(X^TX)^{-1} - XMyp^T(X^TX)^{-1} - M^Tpy^TM^TBig):frac{partial X}{partiallambda_k},dlambda_k crcr
        frac{partialphi}{partial lambda_k}
        &= Big(yp^T(X^TX)^{-1} - XMyp^T(X^TX)^{-1} - M^Tpy^TM^TBig):frac{partial X}{partiallambda_k} crcr
        }$$

        The colon is a convenient product notation for the trace, i.e. $,,A:B={rm Tr}(A^TB)$.

        Rules for rearranging terms in a colon product follow from the properties of the trace.






        share|cite|improve this answer














        Define some new variables
        $$eqalign{
        M &= (X^TX)^{-1}X^T cr
        p &= My - theta cr
        }$$

        and their differentials
        $$eqalign{
        dM &= (X^TX)^{-1},dX^T - (X^TX)^{-1},d(X^TX),(X^TX)^{-1}X^T cr
        &= (X^TX)^{-1},dX^T - (X^TX)^{-1},dX^T,XM - M,dX,M cr
        dp &= dM,y cr
        }$$

        Write the cost function in terms of these new variables.

        Then find its differential and gradient.
        $$eqalign{
        phi &= p:p crcr
        dphi
        &= p:dM,y cr
        &= py^T:dM cr
        &= py^T:(X^TX)^{-1},dX^T - py^T:(X^TX)^{-1},dX^T,XM - py^T:M,dX,M cr
        &= (X^TX)^{-1}py^T:dX^T - (X^TX)^{-1}py^TM^TX^T:dX^T - M^Tpy^TM^T:dX cr
        &= Big(yp^T(X^TX)^{-1} - XMyp^T(X^TX)^{-1} - M^Tpy^TM^TBig):dX cr
        &= Big(yp^T(X^TX)^{-1} - XMyp^T(X^TX)^{-1} - M^Tpy^TM^TBig):frac{partial X}{partiallambda_k},dlambda_k crcr
        frac{partialphi}{partial lambda_k}
        &= Big(yp^T(X^TX)^{-1} - XMyp^T(X^TX)^{-1} - M^Tpy^TM^TBig):frac{partial X}{partiallambda_k} crcr
        }$$

        The colon is a convenient product notation for the trace, i.e. $,,A:B={rm Tr}(A^TB)$.

        Rules for rearranging terms in a colon product follow from the properties of the trace.







        share|cite|improve this answer














        share|cite|improve this answer



        share|cite|improve this answer








        edited Nov 21 '18 at 4:50

























        answered Nov 21 '18 at 2:52









        greg

        7,5251821




        7,5251821






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Mathematics Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3006684%2fderivate-of-an-inverse-of-a-matrix%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            MongoDB - Not Authorized To Execute Command

            How to fix TextFormField cause rebuild widget in Flutter

            in spring boot 2.1 many test slices are not allowed anymore due to multiple @BootstrapWith