Matrix derivatives, problem with dimensions












1












$begingroup$


I'm trying to find a derivative of function:
$$L = f cdot y; f = X cdot W + b$$



Matrices shapes: $X.shape=(1, m), W.shape=(m,10), b.shape=(1, 10), y.shape=(10, 1)$
I'm looking for $frac{partial L}{partial W}$



According to chain-rule:
$$frac{partial L}{partial W} = frac{partial L}{partial f} frac{partial f}{partial W} $$



Separately we can find:
$$ frac{partial L}{partial f} = y$$
$$ frac{partial f}{partial W} = X$$



And the problem is that the derivative's dimension of $frac{partial L}{partial W} $ according to my formula is $(10, m)$. However, the dimension should coincide with dimension of $W$.



Also I was advised to find differential of $L$:



$$ d(L) = d(f cdot y) = d(f) cdot y = d (X cdot W + b)y = X cdot dW cdot y $$
But I do not understand how can I get from this the derivative $frac{partial L}{partial W} $










share|cite|improve this question









$endgroup$

















    1












    $begingroup$


    I'm trying to find a derivative of function:
    $$L = f cdot y; f = X cdot W + b$$



    Matrices shapes: $X.shape=(1, m), W.shape=(m,10), b.shape=(1, 10), y.shape=(10, 1)$
    I'm looking for $frac{partial L}{partial W}$



    According to chain-rule:
    $$frac{partial L}{partial W} = frac{partial L}{partial f} frac{partial f}{partial W} $$



    Separately we can find:
    $$ frac{partial L}{partial f} = y$$
    $$ frac{partial f}{partial W} = X$$



    And the problem is that the derivative's dimension of $frac{partial L}{partial W} $ according to my formula is $(10, m)$. However, the dimension should coincide with dimension of $W$.



    Also I was advised to find differential of $L$:



    $$ d(L) = d(f cdot y) = d(f) cdot y = d (X cdot W + b)y = X cdot dW cdot y $$
    But I do not understand how can I get from this the derivative $frac{partial L}{partial W} $










    share|cite|improve this question









    $endgroup$















      1












      1








      1





      $begingroup$


      I'm trying to find a derivative of function:
      $$L = f cdot y; f = X cdot W + b$$



      Matrices shapes: $X.shape=(1, m), W.shape=(m,10), b.shape=(1, 10), y.shape=(10, 1)$
      I'm looking for $frac{partial L}{partial W}$



      According to chain-rule:
      $$frac{partial L}{partial W} = frac{partial L}{partial f} frac{partial f}{partial W} $$



      Separately we can find:
      $$ frac{partial L}{partial f} = y$$
      $$ frac{partial f}{partial W} = X$$



      And the problem is that the derivative's dimension of $frac{partial L}{partial W} $ according to my formula is $(10, m)$. However, the dimension should coincide with dimension of $W$.



      Also I was advised to find differential of $L$:



      $$ d(L) = d(f cdot y) = d(f) cdot y = d (X cdot W + b)y = X cdot dW cdot y $$
      But I do not understand how can I get from this the derivative $frac{partial L}{partial W} $










      share|cite|improve this question









      $endgroup$




      I'm trying to find a derivative of function:
      $$L = f cdot y; f = X cdot W + b$$



      Matrices shapes: $X.shape=(1, m), W.shape=(m,10), b.shape=(1, 10), y.shape=(10, 1)$
      I'm looking for $frac{partial L}{partial W}$



      According to chain-rule:
      $$frac{partial L}{partial W} = frac{partial L}{partial f} frac{partial f}{partial W} $$



      Separately we can find:
      $$ frac{partial L}{partial f} = y$$
      $$ frac{partial f}{partial W} = X$$



      And the problem is that the derivative's dimension of $frac{partial L}{partial W} $ according to my formula is $(10, m)$. However, the dimension should coincide with dimension of $W$.



      Also I was advised to find differential of $L$:



      $$ d(L) = d(f cdot y) = d(f) cdot y = d (X cdot W + b)y = X cdot dW cdot y $$
      But I do not understand how can I get from this the derivative $frac{partial L}{partial W} $







      matrices derivatives chain-rule






      share|cite|improve this question













      share|cite|improve this question











      share|cite|improve this question




      share|cite|improve this question










      asked Jan 13 at 18:48









      Dmitry DenisovDmitry Denisov

      62




      62






















          1 Answer
          1






          active

          oldest

          votes


















          0












          $begingroup$

          Let's use a convention where a lowercase Latin letter always represents a column vector, an uppercase Latin is a matrix, and a Greek letter is a scalar.



          Using this convention your equations are
          $$eqalign{
          f &= W^Tx + b cr
          lambda &= f^Ty cr
          }$$

          As you have noted, the differential of the scalar function is
          $$eqalign{
          dlambda &= df^Ty = (dW^Tx)^Ty = x^TdW,y cr
          }$$

          Let's develop that a bit further by introducing the Trace function
          $$eqalign{
          dlambda &= {rm Tr}(x^TdW,y) = {rm Tr}(yx^TdW) cr
          }$$

          Then, depending on your preferred Layout Convention, the gradient is either
          $$eqalign{
          frac{partiallambda}{partial W} &=yx^T quad{rm or}quad xy^T cr
          }$$

          Since you expected the the dimensions of the gradient to be those of $W$, it sounds like your preferred layout is $xy^T$



          Also note that $frac{partial f}{partial W}neq X.,$ The gradient is a 3rd order tensor, while $X$ is just a 2nd order tensor (aka a matrix). The presence of these 3rd and 4th order tensors as intermediate quantities in the chain rule can make it difficult/impossible to use in practice.



          The differential approach suggested by your advisor is often simpler because the differential of a matrix is just another matrix quantity, which is easy to handle.






          share|cite|improve this answer











          $endgroup$













          • $begingroup$
            Thank you very much for your answer, it became more clear for me now! I have 2 questions about your solution: 1) Do I understand correctly that you introduced Trace function, because dλ is scalar so const=Tr(const) ? 2) $frac{partial f}{partial W}$ is 3rd order tensor. Maybe in this case you know how chain rule works in Neural Networks? Because I get derivative from previous layer and I should multiply it by the derivative of current layer according to chain rule. However, as you mentioned, $frac{partial f}{partial W}$ now is a 3rd order tensor, so how can we apply chain rule?
            $endgroup$
            – Dmitry Denisov
            Jan 14 at 22:13












          • $begingroup$
            And even if X is matrix then f is also a matrix and we should take a derivative: matrix-by-matrix?
            $endgroup$
            – Dmitry Denisov
            Jan 15 at 12:14












          • $begingroup$
            @DmitryDenisov 1) Yes, ${rm Tr}(scalar)=scalar,,$ 2) the gradient really is a 3rd order tensor. The point is you never need to calculate 3rd order (vector-by-matrix) or 4th order (matrix-by-matrix) derivatives, and the programs you write will never calculate such quantities either. These online notes are worth a read.
            $endgroup$
            – greg
            Jan 15 at 18:04












          • $begingroup$
            In this row nabla_w[-1] = np.dot(delta, activations[-2].transpose()) they set $frac{partial L}{partial W}$ is equal to $X^T cdot delta$, so it doesn't seem like chain rule. I.e. in another case they also should calculate the derivative using differential on paper, however they stated that chain rule is a universal approach
            $endgroup$
            – Dmitry Denisov
            Jan 16 at 9:57













          Your Answer





          StackExchange.ifUsing("editor", function () {
          return StackExchange.using("mathjaxEditing", function () {
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          });
          });
          }, "mathjax-editing");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "69"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          noCode: true, onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3072387%2fmatrix-derivatives-problem-with-dimensions%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          0












          $begingroup$

          Let's use a convention where a lowercase Latin letter always represents a column vector, an uppercase Latin is a matrix, and a Greek letter is a scalar.



          Using this convention your equations are
          $$eqalign{
          f &= W^Tx + b cr
          lambda &= f^Ty cr
          }$$

          As you have noted, the differential of the scalar function is
          $$eqalign{
          dlambda &= df^Ty = (dW^Tx)^Ty = x^TdW,y cr
          }$$

          Let's develop that a bit further by introducing the Trace function
          $$eqalign{
          dlambda &= {rm Tr}(x^TdW,y) = {rm Tr}(yx^TdW) cr
          }$$

          Then, depending on your preferred Layout Convention, the gradient is either
          $$eqalign{
          frac{partiallambda}{partial W} &=yx^T quad{rm or}quad xy^T cr
          }$$

          Since you expected the the dimensions of the gradient to be those of $W$, it sounds like your preferred layout is $xy^T$



          Also note that $frac{partial f}{partial W}neq X.,$ The gradient is a 3rd order tensor, while $X$ is just a 2nd order tensor (aka a matrix). The presence of these 3rd and 4th order tensors as intermediate quantities in the chain rule can make it difficult/impossible to use in practice.



          The differential approach suggested by your advisor is often simpler because the differential of a matrix is just another matrix quantity, which is easy to handle.






          share|cite|improve this answer











          $endgroup$













          • $begingroup$
            Thank you very much for your answer, it became more clear for me now! I have 2 questions about your solution: 1) Do I understand correctly that you introduced Trace function, because dλ is scalar so const=Tr(const) ? 2) $frac{partial f}{partial W}$ is 3rd order tensor. Maybe in this case you know how chain rule works in Neural Networks? Because I get derivative from previous layer and I should multiply it by the derivative of current layer according to chain rule. However, as you mentioned, $frac{partial f}{partial W}$ now is a 3rd order tensor, so how can we apply chain rule?
            $endgroup$
            – Dmitry Denisov
            Jan 14 at 22:13












          • $begingroup$
            And even if X is matrix then f is also a matrix and we should take a derivative: matrix-by-matrix?
            $endgroup$
            – Dmitry Denisov
            Jan 15 at 12:14












          • $begingroup$
            @DmitryDenisov 1) Yes, ${rm Tr}(scalar)=scalar,,$ 2) the gradient really is a 3rd order tensor. The point is you never need to calculate 3rd order (vector-by-matrix) or 4th order (matrix-by-matrix) derivatives, and the programs you write will never calculate such quantities either. These online notes are worth a read.
            $endgroup$
            – greg
            Jan 15 at 18:04












          • $begingroup$
            In this row nabla_w[-1] = np.dot(delta, activations[-2].transpose()) they set $frac{partial L}{partial W}$ is equal to $X^T cdot delta$, so it doesn't seem like chain rule. I.e. in another case they also should calculate the derivative using differential on paper, however they stated that chain rule is a universal approach
            $endgroup$
            – Dmitry Denisov
            Jan 16 at 9:57


















          0












          $begingroup$

          Let's use a convention where a lowercase Latin letter always represents a column vector, an uppercase Latin is a matrix, and a Greek letter is a scalar.



          Using this convention your equations are
          $$eqalign{
          f &= W^Tx + b cr
          lambda &= f^Ty cr
          }$$

          As you have noted, the differential of the scalar function is
          $$eqalign{
          dlambda &= df^Ty = (dW^Tx)^Ty = x^TdW,y cr
          }$$

          Let's develop that a bit further by introducing the Trace function
          $$eqalign{
          dlambda &= {rm Tr}(x^TdW,y) = {rm Tr}(yx^TdW) cr
          }$$

          Then, depending on your preferred Layout Convention, the gradient is either
          $$eqalign{
          frac{partiallambda}{partial W} &=yx^T quad{rm or}quad xy^T cr
          }$$

          Since you expected the the dimensions of the gradient to be those of $W$, it sounds like your preferred layout is $xy^T$



          Also note that $frac{partial f}{partial W}neq X.,$ The gradient is a 3rd order tensor, while $X$ is just a 2nd order tensor (aka a matrix). The presence of these 3rd and 4th order tensors as intermediate quantities in the chain rule can make it difficult/impossible to use in practice.



          The differential approach suggested by your advisor is often simpler because the differential of a matrix is just another matrix quantity, which is easy to handle.






          share|cite|improve this answer











          $endgroup$













          • $begingroup$
            Thank you very much for your answer, it became more clear for me now! I have 2 questions about your solution: 1) Do I understand correctly that you introduced Trace function, because dλ is scalar so const=Tr(const) ? 2) $frac{partial f}{partial W}$ is 3rd order tensor. Maybe in this case you know how chain rule works in Neural Networks? Because I get derivative from previous layer and I should multiply it by the derivative of current layer according to chain rule. However, as you mentioned, $frac{partial f}{partial W}$ now is a 3rd order tensor, so how can we apply chain rule?
            $endgroup$
            – Dmitry Denisov
            Jan 14 at 22:13












          • $begingroup$
            And even if X is matrix then f is also a matrix and we should take a derivative: matrix-by-matrix?
            $endgroup$
            – Dmitry Denisov
            Jan 15 at 12:14












          • $begingroup$
            @DmitryDenisov 1) Yes, ${rm Tr}(scalar)=scalar,,$ 2) the gradient really is a 3rd order tensor. The point is you never need to calculate 3rd order (vector-by-matrix) or 4th order (matrix-by-matrix) derivatives, and the programs you write will never calculate such quantities either. These online notes are worth a read.
            $endgroup$
            – greg
            Jan 15 at 18:04












          • $begingroup$
            In this row nabla_w[-1] = np.dot(delta, activations[-2].transpose()) they set $frac{partial L}{partial W}$ is equal to $X^T cdot delta$, so it doesn't seem like chain rule. I.e. in another case they also should calculate the derivative using differential on paper, however they stated that chain rule is a universal approach
            $endgroup$
            – Dmitry Denisov
            Jan 16 at 9:57
















          0












          0








          0





          $begingroup$

          Let's use a convention where a lowercase Latin letter always represents a column vector, an uppercase Latin is a matrix, and a Greek letter is a scalar.



          Using this convention your equations are
          $$eqalign{
          f &= W^Tx + b cr
          lambda &= f^Ty cr
          }$$

          As you have noted, the differential of the scalar function is
          $$eqalign{
          dlambda &= df^Ty = (dW^Tx)^Ty = x^TdW,y cr
          }$$

          Let's develop that a bit further by introducing the Trace function
          $$eqalign{
          dlambda &= {rm Tr}(x^TdW,y) = {rm Tr}(yx^TdW) cr
          }$$

          Then, depending on your preferred Layout Convention, the gradient is either
          $$eqalign{
          frac{partiallambda}{partial W} &=yx^T quad{rm or}quad xy^T cr
          }$$

          Since you expected the the dimensions of the gradient to be those of $W$, it sounds like your preferred layout is $xy^T$



          Also note that $frac{partial f}{partial W}neq X.,$ The gradient is a 3rd order tensor, while $X$ is just a 2nd order tensor (aka a matrix). The presence of these 3rd and 4th order tensors as intermediate quantities in the chain rule can make it difficult/impossible to use in practice.



          The differential approach suggested by your advisor is often simpler because the differential of a matrix is just another matrix quantity, which is easy to handle.






          share|cite|improve this answer











          $endgroup$



          Let's use a convention where a lowercase Latin letter always represents a column vector, an uppercase Latin is a matrix, and a Greek letter is a scalar.



          Using this convention your equations are
          $$eqalign{
          f &= W^Tx + b cr
          lambda &= f^Ty cr
          }$$

          As you have noted, the differential of the scalar function is
          $$eqalign{
          dlambda &= df^Ty = (dW^Tx)^Ty = x^TdW,y cr
          }$$

          Let's develop that a bit further by introducing the Trace function
          $$eqalign{
          dlambda &= {rm Tr}(x^TdW,y) = {rm Tr}(yx^TdW) cr
          }$$

          Then, depending on your preferred Layout Convention, the gradient is either
          $$eqalign{
          frac{partiallambda}{partial W} &=yx^T quad{rm or}quad xy^T cr
          }$$

          Since you expected the the dimensions of the gradient to be those of $W$, it sounds like your preferred layout is $xy^T$



          Also note that $frac{partial f}{partial W}neq X.,$ The gradient is a 3rd order tensor, while $X$ is just a 2nd order tensor (aka a matrix). The presence of these 3rd and 4th order tensors as intermediate quantities in the chain rule can make it difficult/impossible to use in practice.



          The differential approach suggested by your advisor is often simpler because the differential of a matrix is just another matrix quantity, which is easy to handle.







          share|cite|improve this answer














          share|cite|improve this answer



          share|cite|improve this answer








          edited Jan 14 at 19:53

























          answered Jan 14 at 19:41









          greggreg

          8,2751823




          8,2751823












          • $begingroup$
            Thank you very much for your answer, it became more clear for me now! I have 2 questions about your solution: 1) Do I understand correctly that you introduced Trace function, because dλ is scalar so const=Tr(const) ? 2) $frac{partial f}{partial W}$ is 3rd order tensor. Maybe in this case you know how chain rule works in Neural Networks? Because I get derivative from previous layer and I should multiply it by the derivative of current layer according to chain rule. However, as you mentioned, $frac{partial f}{partial W}$ now is a 3rd order tensor, so how can we apply chain rule?
            $endgroup$
            – Dmitry Denisov
            Jan 14 at 22:13












          • $begingroup$
            And even if X is matrix then f is also a matrix and we should take a derivative: matrix-by-matrix?
            $endgroup$
            – Dmitry Denisov
            Jan 15 at 12:14












          • $begingroup$
            @DmitryDenisov 1) Yes, ${rm Tr}(scalar)=scalar,,$ 2) the gradient really is a 3rd order tensor. The point is you never need to calculate 3rd order (vector-by-matrix) or 4th order (matrix-by-matrix) derivatives, and the programs you write will never calculate such quantities either. These online notes are worth a read.
            $endgroup$
            – greg
            Jan 15 at 18:04












          • $begingroup$
            In this row nabla_w[-1] = np.dot(delta, activations[-2].transpose()) they set $frac{partial L}{partial W}$ is equal to $X^T cdot delta$, so it doesn't seem like chain rule. I.e. in another case they also should calculate the derivative using differential on paper, however they stated that chain rule is a universal approach
            $endgroup$
            – Dmitry Denisov
            Jan 16 at 9:57




















          • $begingroup$
            Thank you very much for your answer, it became more clear for me now! I have 2 questions about your solution: 1) Do I understand correctly that you introduced Trace function, because dλ is scalar so const=Tr(const) ? 2) $frac{partial f}{partial W}$ is 3rd order tensor. Maybe in this case you know how chain rule works in Neural Networks? Because I get derivative from previous layer and I should multiply it by the derivative of current layer according to chain rule. However, as you mentioned, $frac{partial f}{partial W}$ now is a 3rd order tensor, so how can we apply chain rule?
            $endgroup$
            – Dmitry Denisov
            Jan 14 at 22:13












          • $begingroup$
            And even if X is matrix then f is also a matrix and we should take a derivative: matrix-by-matrix?
            $endgroup$
            – Dmitry Denisov
            Jan 15 at 12:14












          • $begingroup$
            @DmitryDenisov 1) Yes, ${rm Tr}(scalar)=scalar,,$ 2) the gradient really is a 3rd order tensor. The point is you never need to calculate 3rd order (vector-by-matrix) or 4th order (matrix-by-matrix) derivatives, and the programs you write will never calculate such quantities either. These online notes are worth a read.
            $endgroup$
            – greg
            Jan 15 at 18:04












          • $begingroup$
            In this row nabla_w[-1] = np.dot(delta, activations[-2].transpose()) they set $frac{partial L}{partial W}$ is equal to $X^T cdot delta$, so it doesn't seem like chain rule. I.e. in another case they also should calculate the derivative using differential on paper, however they stated that chain rule is a universal approach
            $endgroup$
            – Dmitry Denisov
            Jan 16 at 9:57


















          $begingroup$
          Thank you very much for your answer, it became more clear for me now! I have 2 questions about your solution: 1) Do I understand correctly that you introduced Trace function, because dλ is scalar so const=Tr(const) ? 2) $frac{partial f}{partial W}$ is 3rd order tensor. Maybe in this case you know how chain rule works in Neural Networks? Because I get derivative from previous layer and I should multiply it by the derivative of current layer according to chain rule. However, as you mentioned, $frac{partial f}{partial W}$ now is a 3rd order tensor, so how can we apply chain rule?
          $endgroup$
          – Dmitry Denisov
          Jan 14 at 22:13






          $begingroup$
          Thank you very much for your answer, it became more clear for me now! I have 2 questions about your solution: 1) Do I understand correctly that you introduced Trace function, because dλ is scalar so const=Tr(const) ? 2) $frac{partial f}{partial W}$ is 3rd order tensor. Maybe in this case you know how chain rule works in Neural Networks? Because I get derivative from previous layer and I should multiply it by the derivative of current layer according to chain rule. However, as you mentioned, $frac{partial f}{partial W}$ now is a 3rd order tensor, so how can we apply chain rule?
          $endgroup$
          – Dmitry Denisov
          Jan 14 at 22:13














          $begingroup$
          And even if X is matrix then f is also a matrix and we should take a derivative: matrix-by-matrix?
          $endgroup$
          – Dmitry Denisov
          Jan 15 at 12:14






          $begingroup$
          And even if X is matrix then f is also a matrix and we should take a derivative: matrix-by-matrix?
          $endgroup$
          – Dmitry Denisov
          Jan 15 at 12:14














          $begingroup$
          @DmitryDenisov 1) Yes, ${rm Tr}(scalar)=scalar,,$ 2) the gradient really is a 3rd order tensor. The point is you never need to calculate 3rd order (vector-by-matrix) or 4th order (matrix-by-matrix) derivatives, and the programs you write will never calculate such quantities either. These online notes are worth a read.
          $endgroup$
          – greg
          Jan 15 at 18:04






          $begingroup$
          @DmitryDenisov 1) Yes, ${rm Tr}(scalar)=scalar,,$ 2) the gradient really is a 3rd order tensor. The point is you never need to calculate 3rd order (vector-by-matrix) or 4th order (matrix-by-matrix) derivatives, and the programs you write will never calculate such quantities either. These online notes are worth a read.
          $endgroup$
          – greg
          Jan 15 at 18:04














          $begingroup$
          In this row nabla_w[-1] = np.dot(delta, activations[-2].transpose()) they set $frac{partial L}{partial W}$ is equal to $X^T cdot delta$, so it doesn't seem like chain rule. I.e. in another case they also should calculate the derivative using differential on paper, however they stated that chain rule is a universal approach
          $endgroup$
          – Dmitry Denisov
          Jan 16 at 9:57






          $begingroup$
          In this row nabla_w[-1] = np.dot(delta, activations[-2].transpose()) they set $frac{partial L}{partial W}$ is equal to $X^T cdot delta$, so it doesn't seem like chain rule. I.e. in another case they also should calculate the derivative using differential on paper, however they stated that chain rule is a universal approach
          $endgroup$
          – Dmitry Denisov
          Jan 16 at 9:57




















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Mathematics Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3072387%2fmatrix-derivatives-problem-with-dimensions%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          'app-layout' is not a known element: how to share Component with different Modules

          android studio warns about leanback feature tag usage required on manifest while using Unity exported app?

          WPF add header to Image with URL pettitions [duplicate]