0
$\begingroup$

My question is regarding an assignment our professor gave us. In it, at one point we have to find the gradient of a function $:\\L : \mathbb{R}^{n,n} \times \mathbb{R}^n \rightarrow \mathbb{R}, L(M,m) = ||M||_F^2 + m^T*(M*z - g)$

As differentiation of functions taking matrices as arguments can get incredibly cumbersome, we were instructed to resort to "high level differentiation", i.e. treat matrices and vectors as scalars while keeping in mind the intricacies of not having a scalar. This high level differentiation works most of the time, but in this case it does not.

Specifically, the partial derivative in regards to M is problematic. I would get:

$\partial_ML(M,m) = 2 * M + m^T * I * z = 2 * M + m^T * z$, where $I$ is the identity.

Obviously, this is obviously incorrect, as can be seen by the dimension mismatch of adding a scalar to a matrix. Unfortunately, the master solution does not give any explanation, it only states:

$\partial_ML(M,m) = 2 * M + m * z^T$.

What am I missing?

Thanks in advance,
Felix

$\endgroup$
2
  • 1
    $\begingroup$ There are two possible things to answer here. 1) Do you want to know why it does not work in this case? or 2) How to get to the "master solution"? Both are valid questions and should be distinguished.. and I dont know exactly what you are after. $\endgroup$ Commented Jan 24, 2020 at 20:08
  • $\begingroup$ You are correct, I should have been more precise. If possible, I would like to know both things, i.e. (1) why it does not work in this case, and (2) how to do it correctly (i.e. how to get to the master solution). $\endgroup$ Commented Jan 24, 2020 at 20:14

1 Answer 1

1
$\begingroup$

(1) why it does not work in this case

Matrix multiplication couple factors in a more complicated way than scalars being multiplicated. Best, just to follow the calculations below. It should make it clearer.

(2) how to do it correctly (i.e. how to get to the master solution)

To calculate the derivative w.r.t. a matrix $M$ one can just differentiate w.r.t. a specific matrix element $M_{ij}$ (see wikipedia). The differentiation is done as in the usual way when differentiating w.r.t. a vector element \begin{align} L(M,m) &= \|M\|_F^2 +m^T (Mz-g) = \sum_{ij} (M_{ij}^2 + m_i M_{ij} z_j) -m^Tg \\ \implies \partial_{M_{ij}}L &= 2M_{ij} + m_i z_j \,. \end{align}

Finally, you need to collect the derivatives w.r.t. the matrix elements to get the derivative w.r.t. the full matrix \begin{align} \partial_M L &= \begin{bmatrix} \partial_{M_{11}} L & \dots & \partial_{M_{1n}} L \\ \vdots & \ddots & \vdots \\ \partial_{M_{n1}} L & \dots & \partial_{M_{nn}} L \end{bmatrix} \\ &= \begin{bmatrix} 2M_{11} + m_1 z_1 & \dots & 2M_{1n} + m_1 z_n \\ \vdots & \ddots & \vdots \\ 2M_{n1} + m_n z_1 & \dots & 2M_{nn} + m_n z_n \end{bmatrix} \\ &= 2M + m^T z \,. \end{align}

As a side note: see the different layouts for derivatives w.r.t. matrices.

$\endgroup$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.