矩阵求导

uf(x)=f(x)u\nabla_\mathbf{u}f(\mathbf x)=\nabla f(\mathbf x)\cdot \mathbf u

雅各比矩阵和雅各比行列式(Jacobian matrix)

雅各比矩阵是所有向量值函数的一阶导数。 When the matrix is a square matrix, both the matrix and its determinant are referred to as the Jacobian in literature.

假设:f:RnRmf:R^n\to R^m是以n维向量为自变量,以m维向量为函数值函数,f的雅各比矩阵J是一个m×nm\times n的矩阵。定义为:(每个梯度向量转置的合并)

或者分量形式:
Jij=fixjJ_{ij}=\frac{\partial f_i}{\partial x_j}

This matrix, whose entries are functions of x, is also denoted by DfDf, JfJ_f, and (f1,,fm)(x1,,xn)\frac{\partial(f_1,\ldots,f_m)}{\partial(x_1,\ldots,x_n)}

那么,If p is a point in ℝn and f is differentiable at p, then its derivative is given by Jf(p)J_f(p). In this case, the linear map described by Jf(p)J_f(p) is the best linear approximation of f near the point p, in the sense that
f(x)=f(p)+Jf(p)(xp)+o(xp)f(x)=f(p)+J_f(p)(x-p)+o(||x-p||)

矩阵求导记号:

yx=[yx1yx2yxn].\frac{\partial y}{\partial \mathbf{x}} = \left[ \frac{\partial y}{\partial x_1} \frac{\partial y}{\partial x_2} \cdots \frac{\partial y}{\partial x_n} \right].

yx=[y1xy2xymx].\frac{\partial \mathbf{y}}{\partial x} = \begin{bmatrix} \frac{\partial y_1}{\partial x}\\ \frac{\partial y_2}{\partial x}\\ \vdots\\ \frac{\partial y_m}{\partial x} \end{bmatrix}.

yx=[y1x1y1x2y1xny2x1y2x2y2xnymx1ymx2ymxn].\frac{\partial \mathbf{y}}{\partial \mathbf{x}} = \begin{bmatrix} \frac{\partial y_1}{\partial x_1} & \frac{\partial y_1}{\partial x_2} & \cdots & \frac{\partial y_1}{\partial x_n}\\ \frac{\partial y_2}{\partial x_1} & \frac{\partial y_2}{\partial x_2} & \cdots & \frac{\partial y_2}{\partial x_n}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial y_m}{\partial x_1} & \frac{\partial y_m}{\partial x_2} & \cdots & \frac{\partial y_m}{\partial x_n} \end{bmatrix}.

yX=[yx11yx21yxp1yx12yx22yxp2yx1qyx2qyxpq].\frac{\partial y}{\partial \mathbf{X}} = \begin{bmatrix} \frac{\partial y}{\partial x_{11}} & \frac{\partial y}{\partial x_{21}} & \cdots & \frac{\partial y}{\partial x_{p1}}\\ \frac{\partial y}{\partial x_{12}} & \frac{\partial y}{\partial x_{22}} & \cdots & \frac{\partial y}{\partial x_{p2}}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial y}{\partial x_{1q}} & \frac{\partial y}{\partial x_{2q}} & \cdots & \frac{\partial y}{\partial x_{pq}} \end{bmatrix}.

The following definitions are only provided in numerator-layout notation:(分子展开记法,上述都沿用这个记法)

Yx=[y11xy12xy1nxy21xy22xy2nxym1xym2xymnx].\frac{\partial \mathbf{Y}}{\partial x} = \begin{bmatrix} \frac{\partial y_{11}}{\partial x} & \frac{\partial y_{12}}{\partial x} & \cdots & \frac{\partial y_{1n}}{\partial x}\\ \frac{\partial y_{21}}{\partial x} & \frac{\partial y_{22}}{\partial x} & \cdots & \frac{\partial y_{2n}}{\partial x}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial y_{m1}}{\partial x} & \frac{\partial y_{m2}}{\partial x} & \cdots & \frac{\partial y_{mn}}{\partial x} \end{bmatrix}.

dX=[dx11dx12dx1ndx21dx22dx2ndxm1dxm2dxmn].d\mathbf{X} = \begin{bmatrix} dx_{11} & dx_{12} & \cdots & dx_{1n}\\ dx_{21} & dx_{22} & \cdots & dx_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ dx_{m1} & dx_{m2} & \cdots & dx_{mn} \end{bmatrix}.

矩阵对标量

Y=(yij(t))p×qY=(y_{ij}(t))_{p\times q}
Yt=(yij(t)t)\frac{\partial Y}{\partial t}=(\frac{\partial y_{ij}(t)}{\partial t})

X,Y是关于t的函数。
性质:

标量对矩阵

(矩阵的标量函数对矩阵的导数)
y=f(X)y=f(X)p×qp\times q阶矩阵X的标量函数,y对X的导数定义为:
yX=(yxij)\frac{\partial y}{\partial X}=(\frac{\partial y}{x_{ij}})
称为y的梯度矩阵。

性质:

标量对向量(最常见情形)

若x,a为列向量,u,v是x的向量值函数,以下表示对x求导,其中内侧的'表示转置,外侧的表示求导。

矩阵的迹

参考资料

  1. Matrix calculus,wikipedia
  2. 张伟平 矩阵代数
  3. Jacobian matirx and determinant